Experiments have been conducted to empirically determine the effectiveness of point-based DP update in speeding up value iteration. Eight problems are used in the experiments. In the literature, the problems are commonly referred to as 4x3CO, Cheese, 4x4, Part Painting, Tiger, Shuttle, Network, and Aircraft ID. We obtained the problem files from Tony Cassandra. Information about their sizes is summarized in the following table.
Problem Problem 4x3CO 11 4 11
Cheese 11 4 7
4x4 16 2 4
Painting 4 4 2
Tiger 2 2 3
Shuttle 8 2 3
Network 7 2 4
Aircraft ID 12 5 6
The effectiveness of point-based DP update is determined by comparing the standard value iteration algorithm VI and the modified value iteration algorithm VI1. The implementation of standard value iteration used in our experiments is borrowed from Hansen. Modified value iteration is implemented on top of Hansen's code. The discount factor is set at 0.95 and round-off precision is set at . All experiments are conducted on an UltraSparc II machine.
Table 1 shows the amounts of time VI and VI1 took to compute 0.01-optimal policies for the test problems. We see that VI1 is consistently more efficient than VI, especially on the larger problems. It is about 1.3, 2.8, 5, 62, 141, 173, and 49 times faster than VI on the first seven problems respectively. For the Aircraft ID problem, VI1 was able to compute a 0.01-optimal policy in less than 8 hours, while VI was only able to produce a 33-optimal policy after 40 hours.
Various other statistics are given in Table 2 to highlight computational properties of VI1 and to explain its superior performance. The numbers of standard DP updates carried out by VI and VI1 are shown at rows 1 and 3. We see that VI1 performed no more than 5 standard updates on the test problems, while VI performed more than 125. This indicates that point-based update is very effective in cutting down the number of standard updates required to reach convergence. As a consequence, VI1 spent much less time than VI in standard updates (row 2 and 4).
Row 5 shows the numbers of point-based updates carried out by VI1. We see that those numbers are actually larger than the numbers of standard updates performed by VI. This is expected. To see why, recall that point-based update is an approximation of standard update. Let be a set of vectors that is uniformly improvable. Use to denote the sets of vectors resulted from performing point-based update on . For any belief state b, we have . This means that point-based update improves but not as much as standard update. Consequently, the use of point-based update increases the total number of iterations, i.e the number of standard updates plus the number of point-based updates.
Intuitively, the better point-based update is as an approximation of standard update, the less the difference between the total number iterations that VI1 and VI need take. So, the ratio between those two numbers in a problem can be used, to certain extent, as a measurement of the quality of point-based update in that problem. We shall refer to it as the quality ratio of point-based update. Row 7 shows the quality ratios in the seven test problems. We see that the quality of point-based update is fairly good and stable across all the problems.
Row 8 shows, for each test problem, the ratio between the average time of a standard update performed by VI and that of a point-based update performed by VI1. Those ratios measure, to certain extent, the complexity of point-based update relative to standard update and hence will be referred to as the complexity ratios of point-based update. We see that, as predicted by the analysis in Section 5.3, point-based update is consistently less expensive than standard update. The differences are more than 200 times in the last four problems.
In summary, the statistics suggest that the quality of point-based update relative to standard update is fairly good and stable and its complexity is much lower. Together with the fact that point-based update can drastically reduces the number of standard updates, those explain the superior performance of VI1.
To close this section, let us note that while VI finds policies with quality very close to the predetermined criterion, VI1 usually finds much better ones (Table 3). This is because VI checks policy quality after each (standard) update, while VI1 does not do this after point-based updates.