next up previous
Next: Variations of Point-Based DP Up: Speeding Up the Previous: Computing the Bellman Residual

Empirical Results and Discussions

Experiments have been conducted to empirically determine the effectiveness of point-based DP update in speeding up value iteration. Eight problems are used in the experiments. In the literature, the problems are commonly referred to as 4x3CO, Cheese, 4x4, Part Painting, Tiger, Shuttle, Network, and Aircraft ID. We obtained the problem files from Tony Cassandra. Information about their sizes is summarized in the following table.

Problem tex2html_wrap_inline1136 tex2html_wrap_inline1710 tex2html_wrap_inline1712 Problem tex2html_wrap_inline1136 tex2html_wrap_inline1710 tex2html_wrap_inline1712
4x3CO 11 4 11 Cheese 11 4 7
4x4 16 2 4 Painting 4 4 2
Tiger 2 2 3 Shuttle 8 2 3
Network 7 2 4 Aircraft ID 12 5 6

The effectiveness of point-based DP update is determined by comparing the standard value iteration algorithm VI and the modified value iteration algorithm VI1. The implementation of standard value iteration used in our experiments is borrowed from Hansen. Modified value iteration is implemented on top of Hansen's code.gif The discount factor is set at 0.95 and round-off precision is set at tex2html_wrap_inline1722 . All experiments are conducted on an UltraSparc II machine.

Table 1 shows the amounts of time VI and VI1 took to compute 0.01-optimal policies for the test problems. We see that VI1 is consistently more efficient than VI, especially on the larger problems. It is about 1.3, 2.8, 5, 62, 141, 173, and 49 times faster than VI on the first seven problems respectively. For the Aircraft ID problem, VI1 was able to compute a 0.01-optimal policy in less than 8 hours, while VI was only able to produce a 33-optimal policy after 40 hours.

 

4x3CO Cheese 4x4 Paint Tiger Shuttle Network Aircraft
VI 3.2 13.9 27.15 37.84 79.14 5,199 12,478 -
VI1 2.4 5.0 5.30 .61 .56 30 253 27,676
Table 1: Time for Computing 0.01-Optimal Policies in Seconds.

Various other statistics are given in Table 2 to highlight computational properties of VI1 and to explain its superior performance. The numbers of standard DP updates carried out by VI and VI1 are shown at rows 1 and 3. We see that VI1 performed no more than 5 standard updates on the test problems, while VI performed more than 125. This indicates that point-based update is very effective in cutting down the number of standard updates required to reach convergence. As a consequence, VI1 spent much less time than VI in standard updates (row 2 and 4).gif

 

Problem 4x3CO Cheese 4x4 Paint Tiger Shuttle Network
DPU # 125 129 130 127 163 174 214
1.5ex[0cm][0cm]VI Time 2.00 7.63 17.83 33.39 70.44 3,198 8,738
DPU # 4 4 3 3 3 5 5
Time .05 .09 .15 .21 .09 13 82
1.5ex[0cm][0cm]VI1 PBDPU # 377 219 173 244 515 455 670
Time 2.32 4.86 5.09 .37 .45 10 139
Quality Ratio .33 .58 .74 .51 .31 0.31 .32
Complexity Ratio .38 .37 .21 .0057 .002 .0012 .005
Table 2: Detailed Statistics.

Row 5 shows the numbers of point-based updates carried out by VI1. We see that those numbers are actually larger than the numbers of standard updates performed by VI. This is expected. To see why, recall that point-based update is an approximation of standard update. Let tex2html_wrap_inline1142 be a set of vectors that is uniformly improvable. Use tex2html_wrap_inline1726 to denote the sets of vectors resulted from performing point-based update on tex2html_wrap_inline1142 . For any belief state b, we have tex2html_wrap_inline1732 . This means that point-based update improves tex2html_wrap_inline1142 but not as much as standard update. Consequently, the use of point-based update increases the total number of iterations, i.e the number of standard updates plus the number of point-based updates.

Intuitively, the better point-based update is as an approximation of standard update, the less the difference between the total number iterations that VI1 and VI need take. So, the ratio between those two numbers in a problem can be used, to certain extent, as a measurement of the quality of point-based update in that problem. We shall refer to it as the quality ratio of point-based update. Row 7 shows the quality ratios in the seven test problems. We see that the quality of point-based update is fairly good and stable across all the problems.

Row 8 shows, for each test problem, the ratio between the average time of a standard update performed by VI and that of a point-based update performed by VI1. Those ratios measure, to certain extent, the complexity of point-based update relative to standard update and hence will be referred to as the complexity ratios of point-based update. We see that, as predicted by the analysis in Section 5.3, point-based update is consistently less expensive than standard update. The differences are more than 200 times in the last four problems.

In summary, the statistics suggest that the quality of point-based update relative to standard update is fairly good and stable and its complexity is much lower. Together with the fact that point-based update can drastically reduces the number of standard updates, those explain the superior performance of VI1.

To close this section, let us note that while VI finds policies with qualitygif very close to the predetermined criterion, VI1 usually finds much better ones (Table 3). This is because VI checks policy quality after each (standard) update, while VI1 does not do this after point-based updates.

 

Problem 4x3CO Cheese 4x4 Paint Tiger Shuttle Network
VI .0095 .0099 .0099 .01 .0098 .0097 .0098
VI1 .0008 .0008 .0009 .0007 .0007 .00015 .001
Table 3: Quality of Policies Found by VI and VI1.


next up previous
Next: Variations of Point-Based DP Up: Speeding Up the Previous: Computing the Bellman Residual

Dr. Lian Wen Zhang
Thu Feb 15 14:47:09 HKT 2001