Experiments have been conducted to empirically determine the effectiveness of point-based DP update in speeding up value iteration. Eight problems are used in the experiments. In the literature, the problems are commonly referred to as 4x3CO, Cheese, 4x4, Part Painting, Tiger, Shuttle, Network, and Aircraft ID. We obtained the problem files from Tony Cassandra. Information about their sizes is summarized in the following table.

Problem
Problem 4x3CO 11 4 11
Cheese 11 4 7
4x4 16 2 4
Painting 4 4 2
Tiger 2 2 3
Shuttle 8 2 3
Network 7 2 4
Aircraft ID 12 5 6

The effectiveness of point-based DP update is determined by comparing the standard value iteration algorithm VI and the modified value iteration algorithm VI1. The implementation of standard value iteration used in our experiments is borrowed from Hansen. Modified value iteration is implemented on top of Hansen's code. The discount factor is set at 0.95 and round-off precision is set at . All experiments are conducted on an UltraSparc II machine.

Table 1 shows the amounts of time VI and VI1 took to compute 0.01-optimal policies for the test problems. We see that VI1 is consistently more efficient than VI, especially on the larger problems. It is about 1.3, 2.8, 5, 62, 141, 173, and 49 times faster than VI on the first seven problems respectively. For the Aircraft ID problem, VI1 was able to compute a 0.01-optimal policy in less than 8 hours, while VI was only able to produce a 33-optimal policy after 40 hours.

4x3CO | Cheese | 4x4 | Paint | Tiger | Shuttle | Network | Aircraft | |

VI | 3.2 | 13.9 | 27.15 | 37.84 | 79.14 | 5,199 | 12,478 | - |

VI1 | 2.4 | 5.0 | 5.30 | .61 | .56 | 30 | 253 | 27,676 |

Various other statistics are given in Table 2 to highlight computational properties of VI1 and to explain its superior performance. The numbers of standard DP updates carried out by VI and VI1 are shown at rows 1 and 3. We see that VI1 performed no more than 5 standard updates on the test problems, while VI performed more than 125. This indicates that point-based update is very effective in cutting down the number of standard updates required to reach convergence. As a consequence, VI1 spent much less time than VI in standard updates (row 2 and 4).

Problem | 4x3CO | Cheese | 4x4 | Paint | Tiger | Shuttle | Network | |

DPU # | 125 | 129 | 130 | 127 | 163 | 174 | 214 | |

1.5ex[0cm][0cm]VI | Time | 2.00 | 7.63 | 17.83 | 33.39 | 70.44 | 3,198 | 8,738 |

DPU # | 4 | 4 | 3 | 3 | 3 | 5 | 5 | |

Time | .05 | .09 | .15 | .21 | .09 | 13 | 82 | |

1.5ex[0cm][0cm]VI1 | PBDPU # | 377 | 219 | 173 | 244 | 515 | 455 | 670 |

Time | 2.32 | 4.86 | 5.09 | .37 | .45 | 10 | 139 | |

Quality Ratio | .33 | .58 | .74 | .51 | .31 | 0.31 | .32 | |

Complexity Ratio | .38 | .37 | .21 | .0057 | .002 | .0012 | .005 |

Row 5 shows the numbers of point-based updates carried out by
VI1. We see that those numbers are actually larger than the numbers of
standard updates performed by VI. This is expected.
To see why, recall that point-based update is an approximation of
standard update. Let be a set of vectors that is
uniformly improvable.
Use to
denote the sets of vectors resulted from performing
point-based update on .
For any belief state *b*,
we have .
This means that point-based update improves but not as
much as standard update. Consequently, the use of
point-based update increases the *total number of
iterations*, i.e the number of standard updates plus the number of
point-based updates.

Intuitively, the better point-based update is as an
approximation of standard update, the less
the difference between the total number iterations
that VI1 and VI need take.
So, the ratio between those two numbers in a problem
can be used, to certain extent, as a measurement of the quality of
point-based update in that problem. We shall refer to it
as the *quality ratio* of point-based update.
Row 7 shows the quality ratios in the seven test
problems.
We see that the quality of
point-based update is fairly good and stable across
all the problems.

Row 8 shows, for each test problem, the ratio
between the average time of a standard update performed by VI
and that of a point-based update performed by VI1.
Those ratios measure, to certain extent, the complexity
of point-based update relative to standard update and
hence will be referred to as the *complexity ratios* of
point-based update.
We see that, as predicted by the analysis in Section 5.3,
point-based update is consistently less
expensive than standard update. The differences are more than
200 times in the last four problems.

In summary, the statistics suggest that the quality of point-based update relative to standard update is fairly good and stable and its complexity is much lower. Together with the fact that point-based update can drastically reduces the number of standard updates, those explain the superior performance of VI1.

To close this section, let us note that while VI finds policies with quality very close to the predetermined criterion, VI1 usually finds much better ones (Table 3). This is because VI checks policy quality after each (standard) update, while VI1 does not do this after point-based updates.

Problem | 4x3CO | Cheese | 4x4 | Paint | Tiger | Shuttle | Network |

VI | .0095 | .0099 | .0099 | .01 | .0098 | .0097 | .0098 |

VI1 | .0008 | .0008 | .0009 | .0007 | .0007 | .00015 | .001 |

Thu Feb 15 14:47:09 HKT 2001