Value iteration is a popular algorithm for finding -optimal policies for POMDPs. It typically performs a large number of DP updates before convergence and DP updates are notoriously expensive. In this paper, we have developed a technique called point-based DP update for reducing the number of standard DP updates. The technique is conceptually simple and clean. It can easily be incorporated into most existing POMDP value iteration algorithms. Empirical studies have shown that point-based DP update can drastically cut down the number of standard DP updates and hence significantly speeding up value iteration. Moreover, point-based DP update compares favorably with its more complex variations that we can think also. It also compares favorably with policy iteration.
The algorithm presented this paper still requires standard DP updates. This limits its capability of solving large POMDPs. One future direction is to investigate the properties of point-based value iteration as an approximation algorithm by itself. Another direction is to design efficient algorithms for standard DP updates in special models. We are currently exploring the latter direction.
Research is supported by Hong Kong Research Grants Council Grant HKUST6125/98E. The authors thank Tony Cassandra and Eric Hansen for sharing with us their programs. We are also grateful for the three anonymous reviewers who provided insightful comments and suggestions on an earlier version of this paper.