Approximate POMDP Planning: Overcoming the Curse of History

Abstract

The POMDP framework is a generalized model for planning under uncertainty. However many still believe that POMDPs cannot scale to real-world-sized problems. There are two distinct but interdependent reasons for the limited scalability of POMDP algorithms. The first is the so-called curse of dimensionality, which was discussed in last week's ML lunch. The second is what we will call the curse of history: the number of distinct action-observation histories considered for planning grows exponentially with the planning horizon.
In this talk I will present the Point-Based Value Iteration algorithm for POMDP planning, which specifically addresses the curse of history. It approximates an exact value iteration solution by selecting a small set of representative belief points, and planning for those only. By using stochastic trajectories to choose belief points, and by maintaining only one value hyper-plane per point, it is able to successfully solve large problems, including the robotic Tag domain, a POMDP version of the population game of lasertag.

Back to the Main Page

Charles Rosenberg

Last modified: Wed Mar 5 11:59:58 EST 2003