POMDPs model sequential decision making
problems where effects of actions are nondeterministic and
the state of the world is not known with certainty.
They have attracted many researchers in Operations
Research and Artificial Intelligence because of
their potential applications in
a wide range of areas (Monahan 1982, Cassandra 1998b),
one of which is planning under uncertainty.
Unfortunately, there is still a significant gap between this
potential and actual applications,
primarily due to the lack of effective solution methods.
For this reason, much recent effort has been devoted to
finding efficient algorithms for POMDPs
(e.g., Parr and Russell 1995,
Hauskrecht 1997b, Cassandra 1998a,
Hansen 1998, Kaelbling *et al.* 1998,
Zhang *et al.* 1999).

Value iteration is a well-known algorithm for POMDPs (Smallwood and Sondik 1973, Puterman 1990). It starts with an initial value function and iteratively performs dynamic programming (DP) updates to generate a sequence of value functions. The sequence converges to the optimal value function. Value iteration terminates when a predetermined convergence condition is met.

Value iteration performs typically a large number of DP updates before it converges and DP updates are notoriously expensive. In this paper, we develop a technique for reducing the number of DP updates.

DP update takes (the finite representation of)
a value function as input and
returns (the finite representation of)
another value function. The output value function
is closer to the optimal
than the input value function.
In this sense, we say that DP update *improves*
its input.
We propose an approximation to DP update called
*point-based DP update*. Point-based DP update also
improves its input, but possibly to a lesser degree than
standard DP update. On the other hand,
it is computationally much cheaper.
During value iteration, we perform point-based
DP update a number of times in between two standard DP updates.
The number of standard DP updates can be reduced this way
since point-based DP update improves its input.
The reduction does not come with a high cost since
point-based DP update takes little time.

The rest of this paper is organized as follows. In the next section we shall give a brief review of POMDPs and value iteration. The basic idea behind point-based DP update will be explained in Section 3. After some theoretical preparations in Section 4, we shall work out the details of point-based DP update in Section 5. Empirical results will be reported in Section 6 and possible variations evaluated in Section 7. Finally, we shall discuss related work in Section 8 and provide some concluding remarks in Section 9.

Thu Feb 15 14:47:09 HKT 2001