Point-based value iteration starts with a set of vectors and generates a sequence of vector sets by repeatedly applying point-based update. The last set can be used to approximate the optimal value function.

Various methods for approximating the optimal value function have been developed previously. We will compare them against point-based value iteration along two dimensions: (1) Whether they map one set of vectors to another, that is whether the can be interleaved with standard updates, and (2) if they do, whether they can guarantee convergence when interleaved with standard updates.

Lovejoy (1993) proposes to approximate the optimal value function
of a POMDP using
the optimal value function of the underlying Markov
decision process (MDP). The latter is a function over the state space.
So is being approximated by one vector.
Littman *et al.* (1995b) extend this idea and approximate
using
vectors, each of which corresponds to a Q-function of
the underlying MDP. A further extension is recently introduced
by Zubek and Dietterich (2000). Their idea
is to base the approximation not on the underlying MDP, rather
on a so-called even-odd POMDP that
is identical to the original POMDP except that the state
is fully observable at even time steps.
Platzman (1980) suggests approximating using the value functions of
one or more fixed suboptimal policies that are constructed heuristically.
Those methods do not start with a set of vectors
and hence do not map a set of vectors to another.
However,
they can easily be adapted to do so.
However, they all put a predetermined limit on the number of output vectors.
Consequently,
convergence is not guaranteed when they are
interleaved with standard updates.

Fast informed bound (Hauskrecht 1997a), Q-function curve fitting (Littman
*et al.* 1995b), and softmax curve fitting (Parr and Russell 1995)
do map a set of vectors to another.
However, they differ drastically
from point-based value iteration and from each other
in their ways of deriving the next set of vectors from the current one.
Regardless of the size of the current set,
fast informed bound and Q-function curve fitting
always produces vectors, one for each action.
In softmax curve fitting, the number of vectors is also determined
a priori, although it is not necessarily related to the number of actions.
Those methods can be interleaved with standard DP updates.
Unlike point-based value iteration, they themselves
may not converge (Hauskrecht 2000). Even in cases where they do converge
themselves, the algorithms resulting from interleaving
them with standard updates do not necessarily converge
due to the
a priori limits on the number of vectors.

Grid-based interpolation/extrapolation methods (Lovejoy 1991, Brafman 1997, Hauskrecht 1997b) approximate value functions by discretizing the belief space using a fixed or variable grid and by maintaining values only for the grid points. Values at non-grid points are estimated by interpolation/extrapolation when needed. Such methods cannot be interleaved with standard DP updates because they do not work with sets of vectors.

There are grid-based methods that work with sets of vectors. Lovejoy's method to lower bound the optimal value function (Lovejoy 1991), for instance, falls into this category. This method is actually identical to point-based value iteration except for the way it derives the next set of vectors from the current one. Instead of using point-based update, it backs up on grid points in a regular grid. Convergence of this method is not guaranteed. The algorithm resulting from interleaving it with standard updates may not converge either.

The incremental linear-function method (Hauskrecht 2000) roughly corresponds to a variation of point-based value iteration that uses non-LP point-based update (Section 7.2) augmented with the Gauss-Seidel asynchronous update. The method does not have access to witness point. So it starts, for the purpose of backup, with extreme points of the belief space and supplement them with projected points. This choice of points appears poor because it leads to a large number of vectors and consequently the backup process is ``usually stopped well before" convergence (Hauskrecht 2000).

Thu Feb 15 14:47:09 HKT 2001