As indicated in the introduction,
we propose to perform point-based DP update a number of times
in between two standard DP updates.
To be more specific, we propose to modify value iteration
in the way as shown in Figure 3.
Note that the only change is at line 5.
Instead of assigning directly to ,
we pass it to a subroutine `POINT-BASED-VI` and
assign the output of the subroutine to .
The subroutine
functions in the same way as value iteration,
except that it performs
point-based DP updates rather than standard DP updates.
Hence we call it
*point-based value iteration*.

Figure 4 illustrates the basic idea behind modified value iteration in contrast to value iteration. When the initial value function is properly selected, the sequence of value functions produced by value iteration converges monotonically to the optimal value function. Convergence usually takes a long time partially because standard DP updates, indicated by fat upward arrows, are computationally expensive. Modified value iteration interleaves standard DP updates with point-based DP updates, which are indicated by the thin upward arrows. Point-based DP update does not improve a value function as much as standard DP update. However, its complexity is much lower. As a consequence, modified value iteration can hopefully converge in less time.

The idea of interleaving standard DP updates with approximate updates that back up at a finite number of belief points is due to Cheng (1988). Our work differs from Cheng's method mainly in the way we select the belief points. A detailed discussion of the differences will be given in Section 8.

**Figure 4:** Illustration of the Basic Idea behind Modified Value Iteration.

The modified value iteration algorithm raises three issues. First, what stopping criterion do we use for point-based value iteration? Second, how can we guarantee the stopping criterion can eventually be satisfied? Third, how do we guarantee the convergence of the modified value iteration algorithm itself? To address those issues, we introduce the concept of uniformly improvable value functions.

Thu Feb 15 14:47:09 HKT 2001