Reducing the Complexity of Point-Based DP Update

Solving linear programs is the most expensive operation in point-based update. An obvious way to speed up is to avoid linear programs. Point-based update solves linear programs and backs up on the belief points found so as to guarantee uniform improvability. If the linear programs are to be skipped, there must be some other way to guarantee uniform improvability. There is an easy solution to this problem. Suppose

is the set of vectors that we try to update and it is uniformly improvable. Let

be the set obtained from

by backing up only on the witness points, which can be done without solving linear programs. The set

might or might not be uniformly improvable. However, the union

is guaranteed to be uniformly improvable. Therefore we can reprogram point-based update to return the union in hope to reduce complexity. The resulting variation will be called non-LP point-based DP update.

Another way to reduce complexity is to simplify the backup operator (Section 3.1) using the idea behind modified policy iteration (e.g., Puterman 1990). When backing up from a set of vectors

at a belief point, the operator considers all possible actions and picks the one that is optimal according to the

-improving policy. To speed up, one can simply use the action found for the belief point by the previous standard update. The resulting operator will be called the MPI backup operator, where MPI stands for modified policy iteration. If

is the output of the previous standard update, the two actions often are the same. However, they are usually different if

is the result of several point-based updates following the standard update.

Table 5 shows, for each test problem, the number of standard updates and the amount of time that VI1 took when non-LP point-based update was used (together with the standard backup operator). Comparing the statistics with those for point-based update (Tables 1 and 2), we see that the number of standard updates is increased on all test problems and the amount of time is also increased except for the first three problems. Here are the plausible reasons. First, it is clear that non-LP point-based update does not improve a set of vectors as much as point-based update. Consequently, it is less effective in reducing the number of standard updates. Second, although it does not solve linear programs, non-LP point-based update produces extraneous vectors. This means that it might need to deal with a large number of vectors at later iterations and hence might not be as efficient as point-based update after all.

4x3CO Cheese 4x4 Paint Tiger Shuttle Network Aircraft

4 5 8 4 4 7 10 8

2.38 2.38 3.4 .75 .88 44 599 32,281

Table 5: Number of Standard DP Updates and Time That VI1 Took When Non-LP Point-Based Update is Used.

**Table 5:** Number of Standard DP Updates and Time That VI1 Took When Non-LP Point-Based Update is Used.
4x3CO	Cheese	4x4	Paint	Tiger	Shuttle	Network	Aircraft
4	5	8	4	4	7	10	8
2.38	2.38	3.4	.75	.88	44	599	32,281

Extraneous vectors can be pruned. As a matter of fact, we did prune vectors that are pointwise-dominated by others (hence extraneous) in our experiments. This is inexpensive. Pruning of other extraneous vectors, however, requires the solution of linear programs and is expensive. In Zhang et al. (1999), we have discussed how this can be done the most efficient way. Still the results were not as good as those in Table 5. In that paper, we have also explored the combination of non-LP point-based update with the MPI backup operator. Once again, the results were not as good as those in Table 5. The reason is that the MPI backup operator further compromises the quality of point-based update.

The quality of non-LP point-based update can be improved by using the Gauss-Seidel asynchronous update (Denardo 1982). Suppose we are updating a set

. The idea is to, after a vector is created by backup, add a copy of the vector to the set

right away. The hope is to increase the components of later vectors. We have tested this idea when preparing Zhang et al. (1999) and found that the costs almost always exceed the benefits. A reason is that asynchronous update introduces many more extraneous vectors than synchronous update.

In conclusion, point-based is conceptually simple and clean. When compared to its more complex variations, it seems to be the most effective in accelerating value iteration.