Point-Based DP Update and Standard DP Update

Next: Point-Based Value Iteration and Up: Related Work Previous: Related Work

Point-Based DP Update and Standard DP Update

As mentioned in Section 3.1, point-based update is closely related to several exact algorithms for standard update, namely one-pass (Sondik 1971), linear-support (Cheng 1988), and relaxed-region (Cheng 1988). They all backup on a finite number of belief points. The difference is that these exact algorithms generate the points systematically, which is expensive, while point-based update generate the points heuristically.

There are several other exact algorithms for standard DP update. The enumeration/reduction algorithms (Monahan 1982, Eagle 1984) and incremental pruning (Zhang and Liu 1997, Cassandra et al. 1997) first generate a set of vectors that are not parsimonious and then prune extraneous vectors by solving linear programs. Point-based DP update never generates extraneous vectors. It might generate duplicate vectors. However, duplicates are pruned without solving linear programs. The witness algorithm (Kaelbling et al. 1998) has two stages. In the first stage, it considers actions one by one. For each action, it constructs a set of vectors based on a finite number of systematically generated belief points using an operator similar to the backup operator. In the second stage, vectors for different actions are pooled together and extraneous vectors are pruned.

There are proposals to carry out standard update approximately by dropping vectors that are marginally useful (e.g., Kaelbling et al. 1998, Hansen 1998). Here is one idea along this line that we have empirically evaluated. Recall that to achieve -optimality, the stopping threshold for the Bellman residual should be . Our idea is to drop marginally useful vectors at various stages of standard update while keeping the overall error under and to stop when the Bellman residual falls below . It is easy to see that -optimality is still guaranteed this way. We have also tried to start with a large error tolerance in hope to prune more vectors and gradually decrease the tolerance level to . Reasonable improvements have been observed especially when one does not need quality of policy to be high. However such approximate updates are much more expensive than point-based updates. In the context of the modified value iteration algorithm, they are more suitable alternatives to standard updates than point-based update.

Next: Point-Based Value Iteration and Up: Related Work Previous: Related Work

Dr. Lian Wen Zhang
Thu Feb 15 14:47:09 HKT 2001