Next: Point-Based DP Update Up: Point-Based DP Update: The Previous: Point-Based DP Update: The

## The Backup Operator

Let be a set of vectors and b be a belief state. The backup operator constructs a new vector in three steps:

1. For each action a and each observation z, find the vector in that has maximum inner product with -- the belief state for the case when z is observed after executing action a in belief state b. If there are more than one such vector, break ties lexicographically (Littman 1996). Denote the vector found by .
2. For each action a, construct a vector by:

3. Find the vector, among the 's, that has maximum inner product with b. If there are more than one such vector, break ties lexicographically. Denote the vector found by .

It has been shown (Smallwood and Sondik 1973, Littman 1996) that is a member of -- the set of vectors obtained by performing DP update on . Moreover, b is a witness point for .

The above fact is the corner stone of several DP update algorithms. The one-pass algorithm (Sondik 1971), the linear-support algorithm (Cheng 1988), and the relaxed-region algorithm (Cheng 1988) operate in the following way: They first systematically search for witness points of vectors in and then obtain the vectors using the backup operator. The witness algorithm (Kaelbling et al. 1998) employs a similar idea.

Dr. Lian Wen Zhang
Thu Feb 15 14:47:09 HKT 2001