next up previous
Next: The Algorithm Up: Point-Based DP Update: The Previous: Backing Up on Witness

Retaining Uniform Improvability

To address convergence issues, we assume that the input to point-based DP update is uniformly improvable and require its output to be also uniformly improvable. We will explain later how the assumption can be facilitated and how the requirement guarantees convergence of the modified value iteration algorithm. In this subsection, we discuss how the requirement can be fulfilled.

Point-based DP update constructs new vectors by backing up on belief points and the new vectors are all members of tex2html_wrap_inline1230 . Hence the output of point-based DP update is trivially dominated by tex2html_wrap_inline1230 . If the output also dominates tex2html_wrap_inline1142 , then it must be uniformly improvable by Lemma 2. The question is how to guarantee that the output dominates tex2html_wrap_inline1142 .

Consider the set tex2html_wrap_inline1144 resulted from backUpWitnessPoints. If it does not dominate tex2html_wrap_inline1142 , then there must exist a belief state b such tex2html_wrap_inline1478 . Consequently, there must exist a vector tex2html_wrap_inline1140 in tex2html_wrap_inline1142 such that tex2html_wrap_inline1484 . This gives us the following subroutine for testing whether tex2html_wrap_inline1144 dominates tex2html_wrap_inline1142 and for, when this is not the case, adding vectors to tex2html_wrap_inline1144 so that it does. The subroutine is called backUpLPPoints because belief points are found by solving linear programs.

 
 		   tex2html_wrap_inline1492 :

1. for each tex2html_wrap_inline1444

2. do {

3. tex2html_wrap_inline1496 .

4. if tex2html_wrap_inline1498 ,

5. tex2html_wrap_inline1500 .

6. tex2html_wrap_inline1502 .

7. tex2html_wrap_inline1452 .

8. } while ( tex2html_wrap_inline1498 ).

The subroutine examines vectors in tex2html_wrap_inline1142 one by one. For each tex2html_wrap_inline1140 in tex2html_wrap_inline1142 , it calls another subroutine dominanceCheck to try to find a belief point b such that tex2html_wrap_inline1484 . If such a point is found, it backs up on it, resulting in a new vector tex2html_wrap_inline1138 (line 5). By the property of the backup operator, b is a witness point of tex2html_wrap_inline1138 w.r.t tex2html_wrap_inline1230 (line 6). There cannot be any vector in tex2html_wrap_inline1144 that equals tex2html_wrap_inline1138 .gif Consequently, the vector is simply added to tex2html_wrap_inline1144 without checking for duplicates (line 7). The process repeats for tex2html_wrap_inline1140 until dominanceCheck returns NULL, that is when there are no belief points b such that tex2html_wrap_inline1484 . When backUpLPPoints terminates, we have tex2html_wrap_inline1560 for any vector tex2html_wrap_inline1140 in tex2html_wrap_inline1142 and any belief point b. Hence tex2html_wrap_inline1144 dominates tex2html_wrap_inline1142 .

The subroutine tex2html_wrap_inline1572 first checks whether there exists a vector tex2html_wrap_inline1138 in tex2html_wrap_inline1144 that pointwise dominates tex2html_wrap_inline1140 , that is tex2html_wrap_inline1580 for all states s. If such an tex2html_wrap_inline1138 exists, it returns NULL right away. Otherwise, it solves the following linear program tex2html_wrap_inline1586 . It returns the solution point b when the optimal value of the objective function is positive and returns NULL otherwise:gif

 
			LP tex2html_wrap_inline1592 :

1. Variables: x, b(s) for each state s

2. Maximize: x.

3. Constraints:

4. tex2html_wrap_inline1602 for all tex2html_wrap_inline1604

5. tex2html_wrap_inline1606 , tex2html_wrap_inline1608 for all states s.


next up previous
Next: The Algorithm Up: Point-Based DP Update: The Previous: Backing Up on Witness

Dr. Lian Wen Zhang
Thu Feb 15 14:47:09 HKT 2001