Let and be sets of vectors respectively generated
by VI (Figure 1) and VI1 (Figure 2) at line 3 in iteration *n*.
Suppose the initial set is uniformly improvable.
Using Lemma 2 and Corollary 1,
one can prove by induction that and are uniformly
improvable for all *n* and their induced value functions
increase with *n*.
Moreover, dominates and
is dominated by the optimal value function. It is well known that
converges to the optimal value function. Therefore,
must also converge to the optimal value function.

The question now is how to make sure that the initial set is uniformly improvable. The following lemma answers this question.

**Proof**: Use *V* to denote the value function induced
by the singleton set.
For any belief state *b*, we have

Therefore the value function, and hence the singleton set, is uniformly improvable.

Experiments (Section 6) have shown that VI1 is more efficient VI on a number of test problems. It should be noted, however, that we have not proved this is always the case. Moreover, complexity results by Papadimitriou and Tsitsiklis (1987) implies that the task of finding -optimal policies for POMDPs is PSPACE-complete. Hence, the worst-case complexity should remain the same.

Thu Feb 15 14:47:09 HKT 2001