Let and be sets of vectors respectively generated by VI (Figure 1) and VI1 (Figure 2) at line 3 in iteration n. Suppose the initial set is uniformly improvable. Using Lemma 2 and Corollary 1, one can prove by induction that and are uniformly improvable for all n and their induced value functions increase with n. Moreover, dominates and is dominated by the optimal value function. It is well known that converges to the optimal value function. Therefore, must also converge to the optimal value function.
The question now is how to make sure that the initial set is uniformly improvable. The following lemma answers this question.
Proof: Use V to denote the value function induced by the singleton set. For any belief state b, we have
Therefore the value function, and hence the singleton set, is uniformly improvable.
Experiments (Section 6) have shown that VI1 is more efficient VI on a number of test problems. It should be noted, however, that we have not proved this is always the case. Moreover, complexity results by Papadimitriou and Tsitsiklis (1987) implies that the task of finding -optimal policies for POMDPs is PSPACE-complete. Hence, the worst-case complexity should remain the same.