Consider the do-while loop of
`POINT-BASED-VI` (Figure 2).
Starting from an initial set of vectors, it generates a
sequence of sets. If the initial set is uniformly improvable,
then the value functions represented by the sets are
monotonically increasing and are upper bounded by the optimal
value function. As such, they converge to a value function (which
is not necessarily the optimal value function). The question is
when to stop the do-while loop.

A straightforward method would be to compute the distance between two consecutive sets and and stop when the distance falls below a threshold. To compute the distance, one needs to solve linear programs, which is time consuming. We use a metric that is less expensive to compute. To be more specific, we stop the do-while loop when

In words, we calculate the maximum difference between and at the witness points of vectors in and stop the do-while loop when this quantity is no larger than . Here is the threshold on the Bellman residual for terminating value iteration and is a number between 0 and 1. In our experiments, we set it at 0.1.

Thu Feb 15 14:47:09 HKT 2001