A value function *V*
is *represented*
by a set of vectors if it equals the value function induced
by the set.
When a value function is representable by a finite set of
vectors, there is a unique parsimonious set of vectors that
represents the function (Littman *et al.* 1995a).

Sondik (1971) has shown that
if a value function *V* is representable by a finite
set of vectors, then so is the value function
*TV*.
The process of obtaining the parsimonious representation
for *TV* from the parsimonious representation of *V* is usually
referred to as
*dynamic programming (DP) update*.
Let be the parsimonious set of vectors that represents *V*.
For convenience, we use to denote the
parsimonious set of vectors that represents *TV*.

In practice, value iteration for POMDPs is not carried out directly in terms of value functions themselves. Rather, it is carried out in terms of sets of vectors that represent the value functions (Figure 2). One begins with an initial set of vectors . At each iteration, one performs a DP update on the previous parsimonious set of vectors and obtains a new parsimonious set of vectors . One continues until the Bellman residual , which is determined by solving a sequence of linear programs, falls below a threshold.

**Figure 2:** Value Iteration for POMDPs.

Thu Feb 15 14:47:09 HKT 2001