next up previous
Next: Point-Based DP Update: The Up: Speeding Up the Previous: The Use of Point-Based

Uniformly Improvable Value Functions

Suppose V and U are two value functions. We say that U dominates V and write tex2html_wrap_inline1358 if tex2html_wrap_inline1360 for every belief state b. A value function V is said to be uniformly improvable if tex2html_wrap_inline1366 . A set tex2html_wrap_inline1144 of vectors dominates another set tex2html_wrap_inline1142 of vectors if the value function induced by tex2html_wrap_inline1144 dominates that induced by tex2html_wrap_inline1142 . A set of vectors is unformly improvable if the value function it induces is.


This lemma is obvious and is well known in the MDP community (Puterman 1990). Nonetheless, it enables us to explain the intuition behind the term ``uniformly improvable". Suppose V is a uniformly improvable value function and suppose value iteration starts with V. Then the sequence of value functions generated is monotonically increasing and converges to the optimal value function tex2html_wrap_inline1068 . This implies tex2html_wrap_inline1394 . That is, TV(b) is closer to tex2html_wrap_inline1398 than V(b) for all belief states b.

The following lemma will be used later to address the issues listed at the end of the previous section.


Proof: Since tex2html_wrap_inline1358 , we have tex2html_wrap_inline1416 by Lemma 1. We also have the condition tex2html_wrap_inline1418 . Consequently, tex2html_wrap_inline1420 . That is, U is uniformly improvable. tex2html_wrap_inline1424


Dr. Lian Wen Zhang
Thu Feb 15 14:47:09 HKT 2001