next up previous
Next: Computational Complexity Up: Finding a Policy Given Previous: Policy Iteration

Enhancement to Value Iteration and Policy Iteration

In practice, value iteration is much faster per iteration, but policy iteration takes fewer iterations. Arguments have been put forth to the effect that each approach is better for large problems. Puterman's modified policy iteration algorithm [91] provides a method for trading iteration time for iteration improvement in a smoother way. The basic idea is that the expensive part of policy iteration is solving for the exact value of tex2html_wrap_inline2010 . Instead of finding an exact value for tex2html_wrap_inline2010 , we can perform a few steps of a modified value-iteration step where the policy is held fixed over successive iterations. This can be shown to produce an approximation to tex2html_wrap_inline2010 that converges linearly in tex2html_wrap_inline1776 . In practice, this can result in substantial speedups.

Several standard numerical-analysis techniques that speed the convergence of dynamic programming can be used to accelerate value and policy iteration. Multigrid methods can be used to quickly seed a good initial approximation to a high resolution value function by initially performing value iteration at a coarser resolution [93]. State aggregation works by collapsing groups of states to a single meta-state solving the abstracted problem [15].

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996