In practice, value iteration is much faster per iteration, but policy
iteration takes fewer iterations. Arguments have been put forth to
the effect that each approach is better for large problems.
Puterman's *modified policy iteration* algorithm [91]
provides a method for trading iteration time for iteration improvement
in a smoother way. The basic idea is that the expensive part of
policy iteration is solving for the exact value of . Instead
of finding an exact value for , we can perform a few steps of a
modified value-iteration step where the policy is held fixed over
successive iterations. This can be shown to produce an approximation
to that converges linearly in . In practice, this can
result in substantial speedups.

Several standard numerical-analysis techniques that speed the
convergence of dynamic programming can be used to accelerate value and
policy iteration.
*Multigrid methods* can be used to quickly seed a good initial
approximation to a high resolution value function by initially
performing value iteration at a coarser resolution [93].
*State aggregation* works by collapsing groups of states to a
single meta-state solving the abstracted problem [15].

Wed May 1 13:19:13 EDT 1996