Bounded Real-Time Dynamic Programming: RTDP with monotone upper bounds and performance guarantees
H. Brendan McMahan, Maxim Likhachev, and Geoff Gordon
Carnegie Mellon University
Abstract
MDPs are an attractive formalization for
planning, but realistic problems often have
intractably large state spaces. When we only
need a partial policy to get from a fixed start
state to a goal, restricting computation to
states relevant to this task can make much
larger problems tractable. We introduce
a new algorithm, Bounded RTDP, which
can produce partial policies with strong performance
guarantees while only touching a
fraction of the state space, even on problems
where other algorithms would have
to visit the full state space. To do so,
Bounded RTDP maintains both upper and
lower bounds on the optimal value function.
The performance of Bounded RTDP
is greatly aided by the introduction of a
new technique to efficiently find suitable upper
bounds; this technique can also be used
to provide informed initialization to a wide
range of other planning algorithms.