next up previous
Next: Generalization Up: Computing Optimal Policies by Previous: Prioritized Sweeping / Queue-Dyna

Other Model-Based Methods

Methods proposed for solving MDPs given a model can be used in the context of model-based methods as well.

RTDP (real-time dynamic programming) [8] is another model-based method that uses Q-learning to concentrate computational effort on the areas of the state-space that the agent is most likely to occupy. It is specific to problems in which the agent is trying to achieve a particular goal state and the reward everywhere else is 0. By taking into account the start state, it can find a short path from the start to the goal, without necessarily visiting the rest of the state space.

The Plexus planning system [33, 55] exploits a similar intuition. It starts by making an approximate version of the MDP which is much smaller than the original one. The approximate MDP contains a set of states, called the envelope, that includes the agent's current state and the goal state, if there is one. States that are not in the envelope are summarized by a single ``out'' state. The planning process is an alternation between finding an optimal policy on the approximate MDP and adding useful states to the envelope. Action may take place in parallel with planning, in which case irrelevant states are also pruned out of the envelope.

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996