Next: Certainty Equivalent Methods Up: Reinforcement Learning: A Survey Previous: Model-free Learning With Average

Computing Optimal Policies by Learning Models

The previous section showed how it is possible to learn an optimal policy without knowing the models T(s,a,s') or R(s,a) and without even learning those models en route. Although many of these methods are guaranteed to find optimal policies eventually and use very little computation time per experience, they make extremely inefficient use of the data they gather and therefore often require a great deal of experience to achieve good performance. In this section we still begin by assuming that we don't know the models in advance, but we examine algorithms that do operate by learning these models. These algorithms are especially important in applications in which computation is considered to be cheap and real-world experience costly.

Certainty Equivalent Methods
Dyna
Prioritized Sweeping / Queue-Dyna
Other Model-Based Methods

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996