next up previous
Next: Certainty Equivalent Methods Up: Reinforcement Learning: A Survey Previous: Model-free Learning With Average

Computing Optimal Policies by Learning Models

 

The previous section showed how it is possible to learn an optimal policy without knowing the models T(s,a,s') or R(s,a) and without even learning those models en route. Although many of these methods are guaranteed to find optimal policies eventually and use very little computation time per experience, they make extremely inefficient use of the data they gather and therefore often require a great deal of experience to achieve good performance. In this section we still begin by assuming that we don't know the models in advance, but we examine algorithms that do operate by learning these models. These algorithms are especially important in applications in which computation is considered to be cheap and real-world experience costly.





Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996