Friday 18 Nov 1994, 1:30pm, WeH 4623
ON REINFORCEMENT LEARNING AND FUNCTION APPROXIMATION
Richard S. Sutton
Reinforcement learning is a broad class of optimal control methods
based on estimating value functions from experience, simulation, or
search. Many of these methods, e.g., dynamic programming and
temporal-difference learning, build their estimates in part on the
basis of other estimates. This is worrisome because, in practice, the
estimates may never become exact; on large problems, parameterized
function approximators such as neural networks must be used. In these
cases there are as yet no strong theoretical guarantees of
convergence, and empirical results have been mixed, including both
impressive successes, such as Tesauro's championship backgammon
player, and disappointing failures, such as Boyan and Moore's recent
attempt to apply dynamic programming to simple control problems with
continuous state spaces. In this talk, I will present positive
empirical results for all the control tasks attempted by Boyan and
Moore, and for one that is significantly larger. Moreover, I will
present evidence that we cannot get by with simpler "Monte Carlo"
(lambda=1) methods; they perform substantially worse on these tasks.
I conclude by speculating on what might be required of a function
approximator in order to assure good convergence.