Friday 18 Nov 1994, 1:30pm, WeH 4623 ON REINFORCEMENT LEARNING AND FUNCTION APPROXIMATION Richard S. Sutton Reinforcement learning is a broad class of optimal control methods based on estimating value functions from experience, simulation, or search. Many of these methods, e.g., dynamic programming and temporal-difference learning, build their estimates in part on the basis of other estimates. This is worrisome because, in practice, the estimates may never become exact; on large problems, parameterized function approximators such as neural networks must be used. In these cases there are as yet no strong theoretical guarantees of convergence, and empirical results have been mixed, including both impressive successes, such as Tesauro's championship backgammon player, and disappointing failures, such as Boyan and Moore's recent attempt to apply dynamic programming to simple control problems with continuous state spaces. In this talk, I will present positive empirical results for all the control tasks attempted by Boyan and Moore, and for one that is significantly larger. Moreover, I will present evidence that we cannot get by with simpler "Monte Carlo" (lambda=1) methods; they perform substantially worse on these tasks. I conclude by speculating on what might be required of a function approximator in order to assure good convergence.