CALL FOR PARTICIPATION REINFORCEMENT LEARNING: TO MODEL OR NOT TO MODEL, THAT IS THE QUESTION Workshop at the Fourteenth International Conference on Machine Learning (ICML-97) Vanderbilt University, Nashville, TN July 12, 1997 www.cs.cmu.edu/~ggordon/ml97ws Recently there has been some disagreement in the reinforcement learning community about whether finding a good control policy is helped or hindered by learning a model of the system to be controlled. Recent reinforcement learning successes (Tesauro's TD-gammon, Crites' elevator control, Zhang and Dietterich's space-shuttle scheduling) have all been in domains where a human-specified model of the target system was known in advance, and have all made substantial use of the model. On the other hand, there have been real robot systems which learned tasks either by model-free methods or via learned models. The debate has been exacerbated by the lack of fully-satisfactory algorithms on either side for comparison. Topics for discussion include (but are not limited to) o Case studies in which a learned model either contributed to or detracted from the solution of a control problem. In particular, does one method have better data efficiency? Time efficiency? Space requirements? Final control performance? Scaling behavior? o Computational techniques for finding a good policy, given a model from a particular class -- that is, what are good planning algorithms for each class of models? o Approximation results of the form: if the real system is in class A, and we approximate it by a model from class B, we are guaranteed to get "good" results as long as we have "sufficient" data. o Equivalences between techniques of the two sorts: for example, if we learn a policy of type A by direct method B, it is equivalent to learning a model of type C and computing its optimal controller. o How to take advantage of uncertainty estimates in a learned model. o Direct algorithms combine their knowledge of the dynamics and the goals into a single object, the policy. Thus, they may have more difficulty than indirect methods if the goals change (the "lifelong learning" question). Is this an essential difficulty? o Does the need for an online or incremental algorithm interact with the choice of direct or indirect methods? Preliminary schedule of talks: 9:00- 9:30 Chris Atkeson "Introduction" 9:30-10:15 Jeff Schneider "Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning" 10:15-10:45 Discussion break 10:45-11:15 David Andre, Nir Friedman, and Ronald Parr "Generalized Prioritized Sweeping" 11:15-12:00 Scott Davies, Andrew Y. Ng, and Andrew Moore "Applying Model-Based Search to Reinforcement Learning" 12:00- 1:00 LUNCH BREAK 1:00- 1:45 Rich Sutton "Multi-Time Models: A Unified View of Modeling and Not Modeling" 1:45- 2:15 Doina Precup and Rich Sutton "Multi-time Models for Reinforcement Learning" 2:15- 2:45 Howell, Frost, Gordon, and Wu "Real-Time Learning of Vehicle Suspension Control Laws" 2:45- 3:15 Discussion break 3:15- 3:45 Leonid Kuvayev and Rich Sutton "Approximation in Model-Based Learning" 3:45-4:15 Geoff Gordon "Wrap-up" 4:15- 5:00 Discussion Organizers: Chris Atkeson (cga@cc.gatech.edu) College of Computing Georgia Institute of Technology 801 Atlantic Drive Atlanta, GA 30332-0280 Geoff Gordon (ggordon@cs.cmu.edu) Computer Science Department Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 15213-3891 (412) 268-3613, (412) 361-2893 Contact: Geoff Gordon (ggordon@cs.cmu.edu)