CALL FOR PARTICIPATION

                 REINFORCEMENT LEARNING:  TO MODEL OR 
                  NOT TO MODEL, THAT IS THE QUESTION

                      Workshop at the Fourteenth 
                  International Conference on Machine 
                          Learning (ICML-97)

                 Vanderbilt University, Nashville, TN
                            July 12, 1997

                    www.cs.cmu.edu/~ggordon/ml97ws

Recently there has been some disagreement in the reinforcement 
learning community about whether finding a good control policy 
is helped or hindered by learning a model of the system to be 
controlled.  Recent reinforcement learning successes 
(Tesauro's TD-gammon, Crites' elevator control, Zhang and 
Dietterich's space-shuttle scheduling) have all been in 
domains where a human-specified model of the target system was 
known in advance, and have all made substantial use of the 
model.  On the other hand, there have been real robot systems 
which learned tasks either by model-free methods or via 
learned models.  The debate has been exacerbated by the lack 
of fully-satisfactory algorithms on either side for 
comparison.

Topics for discussion include (but are not limited to)

  o Case studies in which a learned model either contributed to 
    or detracted from the solution of a control problem.  In 
    particular, does one method have better data efficiency?  
    Time efficiency?  Space requirements?  Final control
    performance?  Scaling behavior?
  o Computational techniques for finding a good policy, given a 
    model from a particular class -- that is, what are good 
    planning algorithms for each class of models?
  o Approximation results of the form: if the real system is in 
    class A, and we approximate it by a model from class B, we 
    are guaranteed to get "good" results as long as we have 
    "sufficient" data.  
  o Equivalences between techniques of the two sorts: for 
    example, if we learn a policy of type A by direct method B, 
    it is equivalent to learning a model of type C and computing 
    its optimal controller.
  o How to take advantage of uncertainty estimates in a learned 
    model.
  o Direct algorithms combine their knowledge of the dynamics and 
    the goals into a single object, the policy. Thus, they may 
    have more difficulty than indirect methods if the goals change 
    (the "lifelong learning" question). Is this an essential 
    difficulty?
  o Does the need for an online or incremental algorithm interact 
    with the choice of direct or indirect methods?

Preliminary schedule of talks:

     9:00- 9:30 Chris Atkeson
       "Introduction"
     9:30-10:15 Jeff Schneider
       "Exploiting Model Uncertainty Estimates for Safe Dynamic 
        Control Learning"
    
    10:15-10:45 Discussion break
    
    10:45-11:15 David Andre, Nir Friedman, and Ronald Parr
       "Generalized Prioritized Sweeping"
    11:15-12:00 Scott Davies, Andrew Y. Ng, and Andrew Moore
       "Applying Model-Based Search to Reinforcement Learning"           
    
    12:00- 1:00 LUNCH BREAK
    
     1:00- 1:45 Rich Sutton
       "Multi-Time Models: A Unified View of Modeling and
        Not Modeling"
     1:45- 2:15 Doina Precup and Rich Sutton
       "Multi-time Models for Reinforcement Learning"
     2:15- 2:45 Howell, Frost, Gordon, and Wu
       "Real-Time Learning of Vehicle Suspension Control Laws"
    
     2:45- 3:15 Discussion break
     
     3:15- 3:45 Leonid Kuvayev and Rich Sutton
       "Approximation in Model-Based Learning"
     3:45-4:15 Geoff Gordon
       "Wrap-up"
    
     4:15- 5:00 Discussion


Organizers:
  Chris Atkeson (cga@cc.gatech.edu)
  College of Computing
  Georgia Institute of Technology
  801 Atlantic Drive
  Atlanta, GA 30332-0280
  
  Geoff Gordon (ggordon@cs.cmu.edu)
  Computer Science Department
  Carnegie Mellon University
  5000 Forbes Ave
  Pittsburgh, PA 15213-3891
  (412) 268-3613, (412) 361-2893
  
  
Contact:
  Geoff Gordon (ggordon@cs.cmu.edu)