Workshop Call for Papers

Twelfth International Conference on Machine Learning

Value Function Approximation in Reinforcement Learning

July 9, 1995

Granlibakken Resort, Tahoe City, California, U.S.A.

This workshop will explore the issues that arise in reinforcement learning when the value function cannot be learned exactly, but must be approximated. It has long been recognized that approximation is essential on large, real-world problems because the state space is too large to permit table-lookup approaches. In addition, we need to generalize from past experiences to future ones, which inevitably involves making approximations. In principle, all methods for learning from examples are relevant here, but in practice only a few have been tried, and fewer still have been effective. The objective of this workshop is to bring together all the strands of reinforcement learning research that bear directly on the issue of value function approximation in reinforcement learning. We hope to survey what works and what doesn't, and achieve a better understanding of what makes value function approximation special as learning from examples problem.

Overview

Workshop Format

Who Should Attend

Submission Information

Important Dates

Organizers

Contact Address

Overview

The key computational idea underlying reinforcement learning is the iterative approximation of the value function---the mapping from states (or state-action pairs) to an estimate of the long-term future reward obtainable from that state. For large problems, the approximation of the value function must involve generalization from examples to reduce memory requirements and training time. Moreover, generalization may help beat the curse of dimensionality for problems in which the complexity of the value function increases sub-exponentially with the number of state variables. Generalizing function approximators have been used effectively in reinforcement learning as far back as Samuel's checker player, which used a linear approximator, and Michie and Chambers' BOXES system, which used state aggregation. Tesauro's TD-Gammon, which used a backpropagation network, provides a tantalizing recent demonstration of just how effective this can be.

However, almost all of the theory of reinforcement learning is dependent on the assumption of tabular representations of the value function, for which generalization is impossible. Moreover, several researchers have argued that there are serious hazards intrinsic to the approximation of value functions by reinforcement learning methods. It is an important and still unclear question as to just how serious these hazards might be. In this workshop we will survey the substantial recent and ongoing work pertaining to this question. We will seek to answer it or, failing that, to identify the remaining outstanding issues.

If existing theory of reinforcement learning no longer applies when an approximate value function is learned, what are we to do? The workshop will explore the problem and following possible responses, among others:

ignore the problem and proceed empirically, perhaps discovering as we create applications that some seem to work and some don't. A danger with this approach is that successes will be reported loudly and failures whispered quietly, and we may become a community of tweakers.
analyze what properties function approximators need in order to work well with current reinforcement learning algorithms: are there function approximators specially suited to reinforcement learning?
invent new reinforcement learning methods that are specifically designed to work well with function approximators.
invent new theory for value function approximation. What bounds can be placed on the errors? Can particular function approximators give us better bounds or stability guarantees? Can online training get us better bounds or stability guarantees?
What can we learn from the literature of other fields:
- optimal control theory and LQG regulation
- heuristic function generation techniques in computer game playing
- operations research and differential games

Workshop Format

Several focussed sessions with short (~15 minute) talks followed by moderated discussion.

Who Should Attend

All researchers with empirical or theoretical experience with value function approximation in reinforcement learning or dynamic programming. As an expression of interest, please submit as soon as possible a short (one paragraph to one page) statement of your research interests in value function approximation, or a copy of a paper you have written in this area. We will use this material to help select and organize the sessions.

If you would like to make a presentation at the workshop, please also submit an extended abstract (2-5 pages) by May 1, 1995.

Submission Information

Electronic submissions (ASCII or postscript) are preferred. E-mail or post submissions should be sent to the contact address below by May 1, 1995.

Important Dates

Submission of statements of interest: ASAP
Submission of extended abstracts/papers: May 1, 1995
Acceptance notification: May 19
Submission of final camera-ready papers: June 9
Workshop: July 9, 1995

Organizers

Andrew Moore (chair) - Carnegie Mellon University
Rich Sutton - Stow Research
Justin Boyan - Carnegie Mellon University

Program Committee

John Tsitsiklis - MIT, Lab for Information and Decision Sciences
Satinder Singh - MIT, Brain and Cognitive Sciences
Leemon Baird - Wright Patterson Air Force Base

Contact Address

Andrew W. Moore
Smith Hall 221
Robotics Institute
Carnegie Mellon University
Pittsburgh, PA 15213

awm@cs.cmu.edu

http://www.cs.cmu.edu:8001/Web/Groups/reinforcement/ml95/

Back to VFA Workshop home

HTML by Jeffrey C. Schlimmer, schlimme@eecs.wsu.edu, 1 March 1995