next up previous
Next: Acknowledgments Up: Reinforcement Learning: A Survey Previous: Robotics and Control



There are a variety of reinforcement-learning techniques that work effectively on a variety of small problems. But very few of these techniques scale well to larger problems. This is not because researchers have done a bad job of inventing learning techniques, but because it is very difficult to solve arbitrary problems in the general case. In order to solve highly complex problems, we must give up tabula rasa learning techniques and begin to incorporate bias that will give leverage to the learning process.

The necessary bias can come in a variety of forms, including the following:

The technique of shaping is used in training animals [45]; a teacher presents very simple problems to solve first, then gradually exposes the learner to more complex problems. Shaping has been used in supervised-learning systems, and can be used to train hierarchical reinforcement-learning systems from the bottom up [59], and to alleviate problems of delayed reinforcement by decreasing the delay until the problem is well understood [37, 38].

local reinforcement signals:
Whenever possible, agents should be given reinforcement signals that are local. In applications in which it is possible to compute a gradient, rewarding the agent for taking steps up the gradient, rather than just for achieving the final goal, can speed learning significantly [73].

An agent can learn by ``watching'' another agent perform the task [59]. For real robots, this requires perceptual abilities that are not yet available. But another strategy is to have a human supply appropriate motor commands to a robot through a joystick or steering wheel [89].

problem decomposition:
Decomposing a huge learning problem into a collection of smaller ones, and providing useful reinforcement signals for the subproblems is a very powerful technique for biasing learning. Most interesting examples of robotic reinforcement learning employ this technique to some extent [28].

One thing that keeps agents that know nothing from learning anything is that they have a hard time even finding the interesting parts of the space; they wander around at random never getting near the goal, or they are always ``killed'' immediately. These problems can be ameliorated by programming a set of ``reflexes'' that cause the agent to act initially in some way that is reasonable [73, 107]. These reflexes can eventually be overridden by more detailed and accurate learned knowledge, but they at least keep the agent alive and pointed in the right direction while it is trying to learn. Recent work by Millan [78] explores the use of reflexes to make robot learning safer and more efficient.

With appropriate biases, supplied by human programmers or teachers, complex reinforcement-learning problems will eventually be solvable. There is still much work to be done and many interesting questions remaining for learning techniques and especially regarding methods for approximating, decomposing, and incorporating bias into problems.

next up previous
Next: Acknowledgments Up: Reinforcement Learning: A Survey Previous: Robotics and Control

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996