Skinnerbots

Dave Touretzky, Nathaniel Daw

Problem:

We are developing computational theories of operant conditioning and applying them to train mobile robots to execute complex behaviors. This project also aims to elucidate the neuroscience of animal learning by mapping features of the computational model onto their anatomical substrates in the animal brain.

Unlike classical (Pavlovian) conditioning, in which animals learn simple associations between pairs of stimuli, in operant conditioning animals learn to interact with their enviroments in order to produce rewarding outcomes.

Impact:

An attempt to model phenomena described in the instrumental learning literature will serve the dual purpose of improving the abilities of robot learners while at the same time yielding a fresh, computationally-oriented perspective on animal learning and its neural bases.

State of the Art:

While classical conditioning has a well-developed theory, implemented in the Rescorla-Wagner model and its descendants, there is at present no comprehensive theory of operant conditioning. Moreover, mobile robots trained by reinforcement learning methods such as Q-learning have not come close to matching the sophistication and versatility of animal learners. Efforts toward understanding instrumental learning through computer simulations have, to this point, addressed only elementary phenomena such as encouraging or suppressing a single motor action. And while much progress has recently been made in understanding the neural basis of learning from reward, the theory underlying this research is grounded in reinforcement learning, and thus appears too simple to explain real animal behavior.

Approach:

We have constructed two symbolic-level models of animal learning and tested them on an RWI B21 mobile robot. The first ([3],[4]), a "bottom-up" model, models the process by which animals build up sequences of complex actions out of simple consituents. On the robot, this model was able to learn a variant of the classic Delayed Match to Sample task. The second ([2],[1]), a "top-down" model, addresses the ways in which existing behaviors can be shaped through a process of "behavior editing" to achieve new goals. Using this model, we were able to teach the mobile robot to play "fetch." We have made some preliminary attempts to identify areas of the animal brain associated with various different types of learning that our computational models postulate contribute to conditioing. These studies have primarily concentrated on a group of structures known as the basal ganglia.

Future Work:

We are working on a third, unified theory of conditioning which combines elements from the previous "top-down" and "bottom-up" models along with a new focus on perceptual and predictive learning. Because this model will combine for the first time several different kinds of learning which we postulate contribute to animal conditioning, it should also provide a strong basis for identifying different functional systems within the animal brain.

**Figure 1:** Robot maintainer Greg Armstrong sends a reward signal to Amelia using the wireless mouse in his right hand.
$\begin{figure} \begin{center} \epsfig{file=greg-train.eps,width=3.5in}\end{center} \parskip +0pt\rule{\textwidth}{.2mm} \end{figure}$

Bibliography

1: Lisa M. Saksida, Scott M. Raymond, and David S. Touretzky.
Shaping robot behavior using principles from instrumental conditioning.
Robotics and Autonomous Systems, 22(3/4):231-249, 1998.
2: Lisa M. Saksida and David S. Touretzky.
Application of a model of instrumental conditioning to mobile robot control.
In Paul S. Schenker and Gerard T. McKee, editors, Sensor Fusion and Decentralized Control in Autonomous Robotic Systems, volume 3209, pages 55-66. SPIE, 1997.
3: David S. Touretzky and Lisa M. Saksida.
Skinnerbots.
In P. Maes, M. Mataric, J. A. Meyer, J. Pollack, and S. W. Wilson, editors, Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, pages 285-294, Cambridge, MA, 1996. MIT Press.
4: David S. Touretzky and Lisa M. Saksida.
Operant conditioning in skinnerbots.
Adaptive Behavior, 5(3/4):219-247, 1997.

About this document...

This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998).
The translation was performed on 1999-02-20.