Dave Touretzky, Nathaniel Daw
We are developing computational theories of operant conditioning and applying them to train mobile robots to execute complex behaviors. This project also aims to elucidate the neuroscience of animal learning by mapping features of the computational model onto their anatomical substrates in the animal brain.
Unlike classical (Pavlovian) conditioning, in which animals learn simple associations between pairs of stimuli, in operant conditioning animals learn to interact with their enviroments in order to produce rewarding outcomes.
An attempt to model phenomena described in the instrumental learning literature will serve the dual purpose of improving the abilities of robot learners while at the same time yielding a fresh, computationally-oriented perspective on animal learning and its neural bases.
While classical conditioning has a well-developed theory, implemented in the Rescorla-Wagner model and its descendants, there is at present no comprehensive theory of operant conditioning. Moreover, mobile robots trained by reinforcement learning methods such as Q-learning have not come close to matching the sophistication and versatility of animal learners. Efforts toward understanding instrumental learning through computer simulations have, to this point, addressed only elementary phenomena such as encouraging or suppressing a single motor action. And while much progress has recently been made in understanding the neural basis of learning from reward, the theory underlying this research is grounded in reinforcement learning, and thus appears too simple to explain real animal behavior.
We have constructed two symbolic-level models of animal learning and tested them on an RWI B21 mobile robot. The first ([3],[4]), a "bottom-up" model, models the process by which animals build up sequences of complex actions out of simple consituents. On the robot, this model was able to learn a variant of the classic Delayed Match to Sample task. The second ([2],[1]), a "top-down" model, addresses the ways in which existing behaviors can be shaped through a process of "behavior editing" to achieve new goals. Using this model, we were able to teach the mobile robot to play "fetch." We have made some preliminary attempts to identify areas of the animal brain associated with various different types of learning that our computational models postulate contribute to conditioing. These studies have primarily concentrated on a group of structures known as the basal ganglia.
We are working on a third, unified theory of conditioning which combines elements from the previous "top-down" and "bottom-up" models along with a new focus on perceptual and predictive learning. Because this model will combine for the first time several different kinds of learning which we postulate contribute to animal conditioning, it should also provide a strong basis for identifying different functional systems within the animal brain.
|  |