Center for the Neural Basis of Cognition
Carnegie Mellon University and University of Pittsburgh
NSF LIS 9720350
It is known that many parts of the mammalian brain contribute to incremental learning. Although the cortex may play a central role, other brain areas known to make vital contributions include the basal ganglia, hippocampus, amygdala, and cerebellum.
It is important to understand the roles of these various areas and their interactions with each other during learning. The development of theories of incremental learning will provide a better understanding of how the process occurs and may result in improved approaches to the development of skills in both humans and machines.
| David S. Touretzky | PI | CMU | Computer Science |
| Julie A. Fiez | co-PI | Pitt | Psychology |
| Tai Sing Lee | co-PI | CMU | Computer Science |
| James McClelland | co-PI | CMU | Psychology (and CNBC co-director) |
| William Skaggs | co-PI | Pitt | Neuroscience |
| Nathaniel D. Daw | grad | CMU | Computer Science |
| Adam G. Thomas | grad | Pitt | Neuroscience |
| Cyrus McCandless | grad | Pitt | Neuroscience |
| Mauricio Delgado | grad | Pitt | Neuroscience |
| Lisa M. Saksida | grad | CMU | Robotics |
| Stella X. Yu | grad | CMU | Robotics |
| Greg Armstrong | technician | CMU | Robotics |
The temporal difference (TD) learning rule of Sutton and Barto has shown some success in explaining data on dopamine neuron responses in the substantia nigra and ventral tegmental area of primates. This model predicts cumulative discounted reward using exponential discounting.
However, behavioral studies with varying reward schedules suggest that animals seek to maximize rate of reward rather than exponentially discounted future reward.
Tsitsiklis (1997) recently proposed a rate maximization version of TD.
Adopting this rate maximization rule, Daw and Touretzky developed a simulation of animal learning as an alternative to the exponential discounting model.
![]() |
Results for rate maximization are as good as the original TD model at matching dopamine neuron response data:
This graph shows the dopamine error signal calculated using the rate maximization equation above. It is virtually indistinguishable from the graph in Montague, Dayan, and Sejnowski (1995), J. Neurosci. 16(5);1936-1947, where exponential discounting was used.
![]() |
![]() |
The goal is to detect changes in striatal activity as the rat learns a lever press task.

Pictured above is the Neuralynx Cheetah parallel recording setup in the Skaggs lab. This system can record simultaneously from 12 tetrodes. The purple boxes at bottom are amplifiers; the plugboard is used to configure the EEG and tetrode channels for data collection. At right is a Sun SparcStation for recording the data and analyzing it offline.

GPe: globus pallidus external segment; GPi: globus pallidus internal segment; PFC: prefrontal cortex; SNpc: substantia nigra pars compacta (has dopamine cells); SNpr: substantia nigra pars reticulata; STN: subthalamic nucleus; VP: ventral pallidum; VTA: ventral tegmental area (has dopamine cells).

Nine subjects were run on a 1.5-T GE Signa whole-body scanner, using a 2-interleave spiral pulse sequence. TR = 1500 msec. TE = 34 msec.
Results:




See the other CNBC LIS project for discussion of experimental work by McClelland and colleagues on language remediation (L/R discrimination in native Japanese speakers), based on this theory of cortical plasticity.
The theory states that speakers may fail to discriminate between two similar stimuli, despite repeated exposure, because the "feature map" is trapped in a local minimum in weight space that classifies the `stimuli identically.
Exposure to versions of the stimuli that have been artificially pulled further apart, emphasizing the differences between them (a technique successfully used by Merzenich, Tallal, and colleagues in humans and primates) should get the system out of the local minimum. More information is available here.
Complementing the experimental work, McClelland and Thomas are seeking as part of this LIS project to develop more biologically realistic versions of their model of cortical learning. This will facilitate comparison of the model with physiological experiments on learning in visual cortex being done by Tai Sing Lee in his portio of this LIS project.

![]() |
Examples of inputs generated from the various prototypes used in each simulation. The prototypes from which each input was generated include: (A) four corner prototypes (B) the two overlapping prototypes (C) the two exaggerated prototypes, and (D) a single center prototype. The two stimuli in (B) are initially confused by the model, but training on (C) allows the model to discriminate (B). Cases (E) through (G) are degraded versions of (A) through (C), using a greater spread for the gaussian.
|

In one experiment, monkeys are being trained to detect a pop-out stimulus in a field of homogeneous stimuli. The pop-out detection performance of the monkey is a function of the orientation difference between the pop-out stimulus and the distractor stimuli.
(a) No difference (b) 10o pop-out(c) 45o pop-out
From previous experiments, Lee has found that the neural correlates of the pop-out signal emerge at the later part of V1 neurons' responses. The following graph shows the population response in V1, comparing a homogeneous field such as (a) above with an obvious pop-out stimulus (c). The pop-out response emerges around 80 msec after stimulus onset. The case marked "single" denotes presentation of a single stimulus element instead of a stimulus in an array of distractors. Responses for stimulus arrays are lower than for single stimuli, presumably due to lateral inhibition in cortex.
The monkey is trained to detect the pop-out target at one location, while the other location is left untrained. The orientation difference between the pop-out stimulus and the distractor stimuli is decreased over time, as the monkey grows more proficient.
The hypothesis is that shaping the performance of the monkey this way will induce a change in the later part of the responses of the neurons at the trained location, but not the untrained location. For example, if there is no differential response between cases (a) and (b) initially, with training, we should be able to see a difference in the later part of the V1 neurons' responses.
Lee is exploring two methods of recording for this task. One is the traditional single electrode, single-unit recording in which he records from a large number of cells before, during, and after training. However, the cells recorded in each session will be different, so one must look for changes in the population response.
The other recording method is parallel multi-unit recording using a microelectrode array.
This method potentially will enable them to monitor miltiple cells in parallel over a long period of time.
The ability to monitor the evolution of basic properties of the cells as the training progresses should provide deeper insights into the biological mechanisms underlying perceptual shaping, an important kind of incremental learning.

Dave Touretzky Last modified: Sat Dec 15 02:42:33 EST 2001