The Biological Basis of Incremental Learning

David S. Touretzky
Julie A. Fiez
Tai Sing Lee
James L. McClelland
William Skaggs

Center for the Neural Basis of Cognition
Carnegie Mellon University and University of Pittsburgh

NSF LIS 9720350


Project Description

This project is concerned with learning that is incremental in nature, resulting in skills that improve over time. Such learning is responsible for the development of perceptual discriminations, stimulus associations, and motor skills, and is seen in both humans and animals. The goal of the research is a systems-level neural theory of incremental learning. The project combines computer simulations with neurophysiological recording from the brains of behaving rats and monkeys, functional magnetic resonance imaging in humans, and robotic implementation.

It is known that many parts of the mammalian brain contribute to incremental learning. Although the cortex may play a central role, other brain areas known to make vital contributions include the basal ganglia, hippocampus, amygdala, and cerebellum.

It is important to understand the roles of these various areas and their interactions with each other during learning. The development of theories of incremental learning will provide a better understanding of how the process occurs and may result in improved approaches to the development of skills in both humans and machines.


Personnel

David S. Touretzky    PICMU    Computer Science
Julie A. Fiezco-PIPittPsychology
Tai Sing Leeco-PICMUComputer Science
James McClellandco-PICMUPsychology (and CNBC co-director)
William Skaggsco-PIPittNeuroscience
 
Nathaniel D. DawgradCMUComputer Science
Adam G. ThomasgradPittNeuroscience
Cyrus McCandlessgradPittNeuroscience
Mauricio DelgadogradPittNeuroscience
Lisa M. SaksidagradCMURobotics
Stella X. YugradCMURobotics
 
Greg ArmstrongtechnicianCMURobotics

Studying Multiple Brain Areas Involved in Reward


Temporal Difference Learning and Reward

Dopamine is thought to be the brain's reward signal. More accurately, dopamine signals reward prediction error.

Figure from Schultz et al. (1997) showing (a) dopamine cell firing in response to an unpredicted reward, (b) dopamine firing in response to a conditioned stimulus that predicts reward, with no firing to the primary reward, and (c) depression of dopamine activity at the time when a a predicted reward failed to arrive.

The temporal difference (TD) learning rule of Sutton and Barto has shown some success in explaining data on dopamine neuron responses in the substantia nigra and ventral tegmental area of primates. This model predicts cumulative discounted reward using exponential discounting.

However, behavioral studies with varying reward schedules suggest that animals seek to maximize rate of reward rather than exponentially discounted future reward.

Tsitsiklis (1997) recently proposed a rate maximization version of TD.

Adopting this rate maximization rule, Daw and Touretzky developed a simulation of animal learning as an alternative to the exponential discounting model.

Results for rate maximization are as good as the original TD model at matching dopamine neuron response data:

This graph shows the dopamine error signal calculated using the rate maximization equation above. It is virtually indistinguishable from the graph in Montague, Dayan, and Sejnowski (1995), J. Neurosci. 16(5);1936-1947, where exponential discounting was used.

Novel Prediction

A novel prediction of our rate maximization model is that dopamine cell response to unexpected rewards should be reduced if background reward rate is increased. This could easily be tested experimentally.

Modeling Animal Behavior

The new model using rate maximzation also matches the animal behavioral data on discounting of delayed rewards. The exponential discounting model does not match animal behavior, because animals discount hyperbolically.

Left: data from four pigeons (Mazur 1987) showing indifference point for small immediate reward (2 sec. access to food) vs. larger delayed reward (6 sec. access to food.) Slope > 1 shows that discounting is not exponential. Right: data from our simulation of this task also produces slope > 1.


Recording from Striatum

Bill Skaggs and Cyrus McCandless are recording from cells in the rat striatum. This work complements the TD simulations of Touretzky and Daw, and the brain imaging work of Fiez and Delgado.

The goal is to detect changes in striatal activity as the rat learns a lever press task.

Pictured above is the Neuralynx Cheetah parallel recording setup in the Skaggs lab. This system can record simultaneously from 12 tetrodes. The purple boxes at bottom are amplifiers; the plugboard is used to configure the EEG and tetrode channels for data collection. At right is a Sun SparcStation for recording the data and analyzing it offline.


GPe: globus pallidus external segment; GPi: globus pallidus internal segment; PFC: prefrontal cortex; SNpc: substantia nigra pars compacta (has dopamine cells); SNpr: substantia nigra pars reticulata; STN: subthalamic nucleus; VP: ventral pallidum; VTA: ventral tegmental area (has dopamine cells).


fMRI Study of Reward Signal in Humans

Julie Fiez, Mauricio Delgado, and colleagues are using functional Magnetic Resonance Imaging (fMRI) to locate brain areas that are activated after presentation of reward in a single trial.

Nine subjects were run on a 1.5-T GE Signa whole-body scanner, using a 2-interleave spiral pulse sequence. TR = 1500 msec. TE = 34 msec.

Results:






Modeling Cortical Plasticity

Adam Thomas and James McClelland developed a model of cortical plasticity based on Kohonen's SOFM (Self-Organizing Feature Map) architecture.

See the other CNBC LIS project for discussion of experimental work by McClelland and colleagues on language remediation (L/R discrimination in native Japanese speakers), based on this theory of cortical plasticity.

The theory states that speakers may fail to discriminate between two similar stimuli, despite repeated exposure, because the "feature map" is trapped in a local minimum in weight space that classifies the `stimuli identically.

Exposure to versions of the stimuli that have been artificially pulled further apart, emphasizing the differences between them (a technique successfully used by Merzenich, Tallal, and colleagues in humans and primates) should get the system out of the local minimum. More information is available here.

Complementing the experimental work, McClelland and Thomas are seeking as part of this LIS project to develop more biologically realistic versions of their model of cortical learning. This will facilitate comparison of the model with physiological experiments on learning in visual cortex being done by Tai Sing Lee in his portio of this LIS project.


Kohonen SOFM Network

The two-layer Kohonen network used in the simulations. The color of each unit represents activation values when a pattern is presented. In this example a pattern generated from the first prototype in row B in the panel below is presented, and the center unit in the representation layer is the ``winner''.


Sample Inputs

Examples of inputs generated from the various prototypes used in each simulation. The prototypes from which each input was generated include:

(A) four corner prototypes

(B) the two overlapping prototypes

(C) the two exaggerated prototypes, and

(D) a single center prototype.

The two stimuli in (B) are initially confused by the model, but training on (C) allows the model to discriminate (B).

Cases (E) through (G) are degraded versions of (A) through (C), using a greater spread for the gaussian.


Results From SOFM Model


Degraded input (DI) makes stimuli indiscriminable. But exposure to exaggerated inputs (EI) causes the feature map to realign its discrimination boundaries, thereby pulling the two stimulus classes apart. Now mildly degraded inputs can be discriminated.


Neural Mechanism of Perceptual Learning

In order to gain a better understanding of the neural mechanisms underlying perceptual shaping, which the Thomas-McClelland model attempts to explain at a more conceptual level, Tai Sing Lee and his students are conducting a series of neurophysiological experiments on awake monkeys.

In one experiment, monkeys are being trained to detect a pop-out stimulus in a field of homogeneous stimuli. The pop-out detection performance of the monkey is a function of the orientation difference between the pop-out stimulus and the distractor stimuli.

(a) No difference (b) 10o pop-out(c) 45o pop-out

Seeing the Pop-Out Effect in V1

Behaviorally, preattentive detection of pop-out is possible at about a 20o orientation difference. But with training, smaller orientation differences can be detected.

From previous experiments, Lee has found that the neural correlates of the pop-out signal emerge at the later part of V1 neurons' responses. The following graph shows the population response in V1, comparing a homogeneous field such as (a) above with an obvious pop-out stimulus (c). The pop-out response emerges around 80 msec after stimulus onset. The case marked "single" denotes presentation of a single stimulus element instead of a stimulus in an array of distractors. Responses for stimulus arrays are lower than for single stimuli, presumably due to lateral inhibition in cortex.


Experimental Paradigm

By recording from visual cortex as monkeys learn a pop-out discrimination task, Lee hopes to observe the gradual acquisition of a stimulus discrimination in V1 with experience. To do this, Lee is recording the responses of an ensemble of V1 neurons at two retinotopic locations.

The monkey is trained to detect the pop-out target at one location, while the other location is left untrained. The orientation difference between the pop-out stimulus and the distractor stimuli is decreased over time, as the monkey grows more proficient.

The hypothesis is that shaping the performance of the monkey this way will induce a change in the later part of the responses of the neurons at the trained location, but not the untrained location. For example, if there is no differential response between cases (a) and (b) initially, with training, we should be able to see a difference in the later part of the V1 neurons' responses.

Lee is exploring two methods of recording for this task. One is the traditional single electrode, single-unit recording in which he records from a large number of cells before, during, and after training. However, the cells recorded in each session will be different, so one must look for changes in the population response.

The other recording method is parallel multi-unit recording using a microelectrode array.


Microelectrode Array

Lee and his students are currently experimenting with chronic implantation of the Utah array (Bionic Inc.), with 100 electrodes.

This method potentially will enable them to monitor miltiple cells in parallel over a long period of time.

The ability to monitor the evolution of basic properties of the cells as the training progresses should provide deeper insights into the biological mechanisms underlying perceptual shaping, an important kind of incremental learning.


Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Dave Touretzky
Last modified: Sat Dec 15 02:42:33 EST 2001