Timing and Partial Observability in Reinforcement Learning Models of the Dopamine System

Nathaniel Daw - joint work with Aaron C. Courville and David S. Touretzky.


  I will review a series of influential attempts in theoretical neuroscience and psychology to understand the responses of animal neurons, and the behavior of animals themselves, as driven by a simple reinforcement learning algorithm. The bulk of the talk will draw on somewhat more advanced RL techniques to address a number of interrelated shortcomings in existing accounts, which are all based on temporal difference learning with a tapped-delay-line representation of state. Such theories do not provide a satisfactory account of variability in the timing between events nor of the partial observability of world state, and they are inconsistent with dominant psychological theories about how animals learn at multiple timescales. These model-free accounts also fail to explain why animals, behaviorally, evidence a great deal of knowledge about neutral stimulus-stimulus contingencies. We develop a TD learning account based on a richer former model, the partially-observable semi-Markov process, to address these issues. I will conclude by seeking discussion of a number of puzzling aspects arising from the research, particularly a tension between model based and sample-based model-free learning methods, both of which are indicated by certain aspects of the data.

Back to the Main Page

Charles Rosenberg
Last modified: Fri Sep 27 12:58:51 EDT 2002