12:00, 24 Sep 1997, WeH 7220 Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition Tom Dietterich Department of Computer Science Oregon State University This talk will describe work-in-progress on a hierarchical decomposition of the value function called the MAXQ decomposition. I will introduce the MAXQ formalism, which provides a graphical language for describing hierarchical reinforcement learning problems. This MAXQ decomposition can be used to formalize the work of Singh, Kaelbling, and Dayan on hierarchical reinforcement learning. The decomposition can be viewed in two ways: (a) as a form of value function approximation and (b) as a form of procedural abstraction. Many interesting issues arise including the following: (a) what value functions can be represented by a given MAXQ decomposition? (b) how can we choose which parts of the global state to make available to the individual subtasks within the MAXQ hierarchy? (c) how can the MAXQ hierarchy be efficiently evaluated? (d) how can it be efficiently trained? and (e) how can the hierarchical credit assignment problem be solved? I will use examples drawn from our work on the Kanfer-Ackerman Air Traffic Control task.