12:00, 24 Sep 1997, WeH 7220
Hierarchical Reinforcement Learning
with the MAXQ Value Function Decomposition
Tom Dietterich
Department of Computer Science
Oregon State University
This talk will describe work-in-progress on a hierarchical
decomposition of the value function called the MAXQ decomposition. I
will introduce the MAXQ formalism, which provides a graphical language
for describing hierarchical reinforcement learning problems. This
MAXQ decomposition can be used to formalize the work of Singh,
Kaelbling, and Dayan on hierarchical reinforcement learning. The
decomposition can be viewed in two ways: (a) as a form of value
function approximation and (b) as a form of procedural abstraction.
Many interesting issues arise including the following: (a) what value
functions can be represented by a given MAXQ decomposition? (b) how
can we choose which parts of the global state to make available to the
individual subtasks within the MAXQ hierarchy? (c) how can the MAXQ
hierarchy be efficiently evaluated? (d) how can it be efficiently
trained? and (e) how can the hierarchical credit assignment problem
be solved? I will use examples drawn from our work on the
Kanfer-Ackerman Air Traffic Control task.