Thu 17 Mar, 1:30, WeH 4601
POMDP's: A framework for reinforcement learning with hidden state?
Michael Littman
Traditional approaches to reinforcement learning make the arguably
bold assumption that the learner's input is sufficient to determine a
complete description of the environment in which the learner is
embedded. When this assumption is violated, for instance when
multiple environmental states give the same appearance to the agent,
there is no reason to expect standard RL techniques such as Q-learning
to act appropriately.
In the field of operations research, these types of hidden state
environments are formalized as "Partially-Observable Markov Decision
Processes" (POMDP's) and an extensive literature exists on the
structure of such problems. Chrisman (1992) introduced this formalism
to the RL community and described methods for learning a POMDP model
from data. Several of us at Brown (Cassandra, Kaelbling, and Littman,
1994) have been exploring the problem of taking a POMDP model and
determining an optimal policy for the environment. In this talk, I'll
present an introduction to the POMDP model and describe algorithms for
finding optimal policies. I'll also sketch a new algorithm we found
in the course of this work that guarantees epsilon-optimal policies
with significantly less computational effort than existing algorithms.
I'll conclude with some thoughts on the policy representations
suggested by POMDP's and their potential use in RL systems.