Thu 17 Mar, 1:30, WeH 4601 POMDP's: A framework for reinforcement learning with hidden state? Michael Littman Traditional approaches to reinforcement learning make the arguably bold assumption that the learner's input is sufficient to determine a complete description of the environment in which the learner is embedded. When this assumption is violated, for instance when multiple environmental states give the same appearance to the agent, there is no reason to expect standard RL techniques such as Q-learning to act appropriately. In the field of operations research, these types of hidden state environments are formalized as "Partially-Observable Markov Decision Processes" (POMDP's) and an extensive literature exists on the structure of such problems. Chrisman (1992) introduced this formalism to the RL community and described methods for learning a POMDP model from data. Several of us at Brown (Cassandra, Kaelbling, and Littman, 1994) have been exploring the problem of taking a POMDP model and determining an optimal policy for the environment. In this talk, I'll present an introduction to the POMDP model and describe algorithms for finding optimal policies. I'll also sketch a new algorithm we found in the course of this work that guarantees epsilon-optimal policies with significantly less computational effort than existing algorithms. I'll conclude with some thoughts on the policy representations suggested by POMDP's and their potential use in RL systems.