2:00, Friday April 8, WeH 7220
Monte Carlo Learning in Environments with Hidden State
Satinder Pal Singh^*
Department of Brain and Cognitive Sciences
MIT
Planning and search algorithms from AI and reinforcement
learning (RL) algorithms from machine learning form a sound
theoretical basis for building architectures than enable an agent to
determine autonomously the ``optimal'' actions in certain classes of
environments. Unfortunately all of the theory and most of the
practice of the above algorithms is limited to problems where the
agent can sense the state of the environment completely. In many
real-world tasks, however, the state of the environment is ``hidden'',
i.e., only partially observable by the agent. Most previous general
approaches to hidden-state problems use computationally expensive
techniques to estimate the state of the environment, and use the
conventional algorithms on the estimated state.
This talk addresses the question of what is the best an agent
can do in hidden-state problems without resorting to any form of state
estimation. We develop a new framework for solving such problems by
including stochastic decision policies in the search space, and by
defining the ``value'' or utility of a cluster of states. We also
discuss why the conventional RL framework needs to be extended and
present theoretical results about what popular RL algorithms do when
applied to hidden-state problems. Finally, we present a new,
computationally tractable, Monte Carlo algorithm for finding locally
optimal stochastic decision policies in a class of hidden-state
problems.
*Joint work with Tommi Jaakkola and Michael Jordan