Next: State-Free Deterministic Policies Up: Reinforcement Learning: A Survey Previous: Hierarchical Distance to Goal

Partially Observable Environments

In many real-world environments, it will not be possible for the agent to have perfect and complete perception of the state of the environment. Unfortunately, complete observability is necessary for learning methods based on MDPs. In this section, we consider the case in which the agent makes observations of the state of the environment, but these observations may be noisy and provide incomplete information. In the case of a robot, for instance, it might observe whether it is in a corridor, an open room, a T-junction, etc., and those observations might be error-prone. This problem is also referred to as the problem of ``incomplete perception,'' ``perceptual aliasing,'' or ``hidden state.''

In this section, we will consider extensions to the basic MDP framework for solving partially observable problems. The resulting formal model is called a partially observable Markov decision process or POMDP.

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996