next up previous
Next: Partially Observable Markov Decision Up: Finding Approximate POMDP Solutions Previous: Finding Approximate POMDP Solutions


Decision making is one of the central problems of artificial intelligence and robotics. Most robots are deployed into the world to accomplish specific tasks, but the real world is a difficult place in which to act--actions can have serious consequences. Figure 1(a) depicts a mobile robot, Pearl, designed to operate in the environment shown in Figure 1(b), the Longwood retirement facility in Pittsburgh. Real world environments such as Longwood are characterized by uncertainty; sensors such as cameras and range finders are noisy and the entire world is not always observable. A large number of state estimation techniques explicitly recognize the impossibility of correctly identifying the true state of the world [GBFK98,Ols00,GF02,KKR95,IB98] by using probabilistic techniques to track the location of the robot. Such state estimators as the Kalman filter [LDW91] or Markov localization [FBT99,TFBD00] provide a (possibly factored, [BK98]) distribution over possible states of the world instead of a single (possibly incorrect) state estimate.

Figure 1: A planner for the mobile robot Pearl, shown in (a), must be able to navigate reliably in such real environments as the Longwood at Oakmont retirement facility, shown in (b). The white areas of the map are free space, the black pixels are obstacles, and the grey areas again are regions of map uncertainty. Notice the large open spaces, and many symmetries that can lead to ambiguity in the robot's position. The map is $ 53.6m \times
37.9m$, with a resolution of $ 0.1m \times 0.1m$ per pixel.
\fbox{\includegraphics[height=2in]{figs/}} \fbox{\includegraphics[height=2in]{figs/}}

In contrast, controllers such as motion planners, dialogue systems, etc. rarely model the same notions of uncertainty. If the state estimate is a full probability distribution, then the controller often uses a heuristic to extract a single ``best'' state, such as the distribution's mean or mode. Some planners compensate for the inevitable estimation errors through robust control [Che00,BS01], but few deployed systems incorporate a full probabilistic state estimate into planning. Although the most-likely-state method is simple and has been used successfully by some real applications [NPB95], substantial control errors can result when the distribution over possible states is very uncertain. If the single state estimate is wrong, the planner is likely to choose an unreasonable action.

Figure 2 illustrates the difference between conventional controllers and those that model uncertainty. In this figure, the robot must navigate from the bottom right corner to the top left, but has limited range sensing (up to $ 2m$) and noisy dead reckoning.1 The impoverished sensor data can cause the robot's state estimate to become quite uncertain if it strays too far from environmental structures that it can use to localize itself. On the left (Figure 2a) is an example trajectory from a motion planner that has no knowledge of the uncertainty in the state estimate and no mechanism for taking this uncertainty into account. The robot's trajectory diverges from the desired path, and the robot incorrectly believes it has arrived at the goal. Not shown here are the state estimates that reflect the high uncertainty in the robot position. On the right (Figure 2b) is an example trajectory from a controller that can model the positional uncertainty, take action to keep the uncertainty small by following the walls, and arrive reliably at the goal.

Figure 2: Two possible trajectories for navigation in the Longwood at Oakmont environment. The robot has limited range sensing (up to $ 2m$) and poor dead-reckoning from odometry. (a) The trajectory from a conventional motion planner that uses a single state estimate, and minimizes travel distance. (b) The trajectory from a more robust controller that models the state uncertainty to minimize travel distance and uncertainty.
[Conventional controller]
[Robust controller]

The controller in Figure 2(b) was derived from a representation called the partially observable Markov decision process (POMDP). POMDPs are a technique for making decisions based on probabilistic estimates of the state of the world, rather than on absolute knowledge of the true state. A POMDP uses an a priori model of the world together with the history of actions taken and observations received in order to infer a probability distribution, or ``belief'', over the possible states of the world. The controller chooses actions, based upon the current belief, to maximize the reward it expects to receive over time.

The advantage to using POMDPs for decision making is that the resulting policies handle uncertainty well. The POMDP planning process can take advantage of actions that implicitly reduce uncertainty, even if the problem specification (e.g., the reward function) does not explicitly reward such actions. The disadvantage to POMDPs is that finding the optimal policy is computationally intractable. Existing techniques for finding exact optimal plans for POMDPs typically cannot handle problems with more than a few hundred states [Hau00,ZZ01]. Most planning problems involving real, physical systems cannot be expressed so compactly; we would like to deploy robots that plan over thousands of possible states of the world (e.g., map grid cells), with thousands of possible observations (e.g., laser range measurements) and actions (e.g., velocities).

In this paper, we will describe an algorithm for finding approximate solutions to real-world POMDPs. The algorithm arises from the insight that exact POMDP policies use unnecessarily complex, high-dimensional representations of the beliefs that the controller can expect to experience. By finding low-dimensional representations, the planning process becomes much more tractable.

We will first describe how to find low-dimensional representations of beliefs for real-world POMDPs; we will use a variant of a common dimensionality-reduction technique called Principal Components Analysis. The particular variant we use modifies the loss function of PCA in order to better model the data as probability distributions. Using these low-dimensional representations, we will describe how to plan in the low-dimensional space, and conclude with experimental results on robot control tasks.


... reckoning.1
For the purposes of this example the sensing and dead reckoning were artificially poor, but the same phenomenon would occur naturally in larger-scale environments.

next up previous
Next: Partially Observable Markov Decision Up: Finding Approximate POMDP Solutions Previous: Finding Approximate POMDP Solutions
Nicholas Roy 2005-01-16