Partially Observable Markov Decision Processes have been considered intractable for finding good controllers in real world domains. In particular, the best algorithms to date for finding an approximate value function over the full belief space have not scaled beyond a few hundred states [PGT03a]. However, we have demonstrated that real world POMDPs can contain structured belief spaces; by finding and using this structure, we have been able to solve POMDPs an order of magnitude larger than those solved by conventional value iteration techniques. Additionally, we were able to solve different kinds of POMDPs, from a simple highly-structured synthetic problem to a robot navigation problem to a problem with a factored belief space and relatively complicated probability distributions.
The algorithm we used to find this structure is related to Principal Components Analysis with a loss function specifically chosen for representing probability distributions. The real world POMDPs we have been able to solve are characterized by sparse distributions, and the Exponential family PCA algorithm is particularly effective for compressing this data. There do exist POMDP problems which do not have this structure, and for which this dimensionality reduction technique will not work well; however, it is a question for further investigation if other, related dimensionality-reduction techniques (e.g., Isomap or Locally-Linear Embedding, [TdSL00,RSH02]) can be applied.
There are a number of interesting possibilities for extending this algorithm in order to improve its efficiency or increase the domain of applicability. The loss function that we chose for dimensionality reduction was based on reconstruction error, as in
Another shortcoming of the approach described in this work is that it contains the assumption that all beliefs can be described using the same low-dimensional representation. However, it is relatively easy to construct an example problem which generates beliefs that lie on two distinct low-dimensional surfaces, which in the current formulation would make the apparent dimensionality of the beliefs appear much higher than a set of beliefs sampled from one surface alone.
While this work has largely been motivated by finding better representations of beliefs, it is not the only approach to solving large POMDPs. Policy search methods [MPKK99] and hierarchical methods [PGT03b] have also been able to solve large POMDPs. It is interesting to note that controllers based on the E-PCA representations are often essentially independent of policy complexity but strongly dependent on belief complexity, whereas the policy search and hierarchical methods are strongly dependent on policy complexity but largely independent of belief space complexity. It seems likely that progress in solving large POMDPs in general will lie in a combination of both approaches.
The E-PCA algorithm finds a low-dimensional representation of the full belief space from sampled data. We demonstrated that the reliance on sampled data is not an obstacle for some real world problems. Furthermore, using only sampled beliefs could be an asset for large problems where generating and tracking beliefs can be considerably easier than planning. It may however be preferable to try to compute a low-dimensional representation directly from the model parameters. Poupart & Boutilier [PB02] use the notion of a Krylov subspace to do this. The subspace computed by their algorithm may correspond exactly with a conventional PCA and we have seen instances where PCA does a poor job of finding low-dimensional representations. The most likely explanation is that real-world beliefs do not lie on low-dimensional planes for most problems, but instead on curved surfaces. An extremely useful algorithm would be one that finds a subset of belief space closed under the transition and observation function, but which is not constrained to find only planes.