next up previous
Next: Conclusion Up: Finding Approximate POMDP Solutions Previous: Discussion

Related Work

Many attempts have been made to use reachability analysis to constrain the set of beliefs for planning [Was97,Hau00,ZH01,PGT03a]. If the reachable set of beliefs is relatively small, then forward search to find this set is a perfectly reasonable approach. The policy computed over these beliefs is of course optimal, although it is relatively rare in real world problems to be able to enumerate the reachable beliefs. Reachability analysis has also been used with some success as a heuristic in guiding search methods, especially for focusing computation on finding function approximators [Was97,Han98]. In this approach, the problem still remains of how to compute the low-dimensional representation given the finite set of representative beliefs. Discretization of the belief space itself has been explored a number of times, both regular grid-based discretization [Lov91], regular variable resolution approaches [ZH01] and non-regular variable resolution representations [Bra97,Hau00]. In the same vein, state abstraction [BP96] has been explored to take advantage of factored state spaces, and of particular interest is the algorithm of Hansen & Feng [HF00] which can perform state abstraction in the absence of a prior factorization. So far, however, all of these approaches have fallen victim to the ``curse of dimensionality'' and have failed to scale to more than a few dozen states at most.

The value-directed POMDP compression algorithm of Poupart & Boutilier [PB02] is a dimensionality-reduction technique that is closer in spirit to ours, if not in technique. This algorithm computes a low-dimensional representation of a POMDP directly from the model parameters $ R$, $ T$, and $ O$ by finding the Krylov subspace for the reward function under belief propagation. The Krylov subspace for a vector and a matrix is the smallest subspace that contains the vector and is closed under multiplication by the matrix. For POMDPs, the authors use the smallest subspace that contains the immediate reward vector and is closed under a set of linear functions defined by the state transitions and observation model. The major advantage of this approach is that it optimizes the correct criterion: the value-directed compression will only distinguish between beliefs that have different value. The major disadvantage of this approach is that the Krylov subspace is constrained to be linear. Using our algorithm with PCA instead of E-PCA, we can realize much of the same compression as the Poupart & Boutilier [PB02] method: we can take advantage of regularities in the same transition matrices $ T^{a,z}$ but not in the reward function $ R$. Unfortunately, as we have seen, beliefs are unlikely to lie on a low-dimensional hyperplane, and our results reported in section 3 indicate that linear compression will not scale to the size of problems we wish to address.

Possibly the most promising approaches for finding approximate value-functions are the point-based methods, which instead of optimizing the value function over the entire belief space, do so only for specific beliefs. Cheng [Che88] described a method for backing up the value function at specific belief points in a procedure called ``point-based dynamic programming'' (PB-DP). These PB-DP steps are interleaved with standard backups as in full value iteration. Zhang & Zhang [ZZ01] improved this method by choosing the Witness points as the backup belief points, iteratively increasing the number of such points. The essential idea is that point-based backups are significantly cheaper than full backup steps. Indeed, the algorithm described by Zhang & Zhang [ZZ01] out-performs Hansen's exact policy-search method by an order of magnitude for small problems. However, the need for periodic backups across the full belief space still limits the applicability of these algorithms to small abstract problems.

More recently, Pineau, Gordon & Thrun [PGT03a] have abandoned full value function backups in favour of only point-based backups in the ``point-based value iteration'' (PBVI) algorithm. By backing up only at discrete belief points, the backup operator is polynomial instead of exponential (as in value iteration), and, even more importantly, the complexity of the value function remains constant. PBVI uses a fundamentally different approach to finding POMDP policies, and still remains constrained by the curse of dimensionality in large state spaces. However, it has been applied successfully to problems at least an order of magnitude larger than its predecessors, and is another example of algorithms that can be used to make large POMDPs tractable.

E-PCA is not the only possible technique for non-linear dimensionality reduction; there exists a large body of work containing different techniques such as Self-Organizing Maps [Koh82], Generative Topographic Mapping [BSW98], Stochastic Neighbour Embedding [HR03]. Two of the most successful algorithms to emerge recently have are Isomap [TdSL00] and Locally Linear Embedding [RS00]. Isomap extends PCA-like methods to non-linear surfaces using geodesic distances as the distance metric between data samples, rather than Euclidean distances. Locally Linear Embedding (LLE) can be considered a local alternative to the global reduction of Isomap in that it represents each point as the weighted combination of its neighbours and operates in two phases: computing the weights of the $ k$ nearest neighbours for each high-dimensional point, and then reconstructing the data in the low-dimensional co-ordinate frame from the weights. However, these algorithms do not contain explicit models of the kind of data (e.g., probability distributions) that they are attempting model. One interesting line of research, however, may be to extend these algorithms using different loss functions in the same manner that PCA was extended to E-PCA.


next up previous
Next: Conclusion Up: Finding Approximate POMDP Solutions Previous: Discussion
Nicholas Roy 2005-01-16