Ever since CGP  and SGP  a series of planners have been developed for tackling conformant and conditional planning problems - including GPT , C-Plan , PKSPlan , Frag-Plan , MBP , KACMBP , CFF , and YKA . Several of these planners are extensions of heuristic state space planners that search in the space of ``belief states'' (where a belief state is a set of possible states). Without full-observability, agents need belief states to capture state uncertainty arising from starting in an uncertain state or by executing actions with uncertain effects in a known state. We focus on the first type of uncertainty, where an agent starts in an uncertain state but has deterministic actions. We seek strong plans, where the agent will reach the goal with certainty despite its partially known state. Many of the aforementioned planners find strong plans, and heuristic search planners are currently among the best. Yet a foundation for what constitutes a good distance-based heuristic for belief space has not been adequately investigated.
Belief Space Heuristics: Intuitively, it can be argued that the heuristic merit of a belief state depends on at least two factors-the size of the belief state (i.e., the uncertainty in the current state), and the distance of the individual states in the belief state from a destination belief state. The question of course is how to compute these measures and which are most effective. Many approaches estimate belief state distances in terms of individual state to state distances between states in two belief states, but either lack effective state to state distances or ways to aggregate the state distances. For instance the MBP planner  counts the number of states in the current belief state. This amounts to assuming each state distance has unit cost, and planning for each state can be done independently. The GPT planner  measures the state to state distances exactly and takes the maximum distance, assuming the states of the belief state positively interact.
Heuristic Computation Substrates: We characterize several approaches to estimating belief state distance by describing them in terms of underlying state to state distances. The basis of our investigation is in adapting classical planning reachability heuristics to measure state distances and developing state distance aggregation techniques to measure interaction between plans for states in a belief state. We take three fundamental approaches to measure the distance between two belief states. The first approach does not involve aggregating state distance measures, rather we use a classical planning graph to compute a representative state distance. The second retains distinctions between individual states in the belief state by using multiple planning graphs, akin to CGP , to compute many state distance measures which are then aggregated. The third employs a new planning graph generalization, called the Labelled Uncertainty Graph ( ), that blends the first two to measure a single distance between two belief states. With each of these techniques we will discuss the types of heuristics that we can compute with special emphasis on relaxed plans. We present several relaxed plan heuristics that differ in terms of how they employ state distance aggregation to make stronger assumptions about how states in a belief state can co-achieve the goal through action sequences that are independent, positively interact, or negatively interact.
Our motivation for the first of the three planning graph techniques for measuring belief state distances is to try a minimal extension to classical planning heuristics to see if they will work for us. Noticing that our use of classical planning heuristics ignores distinctions between states in a belief state and may provide uninformed heuristics, we move to the second approach where we possibly build exponentially many planning graphs to get a better heuristic. With the multiple planning graphs we extract a heuristic from each graph and aggregate them to get the belief state distance measure. If we assume the states of a belief state are independent, we can aggregate the measures with a summation. Or, if we assume they positively interact we can use a maximization. However, as we will show, relaxed plans give us a unique opportunity to measure both positive interaction and independence among the states by essentially taking the union of several relaxed plans. Moreover, mutexes play a role in measuring negative interactions between states. Despite the utility of having robust ways to aggregate state distances, we are still faced with the exponential blow up in the number of planning graphs needed. Thus, our third approach seeks to retain the ability to measure the interaction of state distances but avoid computing multiple graphs and extracting heuristics from each. The idea is to condense and symbolically represent multiple planning graphs in a single planning graph, called a Labelled Uncertainty Graph ( ). Loosely speaking, this single graph unions the causal support information present in the multiple graphs and pushes the disjunction, describing sets of possible worlds (i.e., initial literal layers), into ``labels". The planning graph vertices are the same as those present in multiple graphs, but redundant representation is avoided. For instance an action that was present in all of the multiple planning graphs would be present only once in the and labelled to indicate that it is applicable in a planning graph projection from each possible world. We will describe how to extract heuristics from the that make implicit assumptions about state interaction without explicitly aggregating several state distances.
Ideally, each of the planning graph techniques considers every state in a belief state to compute heuristics, but as belief states grow in size this could become uninformed or costly. For example, the single classical planning graph ignores distinctions between possible states where the heuristic based on multiple graphs leads to the construction of a planning graph for each state. One way to keep costs down is to base the heuristics on only a subset of the states in our belief state. We evaluate the effect of such a sampling on the cost of our heuristics. With a single graph we sample a single state and with multiple graphs and the we sample some percent of the states. We evaluate state sampling to show when it is appropriate, and find that it is dependent on how we compute heuristics with the states.
Standardized Evaluation of Heuristics: An issue in evaluating the effectiveness of heuristic techniques is the many architectural differences between planners that use the heuristics. It is quite hard to pinpoint the global effect of the assumptions underlying their heuristics on performance. For example, GPT is outperformed by MBP-but it is questionable as to whether the credit for this efficiency is attributable to the differences in heuristics, or differences in search engines (MBP uses a BDD-based search). Our interest in this paper is to systematically evaluate a spectrum of approaches for computing heuristics for belief space planning. Thus we have implemented heuristics similar to GPT and MBP and use them to compare against our new heuristics developed around the notion of overlap (multiple world positive interaction and independence). We implemented the heuristics within two planners, the Conformant- planner ( ) and the Partially-Observable Non-Deterministic planner ( ). does handle search with non-deterministic actions, but for the bulk of the paper we discuss deterministic actions. This more general action formulation, as pointed out by , can be translated into initial state uncertainty. Alternatively, in Section 8.2 we discuss a more direct approach to reason with non-deterministic actions in the heuristics.
External Evaluation: Although our main interest in this paper is to evaluate the relative advantages of a spectrum of belief space planning heuristics in a normalized setting, we also compare the performance of the best heuristics from this work to current state of the art conformant and conditional planners. Our empirical studies show that planning graph based heuristics provide effective guidance compared to cardinality heuristics as well as the reachability heuristic used by GPT and CFF, and our planners are competitive with BDD-based planners such as MBP and YKA, and GraphPlan-based ones such as CGP and SGP. We also notice that our planners gain scalability with our heuristics and retain reasonable quality solutions, unlike several of the planners we compare against.
The rest of this paper is organized as follows. We first present the and planners by describing their state and action representations as well as their search algorithms. To understand search guidance in the planners, we then discuss appropriate properties of heuristic measures for belief space planning. We follow with a description of the three planning graph substrates used to compute heuristics. We carry out an empirical evaluation in the next three sections, by describing our test setup, presenting a standardized internal comparison, and finally comparing with several other state of the art planners. We end with related research, discussion, prospects for future work, and various concluding remarks.