State Distance Aggregation

Next: Summary of Methods for Up: Belief State Distance Previous: State Distance Assumptions

State Distance Aggregation

The aggregation function $\bigtriangledown$ plays an important role in how we measure the distance between belief states. When we compute more than one state distance measure, either exhaustively or by sampling a subset (as previously mentioned), we must combine the measures by some means, denoted $\bigtriangledown$ . There is a range of options for taking the state distances and aggregating them into a belief state distance. We discuss several assumptions associated with potential measures:

Positive Interaction of States: Positive interaction assumes that the most difficult state in requires actions that will help transition all other states in to some state in . In our example, this means that we assume the actions used to transition $S_{11}$ to $S_{32}$ will help us transition $S_{12}$ to $S_{31}$ (assuming each state in transitions to the closest state in ). Inspecting the action sequences, we see they positively interact because both need actions and . We do not need to know the action sequences to assume positive interaction because we define the aggregation $\bigtriangledown$ as a maximization of numerical state distances:

$dist(BS, BS') = \max\limits_{S\in {\cal M}(BS)} \min\limits_{S' \in {\cal M}(BS')} dist(S, S')$ .

The belief state distances are $dist(BS_1, BS_3) = \max(\min(14, 5),$ $\min(3, 7)) = 5$ and $dist(BS_2, BS_3) = \max(\min(8, 10)) = 8$ . In this case we prefer to . If each state distance is admissible and we do not sample from belief states, then assuming positive interaction is also admissible.
Independence of States: Independence assumes that each state in requires actions that are different from all other states in in order to reach a state in . Previously, we found there was positive interaction in the action sequences to transition $S_{11}$ to $S_{32}$ and $S_{12}$ to $S_{31}$ because they shared actions and . There is also some independence in these sequences because the first contains , and , where the second contains . Again, we do not need to know the action sequences to assume independence because we define the aggregation $\bigtriangledown$ as a summation of numerical state distances:

$dist(BS, BS') = \sum\limits_{S\in {\cal M}(BS)} \min\limits_{S' \in {\cal M}(BS')} dist(S, S')$ .

In our example, $dist(BS_1, BS_3) = \min(14, 5) + \min(3, 7) = 8$ , and $dist(BS_2, BS_3) = \min(8, 10) = 8$ . In this case we have no preference over and .
We notice that using the cardinality of a belief state $\vert{\cal M}(BS)\vert$ to measure is a special case of assuming state independence, where $\forall S, S' dist(S, S') = 1$ . If we use cardinality to measure distance in our example, then we have $dist(BS_1, BS_3) = \vert{\cal M}(BS_1)\vert = 2$ , and $dist(BS_2, BS_3) = \vert{\cal M}(BS_2)\vert = 1$ . With cardinality we prefer over because we have better knowledge in .
Overlap of States: Overlap assumes that there is both positive interaction and independence between the actions used by states in to reach a state in . The intuition is that some actions can often be used for multiple states in simultaneously and we should count these actions only once. For example, when we computed by assuming positive interaction, we noticed that the action sequences for $dist(S_{11}, S_{32})$ and $dist(S_{12}, S_{31})$ both used and . When we aggregate these sequences we would like to count and each only once because they potentially overlap. However, truly combining the action sequences for maximal overlap is a plan merging problem [19], which can be as difficult as planning. Since our ultimate intent is to compute heuristics, we take a very simple approach to merging action sequences. We introduce a plan merging operator $\Cup$ for $\bigtriangledown$ that picks a step at which we align the sequences and then unions the aligned steps. We use the size of the resulting action sequence to measure belief state distance:

$dist(BS, BS') = \Cup_{S\in {\cal M}(BS)} \min\limits_{S' \in {\cal M}(BS')} dist(S, S')$ .

Depending on the type of search, we define $\Cup$ differently. We assume that sequences used in progression search start at the same time and those used in regression end at the same time. Thus, in progression all sequences are aligned at the first step before we union steps, and in regression all sequences are aligned at the last step before the union.
For example, in progression $dist(S_{11}, S_{32}) \Cup dist(S_{12}, S_{31})$ $= (\{a_1, a_2\}, \{a_5\}, \{a_6, a_7\}) \Cup (\{a_1, a_7\}, \{a_3\})$ $= (\{a_1, a_2, a_7\}, \{a_5, a_3\}, \{a_6,$ $a_7\})$ because we align the sequences at their first steps, then union each step. Notice that this resulting sequence has seven actions, giving , whereas defining $\bigtriangledown$ as maximum gave a distance of five and as summation gave a distance of eight. Compared with overlap, positive interaction tends to under estimate distance, and independence tends to over estimate distance. As we will see during our empirical evaluation (in Section 6.5), accounting for overlap provides more accurate distance measures for many conformant planning domains.
Negative Interaction of States: Negative interaction between states can appear in our example if transitioning state $S_{11}$ to state $S_{32}$ makes it more difficult (or even impossible) to transition state $S_{12}$ to state $S_{31}$ . This could happen if performing action for $S_{11}$ conflicts with action for $S_{12}$ . We can say that cannot reach if all possible action sequences that start in $S_{11}$ and $S_{12}$ , respectively, and end in any $S \in {\cal M}(BS_3)$ negatively interact.
There are two ways negative interactions play a role in belief state distances. Negative interactions can allow us to prove it is impossible for a belief state to reach a belief state , meaning $dist(BS, BS') = \infty$ , or they can potentially increase the distance by a finite amount. We use only the first, more extreme, notion of negative interaction by computing ``cross-world'' mutexes [30] to prune belief states from the search. If we cannot prune a belief state, then we use one of the aforementioned techniques to aggregate state distances. As such, we do not provide a concrete definition for $\bigtriangledown$ to measure negative interaction.
While we do not explore ways to adjust the distance measure for negative interactions, we mention some possibilities. Like work in classical planning [23], we can penalize the distance measure to reflect additional cost associated with serializing conflicting actions. Additionally in conditional planning, conflicting actions can be conditioned on observations so that they do not execute in the same plan branch. A distance measure that uses observations would reflect the added cost of obtaining observations, as well as the change in cost associated with introducing plan branches (e.g., measuring average branch cost).

The above techniques for belief state distance estimation in terms of state distances provide the basis for our use of multiple planning graphs. We will show in the empirical evaluation that these measures affect planner performance very differently across standard conformant and conditional planning domains. While it can be quite costly to compute several state distance measures, understanding how to aggregate state distances sets the foundation for techniques we develop in the . As we have already mentioned, the conveniently allows us to implicitly aggregate state distances to directly measure belief state distance.

Next: Summary of Methods for Up: Belief State Distance Previous: State Distance Assumptions

2006-05-26