Single graph heuristics are usually uninformed because the projected belief state often corresponds to multiple possible states. The lack of accuracy is because single graphs are not able to capture propagation of multiple world support information. Consider the CBTC problem where the projected belief state is and we are using a single graph . If DunkP1 were the only action we would say that arm and clog can be reached at a cost of two, but in fact the cost is infinite (since there is no DunkP2 to support arm from all possible worlds), and there is no strong plan.

To account for lack of support in all possible worlds and sharpen the heuristic estimate, a set of multiple planning graphs is considered. Each is a single graph, as previously discussed. These multiple graphs are similar to the graphs used by CGP [30], but lack the more general cross-world mutexes. Mutexes are only computed within each graph, i.e. only same-world mutexes are computed. We construct the initial layer of each graph with a different state . With multiple graphs, the heuristic value of a belief state is computed in terms of all the graphs. Unlike single graphs, we can compute different world aggregation measures with the multiple planning graphs.

While we get a more informed heuristic by considering more of the states in , in certain cases it can be costly to compute the full set of planning graphs and extract relaxed plans. We will describe computing the full set of planning graphs, but will later evaluate (in Section 6.4) the effect of computing a smaller proportion of these. The single graph is the extreme case of computing fewer graphs.

To illustrate the use of multiple planning graphs, consider our example CBTC. We build two graphs (Figure 6) for the projected . They have the respective initial literal layers:

arm, clog, inP1, inP2 and

arm, clog, inP2, inP2 .

In the graph for the first possible world, arm comes in only through DunkP1 at level 2. In the graph for the second world, arm comes in only through DunkP2 at level 2. Thus, the multiple graphs show which actions in the different worlds contribute to support the same literal.

A single planning graph is sufficient if we do not aggregate state measures, so in the following we consider how to compute the achievement cost of a belief state with multiple graphs by aggregating state distances.

**Positive Interaction Aggregation:** Similar to GPT
[6], we can use the worst-case world to
represent the cost of the belief state
by using the
heuristic. The difference with GPT is that we
compute a heuristic on planning graphs, where they compute plans in
state space. With this heuristic we account for the number of
actions used in a given world, but assume positive interaction
across all possible worlds.

The heuristic is computed by finding a relaxed plan on each planning graph , exactly as done on the single graph with . The difference is that unlike the single graph relaxed plan , but like , the initial levels of the planning graphs are states, so each relaxed plan will reflect all the support needed in the world corresponding to . Formally:

where is the level of where a constituent of was first reachable.

Notice that we are not computing all state distances between states in and . Each planning graph corresponds to a state in , and from each we extract a single relaxed plan. We do not need to enumerate all states in and find a relaxed plan for each. We instead support a set of literals from one constituent of . This constituent is estimated to be the minimum distance state in because it is the first constituent reached in .

For CBTC, computing (Figure 6) finds:

inP1 , Flush ,

inP1 , Flush },

inP1 clog ,

clog , DunkP1 ,

clog , DunkP1 },

arm clog

and

inP2 , Flush ,

inP2 , Flush },

inP2 clog ,

clog , DunkP2 ,

clog , DunkP2 },

arm clog .

Each relaxed plan contains two actions and taking the maximum of the two relaxed plan values gives . This aggregation ignores the fact that we must use different Dunk actions each possible world.

**Independence Aggregation:** We can use the
heuristic to assume independence among the worlds in our belief
state. We extract relaxed plans exactly as described in the previous
heuristic and simply use a summation rather than maximization of the
relaxed plan costs. Formally:

where is the level of where a constituent of was first reachable.

For CBTC, if computing , we find the same relaxed plans as in the heuristic, but sum their values to get 2 + 2 = 4 as our heuristic. This aggregation ignores the fact that we can use the same Flush action for both possible worlds.

**State Overlap Aggregation:** We notice that in the two previous
heuristics we are either taking a maximization and not accounting
for some actions, or taking a summation and possibly accounting for
extra actions. We present the
heuristic to balance
the measure between positive interaction and independence of worlds.
Examining the relaxed plans computed by the two previous heuristics
for the CBTC example, we see that the relaxed plans extracted from
each graph have some overlap. Notice, that both
and
contain a Flush action
irrespective of which package the bomb is in - showing some
positive interaction. Also,
contains DunkP1,
and
contains DunkP2 - showing some
independence. If we take the layer-wise union of the two relaxed
plans, we would get a unioned relaxed plan:

inP1 , Flush ,

inP1 , inP2 , Flush },

inP1, inP2 clog ,

clog , DunkP1, DunkP2 ,

clog , DunkP1 , DunkP2 },

arm clog .

This relaxed plans accounts for the actions that are the same between possible worlds and the actions that differ. Notice that Flush appears only once in layer zero and the Dunk actions both appear in layer one.

In order to get the union of relaxed plans, we extract relaxed plans
from each
, as in the two previous heuristics.
Then if we are computing heuristics for regression search, we start
at the last level (and repeat for each level) by taking the union of
the sets of actions for each relaxed plan at each level into another
relaxed plan. The relaxed plans are *end-aligned*, hence the
unioning of levels proceeds from the last layer of each relaxed plan
to create the last layer of the
relaxed plan, then the
second to last layer for each relaxed plan is unioned and so on. In
progression search, the relaxed plans are *start-aligned* to
reflect that they all start at the same time, whereas in regression
we assume they all end at the same time. The summation of the
number of actions of each action level in the unioned relaxed plan
is used as the heuristic value. Formally:

where is the greatest level where a constituent of was first reachable.

For CBTC, we just found , so counting the number of actions gives us a heuristic value of .