 
 
 
 
 
   
 )
)
Single graph heuristics are usually uninformed because the
projected belief state  often corresponds to multiple
possible states. The lack of accuracy is because single graphs are
not able to capture propagation of multiple world support
information. Consider the CBTC problem where the projected belief
state is
 often corresponds to multiple
possible states. The lack of accuracy is because single graphs are
not able to capture propagation of multiple world support
information. Consider the CBTC problem where the projected belief
state is  and we are using a single graph
 and we are using a single graph  .  If DunkP1
were the only action we would say that
.  If DunkP1
were the only action we would say that  arm and
arm and  clog
can be reached at a cost of two, but in fact the cost is infinite
(since there is no DunkP2 to support
clog
can be reached at a cost of two, but in fact the cost is infinite
(since there is no DunkP2 to support  arm from all possible
worlds), and there is no strong plan.
arm from all possible
worlds), and there is no strong plan.
To account for lack of support in all possible worlds and sharpen
the heuristic estimate, a set of multiple planning graphs  is considered. Each
is considered. Each 
 is a single graph, as
previously discussed.  These multiple graphs are similar to the
graphs used by CGP [30], but lack the more
general cross-world mutexes.  Mutexes are only computed within
each graph, i.e. only same-world mutexes are computed. We
construct the initial layer
 is a single graph, as
previously discussed.  These multiple graphs are similar to the
graphs used by CGP [30], but lack the more
general cross-world mutexes.  Mutexes are only computed within
each graph, i.e. only same-world mutexes are computed. We
construct the initial layer 
 of each graph
 of each graph
 with a different state
 with a different state 
 . With
multiple graphs, the heuristic value of a belief state is computed
in terms of all the graphs.  Unlike single graphs, we can compute
different world aggregation measures with the multiple planning
graphs.
. With
multiple graphs, the heuristic value of a belief state is computed
in terms of all the graphs.  Unlike single graphs, we can compute
different world aggregation measures with the multiple planning
graphs.
While we get a more informed heuristic by considering more of the
states in 
 , in certain cases it can be costly to
compute the full set of planning graphs and extract relaxed plans.
We will describe computing the full set of planning graphs, but will
later evaluate (in Section 6.4) the effect of computing a smaller
proportion of these.  The single graph
, in certain cases it can be costly to
compute the full set of planning graphs and extract relaxed plans.
We will describe computing the full set of planning graphs, but will
later evaluate (in Section 6.4) the effect of computing a smaller
proportion of these.  The single graph  is the extreme case of
computing fewer graphs.
 is the extreme case of
computing fewer graphs.
To illustrate the use of multiple planning graphs, consider our
example CBTC. We build two graphs (Figure 6) for the
projected  . They have the respective initial literal layers:
. They have the respective initial literal layers:
 arm, clog, inP1,
arm, clog, inP1,  inP2
inP2 and
 and
 arm, clog,
arm, clog,  inP2, inP2
inP2, inP2 .
.
In the graph for the first possible world,  arm comes
in only through DunkP1 at level 2. In the graph for the second
world,
arm comes
in only through DunkP1 at level 2. In the graph for the second
world,  arm comes in only through DunkP2 at level 2. Thus, the
multiple graphs show which actions in the different worlds
contribute to support the same literal.
arm comes in only through DunkP2 at level 2. Thus, the
multiple graphs show which actions in the different worlds
contribute to support the same literal.
A single planning graph is sufficient if we do not aggregate state measures, so in the following we consider how to compute the achievement cost of a belief state with multiple graphs by aggregating state distances.
Positive Interaction Aggregation: Similar to GPT
[6], we can use the worst-case world to
represent the cost of the belief state  by using the
 by using the
 heuristic.  The difference with GPT is that we
compute a heuristic on planning graphs, where they compute plans in
state space.  With this heuristic we account for the number of
actions used in a given world, but assume positive interaction
across all possible worlds.
 heuristic.  The difference with GPT is that we
compute a heuristic on planning graphs, where they compute plans in
state space.  With this heuristic we account for the number of
actions used in a given world, but assume positive interaction
across all possible worlds.
The 
 heuristic is computed by finding a relaxed plan
 heuristic is computed by finding a relaxed plan
 on each planning graph
 on each planning graph 
 , exactly as
done on the single graph with
, exactly as
done on the single graph with 
 .  The difference is that
unlike the single graph relaxed plan
.  The difference is that
unlike the single graph relaxed plan  , but like
, but like  , the
initial levels of the planning graphs are states, so each relaxed
plan will reflect all the support needed in the world corresponding
to
, the
initial levels of the planning graphs are states, so each relaxed
plan will reflect all the support needed in the world corresponding
to  .  Formally:
.  Formally:
|  | 
 is the level of
 is the level of  where a constituent of
 where a constituent of
 was first reachable.
 was first reachable.
Notice that we are not computing all state distances between states
in  and
 and  .  Each planning graph
.  Each planning graph  corresponds to a
state in
 corresponds to a
state in  , and from each
, and from each  we extract a single relaxed
plan. We do not need to enumerate all states in
 we extract a single relaxed
plan. We do not need to enumerate all states in  and find a
relaxed plan for each.  We instead support a set of literals from
one constituent of
 and find a
relaxed plan for each.  We instead support a set of literals from
one constituent of  .  This constituent is estimated to be the
minimum distance state in
.  This constituent is estimated to be the
minimum distance state in  because it is the first constituent
reached in
 because it is the first constituent
reached in  .
.
For CBTC, computing 
 (Figure 6)
finds:
 (Figure 6)
finds:
 
 inP1
inP1 , Flush
, Flush ,
,
 inP1
inP1 ,
,
 Flush
Flush },
},
 inP1
inP1 clog
clog ,
,
 clog
clog , DunkP1
, DunkP1 ,
,
 clog
clog ,
,
 DunkP1
DunkP1 },
},
 arm
arm clog
clog 
and  
 inP2
inP2 , Flush
, Flush ,
,
 inP2
inP2 ,
,
 Flush
Flush },
},
 inP2
inP2 clog
clog ,
,
 clog
clog , DunkP2
, DunkP2 ,
,
 clog
clog ,
,
 DunkP2
DunkP2 },
},
 arm
arm clog
clog .
.
Each relaxed plan contains two actions and taking the
maximum of the two relaxed plan values gives 
 .  This aggregation ignores the fact that we must use different
Dunk actions each possible world.
.  This aggregation ignores the fact that we must use different
Dunk actions each possible world.
Independence Aggregation: We can use the 
 heuristic to assume independence among the worlds in our belief
state. We extract relaxed plans exactly as described in the previous
heuristic and simply use a summation rather than maximization of the
relaxed plan costs.  Formally:
heuristic to assume independence among the worlds in our belief
state. We extract relaxed plans exactly as described in the previous
heuristic and simply use a summation rather than maximization of the
relaxed plan costs.  Formally:
|  | 
 is the level of
 is the level of  where a constituent of
 where a constituent of
 was first reachable.
 was first reachable.
For CBTC, if computing 
 , we find the same
relaxed plans as in the
, we find the same
relaxed plans as in the 
 heuristic, but sum
their values to get 2 + 2 = 4 as our heuristic.  This aggregation
ignores the fact that we can use the same Flush action for both
possible worlds.
 heuristic, but sum
their values to get 2 + 2 = 4 as our heuristic.  This aggregation
ignores the fact that we can use the same Flush action for both
possible worlds.
State Overlap Aggregation: We notice that in the two previous
heuristics we are either taking a maximization and not accounting
for some actions, or taking a summation and possibly accounting for
extra actions. We present the 
 heuristic to balance
the measure between positive interaction and independence of worlds.
Examining the relaxed plans computed by the two previous heuristics
for the CBTC example, we see that the relaxed plans extracted from
each graph have some overlap. Notice, that both
 heuristic to balance
the measure between positive interaction and independence of worlds.
Examining the relaxed plans computed by the two previous heuristics
for the CBTC example, we see that the relaxed plans extracted from
each graph have some overlap. Notice, that both 
 and
 and 
 contain a Flush action
irrespective of which package the bomb is in - showing some
positive interaction. Also,
 contain a Flush action
irrespective of which package the bomb is in - showing some
positive interaction. Also, 
 contains DunkP1,
and
 contains DunkP1,
and 
 contains DunkP2 - showing some
independence. If we take the layer-wise union of the two relaxed
plans, we would get a unioned relaxed plan:
 contains DunkP2 - showing some
independence. If we take the layer-wise union of the two relaxed
plans, we would get a unioned relaxed plan:
 
 inP1
inP1 , Flush
, Flush ,
,
 inP1
inP1 ,
,
 inP2
inP2 ,
, 
 Flush
Flush },
},
 inP1, inP2
inP1, inP2 clog
clog ,
,
 clog
clog , DunkP1, DunkP2
, DunkP1, DunkP2 ,
,
 clog
clog ,
,
 DunkP1
DunkP1 ,
, 
 DunkP2
DunkP2 },
},
 arm
arm clog
clog .
.
This relaxed plans accounts for the actions that are the same between possible worlds and the actions that differ. Notice that Flush appears only once in layer zero and the Dunk actions both appear in layer one.
In order to get the union of relaxed plans, we extract relaxed plans
from each 
 , as in the two previous heuristics.
Then if we are computing heuristics for regression search, we start
at the last level (and repeat for each level) by taking the union of
the sets of actions for each relaxed plan at each level into another
relaxed plan. The relaxed plans are end-aligned, hence the
unioning of levels proceeds from the last layer of each relaxed plan
to create the last layer of the
, as in the two previous heuristics.
Then if we are computing heuristics for regression search, we start
at the last level (and repeat for each level) by taking the union of
the sets of actions for each relaxed plan at each level into another
relaxed plan. The relaxed plans are end-aligned, hence the
unioning of levels proceeds from the last layer of each relaxed plan
to create the last layer of the  relaxed plan, then the
second to last layer for each relaxed plan is unioned and so on. In
progression search, the relaxed plans are start-aligned to
reflect that they all start at the same time, whereas in regression
we assume they all end at the same time.  The summation of the
number of actions of each action level in the unioned relaxed plan
is used as the heuristic value. Formally:
 relaxed plan, then the
second to last layer for each relaxed plan is unioned and so on. In
progression search, the relaxed plans are start-aligned to
reflect that they all start at the same time, whereas in regression
we assume they all end at the same time.  The summation of the
number of actions of each action level in the unioned relaxed plan
is used as the heuristic value. Formally:
|  | 
 is the greatest level
 is the greatest level 
 where a constituent
of
 where a constituent
of  was first reachable.
 was first reachable.
For CBTC, we just found  , so counting the number of actions
gives us a heuristic value of
, so counting the number of actions
gives us a heuristic value of 
 .
.
 
 
 
 
