Single graph heuristics are usually uninformed because the projected belief state often corresponds to multiple possible states. The lack of accuracy is because single graphs are not able to capture propagation of multiple world support information. Consider the CBTC problem where the projected belief state is and we are using a single graph . If DunkP1 were the only action we would say that arm and clog can be reached at a cost of two, but in fact the cost is infinite (since there is no DunkP2 to support arm from all possible worlds), and there is no strong plan.
To account for lack of support in all possible worlds and sharpen the heuristic estimate, a set of multiple planning graphs is considered. Each is a single graph, as previously discussed. These multiple graphs are similar to the graphs used by CGP , but lack the more general cross-world mutexes. Mutexes are only computed within each graph, i.e. only same-world mutexes are computed. We construct the initial layer of each graph with a different state . With multiple graphs, the heuristic value of a belief state is computed in terms of all the graphs. Unlike single graphs, we can compute different world aggregation measures with the multiple planning graphs.
While we get a more informed heuristic by considering more of the states in , in certain cases it can be costly to compute the full set of planning graphs and extract relaxed plans. We will describe computing the full set of planning graphs, but will later evaluate (in Section 6.4) the effect of computing a smaller proportion of these. The single graph is the extreme case of computing fewer graphs.
To illustrate the use of multiple planning graphs, consider our example CBTC. We build two graphs (Figure 6) for the projected . They have the respective initial literal layers:
arm, clog, inP1, inP2 and
arm, clog, inP2, inP2 .
In the graph for the first possible world, arm comes in only through DunkP1 at level 2. In the graph for the second world, arm comes in only through DunkP2 at level 2. Thus, the multiple graphs show which actions in the different worlds contribute to support the same literal.
A single planning graph is sufficient if we do not aggregate state measures, so in the following we consider how to compute the achievement cost of a belief state with multiple graphs by aggregating state distances.
Positive Interaction Aggregation: Similar to GPT , we can use the worst-case world to represent the cost of the belief state by using the heuristic. The difference with GPT is that we compute a heuristic on planning graphs, where they compute plans in state space. With this heuristic we account for the number of actions used in a given world, but assume positive interaction across all possible worlds.
The heuristic is computed by finding a relaxed plan on each planning graph , exactly as done on the single graph with . The difference is that unlike the single graph relaxed plan , but like , the initial levels of the planning graphs are states, so each relaxed plan will reflect all the support needed in the world corresponding to . Formally:
Notice that we are not computing all state distances between states in and . Each planning graph corresponds to a state in , and from each we extract a single relaxed plan. We do not need to enumerate all states in and find a relaxed plan for each. We instead support a set of literals from one constituent of . This constituent is estimated to be the minimum distance state in because it is the first constituent reached in .
For CBTC, computing (Figure 6) finds:
inP1 , Flush ,
inP1 , Flush },
inP1 clog ,
clog , DunkP1 ,
clog , DunkP1 },
inP2 , Flush ,
inP2 , Flush },
inP2 clog ,
clog , DunkP2 ,
clog , DunkP2 },
arm clog .
Each relaxed plan contains two actions and taking the maximum of the two relaxed plan values gives . This aggregation ignores the fact that we must use different Dunk actions each possible world.
Independence Aggregation: We can use the heuristic to assume independence among the worlds in our belief state. We extract relaxed plans exactly as described in the previous heuristic and simply use a summation rather than maximization of the relaxed plan costs. Formally:
For CBTC, if computing , we find the same relaxed plans as in the heuristic, but sum their values to get 2 + 2 = 4 as our heuristic. This aggregation ignores the fact that we can use the same Flush action for both possible worlds.
State Overlap Aggregation: We notice that in the two previous heuristics we are either taking a maximization and not accounting for some actions, or taking a summation and possibly accounting for extra actions. We present the heuristic to balance the measure between positive interaction and independence of worlds. Examining the relaxed plans computed by the two previous heuristics for the CBTC example, we see that the relaxed plans extracted from each graph have some overlap. Notice, that both and contain a Flush action irrespective of which package the bomb is in - showing some positive interaction. Also, contains DunkP1, and contains DunkP2 - showing some independence. If we take the layer-wise union of the two relaxed plans, we would get a unioned relaxed plan:
inP1 , Flush ,
inP1 , inP2 , Flush },
inP1, inP2 clog ,
clog , DunkP1, DunkP2 ,
clog , DunkP1 , DunkP2 },
arm clog .
This relaxed plans accounts for the actions that are the same between possible worlds and the actions that differ. Notice that Flush appears only once in layer zero and the Dunk actions both appear in layer one.
In order to get the union of relaxed plans, we extract relaxed plans from each , as in the two previous heuristics. Then if we are computing heuristics for regression search, we start at the last level (and repeat for each level) by taking the union of the sets of actions for each relaxed plan at each level into another relaxed plan. The relaxed plans are end-aligned, hence the unioning of levels proceeds from the last layer of each relaxed plan to create the last layer of the relaxed plan, then the second to last layer for each relaxed plan is unioned and so on. In progression search, the relaxed plans are start-aligned to reflect that they all start at the same time, whereas in regression we assume they all end at the same time. The summation of the number of actions of each action level in the unioned relaxed plan is used as the heuristic value. Formally:
For CBTC, we just found , so counting the number of actions gives us a heuristic value of .