The heuristics computed on the can capture measures similar to the heuristics, but there exists a new opportunity to make use of labels to improve heuristic computation efficiency. A single planning graph is sufficient if there is no state aggregation being measured, so we do not mention such measures for the .
Positive Interaction Aggregation: Unlike heuristics, we do not compute positive interaction based relaxed plans on the . The approach to measure positive interaction across each state in a belief state is to compute multiple relaxed plans and take their maximum value. To get the same measure on the we would still need to extract multiple relaxed plans, the situation we are trying to avoid by using the . While the graph construction overhead may be lowered by using the , the heuristic computation could take too long. Hence, we do not compute relaxed plans on the to measure positive interaction alone, but we do compute relaxed plans that measure overlap (which measures positive interaction).
Independence Aggregation: Like positive interaction aggregation, we need a relaxed plan for every state in the projected belief state to find the summation of the costs. Hence, we do not compute relaxed plans that assume independence.
State Overlap Aggregation: A relaxed plan extracted from the to get the heuristic resembles the unioned relaxed plan in the heuristic. Recall that the heuristic extracts a relaxed plan from each of the multiple planning graphs (one for each possible world) and unions the set of actions chosen at each level in each of the relaxed plans. The relaxed plan heuristic is similar in that it counts actions that have positive interaction in multiple worlds only once and accounts for independent actions that are used in subsets of the possible worlds. The advantage of is that we find these actions with a single pass on one planning graph.
We are trading the cost of computing multiple relaxed plans for the cost of manipulating labels to determine what lines of causal support are used in what worlds. In the relaxed plan we want to support the goal with every state in , but in doing so we need to track which states in use which paths in the planning graph. A subgoal may have several different (and possibly overlapping) paths from the worlds in .
A relaxed plan is a set of layers: , where is a set of actions, is a set of effects, and is a set of clauses. The elements of the layers are labelled to indicate the worlds of where they are chosen for support. The relaxed plan is extracted from the level (i.e., the first level where is reachable, also described in Appendix A).
Please note that we are extracting the relaxed plan for in terms of clauses, and not literals, which is different than the and versions of relaxed plans. Previously we found the constituent of that was first reached on a planning graph and now we do not commit to any one constituent. Our rationale is that we were possibly using different constituents in each of the multiple graphs, and in this condensed version of the multiple graphs we still want to be able to support different constituents of the in different worlds. We could also use the constituent representation of in defining the layers of the relaxed plan, but choose the clausal representation of instead because we know that we have to support each clause. However with constituents we know we only need to support one (but we don't need to know which one).
The relaxed plan, shown in bold in Figure 7, for to reach in CBTC is listed as follows:
We start by forming with the clauses in , namely arm and clog; we label the clauses with because they need to be supported by all states in our belief state. Next, we support each clause in with the relevant effects from to form . For clog we use persistence because it supports clog in all worlds described by (this is an example of positive interaction of worlds). For arm the relevant effects are the respective from each Dunk action. We choose both effects to support arm because we need to support arm in all worlds of , and each effect gives support in only one world (this is an example of independence of worlds). We then insert the actions associated with each chosen effect into with the appropriate label indicating the worlds where it was needed, which in general is fewer worlds than where it is reachable (i.e. it is always the case that ). Next we form with the execution preconditions of actions in and antecedents of effects in , which are clog, inP1, and inP2, labelled with all worlds where an action or effect needed them. In the same fashion as level two, we support the literals at level one, using persistence for inP1 and inP2, and Flush for clog. We stop here, because we have supported all clauses at level one.
For the general case, extraction starts at the level where is first reachable from . The first relaxed plan layers we construct are , where contains all clauses , labelled as .
For each level , , we support each clause in by choosing relevant effects from to form . An effect is relevant if it is reachable in some of the worlds where we need to support (i.e. ) and the consequent gives a literal . For each clause, we have to choose enough supporting effects so that the chosen effect worlds are a superset of the worlds we need to support the clause, formally:
We think of supporting a clause in a set of worlds as a set cover problem where effects cover subsets of worlds. Our algorithm to cover the worlds of a clause with worlds of effects is a variant of the well known greedy algorithm for set cover . We first choose all relevant persistence effects that can cover worlds, then choose action effects that cover the most new worlds. Each effect we choose for support is added to and labelled with the new worlds it covered for . Once all clauses in are covered, we form the action layer as all actions that have an effect in . The actions in are labelled to indicate all worlds where any of their effects were labelled in .
We obtain the next subgoal layer, , by adding literals from the execution preconditions of actions in and antecedents of effects in . Each literal is labelled to indicate all worlds where any action or effect requires . We support the literals in in the same fashion as . We continue to support literals with effects, insert actions, and insert action and effect preconditions until we have supported all literals in .
Once we get a relaxed plan, the relaxed plan heuristic, , is the summation of the number of actions in each action layer, formally:
Thus in our CBTC example we have . Notice that if we construct the without mutexes for CBTC we reach the goal after two layers. If we had included mutexes the , then it would reach the goal after three layers. The way we use mutexes will not change our relaxed plan because we do not use mutexes to influence relaxed plan extraction. Mutexes only help to identify when a the belief state is not reachable from .