Modifying the Relaxed Planning Graph

It is necessary to modify the Relaxed Planning Graph expansion and plan-extraction phases to make it possible to apply the heuristic when the domain contains ADL actions. Work has been done on extending full graphplan planning graphs to reason with a subset of ADL actions [Koehler et al. 1997]; the approach taken in Marvin extends the relaxed planning graph structure to handle all of the available ADL constructs. The effect of the modifications is that the same heuristic estimate is obtained as if a precompiled STRIPS domain formulation was used.

When building a conventional relaxed planning graph the assumption is made that, in the first fact layer, all the facts present in the state to be evaluated are true and all other facts are, implicitly, false. Facts are then gradually accumulated by the application of actions, add effects adding facts to the spike [Long & Fox 1999]. Actions become applicable when their preconditions are all present; i.e. they have all been accumulated. The STRIPS actions used to build a conventional relaxed planning graph necessarily have no negative preconditions, so it is sufficient to consider when facts have a positive truth value and determine action applicability from this. ADL actions, however, can also have negative preconditions, corresponding to facts which must be false. Within a conventional relaxed planning graph, no record is made of whether it is possible for a given fact to have a negative truth value.

To handle negative facts within the relaxed planning graph used in Marvin, a second spike is added. As with the positive-fact spike, all the facts present in the state to be evaluated are true and all other facts are, implicitly, false. However, unlike the positive-fact spike, facts are then gradually eroded by the applications of actions; with their delete effects marking the fact in the negative-fact spike as having been deleted. The inherent relaxation on which the relaxed planning graph is founded is still preserved, though: delete effects have no effect on the positive-fact spike; and, similarly, add effects have no effect on the negative-fact spike.

If a precompiled STRIPS domain formulation was used, additional complimentary propositions are added to denote when each proposition is not true. These accumulate alongside the original domain propositions, and in this way are able to satisfy negative preconditions. The negative fact spike, as discussed, has the same effect, although rather than recording which propositions are available in a negated form at each layer, it records which propositions are not available in a negated form.

As discussed, ADL action preconditions are preprocessed such that negation is only applied to the leaves of the satisfaction tree; i.e. only applied to unit facts forming part of the actions' precondition structures. Within the relaxed planning graph a given fact leaf can now be marked as satisfied if either one of the following holds:

It is a positive fact leaf, and the fact contained therein has been added to the positive-fact spike.
It is a negative fact leaf, and the fact contained therein has either never been in the negative-fact spike or has since been marked as deleted.

Plan graph construction proceeds in a manner similar to that used to build a conventional relaxed planning graph. Each of the newly present or newly deleted facts are considered in turn, and their effects on the applicability of all available actions noted. Should the updating of the satisfaction tree of an action lead to it becoming applicable:

The action is added to the action spike, available at the next fact layer.
Previously unseen add effects are added to the positive-fact spike, available at the next fact layer.
Delete effects deleting a fact still present in the negative-fact spike mark the fact as being deleted and available to satisfy negative preconditions from the next fact layer.

For efficiency, the first action to achieve each fact is stored when it is added to the positive-fact spike, along with the earliest layer at which that action is applicable. Similarly, the first action that deletes each fact that has ever been in the negative-fact spike is noted. Relaxed plan extraction consists of regressing through the layers of the relaxed planning graph, selecting actions that achieve the goals that are to be achieved at each layer. Initially, each proposition in the goal state is added to the layer of goals for the layer in which it first appears (or disappears, in the case of negative goals). To extract a plan, the next goal is repeatedly taken from the deepest action layer with outstanding goals. Its first achieving action is added to the plan and its preconditions, taken from its satisfaction tree, are added to the goals for the first layer in which they appear. The process finishes when there are no more outstanding goals at any layer. If a sub-action (that is, an action created to represent the conditional effect of an ADL action, see Section 4.2) is chosen to achieve a given proposition, the preconditions of its parent action(s) are also added to the goals for the first layer in which they appear.

When considering adding the preconditions of an achieving action to the layer in which they appear, a collection of disjunctive preconditions may arise. In this situation, the first satisfied precondition or negative precondition in the disjunction is added as a subgoal in an earlier layer. This avoids adding many redundant actions to satisfy each of a the disjunctive preconditions, where only one needs to be satisfied. The precondition chosen to be satisfied from each collection of disjunctive preconditions is the first for which an achiever was found when building the relaxed planning graph, thus providing the same heuristic estimate as if the compiled STRIPS domain formulation was used. In the compiled STRIPS domain formulation, the disjunctive precondition would give rise to several action instantiations; the first applicable of these would be chosen as the achiever for the desired fact.

At the start of the planning process, a relaxed planning graph is constructed forward from the initial state. However, rather than stopping when the goal literals appear, graph construction stops when no more ground actions become applicable. The actions and propositions appearing in this relaxed planning graph are a superset of all the actions and propositions appearing in later relaxed planning graphs: these actions and propositions discovered are then used to form a cache detailing the proposition-action dependencies. Using this cached information, the code shown in Figure 6 can be used to determine the actions applicable in a given state, and the relaxed planning graphs used to calculate heuristic values can be extracted more efficiently.

Andrew Coles and Amanda Smith 2007-01-09