Eager Derivation Replay

Next: Eager Case Extension and Up: Learning from Case Failure Previous: Learning from Case Failure

Eager Derivation Replay

The derivation trace contains a sequence of instructions representing the choices that lie along the derivation path leading from the root of the search tree to the final plan in the leaf node. The trace is fitted to the context of the new search process by validating each choice in the new context, and replaying the decision if valid. In order to understand the validation process, we must first describe the decision steps that the planner takes in arriving at a solution to a planning problem. A planning problem is a 3-tuple $\langle I, G , {\cal A} \rangle$ ,where I is a complete description of the initial state, G is the description of the goal state, and $\cal A$ is the set of operators in STRIPS representation [7]. A ground operator sequence is said to be a solution for a planning problem if it can be executed from the initial state, and the resulting state of the world satisfies the goal.

**Figure 5:** An Example of a case failure reason
$\begin{figure} {\footnotesize \begin{center} \fbox { \begin{tabular} {ll} Case... ... $AT-OB OB2 $ l_p) \rangle\,\,\,\} $\ \end{tabular}} \end{center}}\end{figure}$

DERSNLP+EBL is a refinement planner that solves a planning problem by navigating a space of potential solutions, each represented as a partly constructed plan. Syntactically, a plan in this space $\cal P$ can be seen as a set of constraints (see below). Semantically, a partial plan is a shorthand notation for the set of ground operator sequences that are consistent with its constraints. The latter is called the candidate set of the partial plan, and is denoted by $\langle \mkern-4mu\langle\cal P\rangle \mkern-4mu\rangle$ .In particular, a partial plan is represented as a 6-tuple, $\langle {\cal S}, {\cal O}, {\cal B}, {\cal L} , {\cal E} , {\cal C} \rangle$ ,where

1.: $\cal S$ is the set of actions (step names) in the plan, each of which is mapped onto an operator in the domain theory. $\cal S$ contains two dummy steps: t_I whose effects are the initial state conditions, and t_G whose preconditions are the input goals, G.
2.: $\cal B$ is a set of codesignation (binding) and non-codesignation (prohibited binding) constraints on the variables appearing in the preconditions and post-conditions of the operators which are represented in the plan steps, $\cal S$ .
3.: ${\cal O}$ is a partial ordering relation on $\cal S$ , representing the ordering constraints over the steps in $\cal S$ .
4.: $\cal L$ is a set of causal links of the form $\langle s , p , s' \rangle$ where $s , s' \in {\cal S}$ .
5.: ${\cal E}$ contains step effects, represented as $\langle s, e \rangle$ ,where $s \in \cal S$ .
6.: ${\cal C}$ is a set of open conditions of the partial plan, each of which is a tuple $\langle p , s \rangle$ such that p is a precondition of step s and there is no link supporting p at s in $\cal L$ .

Planning consists of starting with a null plan (denoted by $\cal P_\emptyset$ ) , whose candidate set corresponds to all possible ground operator sequences, and successively refining the plan by adding constraints until a solution is reached. Each planning decision represents a choice as to how to resolve an existing flaw in the plan, which is either an open condition (unachieved goal) or a threat to a causal link. To understand how these choices are validated during the replay process it is useful to think of a planning decision as an operator acting on a partly-constructed plan. The possible choices available to DERSNLP+EBL are shown in Figure 6.

**Figure 6:** Planning decisions are based on the active plan $\langle {\cal S,O,B,L,E,C} \rangle$ and have effects which alter the constraints so as to produce the new current active plan $\langle {\cal S',O',B',L',E',C'} \rangle$ .
$\begin{figure} {\tiny \begin{center} \begin{tabular} {\vert l\vert l\vert l\ver... ... \,\, preconditions(s)$\ ~\ \cline{1-1}\end{tabular}\end{center}}\end{figure}$

Planning decisions have preconditions which are based on the existence of a flaw in the current active plan and have effects which alter the constraints so as to eliminate the flaw. For example, the precondition of an establishment choice is specified in terms of the existence of an unachieved subgoal. The effect is the addition of a causal link that achieves this open condition. The precondition of a resolution decision is a threat in that one step is clobbering an existing causal link. A threat is resolved by adding a step ordering which either promotes or demotes the clobberer relative to the causal link.

Before a decision is replayed, it is first compared with the current active plan to determine whether its precondition holds in the new context. Invalid decisions, those whose preconditions don't match, are skipped. Establishment decisions are ignored if the goals they achieve are not present as open conditions in the current active plan. Threat resolutions are skipped if the threat is not present. Previous choices which are justified in the current situation are used as guidance to direct the new search process. Replaying a valid decision involves selecting a match for that decision from the children of the current active plan, and making this child the next plan refinement.

DERSNLP+EBL's eager derivation replay strategy replays all of the applicable decisions in the trace in sequence. This replay strategy can be contrasted with that of PRODIGY/ANALOGY [40] where replay is alternated with from-scratch planning for extra goals not covered by a case. In eager derivation replay each previous decision is eagerly adopted if justified in the current context. Since invalid instructions have been skipped, the skeletal plan which is the end result of replay is comparable to the product of the fitting phase in plan reuse [22,13]. In contrast to plan reuse, derivation replay does not alter the underlying planning strategy. Replay merely provides search control, directing the search as to which node to visit next. This means that DERSNLP+EBL inherits all of the properties of SNLP, including soundness, completeness, and systematicity.

A sample trace of SNLP's decision process is shown in Figure 7.

**Figure 7:** An Example solution trace for DERSNLP+EBL
$\begin{figure} {\tiny \begin{center} \begin{tabular} {\vert l\vert l\vert l\vert... ...3 < 2) (2 < 1))}\ \cline{1-1}\cline{3-3}\end{tabular}\end{center}}\end{figure}$

The trace corresponds to a simple problem from the logistics transportation domain of [40] adapted for SNLP as in Figure 4. This problem contains the goal of getting a single package, OB1, to a designated airport, l_d. The derivation trace contains the choices that were made along the path from the root of the search tree to the final plan in the leaf node. Instructions contain a description of both the decision taken and its basis for justification in a new context.

Next: Eager Case Extension and Up: Learning from Case Failure Previous: Learning from Case Failure

11/5/1997