next up previous
Next: Using Macros at Run-Time Up: Using Macros from Solutions Previous: Motivation


Generating Macros

As a running example, we will use the solution plan for problem 1 in the Satellite domain shown in Figure 12. For each step, the figure shows the order in the linear plan, the action name, the argument list, the preconditions, and the effects. To keep the picture simple, we ignore static preconditions of actions. Static facts never occur as action effects, and therefore do not affect the interactions between preconditions and effects of actions.

Figure 12: The solution steps of problem 1 in the Satellite benchmark.
\includegraphics[width=\linewidth]{exampleipc4.eps}

In SOL-EP, macro-operators are extracted from the solutions of the training problems. Each training problem is first solved with no macros in use. The found plan can be represented as a solution graph, where each node represents a plan step (action), and edges model interactions between solution steps. Building the solution graph is step 1 (analysis) in our general four-step pattern. In IPC-4 we used a first implementation of the solution graph, that considers interactions only between two consecutive actions of a plan. Here an interaction is defined if the two actions have at least one common argument, or at least one action has no arguments at all. Hence the implementation described in this article extracts only such two-action sequences as possible macros.

The macro-actions extracted from a solution are translated into macro-operators by replacing their instantiated arguments with generic variables. This operation preserves the relative mapping between the arguments of the contained actions. Macro-actions with different sets of arguments can result in the same macro-operator. For the Satellite solution in Figure 12, the sequence TURN-TO followed by TAKE-IMAGE occurs three times. After replacing the constant arguments with generic variables, all occurrences yield the same macro-operator.

There are many pairs of actions in a solution, and a decision must be made as to which ones are going to beneficial as macro-operators in a search. Macros are statically filtered according to the rules of Section 2.2.1 excluding the limitation of the number of preconditions, which is not critical in this algorithm, and the locality rule. Also, as said before, we use a different version of the chaining rule. We request that the operators of a macro have common variables, unless an operator has 0 parameters.

Macro-operators are stored in a global list ordered by their weight, with smaller being better. Weights are initialized to 1.0 and updated in a dynamic ranking process using a gradient-descent method.

For each macro-operator $m$ extracted from the solution of a training problem, we re-solve the problem with $m$ in use. Let $L$ be the solution length when no macros are used, $N$ the number of nodes expanded to solve the problem with no macros, and $N_m$ the number of expanded nodes when macro $m$ is used. Then we use the difference $N - N_m$ to update $w_m$, the weight of macro $m$. Since $N - N_m$ can take arbitrarily large values, we map it to a new value in the interval $(-1, 1)$ by

\begin{displaymath}\delta_m = \sigma ( \frac{N - N_m}{N} ) \end{displaymath}

where $\sigma$ is the sigmoid function

\begin{displaymath}\sigma(x) = \frac{2}{1 + e^{-x}} - 1. \end{displaymath}

Function $\sigma$ generates the curve shown in Figure 13. This particular definition of $\sigma$ was chosen because it is symmetric in $(0,0)$ (i.e., $\sigma(x) = -\sigma(-x)$) and bounded within the interval $(-1, 1)$. In particular, the symmetry property ensures that, if $N_m=N$, than the weight update of $m$ at the current training step is 0. The size of the boundary interval has no effect on the ranking procedure, it only scales all weight updates by a constant multiplicative factor. We used a sigmoid function bounded to $(-1, 1)$ as a canonical representation, which limits the absolute value of $\delta_m$ between 0 and 1.

Figure 13: Sigmoid function.
\includegraphics[width=.5\linewidth]{sigmoid.ps}

The update formula also contains a factor that measures the difficulty of the training instance. The harder the problem, the larger the weight update should be. We use as the difficulty factor the solution length $L$ rather than $N$, since the former has a smaller variance over a training problem set. The formula for updating $w_m$ is

\begin{displaymath}w_m = w_m - \alpha \delta_m L \end{displaymath}

where $\alpha$ is a small constant (0.001 in our implementation). The value of $\alpha$ does not affect the ranking of macros. It was used only to keep macro weights within the vicinity of 1. See the second part of Section 3.4 for a comparison between CA-ED's frequency-based ranking and SOL-EP's gradient-descent ranking.

In CA-ED, only two macros are kept for future use, given the large extra-costs associated with this type of macros. In SOL-EP we allow an arbitrary (but still small) number of macros to be used in search, given the smaller extra-costs involved. SOL-EP macros have no preprocessing costs, and the cost per node in the search can be much smaller than in the case of CA-ED macros (see Table 4).

To decide the number of selected macros in a domain, a weight threshold $w_{im}$ is defined. This threshold can be seen as the weight of an imaginary macro $im$ with ``constant performance'' in all training instances. By ``constant performance'' we mean that, for each training instance,

\begin{displaymath}\frac{N - N_{im}}{N} = c,\end{displaymath}

where $c > 0$ is a constant parameter. The threshold $w_{im}$ is updated following the same procedure as for regular macros: The initial value of $w_{im}$ is set to 1. For each training problem, the weight update of $im$ is

\begin{displaymath}w_{im} = w_{im} - \alpha \delta_{im} L =
w_{im} - \alpha \sigma(c) L \end{displaymath}

After all training problems have been processed, macros with a weight smaller than $w_{im}$ are selected for future use. In experiments we set $c$ to 0.01. Given the competition tight deadline, we invested limited time in studying this method and tuning its parameters. How to best determine the number of selected macros is still an open problem for us, which clearly needs more thourough study and evaluation.


next up previous
Next: Using Macros at Run-Time Up: Using Macros from Solutions Previous: Motivation
Adi Botea 2005-08-01