Next: Using Macros at Run-Time Up: Using Macros from Solutions Previous: Motivation

## Generating Macros

As a running example, we will use the solution plan for problem 1 in the Satellite domain shown in Figure 12. For each step, the figure shows the order in the linear plan, the action name, the argument list, the preconditions, and the effects. To keep the picture simple, we ignore static preconditions of actions. Static facts never occur as action effects, and therefore do not affect the interactions between preconditions and effects of actions.

In SOL-EP, macro-operators are extracted from the solutions of the training problems. Each training problem is first solved with no macros in use. The found plan can be represented as a solution graph, where each node represents a plan step (action), and edges model interactions between solution steps. Building the solution graph is step 1 (analysis) in our general four-step pattern. In IPC-4 we used a first implementation of the solution graph, that considers interactions only between two consecutive actions of a plan. Here an interaction is defined if the two actions have at least one common argument, or at least one action has no arguments at all. Hence the implementation described in this article extracts only such two-action sequences as possible macros.

The macro-actions extracted from a solution are translated into macro-operators by replacing their instantiated arguments with generic variables. This operation preserves the relative mapping between the arguments of the contained actions. Macro-actions with different sets of arguments can result in the same macro-operator. For the Satellite solution in Figure 12, the sequence TURN-TO followed by TAKE-IMAGE occurs three times. After replacing the constant arguments with generic variables, all occurrences yield the same macro-operator.

There are many pairs of actions in a solution, and a decision must be made as to which ones are going to beneficial as macro-operators in a search. Macros are statically filtered according to the rules of Section 2.2.1 excluding the limitation of the number of preconditions, which is not critical in this algorithm, and the locality rule. Also, as said before, we use a different version of the chaining rule. We request that the operators of a macro have common variables, unless an operator has 0 parameters.

Macro-operators are stored in a global list ordered by their weight, with smaller being better. Weights are initialized to 1.0 and updated in a dynamic ranking process using a gradient-descent method.

For each macro-operator extracted from the solution of a training problem, we re-solve the problem with in use. Let be the solution length when no macros are used, the number of nodes expanded to solve the problem with no macros, and the number of expanded nodes when macro is used. Then we use the difference to update , the weight of macro . Since can take arbitrarily large values, we map it to a new value in the interval by

where is the sigmoid function

Function generates the curve shown in Figure 13. This particular definition of was chosen because it is symmetric in (i.e., ) and bounded within the interval . In particular, the symmetry property ensures that, if , than the weight update of at the current training step is 0. The size of the boundary interval has no effect on the ranking procedure, it only scales all weight updates by a constant multiplicative factor. We used a sigmoid function bounded to as a canonical representation, which limits the absolute value of between 0 and 1.

The update formula also contains a factor that measures the difficulty of the training instance. The harder the problem, the larger the weight update should be. We use as the difficulty factor the solution length rather than , since the former has a smaller variance over a training problem set. The formula for updating is

where is a small constant (0.001 in our implementation). The value of does not affect the ranking of macros. It was used only to keep macro weights within the vicinity of 1. See the second part of Section 3.4 for a comparison between CA-ED's frequency-based ranking and SOL-EP's gradient-descent ranking.

In CA-ED, only two macros are kept for future use, given the large extra-costs associated with this type of macros. In SOL-EP we allow an arbitrary (but still small) number of macros to be used in search, given the smaller extra-costs involved. SOL-EP macros have no preprocessing costs, and the cost per node in the search can be much smaller than in the case of CA-ED macros (see Table 4).

To decide the number of selected macros in a domain, a weight threshold is defined. This threshold can be seen as the weight of an imaginary macro with constant performance'' in all training instances. By constant performance'' we mean that, for each training instance,

where is a constant parameter. The threshold is updated following the same procedure as for regular macros: The initial value of is set to 1. For each training problem, the weight update of is

After all training problems have been processed, macros with a weight smaller than are selected for future use. In experiments we set to 0.01. Given the competition tight deadline, we invested limited time in studying this method and tuning its parameters. How to best determine the number of selected macros is still an open problem for us, which clearly needs more thourough study and evaluation.

Next: Using Macros at Run-Time Up: Using Macros from Solutions Previous: Motivation
Adi Botea 2005-08-01