Macro-Actions in Use

The structure and reusability of macro-actions depends on the underlying topology of the problem space under the given heuristic function. When a problem space contains many similar occurrences of the same plateaux (which happens when a problem contains much repeating structure) the effort involved in learning macro-actions to escape these plateaux efficiently can be richly rewarded. In principle, the most benefit is obtained when the problem space features large, frequently recurring plateaux, since large plateaux are the most time-consuming to explore and the effort would need to be repeated on every similar occurrence. Short macro-actions (of two or three actions) indicate that the problem space contains small plateaux (although these might arise frequently enough for learned macro-actions to still be beneficial).

Problems with repeating structure include: transportation problems, where the same basic sequences of actions have to be repeated to move groups of objects from their sources to their destinations; construction problems, in which many similar components need to be built and then combined into a finished artifact; and configuration problems, in which multiple components of an architecture need to go through very similar processes to complete their functions, etc. The Dining Philosophers and Towers of Hanoi problems are good examples of problems with repeating structure.

Although using macro-actions during search has advantages--they can offer search guidance and allow many actions to be planned in one step--considering them during the expansion of each state increases the branching factor. Thus, if a large number of unproductive macro-actions are generated the search space will become larger, making the problem harder, not easier, to solve. Whilst many of the plateau-escaping sequences are helpful in planning, some are specific to the situation in which they were derived, a situation which might not occur again in the plan. As macro-actions are learnt during the planning process--and there is no human intuition, or large test suite, to allow reusable macro-actions to be identified--care must be taken when deciding the points at which to consider their use in the planning process.

Plateau-escaping macro-actions are generated from situations in which the heuristic has broken down; therefore, the heuristic can be used as an indicator of when they are likely to be useful again during planning. As areas of repeating structure within the solution plan involve the application of similar (or identical) sequences of actions, they are likely to have similar heuristic profiles. In the case of plateau-escaping action sequences, the heuristic profile of the search landscape at their application is an initial increase (or no-change) of heuristic value, eventually followed by a fall to below the initial level--the profile occurring at a local minimum. If the plateau-escaping macro-actions are to be reusable, it is likely that the re-use will occur when the planning process is in a similar situation. As such, they are only considered for application in the exhaustive search step used to escape plateaux (both at the start or at any other point on a plateau).

Situations may arise where the use of macro-actions increases the makespan of the resulting plan due to redundant action sequences. For example, if in a simple game domain--with actions to move up, down, left or right-- a macro-action is formed for `left, left, left, left' and the optimal action sequence to escape a given plateau is `left, left, left' then `{left, left, left, left}, right' may be chosen if the state reached by moving left four times is heuristically better than the one reached by applying a single-step `left' action. Thus, macro-actions can have an adverse effect on plan quality.

Within the problem domains presented in IPC 4 [Hoffmann & Edelkamp 2005] was the encoding of the Dining Philosophers problem, translated from Promela into a PDDL encoding. When solving this problem, two important macro-actions are formed: an eleven-step macro-action upon completion of the first period of exhaustive search; and a three-step macro-action upon completion of the second. The solution plan requires these macro-actions to be repeated many times, something which now--as a result of the macro-actions--involves simply applying a single action that results in a strictly better state. Without the macro-actions, the planning process consists of repeated episodes of exhaustive search to find the same two sequences of actions each time.

**Figure 5:** Heuristic landscape over makespan, with and without macro-actions.
$\includegraphics[width=4in]{landscape}$

This behaviour can be seen in Figure 5 depicting the heuristic values of states generated with and without macro-actions, across the solution plan for the IPC 4 Dining Philosophers problem involving 14 philosophers. Initially, no macro-actions have been learnt so the search done by both approaches is identical. For the first 14 action choices the value of the heuristic, shown by the line in the graph, moves monotonically downwards as the planner is able to find actions to apply that lead to strictly better states.

After time step 14, the heuristic value begins to oscillate, at this point the planner has reached a plateau: there is no state with a strictly better heuristic value that can be reached by the application of just one action. As this is the first plateau reached, no macro-actions have been generated so the heuristic profiles are identical for both configurations. At time step 25 a state is reached that has a better heuristic value than that at time step 14. It is at this time that the plateau-escaping macro-action will be generated, memoising a lifted version of the sequence of actions that was used to escape the plateau. A brief period of search in which a strictly better state can be found at each choice point follows before the planner again hits a plateau.

The subsequent six plateaux consist of applying the same sequence of actions to six further pairs of philosophers; it can be seen that the heuristic fingerprints of the plateaux are identical. The version of Marvin in which macro-actions have been disabled repeats the expensive exhaustive search at each plateau: the heuristic value again goes through the process of increasing and then decreasing again before reaching a strictly-better state. The version using the plateau-escaping macro-actions, however, now has a single action to apply that achieves a strictly better state and search continues, stepping over the subsequent plateaux through the selection of macro-actions that yield strictly-better states.

When all of the larger plateaux have been overcome, a series of smaller plateaux are encountered. Again, it can be seen that for the first of these, both versions must complete a stage of exhaustive search; however, after the first of the smaller plateaux has been completed, the macro-action formed allows the subsequent plateaux to be bypassed. Finally, the plan finishes with a short previously unseen sequence of actions, where both versions must do exhaustive search.

Andrew Coles and Amanda Smith 2007-01-09