Journal of Artificial Intelligence Research 23 (2005)  533-585                                                                           Submitted 4/04;  published 5/05

© 2005 AI Access Foundation. All rights reserved.

Using Memory to Transform Search on the Planning Graph

Terry Zimmerman                                                                                    WIZIM@CS.CMU.EDU

Robotics Institute, Carnegie Mellon University

Pittsburgh, PA 15213-3890

Subbarao Kambhampati                                                                                 RAO@ASU.EDU

Department of Computer Science & Engineering

Arizona State University, Tempe AZ 85287-5406

 

Abstract

   The Graphplan algorithm for generating optimal make-span plans containing parallel sets of actions remains one of the most effective ways to generate such plans.  However, despite enhancements on a range of fronts, the approach is currently dominated in terms of speed, by state space planners that employ distance-based heuristics to quickly generate serial plans.  We report on a family of strategies that employ available memory to construct a search trace so as to learn from various aspects of Graphplan’s iterative search episodes in order to expedite search in subsequent episodes.  The planning approaches can be partitioned into two classes according to the type and extent of search experience captured in the trace.  The  planners using the more aggressive tracing method are able to avoid much of Graphplan’s redundant search effort, while planners in the second class trade off this aspect in favor of a much higher degree of freedom than Graphplan in traversing the space of ‘states’ generated during regression search on the planning graph.  The tactic favored by the second approach, exploiting the search trace to transform the depth-first, IDA* nature of Graphplan’s search into an iterative state space view, is shown to be the more powerful.  We demonstrate that distance-based, state space heuristics can be adapted to informed traversal of the search trace used by the second class of planners and develop an augmentation targeted specifically at planning graph search.  Guided by such a heuristic, the step-optimal version of the planner in this class clearly dominates even a highly enhanced version of Graphplan.  By adopting beam search on the search trace we then show that virtually optimal parallel plans can be generated at speeds quite competitive with a modern heuristic state space planner.

1.   Introduction

When Graphplan was introduced in 1995 (Blum & Furst, 1995) it became one of the fastest programs for solving the benchmark planning problems of that time and, by most accounts, constituted a radically different approach to automated planning.  Despite the recent dominance of heuristic state-search planners over Graphplan-style planners, the Graphplan approach is still one of the most effective ways to generate the so-called “optimal parallel plans”. State-space planners are drowned by the exponential branching factors of the search space of parallel plans (the exponential branching is a result of the fact that the planner needs to consider each subset of non-interfering actions).  Over the 8 years since its introduction, the Graphplan system has been enhanced on numerous fronts, ranging from planning graph construction efficiencies that reduce both its size and build time by one or more orders of magnitude (Smith & Weld, 1998; Long & Fox, 1999), to search speedup techniques such as variable and value ordering, dependency-directed backtracking, and explanation based learning (Kambhampati, 2000).  In spite of these advances, Graphplan has ceded the lead in planning speed to a variety of heuristic-guided planners (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000; Gerevini & Serina, 2002).  Notably, several of these exploit the planning graph for powerful state-space heuristics, while eschewing search on the graph itself.  Nonetheless, the Graphplan approach remains perhaps the fastest in parallel planning mainly because of the way it combines an iterative deepening A* (“IDA*”, Korf, 1985) search style with a highly efficient CSP-based incremental generation of applicable action subsets. 

We investigate here the use of available memory so as to surmount some of Graphplan’s major drawbacks, such as redundant search effort and the need to exhaustively search a k-length planning graph before proceeding to the k+1 length graph.  At the same time we wish to retain attractive features of Graphplan’s IDA* search such as rapid generation of parallel action steps and the ability to find step optimal plans.  The approach we describe remains rooted in iterative search on the planning graph but greatly expedites this search by building and maintaining a concise search trace. 

Graphplan alternates between two phases; one in which a data structure called a “planning graph” is incrementally extended, and a backward phase where the planning graph is searched to extract a valid plan. After the first regression search phase the space explored in any given episode is closely correlated with that conducted in the preceding episode.  The strategy we pursue in this work is to employ an appropriately designed trace of the search conducted in episode n (which failed to find a solution) to identify and avoid those aspects of the search that are provably unchanged in episode n+1, and focus effort on features that may have evolved.  We have identified precisely which features are dynamic across Graphplan search episodes and construct search traces that capture and exploit these features to different degrees.  Depending on its design a search trace may provide benefits such as 1) avoidance of much of Graphplan’s redundant search effort, 2) learning from its iterative search experience so as to improve its heuristics and the constraints embodied in the planning graph, and 3) realizing a much higher degree of freedom than Graphplan, in traversing the space of ‘states’ generated during the regression search process.   We will show that the third advantage is particularly key to search trace effectiveness, as it allows the planner to focus its attention on the most promising areas of the search space.

The issue of how much memory is the ‘right’ amount to use to boost an algorithm’s performance cuts across a range of computational approaches from search to the paging process in operating systems, and Internet browsing to database processing operations.  In our investigation we explore several alternative search trace based methods that differ markedly in terms of memory demands.  We describe four of these approaches in this paper.  Figure 1 depicts the pedigree of this family of search trace-based planners, as well as the primary impetus leading to the evolution of each system from its predecessor.  The figure also suggests the relative degree to which each planner steps away from the original IDA* search process underlying Graphplan.  The two tracks correspond to two genres of search trace that we have developed;

·         left track: The EGBG planners (Explanation Guided Backward search for Graphplan) employ a more comprehensive search trace focused on minimizing redundant search.

·         right track:  The PEGG planners (Pilot Explanation Guided Graphplan) use a more skeletal trace, incurring more of Graphplan’s redundant search effort in exchange for reduced memory demands and increased ability to exploit the state space view of the search space. 

The EGBG planner (Zimmerman & Kambhampati, 1999) adopts a memory intensive structure for the search trace as it seeks primarily to minimize redundant consistency-checking across Graphplan’s search iterations.  This proves to be effective in a range of smaller problems but memory constraints impede its ability to scale up.  Noting that Graphplan’s search process can be viewed as a specialized form of CSP search (Kambhampati, 2000), we explore some middle ground in terms of memory usage by augmenting EGBG with several methods known to be effective as speedup techniques for CSP problems.

 

 


 

Text Box: Figure 1:  Applying available memory to step away from the Graphplan search process;
                             A family of search trace-based planners

 Our primary interest in these techniques, however, is the impact on memory reduction and we describe how they accomplish this above and beyond any search speedup benefit they afford.  The implemented planner, me-EGBG, markedly outperforms EGBG in  speed and capabilities, but a variety of problems still lie beyond the planner’s reach due to memory constraints. 

The search trace structure used by the PEGG track planners trades off minimization of redundant search in exchange for a much smaller memory footprint.  In addition to its greatly reduced memory demands, the PEGG search trace structure can be exploited for its intrinsic state space view of what is essentially Graphplan’s CSP-oriented search space.  A significant speedup advantage of this approach over Graphplan and the EGBG track planners derives from its ability to employ the ‘distance-based’ heuristics that power many of the current generation of state-space planners (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000; Hoffman, 2001).  We adapt these heuristics to the task of identifying the most promising states to visit in the search trace and implement the approach first in the so-PEGG planner (‘step-optimal PEGG’, Zimmerman & Kambhampati, 2003).  So-PEGG outperforms even a highly enhanced version of Graphplan by up to two orders of magnitude in terms of speed, and does so while maintaining the guarantee of finding a step-optimal plan.

Finally we explore adoption of a beam search approach in visiting the state space implicit in the PEGG-style trace.  Here we employ the distance-based heuristics extracted from the planning graph itself, not only to direct the order in which search trace states are visited, but also to prune and restrict that space to only the heuristically best set of states, according to a user-specified metric.  We show that the planning graph can be further leveraged to provide a measure of the likelihood that a previously generated regression state might spawn new search branches at a higher planning graph level.  We term this metric ‘flux’ and employ it in an effective filter for states that can be skipped over even though they might appear promising based on the distance-based heuristic. Implemented in the PEGG system (Zimmerman & Kambhampati, 2003), this approach to exploiting a search trace produces a two-fold benefit over our previous approaches; 1) further reduction in search trace memory demands and 2) effective release from Graphplan’s exhaustive search of the planning graph in all search episodes.  PEGG exhibits speedups ranging to more than 300x over the enhanced version of Graphplan and is quite competitive with a recent state space planner using similar heuristics.  In adopting beam search PEGG necessarily sacrifices the guarantee of step-optimality but empirical evidence indicates the secondary heuristics are remarkably effective in ensuring the make-span of solutions produced are virtually at the optimal.

The fact that these systems successfully employ a search trace at all is noteworthy.  In general, the tactic of adopting a search trace for algorithms that explicitly generate node-states during iterative search episodes, has been found to be infeasible due to memory demands that are exponential in the depth of the solution.  In Sections 2 and 3 we describe how tight integration of the search trace with the planning graph permits the EGBG and PEGG planners to largely circumvent this issue.    The planning graph structure itself can be costly to construct, in terms of both memory and time; there are well-known problems and even domains that are problematic for planners that employ it.  (Post-Graphplan planners that employ the planning graph for some purpose include “STAN”, Long & Fox, 1999, “Blackbox”, Kautz & Selman, 1999, “IPP”, Koehler et al., 1997, “AltAlt”, Nguyen & Kambhampati, 2000, “LPG” Gerevini & Serina, 2002).  The planning systems described here share that memory overhead of course, but interestingly, we have found that search trace memory demands for the PEGG class of planners have not significantly limited the range of problems they can solve.   

The remainder of the paper is organized as follows:  Section 2 provides a brief overview of the planning graph and Graphplan’s search process.  The discussion of both its CSP nature and the manner in which the process can be viewed as IDA* search motivates the potential for employing available memory to accelerate solution extraction.  Section 3 addresses the two primary challenges in attempting to build and use a search trace to advantage with Graphplan: 1) How can this be done within reasonable memory constraints given Graphplan’s CSP-style search on the planning graph? and, 2) Once the trace is available, how can it most effectively be used?  This section briefly describes EGBG (Zimmerman & Kambhampati, 1999), the first system to use such a search trace to guide Graphplan’s search, and outlines the limitations of that method (Details of the algorithm are contained in Appendix A.)  Section 4 summarizes our investigations into a variety of memory reduction techniques and reports the impact of a combination of six of them on the performance of EGBG.  The PEGG planners are discussed in Section 5 and the performance of so-PEGG and PEGG (using beam search) are compared to an enhanced version of Graphplan, EGBG, and a modern, serial state-space planner.   Section 6 contains a discussion of our findings and Section 7 compares this work to related research.  Finally, Section 8 wraps up with our conclusions. 

1.   Background & Motivation: Planning Graphs and the Nature of Direct Graph Search

Here we outline the Graphplan algorithm and discuss traits suggesting that judicious use of additional memory might greatly improve its performance.  We touch on three related views of Graphplan’s search; 1) as a form of CSP, 2) as IDA* search and, 3) its state space aspect.

2.1 Construction and Search on a Planning Graph

The Graphplan algorithm employs two interleaved phases – a forward phase, where a data structure called a “planning graph” is incrementally extended, and a backward phase where the planning graph
is searched to extract a valid plan.  The planning graph consists of two alternating structures, called proposition lists and action lists.  At the bottom of Figure 2 is depicted a simple domain we will refer to as the Alpha domain and use for illustration in this study.  The figure shows four action and proposition levels of the planning graph engendered by the simple initial state given the domain.  We start with the initial state as the zeroth level proposition list.  Given a k-level planning graph, the extension of the graph structure to level k+1 involves introducing all actions whose preconditions are present in the kth level proposition list. In addition to the actions of the domain model, “no operation” actions are introduced, one for each condition in the kth level proposition list (abbreviated as “nop” in this paper’s figures, but also termed “persists” by others).  A “nop-C” action has C as its precondition and C as its effect. Given the kth level actions, the proposition list at level k+1 is constructed as just the union of the effects of all the introduced actions.  The planning graph maintains the dependency links between the actions at level k+1, their preconditions in the level k proposition list, and their effects in the level k+1 proposition list.

 During planning graph construction binary "mutex'' constraints are computed and propagated.  In Figure 2, the arcs denote mutex relations between pairs of propositions and pairs of actions.  The propagation starts at level 1 by labeling as mutex all pairs of actions that are statically interfering with each other (“static mutex”), that is their preconditions or effects are logically inconsistent.  Mutexes are then propagated from this level forward using two simple propagation rules.  Two propositions at level k are marked mutex if all actions at level k that support one proposition are mutex with all actions that support the second proposition.  Two actions at level 2 are then mutex if they are statically interfering or if a precondition of the first action is mutually exclusive with a precondition of the second .  (We term the latter “dynamic mutex”, since this constraint may relax at a higher planning graph level).[1]  The propositions themselves can also be either static mutex (one negates the other) or dynamic mutex (all actions supporting one proposition are mutex with all actions supporting the other).  To reduce Figure 2 clutter mutex arcs for propositions and their negations are omitted.

The search phase on a k-level planning graph involves checking to see if there is a sub-graph of the planning graph that corresponds to a valid solution to the problem.  Figure 3 depicts Graphplan search in a manner similar to the CSP variable-value assignment process.  Beginning with the propositions corresponding to the goals at level k, we incrementally select  a set of actions from the level k action list that support all the goals, such that no two actions selected for supporting two different goals are mutually exclusive (if they are, we backtrack and try to change the selection of actions).  This is essentially a CSP problem where the goal propositions at each level are the variables, actions that establish a proposition are the values, and the mutex conditions constitute constraints.  The search proceeds in depth-first fashion: Once all goals for a level are supported, we recursively call the same search process on the k-1 level planning graph, with the preconditions of the actions selected at level k as the goals for the k-1 level search.  The search succeeds when we reach level 0 (the initial state) and the solution is extracted by unwinding the recursive goal assignment calls.  This process can be viewed as a system for solving “Dynamic CSPs” (DCSP) (Mittal & Falkenhainer, 1990; Kambhampati 2000), wherein the standard CSP formalism is augmented with the concept of variables that do not appear (a.k.a. get activated) until other variables are assigned.

During the interleaved planning graph extension and search phases, the graph may be extended to a stasis condition, after which no further changes occur in actions, propositions, or mutex conditions.  A sufficient condition defining this “level-off” is a level where no new actions are introduced and no existing mutex conditions between propositions go away.  We will refer to all planning graph levels at or above level-off as ‘static levels’.  Note that although the graph becomes static at this point, finding a solution may require many more episodes composed of adding identical static levels and conducting regression search on the problem goals.

Like many fielded CSP solvers, Graphplan's search process benefits from a simple form of no-good learning.  When a set of (sub)goals for a level k is determined to be unsolvable, they are memoized at that level in a hash table.  Subsequently, when the backward search process later enters level k with a set of subgoals they are first checked against the hash table, and if a match is found the search Rounded Rectangle:     Goal  State  

Double Brace: W X Y ZRounded Rectangle:       Initial State                         





Double Brace:  YH I JDouble Brace:  WYHIJDouble Brace:  Y H I Double Brace:  Y J
process backtracks.  This constitutes one of three conditions for backtracking: the two others arise from attempts to assign static mutex actions and dynamic mutex actions (See the Figure 3 legend).

We next discuss Graphplan’s search from a higher-level view that abstracts away its CSP nature. 

2.2  Graphplan as State Space Search

From a more abstract perspective, Graphplan can be viewed as conducting regression state space search from the problem goals to the initial state.  In this view, the ‘states’ that are generated and expanded are the subgoals that result when the CSP process for a given set of subgoals finds a consistent set of actions satisfying the subgoals at that planning graph level (c.f. Kambhampati & Sanchez, 2000). In this view the “state-generator function” is effectively Graphplan’s CSP-style goal assignment routine that seeks a non-mutex set of actions for a given set of subgoals within a given planning graph level.  This view is depicted in Figure 4, where the top graph casts the CSP-style search trace of Figure 3 as a high-level state-space search trace.  The terms in each box depict the set of (positive) subgoals that result from the action assignment process for the goals in the higher-level state to which the box is linked.[2]