Journal of Artificial Intelligence Research 23 (2005)  533-585                                                                           Submitted 4/04;  published 5/05

© 2005 AI Access Foundation. All rights reserved.

Using Memory to Transform Search on the Planning Graph

Terry Zimmerman                                                                                    WIZIM@CS.CMU.EDU

Robotics Institute, Carnegie Mellon University

Pittsburgh, PA 15213-3890

Subbarao Kambhampati                                                                                 RAO@ASU.EDU

Department of Computer Science & Engineering

Arizona State University, Tempe AZ 85287-5406

 

Abstract

   The Graphplan algorithm for generating optimal make-span plans containing parallel sets of actions remains one of the most effective ways to generate such plans.  However, despite enhancements on a range of fronts, the approach is currently dominated in terms of speed, by state space planners that employ distance-based heuristics to quickly generate serial plans.  We report on a family of strategies that employ available memory to construct a search trace so as to learn from various aspects of Graphplan’s iterative search episodes in order to expedite search in subsequent episodes.  The planning approaches can be partitioned into two classes according to the type and extent of search experience captured in the trace.  The  planners using the more aggressive tracing method are able to avoid much of Graphplan’s redundant search effort, while planners in the second class trade off this aspect in favor of a much higher degree of freedom than Graphplan in traversing the space of ‘states’ generated during regression search on the planning graph.  The tactic favored by the second approach, exploiting the search trace to transform the depth-first, IDA* nature of Graphplan’s search into an iterative state space view, is shown to be the more powerful.  We demonstrate that distance-based, state space heuristics can be adapted to informed traversal of the search trace used by the second class of planners and develop an augmentation targeted specifically at planning graph search.  Guided by such a heuristic, the step-optimal version of the planner in this class clearly dominates even a highly enhanced version of Graphplan.  By adopting beam search on the search trace we then show that virtually optimal parallel plans can be generated at speeds quite competitive with a modern heuristic state space planner.

1.   Introduction

When Graphplan was introduced in 1995 (Blum & Furst, 1995) it became one of the fastest programs for solving the benchmark planning problems of that time and, by most accounts, constituted a radically different approach to automated planning.  Despite the recent dominance of heuristic state-search planners over Graphplan-style planners, the Graphplan approach is still one of the most effective ways to generate the so-called “optimal parallel plans”. State-space planners are drowned by the exponential branching factors of the search space of parallel plans (the exponential branching is a result of the fact that the planner needs to consider each subset of non-interfering actions).  Over the 8 years since its introduction, the Graphplan system has been enhanced on numerous fronts, ranging from planning graph construction efficiencies that reduce both its size and build time by one or more orders of magnitude (Smith & Weld, 1998; Long & Fox, 1999), to search speedup techniques such as variable and value ordering, dependency-directed backtracking, and explanation based learning (Kambhampati, 2000).  In spite of these advances, Graphplan has ceded the lead in planning speed to a variety of heuristic-guided planners (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000; Gerevini & Serina, 2002).  Notably, several of these exploit the planning graph for powerful state-space heuristics, while eschewing search on the graph itself.  Nonetheless, the Graphplan approach remains perhaps the fastest in parallel planning mainly because of the way it combines an iterative deepening A* (“IDA*”, Korf, 1985) search style with a highly efficient CSP-based incremental generation of applicable action subsets. 

We investigate here the use of available memory so as to surmount some of Graphplan’s major drawbacks, such as redundant search effort and the need to exhaustively search a k-length planning graph before proceeding to the k+1 length graph.  At the same time we wish to retain attractive features of Graphplan’s IDA* search such as rapid generation of parallel action steps and the ability to find step optimal plans.  The approach we describe remains rooted in iterative search on the planning graph but greatly expedites this search by building and maintaining a concise search trace. 

Graphplan alternates between two phases; one in which a data structure called a “planning graph” is incrementally extended, and a backward phase where the planning graph is searched to extract a valid plan. After the first regression search phase the space explored in any given episode is closely correlated with that conducted in the preceding episode.  The strategy we pursue in this work is to employ an appropriately designed trace of the search conducted in episode n (which failed to find a solution) to identify and avoid those aspects of the search that are provably unchanged in episode n+1, and focus effort on features that may have evolved.  We have identified precisely which features are dynamic across Graphplan search episodes and construct search traces that capture and exploit these features to different degrees.  Depending on its design a search trace may provide benefits such as 1) avoidance of much of Graphplan’s redundant search effort, 2) learning from its iterative search experience so as to improve its heuristics and the constraints embodied in the planning graph, and 3) realizing a much higher degree of freedom than Graphplan, in traversing the space of ‘states’ generated during the regression search process.   We will show that the third advantage is particularly key to search trace effectiveness, as it allows the planner to focus its attention on the most promising areas of the search space.

The issue of how much memory is the ‘right’ amount to use to boost an algorithm’s performance cuts across a range of computational approaches from search to the paging process in operating systems, and Internet browsing to database processing operations.  In our investigation we explore several alternative search trace based methods that differ markedly in terms of memory demands.  We describe four of these approaches in this paper.  Figure 1 depicts the pedigree of this family of search trace-based planners, as well as the primary impetus leading to the evolution of each system from its predecessor.  The figure also suggests the relative degree to which each planner steps away from the original IDA* search process underlying Graphplan.  The two tracks correspond to two genres of search trace that we have developed;

·         left track: The EGBG planners (Explanation Guided Backward search for Graphplan) employ a more comprehensive search trace focused on minimizing redundant search.

·         right track:  The PEGG planners (Pilot Explanation Guided Graphplan) use a more skeletal trace, incurring more of Graphplan’s redundant search effort in exchange for reduced memory demands and increased ability to exploit the state space view of the search space. 

The EGBG planner (Zimmerman & Kambhampati, 1999) adopts a memory intensive structure for the search trace as it seeks primarily to minimize redundant consistency-checking across Graphplan’s search iterations.  This proves to be effective in a range of smaller problems but memory constraints impede its ability to scale up.  Noting that Graphplan’s search process can be viewed as a specialized form of CSP search (Kambhampati, 2000), we explore some middle ground in terms of memory usage by augmenting EGBG with several methods known to be effective as speedup techniques for CSP problems.

 

 


 

Text Box: Figure 1:  Applying available memory to step away from the Graphplan search process;
                             A family of search trace-based planners

 Our primary interest in these techniques, however, is the impact on memory reduction and we describe how they accomplish this above and beyond any search speedup benefit they afford.  The implemented planner, me-EGBG, markedly outperforms EGBG in  speed and capabilities, but a variety of problems still lie beyond the planner’s reach due to memory constraints. 

The search trace structure used by the PEGG track planners trades off minimization of redundant search in exchange for a much smaller memory footprint.  In addition to its greatly reduced memory demands, the PEGG search trace structure can be exploited for its intrinsic state space view of what is essentially Graphplan’s CSP-oriented search space.  A significant speedup advantage of this approach over Graphplan and the EGBG track planners derives from its ability to employ the ‘distance-based’ heuristics that power many of the current generation of state-space planners (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000; Hoffman, 2001).  We adapt these heuristics to the task of identifying the most promising states to visit in the search trace and implement the approach first in the so-PEGG planner (‘step-optimal PEGG’, Zimmerman & Kambhampati, 2003).  So-PEGG outperforms even a highly enhanced version of Graphplan by up to two orders of magnitude in terms of speed, and does so while maintaining the guarantee of finding a step-optimal plan.

Finally we explore adoption of a beam search approach in visiting the state space implicit in the PEGG-style trace.  Here we employ the distance-based heuristics extracted from the planning graph itself, not only to direct the order in which search trace states are visited, but also to prune and restrict that space to only the heuristically best set of states, according to a user-specified metric.  We show that the planning graph can be further leveraged to provide a measure of the likelihood that a previously generated regression state might spawn new search branches at a higher planning graph level.  We term this metric ‘flux’ and employ it in an effective filter for states that can be skipped over even though they might appear promising based on the distance-based heuristic. Implemented in the PEGG system (Zimmerman & Kambhampati, 2003), this approach to exploiting a search trace produces a two-fold benefit over our previous approaches; 1) further reduction in search trace memory demands and 2) effective release from Graphplan’s exhaustive search of the planning graph in all search episodes.  PEGG exhibits speedups ranging to more than 300x over the enhanced version of Graphplan and is quite competitive with a recent state space planner using similar heuristics.  In adopting beam search PEGG necessarily sacrifices the guarantee of step-optimality but empirical evidence indicates the secondary heuristics are remarkably effective in ensuring the make-span of solutions produced are virtually at the optimal.

The fact that these systems successfully employ a search trace at all is noteworthy.  In general, the tactic of adopting a search trace for algorithms that explicitly generate node-states during iterative search episodes, has been found to be infeasible due to memory demands that are exponential in the depth of the solution.  In Sections 2 and 3 we describe how tight integration of the search trace with the planning graph permits the EGBG and PEGG planners to largely circumvent this issue.    The planning graph structure itself can be costly to construct, in terms of both memory and time; there are well-known problems and even domains that are problematic for planners that employ it.  (Post-Graphplan planners that employ the planning graph for some purpose include “STAN”, Long & Fox, 1999, “Blackbox”, Kautz & Selman, 1999, “IPP”, Koehler et al., 1997, “AltAlt”, Nguyen & Kambhampati, 2000, “LPG” Gerevini & Serina, 2002).  The planning systems described here share that memory overhead of course, but interestingly, we have found that search trace memory demands for the PEGG class of planners have not significantly limited the range of problems they can solve.   

The remainder of the paper is organized as follows:  Section 2 provides a brief overview of the planning graph and Graphplan’s search process.  The discussion of both its CSP nature and the manner in which the process can be viewed as IDA* search motivates the potential for employing available memory to accelerate solution extraction.  Section 3 addresses the two primary challenges in attempting to build and use a search trace to advantage with Graphplan: 1) How can this be done within reasonable memory constraints given Graphplan’s CSP-style search on the planning graph? and, 2) Once the trace is available, how can it most effectively be used?  This section briefly describes EGBG (Zimmerman & Kambhampati, 1999), the first system to use such a search trace to guide Graphplan’s search, and outlines the limitations of that method (Details of the algorithm are contained in Appendix A.)  Section 4 summarizes our investigations into a variety of memory reduction techniques and reports the impact of a combination of six of them on the performance of EGBG.  The PEGG planners are discussed in Section 5 and the performance of so-PEGG and PEGG (using beam search) are compared to an enhanced version of Graphplan, EGBG, and a modern, serial state-space planner.   Section 6 contains a discussion of our findings and Section 7 compares this work to related research.  Finally, Section 8 wraps up with our conclusions. 

1.   Background & Motivation: Planning Graphs and the Nature of Direct Graph Search

Here we outline the Graphplan algorithm and discuss traits suggesting that judicious use of additional memory might greatly improve its performance.  We touch on three related views of Graphplan’s search; 1) as a form of CSP, 2) as IDA* search and, 3) its state space aspect.

2.1 Construction and Search on a Planning Graph

The Graphplan algorithm employs two interleaved phases – a forward phase, where a data structure called a “planning graph” is incrementally extended, and a backward phase where the planning graph
is searched to extract a valid plan.  The planning graph consists of two alternating structures, called proposition lists and action lists.  At the bottom of Figure 2 is depicted a simple domain we will refer to as the Alpha domain and use for illustration in this study.  The figure shows four action and proposition levels of the planning graph engendered by the simple initial state given the domain.  We start with the initial state as the zeroth level proposition list.  Given a k-level planning graph, the extension of the graph structure to level k+1 involves introducing all actions whose preconditions are present in the kth level proposition list. In addition to the actions of the domain model, “no operation” actions are introduced, one for each condition in the kth level proposition list (abbreviated as “nop” in this paper’s figures, but also termed “persists” by others).  A “nop-C” action has C as its precondition and C as its effect. Given the kth level actions, the proposition list at level k+1 is constructed as just the union of the effects of all the introduced actions.  The planning graph maintains the dependency links between the actions at level k+1, their preconditions in the level k proposition list, and their effects in the level k+1 proposition list.

 During planning graph construction binary "mutex'' constraints are computed and propagated.  In Figure 2, the arcs denote mutex relations between pairs of propositions and pairs of actions.  The propagation starts at level 1 by labeling as mutex all pairs of actions that are statically interfering with each other (“static mutex”), that is their preconditions or effects are logically inconsistent.  Mutexes are then propagated from this level forward using two simple propagation rules.  Two propositions at level k are marked mutex if all actions at level k that support one proposition are mutex with all actions that support the second proposition.  Two actions at level 2 are then mutex if they are statically interfering or if a precondition of the first action is mutually exclusive with a precondition of the second .  (We term the latter “dynamic mutex”, since this constraint may relax at a higher planning graph level).[1]  The propositions themselves can also be either static mutex (one negates the other) or dynamic mutex (all actions supporting one proposition are mutex with all actions supporting the other).  To reduce Figure 2 clutter mutex arcs for propositions and their negations are omitted.

The search phase on a k-level planning graph involves checking to see if there is a sub-graph of the planning graph that corresponds to a valid solution to the problem.  Figure 3 depicts Graphplan search in a manner similar to the CSP variable-value assignment process.  Beginning with the propositions corresponding to the goals at level k, we incrementally select  a set of actions from the level k action list that support all the goals, such that no two actions selected for supporting two different goals are mutually exclusive (if they are, we backtrack and try to change the selection of actions).  This is essentially a CSP problem where the goal propositions at each level are the variables, actions that establish a proposition are the values, and the mutex conditions constitute constraints.  The search proceeds in depth-first fashion: Once all goals for a level are supported, we recursively call the same search process on the k-1 level planning graph, with the preconditions of the actions selected at level k as the goals for the k-1 level search.  The search succeeds when we reach level 0 (the initial state) and the solution is extracted by unwinding the recursive goal assignment calls.  This process can be viewed as a system for solving “Dynamic CSPs” (DCSP) (Mittal & Falkenhainer, 1990; Kambhampati 2000), wherein the standard CSP formalism is augmented with the concept of variables that do not appear (a.k.a. get activated) until other variables are assigned.

During the interleaved planning graph extension and search phases, the graph may be extended to a stasis condition, after which no further changes occur in actions, propositions, or mutex conditions.  A sufficient condition defining this “level-off” is a level where no new actions are introduced and no existing mutex conditions between propositions go away.  We will refer to all planning graph levels at or above level-off as ‘static levels’.  Note that although the graph becomes static at this point, finding a solution may require many more episodes composed of adding identical static levels and conducting regression search on the problem goals.

Like many fielded CSP solvers, Graphplan's search process benefits from a simple form of no-good learning.  When a set of (sub)goals for a level k is determined to be unsolvable, they are memoized at that level in a hash table.  Subsequently, when the backward search process later enters level k with a set of subgoals they are first checked against the hash table, and if a match is found the search Rounded Rectangle:     Goal  State  

Double Brace: W X Y ZRounded Rectangle:       Initial State                         





Double Brace:  YH I JDouble Brace:  WYHIJDouble Brace:  Y H I Double Brace:  Y J
process backtracks.  This constitutes one of three conditions for backtracking: the two others arise from attempts to assign static mutex actions and dynamic mutex actions (See the Figure 3 legend).

We next discuss Graphplan’s search from a higher-level view that abstracts away its CSP nature. 

2.2  Graphplan as State Space Search

From a more abstract perspective, Graphplan can be viewed as conducting regression state space search from the problem goals to the initial state.  In this view, the ‘states’ that are generated and expanded are the subgoals that result when the CSP process for a given set of subgoals finds a consistent set of actions satisfying the subgoals at that planning graph level (c.f. Kambhampati & Sanchez, 2000). In this view the “state-generator function” is effectively Graphplan’s CSP-style goal assignment routine that seeks a non-mutex set of actions for a given set of subgoals within a given planning graph level.  This view is depicted in Figure 4, where the top graph casts the CSP-style search trace of Figure 3 as a high-level state-space search trace.  The terms in each box depict the set of (positive) subgoals that result from the action assignment process for the goals in the higher-level state to which the box is linked.[2] 


   Recognizing the state-space aspect of Graphplan’s search helps in understanding its connection to IDA* search. First noted and briefly discussed in (Bonet & Geffner, 1999), we highlight and expand upon this relationship here.  There are three correspondences between the algorithms:

1.   Graphplan’s episodic search process in which all nodes generated in the previous episode are regenerated in the new episode (and possibly some new nodes), corresponds to IDA*’s iterative search.  Here the Graphplan nodes are the ‘states’ (sets of subgoals) that result when its regression search on a given plan graph level succeeds.  From this perspective the “node-generator function” is effectively Graphplan’s CSP-style goal assignment routine that seeks a non-mutex set of actions for a given set of propositions within a given planning graph level.

2.  From the state space view of Graphplan’s search (ala Figure 4), within a given search episode/iteration the algorithm conducts its search in the depth-first fashion of IDA*.  This ensures that the space requirements are linear in the depth of a solution node.

3.    The upper bound that is ‘iteratively deepened’ ala IDA* is the node-state heuristic f-value;            f = g + h.  In this context h is the distance in terms of associated planning graph levels between a state generated in Graphplan’s regression search and the initial state[3] and g is the cost of reaching the state from the goal state in terms of number  of CSP epochs (i.e. the numerical difference between the highest graph level and the state’s level).

  For our purposes, perhaps he most important observation is that the implicit f-value bound for a given iteration is just the length of the planning graph associated with that iteration.  That is, for any node-state, its associated planning graph level determines both the distance to the initial state (h) and the cost to reach it from the goal state (g), and the total must always equal the length of the plan graph.  This heuristic is clearly admissible; there can be no shorter distance to the goal because Graphplan exhaustively searches all shorter length planning graphs in (any) previous iterations. It is this heuristic implicit in the Graphplan algorithm which guarantees that a step-optimal solution is returned.  Note that from this perspective all nodes visited in a given Graphplan search iteration implicitly have the same f-value:  g + h = length of planning graph.  We will consider implications of this property when we address informed traversal of Graphplan’s search space in Section 5.

  The primary shortcoming of a standard IDA* approach to search is that it regenerates so many of the same nodes in each of its iterations.  It has long been recognized that IDA*’s difficulties in some problem spaces can be traced to using too little memory (Russell, 1992; Sen & Bagchi, 1989).  The only information carried over from one iteration to the next is the upper bound on the f-value.  Graphplan partially addresses this shortcoming with its memo caches that store “no-goods” -states found to be inconsistent in successive episodes.  However, the IDA* nature of its search can make it an inefficient planner for problems in which the goal propositions appear non-mutex in the planning graph many levels before a valid plan can actually be extracted.

A second shortcoming of the IDA* nature of Graphplan’s search is that all node-states generated in a given Graphplan episode have the same f-value (i.e. the length of the graph).  As such, within an iteration (search episode) there is no discernible preference for visiting one state over another. We next discuss the use of available memory to target these shortcomings of Graphplan’s search.

2.   Efficient Use of a Search Trace to Guide Planning Graph Search

The search space Graphplan explores in a given search episode is defined and constrained by three factors: the problem goals, the plan graph associated with the episode, and the cache of memoized no-good states created in all previous search episodes.   Typical of IDA* search there is considerable similarity (i.e. redundancy) in the search space for successive episodes as the plan graph is extended. In fact, as discussed below, the backward search conducted at any level k+1 of the graph is essentially a “replay” of the search conducted at the previous level k with certain well-defined extensions.  More specifically, essentially every set of subgoals generated in the backward search of episode n, starting at level k, will be regenerated by Graphplan during episode n+1 starting at level k+1 (unless a solution is found first).[4]

Now returning to Figure 4 in its entirety, note that it depicts a state space tree structure corresponding to Graphplan’s search over three consecutive iterations.  The top graph, as discussed above, represents the subgoal ‘states’ generated in the course of Graphplan’s first attempt to satisfy the WXYZ goal of a problem resembling our running example.  (It is implied here that the W,X,Y,Z propositions are present in the planning graph at level 7 and that this is the first level at which no pair of them is mutex.)  In the second search episode (the middle Figure 4 graph), the same states are generated again, but each at one level higher.  In addition, these states are expanded to generate a number of children, shown in a darker shade.   (Since Figure 4 is a hypothetical variation of the Alpha domain problem detailed in Figures 2 and 3, all states created beyond the first episode are labeled only with state numbers representing the order in which they are generated.)  Finally, in the third episode, Graphplan regenerates the states from the previous two episodes in attempting to satisfy WXYZ at level 9, and ultimately finds a solution (the assigned actions associated with the figure’s double outlined subgoal sets) after generating the states shown with darkest shading in the bottom graph of Figure 4.

Noting the extent to which consecutive iterations of Graphplan’s search overlap, we investigate the application of additional memory to store a trace of the explored search tree.  The first implemented approach, EGBG (which is summarized in the following subsection), seeks to leverage an appropriately designed search trace to avoid as much of the inter-episode redundant search effort as possible (Zimmerman & Kambhampati, 1999).    

3.1 Aggressive Use of Memory in Tracing Search: The EGBG Planner

Like other types of CSP-based algorithms, Graphplan consumes most of its computational effort on a given problem in checking constraints.  An instrumented version of the planner reveals that typically, 60 - 90% of the cpu run-time is spent in creating and checking action and proposition mutexes -both during planning graph construction and the search process. (Mutex relations incorporated in the planning graph are the primary ‘constraints’ in the CSP view of Graphplan, Kambhampati, 2000)  As such, this is an obvious starting point when seeking efficiency improvements for this planner and is the primary tactic adopted by EGBG. We provide here only an overview of the approach, referring the interested reader to Appendix A for details. 

EGBG exploits four features of the planning graph and Graphplan’s search process:

§         The set of actions that can establish a given proposition at level k+1 is always a superset of those establishing the proposition at level k. 

§         The “constraints” (mutexes) that are active at level k monotonically decrease with increasing planning graph levels.  That is, a mutex that is active at level k may or may not continue to be active at level k+1 but once it becomes inactive it never gets re-activated at future levels. 

§         Two actions in a level that are “statically” mutex (i.e. their effects or preconditions conflict with each other) will be mutex at all succeeding levels.

§         The problem goal set that is to be satisfied at a level k is the same set that will be searched on at level k+1 when the planning graph is extended.  That is, once a subgoal set is present at level k with no two propositions being mutex, it will remain so for all future levels.

Given an appropriate trace of the search conducted in episode n (which failed to find a solution) we would like to ignore those aspects of the search that are provably unchanged in episode n+1, and focus effort on only features that may have evolved.  If previous search failed to extract a solution from the k-length planning graph, search on the k+1 length graph can succeed only if one or more of the following conditions holds:

1.       The dynamic mutex condition between some pair of actions whose concurrent assignment was attempted in episode n no longer holds in episode n+1.

2.       For a subgoal that was generated in the regression search of episode n at planning graph level k, there is an action that establishes it in episode n+1 and first appears in level k+1. 

3.       An episode n regression state (subgoal set) at level k that matched a cached memo at that level has no memo-match when it is generated at level k+1 in episode n+1.

(The discussion in Appendix A formalizes these conditions.)  In each instance where one of these conditions does not hold, a complete policy must resume backward search under the search parameters associated with the instance in the previous episode, n.  Such resumed partial search episodes will either find a solution or generate additional trace subgoal sets to augment the parent trace.  This specialized search trace can be used to direct all future backward search episodes for this problem, and can be viewed as an explanation for the failure of the search process in each episode. We hereafter use the terms pilot explanation (PE) and search trace interchangeably.  The following definitions are useful in describing the search process:

Search segment:  This is essentially a state, specifically a set of planning graph level-specific subgoals generated in regression search from the goal state (which is itself the first search segment).  Each EGBG search segment Sn , generated at planning graph level k contains:

·      A subgoal set of propositions to be satisfied

·      A pointer to the parent search segment (Sp ), (the state at level k+1 that gave rise to Sn)

·       A list of the actions that were assigned in Sp which resulted in the subgoals of Sn

·      A pointer to the PE level (as defined below) associated with the Sn

·      A sequential list of results of the action consistency-checking process during the attempt to satisfy Sn’s  subgoals.  The possible trace results for a given consistency check are: static mutex, dynamic mutex, or action is consistent with all other prior assigned actions.  Trace results are stored as a list of bit vectors for efficiency.

A search segment therefore represents a state plus some path information, but we often use ‘search segment’ and ‘state’ interchangeably.  As such, all the boxes in Figure 4 (whether the state goals are explicitly shown or not) can be viewed as search segments. 

Pilot explanation (PE):  This is the search trace.  It consists of the entire linked set of search segments representing the search space visited in a Graphplan backward search episode.  It is convenient to visualize it as in Figure 4: a tiered structure with separate caches for segments associated with search on each planning graph level.  We adopt the convention of numbering the PE levels in the reverse order of the plan graph: The top PE level is 0 (it contains a single search segment whose goals are the problem goals) and the level number is incremented as we move towards the initial state.  When a solution is found, the PE will necessarily extend from the highest plan graph level to the initial state, as shown in the third graph of Figure 4.

PE transposition:  When a state is first generated in search episode n it is associated with a specific planning graph level, say k.  The premise of using the search trace to guide search in episode n+1 is based on the idea of re-associating each PE search segment (state) generated (or updated) in episode n with the next higher planning graph level.  That is, we define transposing the PE as: For each search segment in the PE associated with a planning graph level k after search episode n, associate it with level k+1 for episode n+1.

Given these definitions, we note that the states in the PE after a search episode n on plan graph level k, loosely constitute the minimal set [5] of states that will be visited when backward search is conducted in episode n+1 at level k+1.  (This bound can be visualized by sliding the fixed tree of search segments in the first graph of Figure 4 up one level.)

3.2 Conducting Search with the EGBG Search Trace

EGBG builds the initial pilot explanation during the first regression search episode while tracing the search process with an augmented version of Graphplan’s “assign-goals” routine.  If no solution is possible on the k-length planning graph, the PE is transposed up one level, and key features of its previous search are replayed such that significant new search effort only occurs at points where one of the three conditions described above holds. During any such new search process the PE is augmented according to the search space visited. 

The EGBG search algorithm exploits its search trace in essentially bi-modal fashion: It alternates informed selection of a state from the search trace of its previous experience with a focused CSP-type search on the state’s subgoals.  Our discussion here of EGBG’s bi-modal algorithm revolves around the second mode; minimizing redundant search effort once a state has been chosen for visitation.  When we describe PEGG’s use of the search trace in Section 5 we will see that greater potential for dramatic efficiency increases lies with the first mode; the selection of a promising state from the search trace.

After choosing a state to visit, EGBG uses the trace from the previous episode to focus on only those aspects of the entailed search that could possibly have changed.  For each search segment Si at planning graph level k+1, visitation is a 4–step process:

1.    Perform a memo check to ensure the subgoals of Si are valid at level k+1

2.      ‘Replay’ the previous episode’s action assignment sequence for all subgoals in Si, using the segment’s ordered trace vectors.  Mutex checking is conducted on only those pairs of actions that were dynamic mutex at level k.  For actions that are no longer dynamic mutex, add the candidate action to Si’s list of consistent assignments and resume Graphplan-style search on the remaining goals.  Si ,is augmented  and the PE extended in the process.  Whenever Si’s goals are successfully assigned, entailing a new set of subgoals to be satisfied at lower level k, a child search segment is created, linked to Si , and added to the PE.

3.      For each Si subgoal in the replay sequence, check also for new actions appearing at level k+1 that establish the subgoal.  New actions that are inconsistent with a previously assigned action are logged as such in Si’s assignments.  For new actions that do not conflict with those previously assigned, assign them and resume Graphplan-style search from that point as for step 2.

4.      Memoize Si’s goals at level k+1 if no solution is found via the search process of steps 2 and 3.

As long as all the segments in the PE are visited in this manner, the planner is guaranteed to find an optimal plan in the same search episode as Graphplan. Hereafter we refer to a PE search segment that is visited and extended via backward search to find a valid plan, as a seed segment.  In addition, all segments that are part of the plan extracted from the PE we call plan segments.  Thus, in the third graph of Figure 4, S18 is the apparent seed segment while the plan segments (in bottom up order) are; S30, S29, S18, S17, S16, S15, labeled segments YH, YHI, and the goal state WXYZ.

  In principle we have the freedom to traverse the search states encapsulated in the PE in any order and are no longer restricted to the (non-informed) depth-first nature of Graphplan’s search process.  Unfortunately, EGBG incurs a high overhead associated with visiting the search segments in any order other than bottom up (in terms of PE levels).   If an ancestor of any state represented in the PE were to be visited before the state itself, EGBG’s search process would regenerate the state and any of its descendents (unless it first finds a solution).  There is a non-trivial cost associated with generating the assignment trace information in each of EGBG’s search segments; its search advantage lies in reusing that trace data without having to regenerate it.

On the other hand, top-down visitation of the segments in the PE levels is the degenerate mode.  Such a search process essentially mimics Graphplan’s, since each episode begins with search on the problem goal set, and (with the exception of the replay of the top-level search segment’s assignments) regenerates all the states generated in the previous episode -plus possibly some new states- during its regression search.  The search trace provides no significant advantage under a top-down visitation policy. 

The bottom-up policy, on the other hand, has intuitive appeal since the lowest levels of the PE correspond to portions of the search space that lie closest to the initial state (in terms of plan steps).  If a state in one of the lower levels can in fact be extended to a solution, the planner avoids all the search effort that Graphplan would expend in reaching the state from the top-level problem goals. 

Adopting a bottom-up visitation policy amounts to layering a secondary heuristic on the primary IDA* heuristic, which is the planning graph length that is iteratively deepened.  Recalling from Section 2.2 that all states in the PE have the same f-value in terms of the primary heuristic, we are essentially biasing here in favor of states with low h-values.  Support for such a policy comes from work on heuristic guided state-space planning (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000) in which weighting h by a factor of 5 relative to the g component of the heuristic f-value generally improved performance.  However, unlike these state-space planning systems, for which this is the primary heuristic, EGBG employs it as a secondary heuristic so the guarantee of step optimality does not depend on its admissibility.  We have found bottom-up visitation to be the most efficient mode for EGBG and it is the default order for all EGBG results reported in this study. 

3.3  EGBG Experimental Results

Table 1 shows some of the performance results reported for the first version of EGBG (Zimmerman & Kambhampati, 1999).  Amongst the search trace designs we tried, this version is the most memory intensive and records the greatest extent of the search experience.  Runtime, the number of search backtracks, and the number of search mutex checks performed is compared to the Lisp implementation of the original Graphplan algorithm.  EGBG exhibits a clear advantage over Graphplan for this small set of problems;

·         total problem runtime:  2.7 -  24.5x  improvement

·         Number of backtracks during search:  3.2 -  33x  improvement

·         Number of mutex checking operations during search:  5.5 -  42x  improvement

 

 

Text Box: Problem	Standard Graphplan	EGBG  	Speedup  Ratios


 Time      Bktrks     Mutex   
                              Chks
	Total Time	Backtracks	Mutex Checks	Total Time	Backtracks	Mutex Checks	Size of PE	
BW-Large-B (18/18)	213	2823 K	121,400 K	79	880 K	21,900 K	7919	2.7x	3.2x	5.5x
Rocket-ext-a (7/36)	402	8128 K	74,900 K	40	712 K	3,400 K	1020	10.0x	11.4x	22x
Tower-5 (31/31)	811	7907 K	23040 K	33	240 K	548 K	2722	24.5x	33x	42x
Ferry-6
(39/39)	319	5909 K	81000 K	62	977 K	8901 K	6611	5.1x	6.0x	9.1x
                                                                
Table 1: Comparison of EGBG with standard Graphplan.  
 Numbers in parentheses give number of time steps / number of actions respectively.  Search backtracks and   
 mutex checks performed during the search are shown. "Size of PE" is pilot explanation size in terms of the  
 final number of search segments.  “Standard Graphplan” is the Lisp version by Smith and Peot.
Since total time is, of course, highly dependent on both the machine as well as the coding language [6] (EGBG performance is particularly sensitive to available memory), the backtrack and mutex checking metrics provide a better comparative measure of search efficiency.  For Graphplan, mutex checking is by far the biggest consumer of computation time and, as such, the latter metric is perhaps the most complete indicator of search process improvements.  Some of the problem-to-problem variation in EGBG’s effectiveness can be attributed to the static/dynamic mutex ratio characterizing Graphplan’s action assignment routine.  The more action assignments rejected due to pair-wise statically mutex actions, the greater the advantage enjoyed by a system that doesn’t need to retest them.  Tower-of-Hanoi problems fall into this classification.

As noted in the original study (Zimmerman & Kambhampati, 1999) the range of problems that can

be handled by this implementation is significantly restricted by the amount of memory available to the program at runtime.  For example, with a PE consisting of almost 8,000 search segments, the very modest sized BW-Large-B problem challenges the available memory limit on our test machine.  We consider next an approach (me-EGBG in Figure 1) that occupies a middle ground in terms of memory demands amongst the search trace approaches we have investigated.

3.   Engineering to Reduce EGBG Memory Requirements: The me-EGBG Planner

The memory demands associated with Graphplan’s search process itself are not a major concern, since it conducts depth-first search with search space requirements linear in the depth of a solution node.  Since we seek to avoid the redundancy inherent in the IDA* episodes of Graphplan’s search by using a search trace, we must deal with a much different memory-demand profile.  The search trace design employed by EGBG has memory requirements that are exponential in the depth of the solution. However, the search trace grows in direct proportion to the search space actually visited, so that techniques which prune search also act to greatly reduce its memory demands.

We examined a variety of methods with respect to this issue, and eventually implemented a suite of seven that together have proven instrumental in helping EGBG (and later, PEGG) overcome memory-bound limitations.  Six of these are known techniques from the planning and CSP fields: variable ordering, value ordering, explanation based learning (EBL), dependency directed backtracking (DDB), domain preprocessing and invariant analysis, and transitioning to a bi-partite planning graph.  Four of the six most effective methods are CSP speedup techniques, however our interest lies primarily in their impact on search trace memory demands.  While there are challenging aspects to adapting these methods to the planning graph and search trace context, it is not the focus of this paper.  Thus details on the motivation and implementation of these methods is relegated to Appendix B.

The seventh method, a novel variant of variable ordering we call ‘EBL-based reordering’, exploits the fact that we are using EBL and have a search trace available.  Although the method is readily implemented in PEGG, the strict ordering of the trace vectors required by the EGBG search trace make it costly to implement for that planner.  As such, ‘memory-efficient EGBG’ (me-EGBG) does not use EBL-based reordering and we defer further discussion until PEGG is introduced in Section 5.     

4.1  Impact of Enhancements on EGBG Memory Demands

There are two major modes in which the first six techniques impact memory demand for me-EGBG: 1) Reduction in the size of the pilot explanation (search trace), either in the number of search segments (states), or the average trace content within the segments, and 2) Reduction in the requirements of structures that compete with the pilot explanation for available memory (i.e. the planning graph and the memo caches).  Admittedly, these two dimensions are not independent, since the number of memos (though not the size) is linear in the number of search segments.  We will nonetheless consider this partition in our discussion to facilitate the comparison of each method’s impact on the search trace.

In general, the impact of each these enhancements on the search process depends significantly, not only on the particular problem, but also on the presence (or absence) of any of the other methods.  No single configuration of techniques proves to be optimal across a wide range of problems.  Indeed, due to computational overhead associated with these methods, it is generally possible to find a class of problems for which planner performance degrades due to the presence of the method. We chose this set of techniques then, based on their joint average impact on the me-EGBG / PEGG memory footprint over an extensive variety of problems.

Text Box:          10       20       30       40       50        60       70        80       90       100Text Box: %  reduction in PE (search trace) memory requirement
 Figure 5 illustrates for each method the impact on memory reduction relative to the two dimensions above, when the method operates in isolation of the others. The plot reflects results based on twelve problems in three domains (logistics, blocksworld, and tower-of-hanoi), chosen to include a mix of problems entailing large planning graphs, problems requiring extensive search, and problems requiring both.  The horizontal axis plots percent reduction in the end-of-run memory footprint of the combined memo caches and the planning graph.  The ratios along this ordinate are assessed based on runs with Graphplan (no search trace employed) where the memo cache and planning graph are the only globally defined structures of significant size that remain in the Lisp interpreted environment at run completion.[7]  Similarly, the vertical axis plots percent reduction in the space required for the PE at the end of EGBG runs with and without each method activated, and with the planning graph and memo cache structures purged from working memory.

The plot crossbars for each method depict the spread of reduction values seen across the twelve problems along both dimensions, with the intersection being the average.  The bi-partite planning graph, not surprisingly, impacts only the graph aspect, but five of the six methods are seen to have an impact on both search trace size and graph/memo cache size.  Of these, DDB has the greatest influence on PE size but little impact on the graph or memo cache size, while EBL has a more modest influence on the former and a larger impact on the latter (due both to the smaller memos that it creates and the production of more ‘general’ memos, which engender more backtracks).  Domain preprocessing/ invariant analysis can have a major impact on both the graph size and the PE size due to processes such as the extraction of invariants from operator preconditions.  It is highly domain dependent, having little effect in the case of blocksworld problems, but can be of great consequence in tower-of-Hanoi and some logistics problems.

That these six methods combined can complement each other is evidenced by the crossbars plotting space reduction when all six are employed at once.  Over the twelve problems average reduction in PE size approaches 90% and average reduction in the planning graph/memo cache aspect exceeds 80%.  No single method in isolation averages more than a 55% reduction along these dimensions.

The runtime reduction associated with each of these methods in isolation is also highly dependent on the problem and which of the other methods are active.   In general, the relative time reduction for any two methods does not correlate closely with their relative memory reduction.  However, we found that similarly, the techniques broadly complement each other such that net speedup accrues.

All of the techniques listed above can be (and have been) used to improve Graphplan’s performance also, in terms of speed.  In order to focus on the impact of planning with the search trace, we use a version of Graphplan that has been enhanced by these six methods for all comparisons to me-EGBG and PEGG in this study (We hereafter refer to this enhanced version of Graphplan as GP-e). 

4.2 Experimental Results with me-EGBG

Table 2 illustrates the impact of the six augmentations discussed in the previous section on EGBG’s (and Graphplan’s) performance, in terms of both space and runtime.  Standard Graphplan, GP-e, EGBG, and me-EGBG are compared across 37 benchmark problems in a wide range of domains, including problems from the first three AIPS planning competitions held to date.  The problems were selected to satisfy three objectives: a subset that both standard Graphplan and EGBG could solve for comparison to me-EGBG, different subsets that exceed the memory limitations of each of the three planners in terms of either planning graph or PE size, and a subset that gives a rough impression of search time limitations.  

Not surprisingly, the memory efficient EGBG clearly outperforms the early version on all problems attempted.  More importantly, me-EGBG is able to solve a variety of problems beyond the reach of both standard Graphplan and EGBG.  Of the 37 problems, standard Graphplan solves 12, the original EGBG solves 14, GP-e solves 32, and me-EGBG solves 32.  Wherever me-EGBG and GP-e solve the same problem, me-EGBG is faster by up to a factor of 62x, and averages ~4x speedup.  Standard Graphplan (on the twelve problems it can solve), is bested by me-EGBG by factors ranging from 3x to over 1000x.

  The striking improvement of the memory efficient version of EGBG over the first version is not simply due to the speedup associated with the five techniques discussed in the previous section, but is directly tied to their impact on search trace memory requirements.  Table 2 indicates one of three reasons for each instance where a problem is not solved by a planner: 1) s: planner is still in search after 30 cpu minutes, 2) pg: memory is exhausted or exceeded 30 minutes during the planning graph building phase, 3) pe: memory is exhausted during search due to pilot explanation extension. The third reason clearly favors me-EGBG as the size of the PE (reported in terms of search segments at the time the problem is solved) indicates that it generates and retains in its trace up to 100x fewer states than EGBG.  This translates into a much broader reach for me-EGBG; it exhausts memory on 14% of the Table 2 problems compared to 49% for the first version of EGBG.  Regardless, GP-e solves three problems on which me-EGBG fails in 30 minutes due to search trace memory demands

The table also illustrates the dramatic impact of the speedup techniques on Graphplan itself.  The enhanced version, GP-e, is well over 10x faster than the original version on problems they can both solve in 30 minutes, and it can solve many problems entirely beyond standard Graphplan’s reach.  Nonetheless, me-EGBG modestly outperforms GP-e on the majority of problems that they both can solve.  Since the EGBG (and PEGG) planners derive their strength from using the PE to shortcut Graphplan’s episodic search process, their advantage is realized only in problems with multiple search episodes and a high fraction of runtime devoted to search.  Thus, no speedup is seen for grid-y-1 and all problems in the ‘mystery’, ‘movie’, and ‘mprime’ domains where a solution can be extracted as soon as the planning graph reaches a level containing the problem goals in a non-mutex state.

The bottom-up order in which EGBG visits PE search segments turns out to be surprisingly effective for many problems.  For Table 2 problems we found that in the great majority the PE for the final episode contains a seed segment (a state from which search will reach the initial state) within the deepest two or three PE levels.   This supports the intuition discussed in Section 3.2 and suggests that the advantage of a low h-value bias as observed for heuristic state-space planners (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000) trans-lates to search on the planning graph.

Results for even the memory efficient version of EGBG reveal two primary weaknesses:

1.       The action assignment trace vectors that allow EGBG to avoid redundant search are somewhat costly to generate, make significant demands on available memory for problems that elicit large search (e.g. Table 2 problems: log-y-4, 8puzzle-1, freecell-2-1), and are difficult to revise when search experience alters drastically in subsequent visits.  

2.       Despite its surprising effectiveness in many problems, the bottom up visitation of PE search segments is inefficient in others.  For Table 2 problems such as freecell-2-1 and essentially all ‘schedule’ domain problems, when the planning graph gets extended to the level from which a solution can be extracted, that solution arises via a new search branch generated from the root search segment (i.e. the problem goal state).  Thus, the only seed segment in the PE is the topmost search segment, and bottom-up visitation of the PE states is more costly than Graphplan’s top-down approach. 

  The first shortcoming is particularly manifest in problems that do not allow EGBG to exploit the PE (e.g. problems in which a solution can be extracted in the first search episode).  The hit that EGBG takes on such problems relative to Graphplan is closely tied to the overhead associated with building its search trace. A compelling tactic to address the second shortcoming is to traverse the search space implicit in the PE according to state space heuristics.  We might wish, for example, to exploit any of the variety of state-space heuristics that have revolutionized state space planners in recent years (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000; Gerevini & Serina, 2002).  However, as we noted in Section 3.2, when we depart from a policy of visiting EGBG search segments in level-by-level, bottom-up order, we face more costly bookkeeping and high memory management overhead.  More informed traversal of the state-space view of Graphplan’s search space is taken up next, where we argue that it’s perhaps the key benefit afforded by a trace of search on the planning graph.

 

 

Problem

(steps/actions)

Graphplan

EGBG

 

 cpu sec     size of     

                      PE

me-EGBG

(memory efficient EGBG)

 cpu sec          size of 

                      PE

SPEEDUP

(me-EGBG   vs. GP-e)

cpu sec

  Stnd.            GP-e

                   (enhanced)

bw-large-B   (18/18)

126

  11.4

79

7919

9.2

2090

1.2x

huge-fct        (18/18)

165

  13.0

98

8410

9.1

2964

1.4x

rocket-ext-a  (7/34)

s

  3.5

40.3

1020

1.8

174

1.9x

att-log-a        (11/79)

s

  12.2

pe

 

7.2

1115

1.7x

gripper-8      (15/23)

125

  14.2

88

9790

7.9

2313

1.8x

Tower-6       (63/63)

s

  43.1

39.1

3303

7.6

80

5.7x

Tower-7      (127/127)

s

  158

s

 

20.0

166

7.9x

8puzzle-1     (31/31)

667

  57.1

pe

 

pe

>16000

(pe)

8puzzle-2    (30/30)

304

  48.3

pe

 

26.9

10392

1.8x

TSP-12        (12/12)

s

  454

pe

 

97.0

7155

4.7x

AIPS 1998

Graphplan

GP-e

EGBG

me-EGBG

Speedup

grid-y-1     (14/14)

388

  16.7

393

19

16.9

15

1x

grid-y-2    (??/??)

pg

pg

pg

 

pg

 

~

gripper-x-3  (15/23)

291

16.1

200

9888

8.4

2299

1.9x

gripper-x-4  (19/29)

s

190

pe

 

65.7

6351

2.9x

gripper-x-5  (23/35)

s

s

pe

 

433

13572

> 5x

log-y-4        (11/56)

pg

470

pg

 

pe

>25000

(pe)

mprime-x-29    (4/6)

15.7

  5.5

6.6

4

5.5

4

1x

movie-x-30      (2/7)

.1

.05

.06

2

.05

2

1x

mysty-x-30     (6/14)

83

13.5

85

32

13.5

19

1x

AIPS 2000

Graphplan

GP-e

EGBG

me-EGBG

Speedup

blocks-10-1     (32/32)

s

 101

pe

 

16.1

6788

6.3x

blocks-12-0     (34/34)

s

 24.2

pe

 

14.5

3220

1.7x

logistics-10-0  (15/56)

s

30.0

s

 

16.3

1259

1.8x

logistics-11-0  (13/56)

s

 78.6

pe

 

10.0

1117

7.9x

logistics-12-1  (15/77)

s

s

pe

 

1205

7101

> 2x

freecell-2-1     (6/10)

s

98.0

pe

 

       pe

>12000

(pe)

schedule-8-5  (4/14)

pg

63.5

pg

 

42.9

6

1.5x

schedule-9-2  (5/13)

pg

58.1

pg

 

46.8

6

1.2x

AIPS 2002

Graphplan

GP-e

EGBG

me-EGBG

Speedup

depot-6512    (10/26)

239

5.1

219

4272

4.1

456

1.25x

depot-7654a   (10/28)

s

32.5

  s

 

14.8

1199

2.2x

driverlog-2-3-6a (10/24)

1280

2.8

807

1569

1.0

230

2.8x

driverlog-2-3-6e (12/28)

s

169

 s

 

83.3

7691

2x

roverprob1425  (10/32)

s

18.9 

979

10028

10.3

1522

1.8x

roverprob1423  (9/30)

s

170 

 pe

 

94.7

10217

1.8x

strips-sat-x-5    (7/22)         

313

47.0

272

4111

23.0

2717

2.0x

strips-sat-x-9    (6/35)

s

s

s

 

84.4

306

>21x

ztravel-3-8a      (7/25)

s

972

 pe

 

15.6

1353

62x

ztravel-3-7a      (10/21)

s

s

pe

 

pe

>20000

~

Text Box:      Table 2: Search for step-optimal plans: EGBG, me-EGBG, standard & enhanced Graphplan
           Standard Graphplan: Lisp version by Smith and Peot
            GP-e: Graphplan enhanced per Section 4.1      me-EGBG:  memory efficient EGBG 
            “Size of PE” is the final search trace size in terms of the number of "search segments" 
             Search failure modes: ‘pg’  Exceeded 30 mins. or memory constraints during graph building 
                                                  ‘pe’  Exceeded memory limit during search due to size of PE  
                                                  ‘s’   Exceeded 30 mins. during search   
             Parentheses adjacent to cpu time give (# of steps /  # of actions) in the solution.

4.   Focusing on the State Space View:  The so-PEGG and PEGG Planners

The costs associated with EGBG’s generation and use of its search trace are directly attributable to the storage, updating, and replay of the CSP value assignments for a search segment’s subgoals.  We therefore investigated a stripped down version of the search trace that abandons this tactic and focuses instead on the embodied state space information.  We will show that the PEGG planners employing this search trace (both so-PEGG, the step-optimal version and PEGG, a version using beam search), outperform the EGBG planners on larger problems.  The key difference between EGBG’s pilot explanation and the pared down, ‘skeletal’ PE used by the PEGG planners, is the elimination of the detailed mutex-checking information contained in the bit vectors of the former (i.e. the last item in the bullet list of EGBG search segment contents in Section 3.1).  The PEGG planners then apply state-space heuristics to rank the PE search segments based on their associated subgoal sets (states) and are free to visit this ‘state space’ in a more informed manner.  The tradeoff is that for each PE state so visited the planner must regenerate the CSP effort of finding consistent action assignments for the subgoals. 

Figure 6 illustrates the PEGG advantage in a small hypothetical search trace at the final search episode.  Here search segments in the PE at the onset of the episode appear in solid lines and all plan segments (states extendable to a valid plan) are shown as double-lined boxes.  The figure reflects the fact that typically there may be many such latent plan segments in diverse branches of the search trace at the solution-bearing episode.  Clearly a planner that can discriminate plan segment states from other states in the PE could solve the problem more quickly than a planner restricted to a bottom-up traversal (deepest PE level first).    State space heuristics endow the PEGG planners with this capability.


Text Box: Figure 6:  The PE for the final search episode of a hypothetical problem.  Search segments  in the PE at onset of search appear in solid lines, double-lined boxes represent plan segments, dashed lined boxes are states newly generated in regression search during the episode.  Visitation order as dictated by the secondary heuristic is shown via numbering.

The so-PEGG planner visits every search segment in the PE during each search episode (comparable to Graphplan’s exhaustive search on a given length graph) thereby guaranteeing that returned plans are step-optimal.  As such, any advantage of heuristic-guided traversal is realized only in the final episode.   For many problems, the computational effort expended by Graphplan in the last search episode greatly exceeds that of all previous episodes combined, so this can still be a powerful advantage.  However, as we scale up to problems that are larger in terms of the number and size of search episodes, the cost of exhaustive search in even the intermediate episodes becomes prohibitive.  The planner we refer to simply as PEGG employs beam search, applying the search trace heuristics in all intermediate search episodes to visit only a select subset of the PE segments.  In so doing PEGG trades off the step-optimality guarantee for often greatly reduced solution times. 

 There are several challenges that must be dealt with to effectively use the pared down search trace employed by so-PEGG and PEGG, including adaptation and augmentation of distance-based heuristics to guide search trace traversal and dealing with memory management problems induced by the tactic of ‘skipping about the search space’.  Before we describe how we addressed such issues and give a more complete description of the algorithm, we first present some results that provide perspective on the effectiveness of these planners.

5.1  Experimental Results With so-PEGG and PEGG

Table 3 compares Graphplan (standard and GP-e), me-EGBG, so-PEGG, and PEGG over most of the same problems as Table 2, and adds a variety of larger problems that only the latter two systems can handle.  Table 2 problems that were easily solved for GP-e and me-EGBG (e.g. those in the AIPS-98 ‘movie’ and ‘mystery’ domains) are omitted from Table 3. Here, all planners that employ variable and value ordering (i.e. all except standard Graphplan), are configured to use value ordering based on the planning graph level at which an action first appears and goal ordering based on proposition distance as determined by the ‘adjusted-sum’ heuristic (which will be defined below).  There are a variety of other parameters for the so-PEGG and PEGG planners for which optimal configurations tend to be problem-dependent.  We defer discussion of these to Sections 5.3, 5.4, and 5.6 but note here that for the Table 3 results the following parameter settings were used based on good performance on average across a variety of domains and problems:

·    Secondary heuristic for visiting states: adjusted-sum with w0=1 (eqn 5-1)   

·    Beam search:  visit the best 20% (lowest f-value) search segments per search episode, with a minimum of 25 and a maximum of 50. Search segments with ‘flux’ lower than 10% of average are not visited regardless of heuristic rank. (wcf = .01, see section 5.6.1)

 

 

Problem

Graphplan

me-EGBG

cpu sec

(steps/acts)

so-PEGG

heur:istic:   adjsum

cpu  sec (steps/acts)

PEGG

heur: adjsum-u

cpu sec  (steps/acts)

Speedup

(PEGG vs. GP-e)

cpu sec    (steps/acts)

  Stnd.                 GP-e

                        (enhanced )

bw-large-B

194.8

  11.4   (18/18)

9.2     

  7.0 

 4.1    (18/18)

2.8x

bw-large-C

s

s     (28/28)

pe

  1104 

 24.2  (28/28)

> 74x

bw-large-D

s

s    (38/38)

pe

pe

 388   (38/38)

> 4.6x

att-log-a     

s

  31.8   (11/79)

7.2      

  2.9    (11/72)

 2.2    (11/62)

14.5x

att-log-b       

s

s

pe

s

21.6   (13/64)

> 83x

Gripper-8 

s

  14.2   (15/23)

7.9    

 30.6   

 5.5    (15/23)

2.6x

Gripper-15

s

 s    

pe

s

 46.7   (31/45)

> 38.5x

Tower-7 

s

 158 (127/127)

20.0  

 14.3  

  6.1    (127/127)

26x

Tower-9 

s

s     (511/511)

232  

 118 

 23.6   (511/511)

> 76x

8puzzle-1   

 2444

  57.1   (31/31)

pe

 31.1   

9.2      (31/31)

6.2x

8puzzle-2   

1546

  48.3   (30/30)

26.9 

 31.3   

 7.0     (32/32)

6.9x

TSP-12       

s

454    (12/12)

97.0 

 390      

6.9     (12/12)

51x

AIPS 1998 

Stnd GP          GP-e

me-EGBG

so-PEGG

PEGG

Speedup

grid-y-1

388

  16.7  (14/14)

17.9

16.8

16.8   (14/14)

1x

gripper-x-5

s

s

433

512

110    (23/35)

> 16x

gripper-x-8

s

s

pe

s

520    (35/53)

> 3.5x

log-y-5  

pg

  470    (16/41)

pe

361   

30.5   (16/34)

15.4x

AIPS 2000

Stnd GP          GP-e

me-EGBG

so-PEGG

PEGG

Speedup

blocks-10-1    

s

 95.4    (32/32)

16.1

18.7

6.9    (32/32)

13.8x

blocks-12-0    

~

 26.6    (34/34)

14.5

23.0

9.4     (34/34)

2.8x

blocks-16-2

s

s 

pe

s

28.1   (56/56)

> 64 x

logistics-10-0 

~

30.0     (15/56)

16.6

21

7.3   (15/53)

4.1x

logistics-12-1 

s

s

1205 (15/77)

1101  (15/75)

17.4  (15/75)

> 103x

logistics-14-0 

s

s

pe

s

678  (13/74)

> 2.7x

freecell-2-1    

pg

98.0     (6/10)

pe

102

19.5  (6/10)

>92x

freecell-3-5

pg

1885   (7/16)

pe

511

101   (7/17)

18.7x

schedule-8-9 

pg

300    (5/12)

615

719

719  (5/12)

(.42x)

AIPS 2002

Stnd GP          GP-e

me-EGBG

so-PEGG

PEGG

Speedup

depot-7654a  

s

32.5   (10/28)

14.8

12.9

13.2  (10/26)

2.7x

depot-4321

s

s

s

s

42.6  (14/37)

>42x

depot-1212  

s

s

s

s

79.1  (22/53)

>22.8x

driverlog-2-3-6e

s

166   (12/28)

83.3

109

80.6   (12/26)

2.1x

driverlog-3-3-6b

s

s

pe

1437   (11/39)

169    (14/45)

> 10.7x

roverprob1423

s

170 (9/30)

pe

63.4 

15.0   (9/26)

11.3x

roverprob4135

s

s

pe

s

 379     (12 / 43)

> 4.7x

roverprob8271

s

s

pe

s

 220     (11 / 39)

> 8.2x

sat-x-5

313

45  (7/22)

43.0

27.0

25.1     (7 / 22)   

1.7x

sat-x-9

s

s

918

9.9

9.9       (6 / 35)

>182x

ztravel-3-8a    

s

972   (7/25)

15.6

11.2

15.1      (9/26)

119x

ztravel-3-7a    

s

s

pe

s

101      (10/23)

> 18x

Text Box: Table 3:  so-PEGG and PEGG comparison to Graphplan, GP-e, and me-EGBG
            GP-e: Graphplan enhanced per Section 4.1    me-EGBG:  memory efficient EGBG
                                   so-PEGG:  step-optimal, search via the PE, segments ordered by adjusted-sum-u heuristic
                                   PEGG:      beam search, best 20% of segments in PE ordered by adjusted-sum-u heuristic
                          Parentheses give (# of steps/ # of actions) in plan.    Boldface values exceed a known 
                          step-optimal.       See Table 2 for definitions of s, pg, and  pe
Focusing first on the GP-e, me-EGBG, and so-PEGG columns, we clearly see the impact of the tradeoff between storing and exploiting all the intra-segment action assignment information in the PE.  In this set of 37 problems, 16 result in me-EGBG exceeding available memory due to the size of the PE while only one pushes that limit for so-PEGG.  Seven of the problems that cause me-EGBG to run out of memory are actually solved by so-PEGG while the remainder exceed the time limit during search.  In addition, so-PEGG handles five problems in the table that GP-e fails on.  These problems typically entail extensive search in the final episode, where the PE efficiently shortcuts the full-graph search conducted by GP-e.  The speedup advantage of so-PEGG relative to GP-e ranges between a modest slowdown on three problems to almost 87x on the Zeno-Travel problems, with an average of about 5x.  (Note that the speedup values reported in the table are not for so-PEGG.)

Generally, any planner using a search trace will under perform GP-e on single search episode problems such as grid-y-1, in which the cost of building the trace is not recovered.  The low overhead associated with building so-PEGG’s search trace means it suffers little relative to GP-e in this case.  On most problems that both me-EGBG and so-PEGG can solve, me-EGBG has the upper hand due to its ability to avoid redundant consistency-checking effort. The fact that me-EGBG’s advantage over so-PEGG is not greater for such problems is attributable both to so-PEGG’s ability to move about the PE  search space in the final search episode (versus me-EGBG’s bottom-up traversal) and its lower

overhead due to its more concise search trace.   Note that there is no obvious reason to prefer one state traversal order over the other in non solution-bearing episodes since these step-optimal planners visit all the states in their PE for these search episodes. [8]

Now turning attention to the PEGG results, it’s apparent that the beam search greatly extends the size of problems that can be handled.  PEGG solves ten larger problems of Table 3 that could not be solved by either so-PEGG or enhanced Graphplan.   Speed-wise PEGG handily outperforms the other planners on every problem except schedule-8-9, where GP-e has a factor of 2.3x advantage.  As indicated by the table’s right-hand column, the speedup of PEGG over GP-e ranges from .42x to over 182x.  This is a conservative bound on PEGG’s maximum advantage relative to GP-e since speedup values for the seventeen problems that GP-e fails to solve were conservatively assessed at the time limit of 1800 seconds.  

We defer further analysis of these results to Section 6 in order to first describe the PEGG algorithm and the advantages it extracts from its search trace.

 5.2 The Algorithm for the PEGG Planners

The high-level algorithm for so-PEGG and PEGG is given in Figure 7.  As for Graphplan, search begins once the planning graph it has been extended to the level where all problem goals first appear with no binary mutex conditions. (The routine, find_1st_level_with_goals is virtually the same as Graphplan’s and is not defined here). The first search episode is then conducted in Graphplan fashion, except that the assign_goals and assign_next_level_goals routines of Figure 8 initialize the PE as they create search segments that hold all states generated during the regression search process.  The assign_goals pseudo-code outlines the process of compiling “conflict sets” (see Appendix B) as a means of implementing DDB and EBL during the action assignment search.  The assign_next_level_goals routine illustrates the role of the top-level conflict set for recording a minimal no-good when search on a state is completed (EBL) and depicts how variable ordering need be done only once for a state (when the search segment is created).  A child segment is created and linked to its parent (extending the PE) in assign_next_level_goals whenever all parent goals are successfully assigned.  The assign_next_level_goals routine determines the subgoals for the child search segment by regressing the parent’s goals over the actions assigned and then checks to see if either the initial state has been reached or there are no remaining goals.  If so, success is signaled by returning the child search segment which can then be used to extract the ordered actions in the plan.

Subsequent to the first episode, PEGG_plan enters an outer loop that employs the PE to conduct successive search episodes.  For each episode, the newly generated search segments from the previous episode are evaluated according to a state space heuristic, ranked, and merged into the already ordered PE.  In an inner loop each search segment is visited in turn by passing its subgoals to the Graphplan-like assign_goals routine. 

It is the exit conditions on the inner loop that primarily differentiate so-PEGG and PEGG.   Whereas so-PEGG will visit every search segment whose goals are not found to match a memo, PEGG restricts visitation to a best subset, based on a user-specified criterion.  As such, expansion of the planning graph can be deferred until a segment is chosen for visitation that transposes to a planning graph level exceeding the current graph length.  As a consequence, in some problems the PEGG planners may be able to extract a step-optimal solution while building one less level than other Graphplan-based planners.[9]




Text Box: Figure 8:   PEGG / so-PEGG regression search algorithm for Graphplan-style regression search   
                 on subgoals while concurrently building the search trace (PE)
Note that PEGG’s algorithm combines both state-space and CSP-based aspects in its search:

·       It chooses for expansion the most promising state based on the previous search iteration and state space heuristics.  PEGG and so-PEGG are free to traverse the states in its search trace in any order.

·       A selected state is expanded in Graphplan’s CSP-style, depth-first fashion, making full use of all CSP speedup techniques outlined above.

The first aspect most clearly distinguishes PEGG from EGBG:  traversal of the state space in the PE is no longer constrained to be bottom-up and level-by-level.  As it was for EGBG, management of memory associated with the search trace is a challenge for PEGG once we stray from bottom-up traversal, but it is less daunting.  It will be easier to outline how we address this if we first discuss the development and adaptation of heuristics to search trace traversal.

5.3  Informed Traversal of the Search Trace Space

The HSP and HSP-R state space planners (Bonet & Geffner, 1999) introduced the idea of using the ‘reachability’ of propositions and sets of propositions (states) to assess the difficulty degree of a relaxed version of a problem.  This concept underlies their powerful ‘distance based’ heuristics for selecting the most promising state to visit.  Subsequent work demonstrated how the planning graph can function as a rich source of such heuristics (Nguyen & Kambhampati, 2000).  Since the planning graph is already available to PEGG, we adapt and extend heuristics from the latter work to serve in a secondary heuristic role to direct PEGG’s traversal of its search trace states.  Again, the primary heuristic is the planning graph length that is iteratively deepened (Section 2.2), so the step-optimality guarantee for the so-PEGG planner does not depend on the admissibility of this secondary heuristic. 

There are important differences between heuristic ranking of states generated by a state space planner and ordering of the search segments (states) in PEGG’s search trace.  For example, a state space planner chooses to visit a given state only once while the PEGG planners often must consider whether to revisit a state in many consecutive search episodes.  Ideally, a heuristic to rank states in the search trace should reflect level-by-level evolutions of the planning graph, since the transposition process associates a search segment with a higher level in each successive episode.  For each higher planning graph level that a given state is associated with, the effective regression search space ‘below’ it changes as a complex function of the number of new actions that appear in the graph, the number of dynamic mutexes that relax, and the no-goods in the memo caches.  Moreover, unlike a state space planner’s queue of previously unvisited states, the states in a search trace include all children of each state generated when it was last visited.  Ideally the value of visiting a state should be assessed independently of the value associated with any of its children, since they will be assessed in turn.  Referring back to the search trace depicted in Figure 6, we desire a heuristic that can, for example, discriminate between the #4 ranked search segment and its ancestor, top goal segment (WXYZ).  Here we would like the heuristic assessment of segment WXYZ to discount the value associated with its children already present in the trace, so that it is ranked based only on its potential for generating new local search branches.

We next discuss adaptation of known planning graph based heuristics for the most effective use with the search trace.

5.3.1 Adoption of distance-based state space heuristics

The heuristic value for a state, S, generated in backward