Journal of
Artificial Intelligence Research 23 (2005)
533585 Submitted 4/04; published 5/05
©
2005 AI Access Foundation. All rights reserved.
Using
Memory to Transform Search on the Planning Graph
Terry
Zimmerman WIZIM@CS.CMU.EDU
Robotics Institute,
Carnegie Mellon University
Pittsburgh, PA
152133890
Subbarao
Kambhampati RAO@ASU.EDU
Department of Computer Science &
Engineering
Arizona State University,
Abstract
The Graphplan algorithm
for generating optimal makespan plans containing parallel sets of actions
remains one of the most effective ways to generate such plans. However, despite enhancements on a range of
fronts, the approach is currently dominated in terms of speed, by state space
planners that employ distancebased heuristics to quickly generate serial
plans. We report on a family of
strategies that employ available memory to construct a search trace so as to
learn from various aspects of Graphplan’s iterative search episodes in order to
expedite search in subsequent episodes.
The planning approaches can be partitioned into two classes according to
the type and extent of search experience captured in the trace. The
planners using the more aggressive tracing method are able to avoid much
of Graphplan’s redundant search effort, while planners in the second class
trade off this aspect in favor of a much higher degree of freedom than
Graphplan in traversing the space of ‘states’ generated during regression
search on the planning graph. The tactic
favored by the second approach, exploiting the search trace to transform the
depthfirst, IDA* nature of Graphplan’s search into an iterative state space
view, is shown to be the more powerful.
We demonstrate that distancebased, state space heuristics can be
adapted to informed traversal of the search trace used by the second class of
planners and develop an augmentation targeted specifically at planning graph
search. Guided by such a heuristic, the
stepoptimal version of the planner
in this class clearly dominates even a highly enhanced version of Graphplan. By adopting beam search on the search trace
we then show that virtually optimal parallel plans can be generated at speeds
quite competitive with a modern heuristic state space planner.
1. Introduction
When Graphplan was introduced in 1995 (Blum & Furst, 1995) it became one of the fastest programs for solving the benchmark planning problems of that time and, by most accounts, constituted a radically different approach to automated planning. Despite the recent dominance of heuristic statesearch planners over Graphplanstyle planners, the Graphplan approach is still one of the most effective ways to generate the socalled “optimal parallel plans”. Statespace planners are drowned by the exponential branching factors of the search space of parallel plans (the exponential branching is a result of the fact that the planner needs to consider each subset of noninterfering actions). Over the 8 years since its introduction, the Graphplan system has been enhanced on numerous fronts, ranging from planning graph construction efficiencies that reduce both its size and build time by one or more orders of magnitude (Smith & Weld, 1998; Long & Fox, 1999), to search speedup techniques such as variable and value ordering, dependencydirected backtracking, and explanation based learning (Kambhampati, 2000). In spite of these advances, Graphplan has ceded the lead in planning speed to a variety of heuristicguided planners (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000; Gerevini & Serina, 2002). Notably, several of these exploit the planning graph for powerful statespace heuristics, while eschewing search on the graph itself. Nonetheless, the Graphplan approach remains perhaps the fastest in parallel planning mainly because of the way it combines an iterative deepening A* (“IDA*”, Korf, 1985) search style with a highly efficient CSPbased incremental generation of applicable action subsets.
We investigate here the use of available memory so as to surmount some of Graphplan’s major drawbacks, such as redundant search effort and the need to exhaustively search a klength planning graph before proceeding to the k+1 length graph. At the same time we wish to retain attractive features of Graphplan’s IDA* search such as rapid generation of parallel action steps and the ability to find step optimal plans. The approach we describe remains rooted in iterative search on the planning graph but greatly expedites this search by building and maintaining a concise search trace.
Graphplan alternates between two phases; one in which a data structure called a “planning graph” is incrementally extended, and a backward phase where the planning graph is searched to extract a valid plan. After the first regression search phase the space explored in any given episode is closely correlated with that conducted in the preceding episode. The strategy we pursue in this work is to employ an appropriately designed trace of the search conducted in episode n (which failed to find a solution) to identify and avoid those aspects of the search that are provably unchanged in episode n+1, and focus effort on features that may have evolved. We have identified precisely which features are dynamic across Graphplan search episodes and construct search traces that capture and exploit these features to different degrees. Depending on its design a search trace may provide benefits such as 1) avoidance of much of Graphplan’s redundant search effort, 2) learning from its iterative search experience so as to improve its heuristics and the constraints embodied in the planning graph, and 3) realizing a much higher degree of freedom than Graphplan, in traversing the space of ‘states’ generated during the regression search process. We will show that the third advantage is particularly key to search trace effectiveness, as it allows the planner to focus its attention on the most promising areas of the search space.
The issue of how much memory is the ‘right’ amount to use to boost an algorithm’s performance cuts across a range of computational approaches from search to the paging process in operating systems, and Internet browsing to database processing operations. In our investigation we explore several alternative search trace based methods that differ markedly in terms of memory demands. We describe four of these approaches in this paper. Figure 1 depicts the pedigree of this family of search tracebased planners, as well as the primary impetus leading to the evolution of each system from its predecessor. The figure also suggests the relative degree to which each planner steps away from the original IDA* search process underlying Graphplan. The two tracks correspond to two genres of search trace that we have developed;
· left track: The EGBG planners (Explanation Guided Backward search for Graphplan) employ a more comprehensive search trace focused on minimizing redundant search.
· right track: The PEGG planners (Pilot Explanation Guided Graphplan) use a more skeletal trace, incurring more of Graphplan’s redundant search effort in exchange for reduced memory demands and increased ability to exploit the state space view of the search space.
The EGBG planner (Zimmerman & Kambhampati, 1999) adopts a memory intensive structure for the search trace as it seeks primarily to minimize redundant consistencychecking across Graphplan’s search iterations. This proves to be effective in a range of smaller problems but memory constraints impede its ability to scale up. Noting that Graphplan’s search process can be viewed as a specialized form of CSP search (Kambhampati, 2000), we explore some middle ground in terms of memory usage by augmenting EGBG with several methods known to be effective as speedup techniques for CSP problems.
Our primary interest in these
techniques, however, is the impact on memory reduction and we describe how they
accomplish this above and beyond any search speedup benefit they afford. The implemented planner, meEGBG, markedly outperforms EGBG in
speed and capabilities, but a variety of problems still lie beyond the
planner’s reach due to memory constraints.
The search trace structure used by the PEGG track planners trades off minimization of redundant search in exchange for a much smaller memory footprint. In addition to its greatly reduced memory demands, the PEGG search trace structure can be exploited for its intrinsic state space view of what is essentially Graphplan’s CSPoriented search space. A significant speedup advantage of this approach over Graphplan and the EGBG track planners derives from its ability to employ the ‘distancebased’ heuristics that power many of the current generation of statespace planners (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000; Hoffman, 2001). We adapt these heuristics to the task of identifying the most promising states to visit in the search trace and implement the approach first in the soPEGG planner (‘stepoptimal PEGG’, Zimmerman & Kambhampati, 2003). SoPEGG outperforms even a highly enhanced version of Graphplan by up to two orders of magnitude in terms of speed, and does so while maintaining the guarantee of finding a stepoptimal plan.
Finally we explore adoption of a beam search approach in visiting the state space implicit in the PEGGstyle trace. Here we employ the distancebased heuristics extracted from the planning graph itself, not only to direct the order in which search trace states are visited, but also to prune and restrict that space to only the heuristically best set of states, according to a userspecified metric. We show that the planning graph can be further leveraged to provide a measure of the likelihood that a previously generated regression state might spawn new search branches at a higher planning graph level. We term this metric ‘flux’ and employ it in an effective filter for states that can be skipped over even though they might appear promising based on the distancebased heuristic. Implemented in the PEGG system (Zimmerman & Kambhampati, 2003), this approach to exploiting a search trace produces a twofold benefit over our previous approaches; 1) further reduction in search trace memory demands and 2) effective release from Graphplan’s exhaustive search of the planning graph in all search episodes. PEGG exhibits speedups ranging to more than 300x over the enhanced version of Graphplan and is quite competitive with a recent state space planner using similar heuristics. In adopting beam search PEGG necessarily sacrifices the guarantee of stepoptimality but empirical evidence indicates the secondary heuristics are remarkably effective in ensuring the makespan of solutions produced are virtually at the optimal.
The fact that these systems successfully employ a search trace at all is noteworthy. In general, the tactic of adopting a search trace for algorithms that explicitly generate nodestates during iterative search episodes, has been found to be infeasible due to memory demands that are exponential in the depth of the solution. In Sections 2 and 3 we describe how tight integration of the search trace with the planning graph permits the EGBG and PEGG planners to largely circumvent this issue. The planning graph structure itself can be costly to construct, in terms of both memory and time; there are wellknown problems and even domains that are problematic for planners that employ it. (PostGraphplan planners that employ the planning graph for some purpose include “STAN”, Long & Fox, 1999, “Blackbox”, Kautz & Selman, 1999, “IPP”, Koehler et al., 1997, “AltAlt”, Nguyen & Kambhampati, 2000, “LPG” Gerevini & Serina, 2002). The planning systems described here share that memory overhead of course, but interestingly, we have found that search trace memory demands for the PEGG class of planners have not significantly limited the range of problems they can solve.
The remainder of the paper is organized as follows: Section 2 provides a brief overview of the planning graph and Graphplan’s search process. The discussion of both its CSP nature and the manner in which the process can be viewed as IDA* search motivates the potential for employing available memory to accelerate solution extraction. Section 3 addresses the two primary challenges in attempting to build and use a search trace to advantage with Graphplan: 1) How can this be done within reasonable memory constraints given Graphplan’s CSPstyle search on the planning graph? and, 2) Once the trace is available, how can it most effectively be used? This section briefly describes EGBG (Zimmerman & Kambhampati, 1999), the first system to use such a search trace to guide Graphplan’s search, and outlines the limitations of that method (Details of the algorithm are contained in Appendix A.) Section 4 summarizes our investigations into a variety of memory reduction techniques and reports the impact of a combination of six of them on the performance of EGBG. The PEGG planners are discussed in Section 5 and the performance of soPEGG and PEGG (using beam search) are compared to an enhanced version of Graphplan, EGBG, and a modern, serial statespace planner. Section 6 contains a discussion of our findings and Section 7 compares this work to related research. Finally, Section 8 wraps up with our conclusions.
1. Background & Motivation: Planning Graphs and the Nature of Direct Graph Search
Here we outline the Graphplan algorithm and discuss traits suggesting that judicious use of additional memory might greatly improve its performance. We touch on three related views of Graphplan’s search; 1) as a form of CSP, 2) as IDA* search and, 3) its state space aspect.
2.1 Construction and Search on a Planning Graph
The Graphplan algorithm employs two interleaved
phases – a forward phase, where a data structure called a “planning graph” is
incrementally extended, and a backward phase where the planning graph
is searched to extract a valid plan. The
planning graph consists of two alternating structures, called proposition lists
and action lists. At the bottom of Figure
2 is depicted a simple domain we will refer to as the Alpha domain and use for illustration in this study. The figure shows four action and proposition
levels of the planning graph engendered by the simple initial state given the
domain. We start with the initial state
as the zeroth level proposition list.
Given a klevel planning graph, the extension of the graph structure to
level k+1 involves introducing all
actions whose preconditions are present in the k^{th} level proposition
list. In addition to the actions of the domain model, “no operation” actions
are introduced, one for each condition in the k^{th} level proposition
list (abbreviated as “nop” in this paper’s figures, but also termed “persists”
by others). A “nopC” action has C as
its precondition and C as its effect. Given the k^{th} level actions,
the proposition list at level k+1 is
constructed as just the union of the effects of all the introduced
actions. The planning graph maintains
the dependency links between the actions at level k+1, their
preconditions in the level k
proposition list, and their effects in the level k+1 proposition list.
During planning graph construction binary "mutex'' constraints are computed and propagated. In Figure 2, the arcs denote mutex relations between pairs of propositions and pairs of actions. The propagation starts at level 1 by labeling as mutex all pairs of actions that are statically interfering with each other (“static mutex”), that is their preconditions or effects are logically inconsistent. Mutexes are then propagated from this level forward using two simple propagation rules. Two propositions at level k are marked mutex if all actions at level k that support one proposition are mutex with all actions that support the second proposition. Two actions at level 2 are then mutex if they are statically interfering or if a precondition of the first action is mutually exclusive with a precondition of the second . (We term the latter “dynamic mutex”, since this constraint may relax at a higher planning graph level).[1] The propositions themselves can also be either static mutex (one negates the other) or dynamic mutex (all actions supporting one proposition are mutex with all actions supporting the other). To reduce Figure 2 clutter mutex arcs for propositions and their negations are omitted.
The search phase on a klevel planning graph involves checking to see if there is a subgraph of the planning graph that corresponds to a valid solution to the problem. Figure 3 depicts Graphplan search in a manner similar to the CSP variablevalue assignment process. Beginning with the propositions corresponding to the goals at level k, we incrementally select a set of actions from the level k action list that support all the goals, such that no two actions selected for supporting two different goals are mutually exclusive (if they are, we backtrack and try to change the selection of actions). This is essentially a CSP problem where the goal propositions at each level are the variables, actions that establish a proposition are the values, and the mutex conditions constitute constraints. The search proceeds in depthfirst fashion: Once all goals for a level are supported, we recursively call the same search process on the k1 level planning graph, with the preconditions of the actions selected at level k as the goals for the k1 level search. The search succeeds when we reach level 0 (the initial state) and the solution is extracted by unwinding the recursive goal assignment calls. This process can be viewed as a system for solving “Dynamic CSPs” (DCSP) (Mittal & Falkenhainer, 1990; Kambhampati 2000), wherein the standard CSP formalism is augmented with the concept of variables that do not appear (a.k.a. get activated) until other variables are assigned.
During the interleaved planning graph extension and search phases, the graph may be extended to a stasis condition, after which no further changes occur in actions, propositions, or mutex conditions. A sufficient condition defining this “leveloff” is a level where no new actions are introduced and no existing mutex conditions between propositions go away. We will refer to all planning graph levels at or above leveloff as ‘static levels’. Note that although the graph becomes static at this point, finding a solution may require many more episodes composed of adding identical static levels and conducting regression search on the problem goals.
Like many fielded CSP solvers, Graphplan's
search process benefits from a simple form of nogood learning. When a set of (sub)goals for a level k is
determined to be unsolvable, they are memoized
at that level in a hash table.
Subsequently, when the backward search process later enters level k with
a set of subgoals they are first checked against the hash table, and if a match
is found the search
process backtracks. This constitutes one
of three conditions for backtracking: the two others arise from attempts to
assign static mutex actions and dynamic mutex actions (See the Figure 3
legend).
We next discuss Graphplan’s search from a higherlevel view that abstracts away its CSP nature.
2.2 Graphplan as State Space Search
From a more abstract perspective, Graphplan can be viewed as conducting regression state space search from the problem goals to the initial state. In this view, the ‘states’ that are generated and expanded are the subgoals that result when the CSP process for a given set of subgoals finds a consistent set of actions satisfying the subgoals at that planning graph level (c.f. Kambhampati & Sanchez, 2000). In this view the “stategenerator function” is effectively Graphplan’s CSPstyle goal assignment routine that seeks a nonmutex set of actions for a given set of subgoals within a given planning graph level. This view is depicted in Figure 4, where the top graph casts the CSPstyle search trace of Figure 3 as a highlevel statespace search trace. The terms in each box depict the set of (positive) subgoals that result from the action assignment process for the goals in the higherlevel state to which the box is linked.[2]
Recognizing the statespace aspect of
Graphplan’s search helps in understanding its connection to IDA* search. First
noted and briefly discussed in (Bonet & Geffner, 1999), we highlight and expand
upon this relationship here. There are
three correspondences between the algorithms:
1. Graphplan’s episodic search process in which all nodes generated in the previous episode are regenerated in the new episode (and possibly some new nodes), corresponds to IDA*’s iterative search. Here the Graphplan nodes are the ‘states’ (sets of subgoals) that result when its regression search on a given plan graph level succeeds. From this perspective the “nodegenerator function” is effectively Graphplan’s CSPstyle goal assignment routine that seeks a nonmutex set of actions for a given set of propositions within a given planning graph level.
2. From the state space view of Graphplan’s search (ala Figure 4), within a given search episode/iteration the algorithm conducts its search in the depthfirst fashion of IDA*. This ensures that the space requirements are linear in the depth of a solution node.
3. The upper bound that is ‘iteratively deepened’ ala IDA* is the nodestate heuristic fvalue; f = g + h. In this context h is the distance in terms of associated planning graph levels between a state generated in Graphplan’s regression search and the initial state[3] and g is the cost of reaching the state from the goal state in terms of number of CSP epochs (i.e. the numerical difference between the highest graph level and the state’s level).
For our purposes, perhaps he most important observation is that the implicit fvalue bound for a given iteration is just the length of the planning graph associated with that iteration. That is, for any nodestate, its associated planning graph level determines both the distance to the initial state (h) and the cost to reach it from the goal state (g), and the total must always equal the length of the plan graph. This heuristic is clearly admissible; there can be no shorter distance to the goal because Graphplan exhaustively searches all shorter length planning graphs in (any) previous iterations. It is this heuristic implicit in the Graphplan algorithm which guarantees that a stepoptimal solution is returned. Note that from this perspective all nodes visited in a given Graphplan search iteration implicitly have the same fvalue: g + h = length of planning graph. We will consider implications of this property when we address informed traversal of Graphplan’s search space in Section 5.
The primary shortcoming of a standard IDA* approach to search is that it regenerates so many of the same nodes in each of its iterations. It has long been recognized that IDA*’s difficulties in some problem spaces can be traced to using too little memory (Russell, 1992; Sen & Bagchi, 1989). The only information carried over from one iteration to the next is the upper bound on the fvalue. Graphplan partially addresses this shortcoming with its memo caches that store “nogoods” states found to be inconsistent in successive episodes. However, the IDA* nature of its search can make it an inefficient planner for problems in which the goal propositions appear nonmutex in the planning graph many levels before a valid plan can actually be extracted.
A second shortcoming of the IDA* nature of Graphplan’s search is that all nodestates generated in a given Graphplan episode have the same fvalue (i.e. the length of the graph). As such, within an iteration (search episode) there is no discernible preference for visiting one state over another. We next discuss the use of available memory to target these shortcomings of Graphplan’s search.
2. Efficient Use of a Search Trace to Guide Planning Graph Search
The search space Graphplan explores in a given search episode is defined and constrained by three factors: the problem goals, the plan graph associated with the episode, and the cache of memoized nogood states created in all previous search episodes. Typical of IDA* search there is considerable similarity (i.e. redundancy) in the search space for successive episodes as the plan graph is extended. In fact, as discussed below, the backward search conducted at any level k+1 of the graph is essentially a “replay” of the search conducted at the previous level k with certain welldefined extensions. More specifically, essentially every set of subgoals generated in the backward search of episode n, starting at level k, will be regenerated by Graphplan during episode n+1 starting at level k+1 (unless a solution is found first).[4]
Now returning to Figure 4 in its entirety, note that it depicts a state space tree structure corresponding to Graphplan’s search over three consecutive iterations. The top graph, as discussed above, represents the subgoal ‘states’ generated in the course of Graphplan’s first attempt to satisfy the WXYZ goal of a problem resembling our running example. (It is implied here that the W,X,Y,Z propositions are present in the planning graph at level 7 and that this is the first level at which no pair of them is mutex.) In the second search episode (the middle Figure 4 graph), the same states are generated again, but each at one level higher. In addition, these states are expanded to generate a number of children, shown in a darker shade. (Since Figure 4 is a hypothetical variation of the Alpha domain problem detailed in Figures 2 and 3, all states created beyond the first episode are labeled only with state numbers representing the order in which they are generated.) Finally, in the third episode, Graphplan regenerates the states from the previous two episodes in attempting to satisfy WXYZ at level 9, and ultimately finds a solution (the assigned actions associated with the figure’s double outlined subgoal sets) after generating the states shown with darkest shading in the bottom graph of Figure 4.
Noting the extent to which consecutive iterations of Graphplan’s search overlap, we investigate the application of additional memory to store a trace of the explored search tree. The first implemented approach, EGBG (which is summarized in the following subsection), seeks to leverage an appropriately designed search trace to avoid as much of the interepisode redundant search effort as possible (Zimmerman & Kambhampati, 1999).
3.1 Aggressive Use of Memory in Tracing Search: The EGBG Planner
Like other types of CSPbased algorithms, Graphplan consumes most of its computational effort on a given problem in checking constraints. An instrumented version of the planner reveals that typically, 60  90% of the cpu runtime is spent in creating and checking action and proposition mutexes both during planning graph construction and the search process. (Mutex relations incorporated in the planning graph are the primary ‘constraints’ in the CSP view of Graphplan, Kambhampati, 2000) As such, this is an obvious starting point when seeking efficiency improvements for this planner and is the primary tactic adopted by EGBG. We provide here only an overview of the approach, referring the interested reader to Appendix A for details.
EGBG exploits four features of the planning graph and Graphplan’s search process:
§ The set of actions that can establish a given proposition at level k+1 is always a superset of those establishing the proposition at level k.
§ The “constraints” (mutexes) that are active at level k monotonically decrease with increasing planning graph levels. That is, a mutex that is active at level k may or may not continue to be active at level k+1 but once it becomes inactive it never gets reactivated at future levels.
§ Two actions in a level that are “statically” mutex (i.e. their effects or preconditions conflict with each other) will be mutex at all succeeding levels.
§ The problem goal set that is to be satisfied at a level k is the same set that will be searched on at level k+1 when the planning graph is extended. That is, once a subgoal set is present at level k with no two propositions being mutex, it will remain so for all future levels.
Given an appropriate trace of the search conducted in episode n (which failed to find a solution) we would like to ignore those aspects of the search that are provably unchanged in episode n+1, and focus effort on only features that may have evolved. If previous search failed to extract a solution from the klength planning graph, search on the k+1 length graph can succeed only if one or more of the following conditions holds:
1. The dynamic mutex condition between some pair of actions whose concurrent assignment was attempted in episode n no longer holds in episode n+1.
2. For a subgoal that was generated in the regression search of episode n at planning graph level k, there is an action that establishes it in episode n+1 and first appears in level k+1.
3. An episode n regression state (subgoal set) at level k that matched a cached memo at that level has no memomatch when it is generated at level k+1 in episode n+1.
(The discussion in Appendix A formalizes these conditions.) In each instance where one of these conditions does not hold, a complete policy must resume backward search under the search parameters associated with the instance in the previous episode, n. Such resumed partial search episodes will either find a solution or generate additional trace subgoal sets to augment the parent trace. This specialized search trace can be used to direct all future backward search episodes for this problem, and can be viewed as an explanation for the failure of the search process in each episode. We hereafter use the terms pilot explanation (PE) and search trace interchangeably. The following definitions are useful in describing the search process:
Search segment: This is essentially a state, specifically a set of planning graph levelspecific subgoals generated in regression search from the goal state (which is itself the first search segment). Each EGBG search segment S_{n }, generated at planning graph level k contains:
·
A
subgoal set of propositions to be satisfied
·
A pointer to the parent search segment (S_{p}_{
}), (the state at level k+1 that gave rise to S_{n})
· _{ }A list of the actions that were assigned in S_{p} which resulted in the subgoals of S_{n}
· A pointer to the PE level (as defined below) associated with the S_{n}
· A sequential list of results of the action consistencychecking process during the attempt to satisfy S_{n}’s subgoals. The possible trace results for a given consistency check are: static mutex, dynamic mutex, or action is consistent with all other prior assigned actions. Trace results are stored as a list of bit vectors for efficiency.
A search segment therefore represents a state plus some path information, but we often use ‘search segment’ and ‘state’ interchangeably. As such, all the boxes in Figure 4 (whether the state goals are explicitly shown or not) can be viewed as search segments.
Pilot explanation (PE): This is the search trace. It consists of the entire linked set of search segments representing the search space visited in a Graphplan backward search episode. It is convenient to visualize it as in Figure 4: a tiered structure with separate caches for segments associated with search on each planning graph level. We adopt the convention of numbering the PE levels in the reverse order of the plan graph: The top PE level is 0 (it contains a single search segment whose goals are the problem goals) and the level number is incremented as we move towards the initial state. When a solution is found, the PE will necessarily extend from the highest plan graph level to the initial state, as shown in the third graph of Figure 4.
PE transposition: When a state is first generated in search episode n it is associated with a specific planning graph level, say k. The premise of using the search trace to guide search in episode n+1 is based on the idea of reassociating each PE search segment (state) generated (or updated) in episode n with the next higher planning graph level. That is, we define transposing the PE as: For each search segment in the PE associated with a planning graph level k after search episode n, associate it with level k+1 for episode n+1.
Given these definitions, we note that the states in the PE after a search episode n on plan graph level k, loosely constitute the minimal set [5] of states that will be visited when backward search is conducted in episode n+1 at level k+1. (This bound can be visualized by sliding the fixed tree of search segments in the first graph of Figure 4 up one level.)
3.2 Conducting Search with the EGBG Search Trace
EGBG builds the initial pilot explanation during the first regression search episode while tracing the search process with an augmented version of Graphplan’s “assigngoals” routine. If no solution is possible on the klength planning graph, the PE is transposed up one level, and key features of its previous search are replayed such that significant new search effort only occurs at points where one of the three conditions described above holds. During any such new search process the PE is augmented according to the search space visited.
The EGBG search algorithm exploits its search trace in essentially bimodal fashion: It alternates informed selection of a state from the search trace of its previous experience with a focused CSPtype search on the state’s subgoals. Our discussion here of EGBG’s bimodal algorithm revolves around the second mode; minimizing redundant search effort once a state has been chosen for visitation. When we describe PEGG’s use of the search trace in Section 5 we will see that greater potential for dramatic efficiency increases lies with the first mode; the selection of a promising state from the search trace.
After choosing a state to visit, EGBG uses the trace from the previous episode to focus on only those aspects of the entailed search that could possibly have changed. For each search segment S_{i} at planning graph level k+1, visitation is a 4–step process:
1. Perform a memo check to ensure the subgoals of S_{i} are valid at level k+1
2. ‘Replay’ the previous episode’s action assignment sequence for all subgoals in S_{i}_{, }using the segment’s ordered trace vectors. Mutex checking is conducted on only those pairs of actions that were dynamic mutex at level k. For actions that are no longer dynamic mutex, add the candidate action to S_{i}’s list of consistent assignments and resume Graphplanstyle search on the remaining goals. S_{i ,}is augmented_{ }and the PE extended in the process. Whenever S_{i}’s goals are successfully assigned, entailing a new set of subgoals to be satisfied at lower level k, a child search segment is created, linked to S_{i }, and added to the PE.
3. For each S_{i} subgoal in the replay sequence, check also for new actions appearing at level k+1 that establish the subgoal. New actions that are inconsistent with a previously assigned action are logged as such in S_{i}’s assignments. For new actions that do not conflict with those previously assigned, assign them and resume Graphplanstyle search from that point as for step 2.
4. Memoize S_{i}’s goals at level k+1 if no solution is found via the search process of steps 2 and 3.
As long as all the segments in the PE are visited in this manner, the planner is guaranteed to find an optimal plan in the same search episode as Graphplan. Hereafter we refer to a PE search segment that is visited and extended via backward search to find a valid plan, as a seed segment. In addition, all segments that are part of the plan extracted from the PE we call plan segments. Thus, in the third graph of Figure 4, S_{18} is the apparent seed segment while the plan segments (in bottom up order) are; S_{30}, S_{29}, S_{18,} S_{17}, S_{16}, S_{15}, labeled segments YH, YHI, and the goal state WXYZ.
In principle we have the freedom to traverse the search states encapsulated in the PE in any order and are no longer restricted to the (noninformed) depthfirst nature of Graphplan’s search process. Unfortunately, EGBG incurs a high overhead associated with visiting the search segments in any order other than bottom up (in terms of PE levels). If an ancestor of any state represented in the PE were to be visited before the state itself, EGBG’s search process would regenerate the state and any of its descendents (unless it first finds a solution). There is a nontrivial cost associated with generating the assignment trace information in each of EGBG’s search segments; its search advantage lies in reusing that trace data without having to regenerate it.
On the other hand, topdown visitation of the segments in the PE levels is the degenerate mode. Such a search process essentially mimics Graphplan’s, since each episode begins with search on the problem goal set, and (with the exception of the replay of the toplevel search segment’s assignments) regenerates all the states generated in the previous episode plus possibly some new states during its regression search. The search trace provides no significant advantage under a topdown visitation policy.
The bottomup policy, on the other hand, has intuitive appeal since the lowest levels of the PE correspond to portions of the search space that lie closest to the initial state (in terms of plan steps). If a state in one of the lower levels can in fact be extended to a solution, the planner avoids all the search effort that Graphplan would expend in reaching the state from the toplevel problem goals.
Adopting a bottomup visitation policy amounts to layering a secondary heuristic on the primary IDA* heuristic, which is the planning graph length that is iteratively deepened. Recalling from Section 2.2 that all states in the PE have the same fvalue in terms of the primary heuristic, we are essentially biasing here in favor of states with low hvalues. Support for such a policy comes from work on heuristic guided statespace planning (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000) in which weighting h by a factor of 5 relative to the g component of the heuristic fvalue generally improved performance. However, unlike these statespace planning systems, for which this is the primary heuristic, EGBG employs it as a secondary heuristic so the guarantee of step optimality does not depend on its admissibility. We have found bottomup visitation to be the most efficient mode for EGBG and it is the default order for all EGBG results reported in this study.
3.3 EGBG Experimental Results
Table 1 shows some of the performance results reported for the first version of EGBG (Zimmerman & Kambhampati, 1999). Amongst the search trace designs we tried, this version is the most memory intensive and records the greatest extent of the search experience. Runtime, the number of search backtracks, and the number of search mutex checks performed is compared to the Lisp implementation of the original Graphplan algorithm. EGBG exhibits a clear advantage over Graphplan for this small set of problems;
· total problem runtime: 2.7  24.5x improvement
· Number of backtracks during search: 3.2  33x improvement
· Number of mutex checking operations during search: 5.5  42x improvement
Since total time is, of course, highly dependent on both the machine as well as
the coding language [6]
(EGBG performance is particularly sensitive to available memory), the backtrack
and mutex checking metrics provide a better comparative measure of search
efficiency. For Graphplan, mutex
checking is by far the biggest consumer of computation time and, as such, the
latter metric is perhaps the most complete indicator of search process
improvements. Some of the
problemtoproblem variation in EGBG’s effectiveness can be attributed to the
static/dynamic mutex ratio characterizing Graphplan’s action assignment
routine. The more action assignments
rejected due to pairwise statically mutex actions, the greater the advantage
enjoyed by a system that doesn’t need to retest them. TowerofHanoi problems fall into this
classification.
As noted in the original study (Zimmerman & Kambhampati, 1999) the range of problems that can
be handled by this implementation is significantly restricted by the amount of memory available to the program at runtime. For example, with a PE consisting of almost 8,000 search segments, the very modest sized BWLargeB problem challenges the available memory limit on our test machine. We consider next an approach (meEGBG in Figure 1) that occupies a middle ground in terms of memory demands amongst the search trace approaches we have investigated.
3. Engineering to Reduce EGBG Memory Requirements: The meEGBG Planner
The memory demands associated with Graphplan’s search process itself are not a major concern, since it conducts depthfirst search with search space requirements linear in the depth of a solution node. Since we seek to avoid the redundancy inherent in the IDA* episodes of Graphplan’s search by using a search trace, we must deal with a much different memorydemand profile. The search trace design employed by EGBG has memory requirements that are exponential in the depth of the solution. However, the search trace grows in direct proportion to the search space actually visited, so that techniques which prune search also act to greatly reduce its memory demands.
We examined a variety of methods with respect to this issue, and eventually implemented a suite of seven that together have proven instrumental in helping EGBG (and later, PEGG) overcome memorybound limitations. Six of these are known techniques from the planning and CSP fields: variable ordering, value ordering, explanation based learning (EBL), dependency directed backtracking (DDB), domain preprocessing and invariant analysis, and transitioning to a bipartite planning graph. Four of the six most effective methods are CSP speedup techniques, however our interest lies primarily in their impact on search trace memory demands. While there are challenging aspects to adapting these methods to the planning graph and search trace context, it is not the focus of this paper. Thus details on the motivation and implementation of these methods is relegated to Appendix B.
The seventh method, a novel variant of variable ordering we call ‘EBLbased reordering’, exploits the fact that we are using EBL and have a search trace available. Although the method is readily implemented in PEGG, the strict ordering of the trace vectors required by the EGBG search trace make it costly to implement for that planner. As such, ‘memoryefficient EGBG’ (meEGBG) does not use EBLbased reordering and we defer further discussion until PEGG is introduced in Section 5.
4.1 Impact of Enhancements on EGBG Memory Demands
There are two major modes in which the first six techniques impact memory demand for meEGBG: 1) Reduction in the size of the pilot explanation (search trace), either in the number of search segments (states), or the average trace content within the segments, and 2) Reduction in the requirements of structures that compete with the pilot explanation for available memory (i.e. the planning graph and the memo caches). Admittedly, these two dimensions are not independent, since the number of memos (though not the size) is linear in the number of search segments. We will nonetheless consider this partition in our discussion to facilitate the comparison of each method’s impact on the search trace.
In general, the impact of each these enhancements on the search process depends significantly, not only on the particular problem, but also on the presence (or absence) of any of the other methods. No single configuration of techniques proves to be optimal across a wide range of problems. Indeed, due to computational overhead associated with these methods, it is generally possible to find a class of problems for which planner performance degrades due to the presence of the method. We chose this set of techniques then, based on their joint average impact on the meEGBG / PEGG memory footprint over an extensive variety of problems.
Figure
5 illustrates for each method the impact on memory reduction relative to the
two dimensions above, when the method operates in isolation of the
others. The plot reflects results based on twelve problems in three domains
(logistics, blocksworld, and towerofhanoi), chosen to include a mix of
problems entailing large planning graphs, problems requiring extensive search,
and problems requiring both. The horizontal
axis plots percent reduction in the endofrun memory footprint of the combined
memo caches and the planning graph. The
ratios along this ordinate are assessed based on runs with Graphplan (no search
trace employed) where the memo cache and planning graph are the only globally
defined structures of significant size that remain in the Lisp interpreted
environment at run completion.[7] Similarly, the vertical axis plots percent
reduction in the space required for the PE at the end of EGBG runs with and without each method
activated, and with the planning graph and memo cache structures purged from
working memory.
The plot crossbars for each method depict the spread of reduction values seen across the twelve problems along both dimensions, with the intersection being the average. The bipartite planning graph, not surprisingly, impacts only the graph aspect, but five of the six methods are seen to have an impact on both search trace size and graph/memo cache size. Of these, DDB has the greatest influence on PE size but little impact on the graph or memo cache size, while EBL has a more modest influence on the former and a larger impact on the latter (due both to the smaller memos that it creates and the production of more ‘general’ memos, which engender more backtracks). Domain preprocessing/ invariant analysis can have a major impact on both the graph size and the PE size due to processes such as the extraction of invariants from operator preconditions. It is highly domain dependent, having little effect in the case of blocksworld problems, but can be of great consequence in towerofHanoi and some logistics problems.
That these six methods combined can complement each other is evidenced by the crossbars plotting space reduction when all six are employed at once. Over the twelve problems average reduction in PE size approaches 90% and average reduction in the planning graph/memo cache aspect exceeds 80%. No single method in isolation averages more than a 55% reduction along these dimensions.
The runtime reduction associated with each of these methods in isolation is also highly dependent on the problem and which of the other methods are active. In general, the relative time reduction for any two methods does not correlate closely with their relative memory reduction. However, we found that similarly, the techniques broadly complement each other such that net speedup accrues.
All of the techniques listed above can be (and have been) used to improve Graphplan’s performance also, in terms of speed. In order to focus on the impact of planning with the search trace, we use a version of Graphplan that has been enhanced by these six methods for all comparisons to meEGBG and PEGG in this study (We hereafter refer to this enhanced version of Graphplan as GPe).
4.2 Experimental Results with meEGBG
Table 2 illustrates the impact of the six
augmentations discussed in the previous section on EGBG’s (and Graphplan’s)
performance, in terms of both space and runtime. Standard Graphplan, GPe, EGBG, and meEGBG
are compared across 37 benchmark problems in a wide range of domains, including
problems from the first three AIPS planning competitions held to date. The problems were selected to satisfy three objectives:
a subset that both standard Graphplan and EGBG could solve for comparison to
meEGBG, different subsets that exceed the memory limitations of each of the
three planners in terms of either planning graph or PE size, and a subset that
gives a rough impression of search time limitations.
Not surprisingly, the memory efficient EGBG clearly outperforms the early version on all problems attempted. More importantly, meEGBG is able to solve a variety of problems beyond the reach of both standard Graphplan and EGBG. Of the 37 problems, standard Graphplan solves 12, the original EGBG solves 14, GPe solves 32, and meEGBG solves 32. Wherever meEGBG and GPe solve the same problem, meEGBG is faster by up to a factor of 62x, and averages ~4x speedup. Standard Graphplan (on the twelve problems it can solve), is bested by meEGBG by factors ranging from 3x to over 1000x.
The striking improvement of the memory efficient version of EGBG over the first version is not simply due to the speedup associated with the five techniques discussed in the previous section, but is directly tied to their impact on search trace memory requirements. Table 2 indicates one of three reasons for each instance where a problem is not solved by a planner: 1) s: planner is still in search after 30 cpu minutes, 2) pg: memory is exhausted or exceeded 30 minutes during the planning graph building phase, 3) pe: memory is exhausted during search due to pilot explanation extension. The third reason clearly favors meEGBG as the size of the PE (reported in terms of search segments at the time the problem is solved) indicates that it generates and retains in its trace up to 100x fewer states than EGBG. This translates into a much broader reach for meEGBG; it exhausts memory on 14% of the Table 2 problems compared to 49% for the first version of EGBG. Regardless, GPe solves three problems on which meEGBG fails in 30 minutes due to search trace memory demands
The table also illustrates the dramatic impact of the speedup techniques on Graphplan itself. The enhanced version, GPe, is well over 10x faster than the original version on problems they can both solve in 30 minutes, and it can solve many problems entirely beyond standard Graphplan’s reach. Nonetheless, meEGBG modestly outperforms GPe on the majority of problems that they both can solve. Since the EGBG (and PEGG) planners derive their strength from using the PE to shortcut Graphplan’s episodic search process, their advantage is realized only in problems with multiple search episodes and a high fraction of runtime devoted to search. Thus, no speedup is seen for gridy1 and all problems in the ‘mystery’, ‘movie’, and ‘mprime’ domains where a solution can be extracted as soon as the planning graph reaches a level containing the problem goals in a nonmutex state.
The bottomup order in which EGBG visits PE search segments turns out to be surprisingly effective for many problems. For Table 2 problems we found that in the great majority the PE for the final episode contains a seed segment (a state from which search will reach the initial state) within the deepest two or three PE levels. This supports the intuition discussed in Section 3.2 and suggests that the advantage of a low hvalue bias as observed for heuristic statespace planners (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000) translates to search on the planning graph.
Results for even the memory efficient version of EGBG reveal two primary weaknesses:
1. The action assignment trace vectors that allow EGBG to avoid redundant search are somewhat costly to generate, make significant demands on available memory for problems that elicit large search (e.g. Table 2 problems: logy4, 8puzzle1, freecell21), and are difficult to revise when search experience alters drastically in subsequent visits.
2. Despite its surprising effectiveness in many problems, the bottom up visitation of PE search segments is inefficient in others. For Table 2 problems such as freecell21 and essentially all ‘schedule’ domain problems, when the planning graph gets extended to the level from which a solution can be extracted, that solution arises via a new search branch generated from the root search segment (i.e. the problem goal state). Thus, the only seed segment in the PE is the topmost search segment, and bottomup visitation of the PE states is more costly than Graphplan’s topdown approach.
The first shortcoming is particularly manifest in problems that do not allow EGBG to exploit the PE (e.g. problems in which a solution can be extracted in the first search episode). The hit that EGBG takes on such problems relative to Graphplan is closely tied to the overhead associated with building its search trace. A compelling tactic to address the second shortcoming is to traverse the search space implicit in the PE according to state space heuristics. We might wish, for example, to exploit any of the variety of statespace heuristics that have revolutionized state space planners in recent years (Bonet & Geffner, 1999; Nguyen & Kambhampati, 2000; Gerevini & Serina, 2002). However, as we noted in Section 3.2, when we depart from a policy of visiting EGBG search segments in levelbylevel, bottomup order, we face more costly bookkeeping and high memory management overhead. More informed traversal of the statespace view of Graphplan’s search space is taken up next, where we argue that it’s perhaps the key benefit afforded by a trace of search on the planning graph.
Problem (steps/actions) 
Graphplan 
EGBG cpu sec
size of PE 
meEGBG (memory efficient EGBG) cpu sec size of PE 
SPEEDUP (meEGBG vs. GPe) 

cpu sec Stnd. GPe (enhanced) 

bwlargeB (18/18) 
126 
11.4 
79 
7919 
9.2 
2090 
1.2x 
hugefct (18/18) 
165 
13.0 
98 
8410 
9.1 
2964 
1.4x 
rocketexta (7/34) 
s 
3.5 
40.3 
1020 
1.8 
174 
1.9x 
attloga (11/79) 
s 
12.2 
pe 

7.2 
1115 
1.7x 
gripper8 (15/23) 
125 
14.2 
88 
9790 
7.9 
2313 
1.8x 
Tower6 (63/63) 
s 
43.1 
39.1 
3303 
7.6 
80 
5.7x 
Tower7 (127/127) 
s 
158 
s 

20.0 
166 
7.9x 
8puzzle1 (31/31) 
667 
57.1 
pe 

pe 
>16000 
(pe) 
8puzzle2 (30/30) 
304 
48.3 
pe 

26.9 
10392 
1.8x 
TSP12 (12/12) 
s 
454 
pe 

97.0 
7155 
4.7x 
AIPS 1998 
Graphplan 
GPe 
EGBG 
meEGBG 
Speedup 

gridy1 (14/14) 
388 
16.7 
393 
19 
16.9 
15 
1x 
gridy2 (??/??) 
pg 
pg 
pg 

pg 

~ 
gripperx3 (15/23) 
291 
16.1 
200 
9888 
8.4 
2299 
1.9x 
gripperx4 (19/29) 
s 
190 
pe 

65.7 
6351 
2.9x 
gripperx5 (23/35) 
s 
s 
pe 

433 
13572 
> 5x 
logy4 (11/56) 
pg 
470 
pg 

pe 
>25000 
(pe) 
mprimex29 (4/6) 
15.7 
5.5 
6.6 
4 
5.5 
4 
1x 
moviex30 (2/7) 
.1 
.05 
.06 
2 
.05 
2 
1x 
mystyx30 (6/14) 
83 
13.5 
85 
32 
13.5 
19 
1x 
AIPS 2000 
Graphplan 
GPe 
EGBG 
meEGBG 
Speedup 

blocks101 (32/32) 
s 
101 
pe 

16.1 
6788 
6.3x 
blocks120 (34/34) 
s 
24.2 
pe 

14.5 
3220 
1.7x 
logistics100 (15/56) 
s 
30.0 
s 

16.3 
1259 
1.8x 
logistics110 (13/56) 
s 
78.6 
pe 

10.0 
1117 
7.9x 
logistics121 (15/77) 
s 
s 
pe 

1205 
7101 
> 2x 
freecell21 (6/10) 
s 
98.0 
pe 

pe 
>12000 
(pe) 
schedule85 (4/14) 
pg 
63.5 
pg 

42.9 
6 
1.5x 
schedule92 (5/13) 
pg 
58.1 
pg 

46.8 
6 
1.2x 
AIPS 2002 
Graphplan 
GPe 
EGBG 
meEGBG 
Speedup 

depot6512 (10/26) 
239 
5.1 
219 
4272 
4.1 
456 
1.25x 
depot7654a (10/28) 
s 
32.5 
s 

14.8 
1199 
2.2x 
driverlog236a (10/24) 
1280 
2.8 
807 
1569 
1.0 
230 
2.8x 
driverlog236e (12/28) 
s 
169 
s 

83.3 
7691 
2x 
roverprob1425 (10/32) 
s 
18.9 
979 
10028 
10.3 
1522 
1.8x 
roverprob1423 (9/30) 
s 
170 
pe 

94.7 
10217 
1.8x 
stripssatx5 (7/22) 
313 
47.0 
272 
4111 
23.0 
2717 
2.0x 
stripssatx9 (6/35) 
s 
s 
s 

84.4 
306 
>21x 
ztravel38a (7/25) 
s 
972 
pe 

15.6 
1353 
62x 
ztravel37a (10/21) 
s 
s 
pe 

pe 
>20000 
~ 
4. Focusing on the State Space View: The soPEGG and PEGG Planners
The costs associated with
EGBG’s generation and use of its search trace are directly attributable to the
storage, updating, and replay of the CSP value assignments for a search segment’s
subgoals. We therefore investigated a
stripped down version of the search trace that abandons this tactic and focuses
instead on the embodied state space information. We will show that the PEGG planners employing
this search trace (both soPEGG, the
stepoptimal version and PEGG, a version using beam search), outperform the
EGBG planners on larger problems. The
key difference between EGBG’s pilot explanation and the pared down, ‘skeletal’
PE used by the PEGG planners, is the elimination of the detailed mutexchecking
information contained in the bit vectors of the former (i.e. the last item in
the bullet list of EGBG search segment contents in Section 3.1). The PEGG planners then apply statespace
heuristics to rank the PE search segments based on their associated subgoal
sets (states) and are free to visit this ‘state space’ in a more informed
manner. The tradeoff is that for each PE
state so visited the planner must regenerate the CSP effort of finding consistent
action assignments for the subgoals.
Figure 6 illustrates the PEGG advantage in a small hypothetical search trace at the final search episode. Here search segments in the PE at the onset of the episode appear in solid lines and all plan segments (states extendable to a valid plan) are shown as doublelined boxes. The figure reflects the fact that typically there may be many such latent plan segments in diverse branches of the search trace at the solutionbearing episode. Clearly a planner that can discriminate plan segment states from other states in the PE could solve the problem more quickly than a planner restricted to a bottomup traversal (deepest PE level first). State space heuristics endow the PEGG planners with this capability.
The soPEGG planner visits every search
segment in the PE during each search episode (comparable to Graphplan’s exhaustive
search on a given length graph) thereby guaranteeing that returned plans are
stepoptimal. As such, any advantage of
heuristicguided traversal is realized only in the final episode. For many problems, the computational effort
expended by Graphplan in the last search episode greatly exceeds that of all
previous episodes combined, so this can still be a powerful advantage. However, as we scale up to problems that are
larger in terms of the number and size of search episodes, the cost of
exhaustive search in even the intermediate episodes becomes prohibitive. The planner we refer to simply as PEGG
employs beam search, applying the search trace heuristics in all intermediate
search episodes to visit only a select subset of the PE segments. In so doing PEGG trades off the
stepoptimality guarantee for often greatly reduced solution times.
There are several challenges that must be dealt with to effectively use the pared down search trace employed by soPEGG and PEGG, including adaptation and augmentation of distancebased heuristics to guide search trace traversal and dealing with memory management problems induced by the tactic of ‘skipping about the search space’. Before we describe how we addressed such issues and give a more complete description of the algorithm, we first present some results that provide perspective on the effectiveness of these planners.
5.1 Experimental Results With soPEGG and PEGG
Table 3 compares Graphplan (standard and GPe), meEGBG, soPEGG, and PEGG over most of the same problems as Table 2, and adds a variety of larger problems that only the latter two systems can handle. Table 2 problems that were easily solved for GPe and meEGBG (e.g. those in the AIPS98 ‘movie’ and ‘mystery’ domains) are omitted from Table 3. Here, all planners that employ variable and value ordering (i.e. all except standard Graphplan), are configured to use value ordering based on the planning graph level at which an action first appears and goal ordering based on proposition distance as determined by the ‘adjustedsum’ heuristic (which will be defined below). There are a variety of other parameters for the soPEGG and PEGG planners for which optimal configurations tend to be problemdependent. We defer discussion of these to Sections 5.3, 5.4, and 5.6 but note here that for the Table 3 results the following parameter settings were used based on good performance on average across a variety of domains and problems:
· Secondary heuristic for visiting states: adjustedsum with w_{0}=1 (eqn 51)
· Beam search: visit the best 20% (lowest fvalue) search segments per search episode, with a minimum of 25 and a maximum of 50. Search segments with ‘flux’ lower than 10% of average are not visited regardless of heuristic rank. (w_{cf} = .01, see section 5.6.1)
Problem 
Graphplan 
meEGBG cpu sec (steps/acts) 
soPEGG heur:istic: adjsum cpu sec (steps/acts) 
PEGG heur: adjsumu cpu sec (steps/acts) 
Speedup (PEGG vs. GPe) 

cpu sec (steps/acts) Stnd. GPe (enhanced ) 

bwlargeB 
194.8 
11.4 (18/18) 
9.2 
7.0 
4.1 (18/18) 
2.8x 

bwlargeC 
s 
s (28/28) 
pe 
1104 
24.2 (28/28) 
> 74x 

bwlargeD 
s 
s (38/38) 
pe 
pe 
388 (38/38) 
> 4.6x 

attloga 
s 
31.8 (11/79) 
7.2 
2.9 (11/72) 
2.2 (11/62) 
14.5x 

attlogb 
s 
s 
pe 
s 
21.6 (13/64) 
> 83x 

Gripper8 
s 
14.2 (15/23) 
7.9 
30.6 
5.5 (15/23) 
2.6x 

Gripper15 
s 
s 
pe 
s 
46.7 (31/45) 
> 38.5x 

Tower7 
s 
158 (127/127) 
20.0 
14.3 
6.1 (127/127) 
26x 

Tower9 
s 
s (511/511) 
232 
118 
23.6 (511/511) 
> 76x 

8puzzle1 
2444 
57.1 (31/31) 
pe 
31.1 
9.2 (31/31) 
6.2x 

8puzzle2 
1546 
48.3 (30/30) 
26.9 
31.3 
7.0 (32/32) 
6.9x 

TSP12 
s 
454 (12/12) 
97.0 
390 
6.9 (12/12) 
51x 

AIPS 1998 
Stnd GP GPe 
meEGBG 
soPEGG 
PEGG 
Speedup 

gridy1 
388 
16.7 (14/14) 
17.9 
16.8 
16.8 (14/14) 
1x 

gripperx5 
s 
s 
433 
512 
110 (23/35) 
> 16x 

gripperx8 
s 
s 
pe 
s 
520 (35/53) 
> 3.5x 

logy5 
pg 
470 (16/41) 
pe 
361 
30.5 (16/34) 
15.4x 

AIPS 2000 
Stnd GP GPe 
meEGBG 
soPEGG 
PEGG 
Speedup 

blocks101 
s 
95.4 (32/32) 
16.1 
18.7 
6.9 (32/32) 
13.8x 

blocks120 
~ 
26.6 (34/34) 
14.5 
23.0 
9.4 (34/34) 
2.8x 

blocks162 
s 
s 
pe 
s 
28.1 (56/56) 
> 64 x 

logistics100 
~ 
30.0 (15/56) 
16.6 
21 
7.3 (15/53) 
4.1x 

logistics121 
s 
s 
1205 (15/77) 
1101 (15/75) 
17.4 (15/75) 
> 103x 

logistics140 
s 
s 
pe 
s 
678 (13/74) 
> 2.7x 

freecell21 
pg 
98.0 (6/10) 
pe 
102 
19.5 (6/10) 
>92x 

freecell35 
pg 
1885 (7/16) 
pe 
511 
101 (7/17) 
18.7x 

schedule89 
pg 
300 (5/12) 
615 
719 
719 (5/12) 
(.42x) 

AIPS 2002 
Stnd GP GPe 
meEGBG 
soPEGG 
PEGG 
Speedup 

depot7654a 
s 
32.5 (10/28) 
14.8 
12.9 
13.2 (10/26) 
2.7x 

depot4321 
s 
s 
s 
s 
42.6 (14/37) 
>42x 

depot1212 
s 
s 
s 
s 
79.1 (22/53) 
>22.8x 

driverlog236e 
s 
166 (12/28) 
83.3 
109 
80.6 (12/26) 
2.1x 

driverlog336b 
s 
s 
pe 
1437 (11/39) 
169 (14/45) 
> 10.7x 

roverprob1423 
s 
170 (9/30) 
pe 
63.4 
15.0 (9/26) 
11.3x 

roverprob4135 
s 
s 
pe 
s 
379 (12 / 43) 
> 4.7x 

roverprob8271 
s 
s 
pe 
s 
220 (11 / 39) 
> 8.2x 

satx5 
313 
45 (7/22) 
43.0 
27.0 
25.1 (7 / 22) 
1.7x 

satx9 
s 
s 
918 
9.9 
9.9 (6 / 35) 
>182x 

ztravel38a 
s 
972 (7/25) 
15.6 
11.2 
15.1 (9/26) 
119x 

ztravel37a 
s 
s 
pe 
s 
101 (10/23) 
> 18x 

Focusing first on the GPe, meEGBG,
and soPEGG columns, we clearly see
the impact of the tradeoff between storing and exploiting all the intrasegment
action assignment information in the PE.
In this set of 37 problems, 16 result in meEGBG exceeding available
memory due to the size of the PE while only one pushes that limit for soPEGG.
Seven of the problems that cause meEGBG
to run out of memory are actually solved by soPEGG
while the remainder exceed the time limit during search. In addition, soPEGG handles five problems in the table that GPe fails on. These problems typically entail extensive
search in the final episode, where the PE efficiently shortcuts the fullgraph
search conducted by GPe. The speedup advantage of soPEGG relative to GPe ranges between a modest slowdown on three problems to almost
87x on the ZenoTravel problems, with an average of about 5x. (Note that the speedup values reported in the
table are not for soPEGG.)
Generally, any planner using a search trace will under perform
GPe on single search episode problems such as gridy1, in which the cost of
building the trace is not recovered. The
low overhead associated with building soPEGG’s
search trace means it suffers little relative to GPe in this case. On most
problems that both meEGBG and soPEGG can solve, meEGBG has the upper hand due to its ability to avoid redundant
consistencychecking effort. The fact that meEGBG’s
advantage over soPEGG is not greater
for such problems is attributable both to soPEGG’s
ability to move about the PE search
space in the final search episode (versus meEGBG’s
bottomup traversal) and its lower
overhead due to its more concise search trace. Note that there is no obvious reason to prefer one state traversal order over the other in non solutionbearing episodes since these stepoptimal planners visit all the states in their PE for these search episodes. [8]
Now turning attention to the PEGG results, it’s apparent that the beam search greatly extends the size of problems that can be handled. PEGG solves ten larger problems of Table 3 that could not be solved by either soPEGG or enhanced Graphplan. Speedwise PEGG handily outperforms the other planners on every problem except schedule89, where GPe has a factor of 2.3x advantage. As indicated by the table’s righthand column, the speedup of PEGG over GPe ranges from .42x to over 182x. This is a conservative bound on PEGG’s maximum advantage relative to GPe since speedup values for the seventeen problems that GPe fails to solve were conservatively assessed at the time limit of 1800 seconds.
We defer further analysis of these results to Section 6 in order to first describe the PEGG algorithm and the advantages it extracts from its search trace.
5.2 The Algorithm for the PEGG Planners
The highlevel algorithm for soPEGG and PEGG is given in Figure 7. As for Graphplan, search begins once the planning graph it has been extended to the level where all problem goals first appear with no binary mutex conditions. (The routine, find_1st_level_with_goals is virtually the same as Graphplan’s and is not defined here). The first search episode is then conducted in Graphplan fashion, except that the assign_goals and assign_next_level_goals routines of Figure 8 initialize the PE as they create search segments that hold all states generated during the regression search process. The assign_goals pseudocode outlines the process of compiling “conflict sets” (see Appendix B) as a means of implementing DDB and EBL during the action assignment search. The assign_next_level_goals routine illustrates the role of the toplevel conflict set for recording a minimal nogood when search on a state is completed (EBL) and depicts how variable ordering need be done only once for a state (when the search segment is created). A child segment is created and linked to its parent_{ }(extending the PE) in assign_next_level_goals whenever all parent goals are successfully assigned. The assign_next_level_goals routine determines the subgoals for the child search segment by regressing the parent’s goals over the actions assigned and then checks to see if either the initial state has been reached or there are no remaining goals. If so, success is signaled by returning the child search segment which can then be used to extract the ordered actions in the plan.
Subsequent to the first episode, PEGG_plan enters an outer loop that employs the PE to conduct successive search episodes. For each episode, the newly generated search segments from the previous episode are evaluated according to a state space heuristic, ranked, and merged into the already ordered PE. In an inner loop each search segment is visited in turn by passing its subgoals to the Graphplanlike assign_goals routine.
It is the exit conditions on the inner loop that primarily differentiate soPEGG and PEGG. Whereas soPEGG will visit every search segment whose goals are not found to match a memo, PEGG restricts visitation to a best subset, based on a userspecified criterion. As such, expansion of the planning graph can be deferred until a segment is chosen for visitation that transposes to a planning graph level exceeding the current graph length. As a consequence, in some problems the PEGG planners may be able to extract a stepoptimal solution while building one less level than other Graphplanbased planners.[9]
Note that PEGG’s algorithm combines both statespace and CSPbased aspects in
its search:
· It chooses for expansion the most promising state based on the previous search iteration and state space heuristics. PEGG and soPEGG are free to traverse the states in its search trace in any order.
· A selected state is expanded in Graphplan’s CSPstyle, depthfirst fashion, making full use of all CSP speedup techniques outlined above.
The first aspect most clearly distinguishes PEGG from EGBG: traversal of the state space in the PE is no longer constrained to be bottomup and levelbylevel. As it was for EGBG, management of memory associated with the search trace is a challenge for PEGG once we stray from bottomup traversal, but it is less daunting. It will be easier to outline how we address this if we first discuss the development and adaptation of heuristics to search trace traversal.
5.3 Informed Traversal of the Search Trace Space
The HSP and HSPR state space planners (Bonet & Geffner, 1999) introduced the idea of using the ‘reachability’ of propositions and sets of propositions (states) to assess the difficulty degree of a relaxed version of a problem. This concept underlies their powerful ‘distance based’ heuristics for selecting the most promising state to visit. Subsequent work demonstrated how the planning graph can function as a rich source of such heuristics (Nguyen & Kambhampati, 2000). Since the planning graph is already available to PEGG, we adapt and extend heuristics from the latter work to serve in a secondary heuristic role to direct PEGG’s traversal of its search trace states. Again, the primary heuristic is the planning graph length that is iteratively deepened (Section 2.2), so the stepoptimality guarantee for the soPEGG planner does not depend on the admissibility of this secondary heuristic.
There are important differences between heuristic ranking of states generated by a state space planner and ordering of the search segments (states) in PEGG’s search trace. For example, a state space planner chooses to visit a given state only once while the PEGG planners often must consider whether to revisit a state in many consecutive search episodes. Ideally, a heuristic to rank states in the search trace should reflect levelbylevel evolutions of the planning graph, since the transposition process associates a search segment with a higher level in each successive episode. For each higher planning graph level that a given state is associated with, the effective regression search space ‘below’ it changes as a complex function of the number of new actions that appear in the graph, the number of dynamic mutexes that relax, and the nogoods in the memo caches. Moreover, unlike a state space planner’s queue of previously unvisited states, the states in a search trace include all children of each state generated when it was last visited. Ideally the value of visiting a state should be assessed independently of the value associated with any of its children, since they will be assessed in turn. Referring back to the search trace depicted in Figure 6, we desire a heuristic that can, for example, discriminate between the #4 ranked search segment and its ancestor, top goal segment (WXYZ). Here we would like the heuristic assessment of segment WXYZ to discount the value associated with its children already present in the trace, so that it is ranked based only on its potential for generating new local search branches.
We next discuss adaptation of known planning graph based heuristics for the most effective use with the search trace.
5.3.1 Adoption of distancebased state space heuristics
The heuristic value for a state, S, generated in backward search from the problem goals can be expressed as:
51) _{}
where: g(S) is the distance from S to the problem goals (e.g. in terms of steps)
h(S) is a distance estimate from S to the initial state (e.g. in steps)
w_{0} is an optional weighting factor
The value of g for any state