Heuristics for Forward-Chaining Planning

Many of the heuristics used to guide forward-chaining planners are based around solving an abstraction of the original, hard, planning problem with which the planner is presented. The most widely used abstraction involves planning using `relaxed actions', where the delete effects of the original actions are ignored. FF, HSP [Bonet & Geffner 2000] and UnPOP [McDermott 1996] use relaxed actions as the basis for their heuristic estimates, although FF was the first to count the number of relaxed actions in a relaxed plan connecting the goal to the initial state. Although ignoring delete lists turns out to be a powerful relaxation, at least for a class of planning domains, other relaxations are possible. More recently, work has been done using an abstraction based on causal graph analysis [Helmert 2004].

The initial approaches for calculating goal distance estimates, taken by planners such as HSP, calculated the cost of reaching the goal state from the state to be evaluated by doing a forwards reachability analysis from the state until the given goal appears. Two heuristics can be derived from this information: either the maximum of the steps-to-goal values--an admissible heuristic; or the sum of the steps-to-goal values--an inadmissible heuristic, which in practice is more informative. The disadvantage of the latter of these approaches is that it ignores any positive interactions (shared actions) between the action sequences for each goal: it is this problem which was addressed by the heuristic used in FF. In FF, a planning graph [Blum & Furst 1995] is built forward from the current state using relaxed actions--this is known as a relaxed planning-graph (RPG). A relaxed plan (one using relaxed actions) to achieve the goal state can be extracted from the RPG in polynomial time; the number of actions in this plan can be used to provide the heuristic value. As Graphplan does not provide a guarantee that the plan found will contain the optimum number of sequentialised actions (only that it will have the optimum makespan) the heuristic returned is inadmissible, but in practice the heuristic is more informative than any of those used previously.

Andrew Coles and Amanda Smith 2007-01-09