Accounting for Positive Interaction

The additive heuristic does not take reuse of actions (other than the dummy action a0) into account, so it often overestimates the actual number of actions needed to complete a plan. The need to take positive interaction into account in order to obtain a more accurate heuristic estimate has been recognized in both state space planning (Nguyen & Kambhampati, 2000; Hoffmann & Nebel, 2001; Refanidis & Vlahavas, 2001) and plan space planning (Nguyen & Kambhampati, 2001). For IPC3 we used a slight modification of the additive heuristic to address the issue of action reuse:

haddr($\displaystyle \pi$) = $\displaystyle \sum_{{\stackrel{q}{\longrightarrow}\!a_i\in\mathcal{OC}(\pi)}}^{}$$\displaystyle \left\{\vphantom{\begin{array}{ll}
0 & \mbox{if $\exists a_j\in\m... \mathcal{O}$}\\
h_{\mathrm{add}}(q) & \mbox{otherwise}
\end{array}}\right.$$\displaystyle \begin{array}{ll}
0 & \mbox{if $\exists a_j\in\mathcal{A}$\ s.t.\...
..._j \not\in \mathcal{O}$}\\
h_{\mathrm{add}}(q) & \mbox{otherwise}

The underlying assumption for this heuristic cost function is that an open condition $ \;\stackrel{{q}}{{\longrightarrow}}\;$ai that can possibly be resolved by linking to the effect of an existing action aj will not give rise to a new action when resolved. This can of course lead to an overly optimistic estimate of the number of actions required to complete the plan. The modified heuristic is still not admissible, however, since the same cost value as before is used for open conditions that cannot be linked to effects of existing actions. In other words, we only account for possible reuse of existing actions and not potential actions.

To illustrate the difference between hadd($ \pi$) and haddr($ \pi$) consider a planning domain with two action schemas A1 and A2, where A1 has no preconditions and A2 has a single precondition q. Assume that q can only be achieved through an action instance of A1. The heuristic cost for the literal q is therefore 1 according to the additive heuristic. Consider now a plan $ \pi$ with two unordered actions a1 and a2 (ai being an instance of action schema Ai) and a single open condition $ \;\stackrel{{q}}{{\longrightarrow}}\;$a2. We have hadd($ \pi$) = hadd(q) = 1 corresponding to the addition of a new instance of action schema A1 to achieve q, but haddr($ \pi$) = 0 because there is an action (viz. a1) that is not ordered after a2 and has an effect that unifies with q. Table 1 shows that taking reuse into account can have a significant impact on planning time in practice. The modified additive heuristic haddr clearly dominates hadd in the DriverLog and ZenoTravel domains despite incurring a higher overhead per generated plan. The results in the Satellite domain are more mixed, with hadd having a slight edge overall. We show planning times for the four flaw selection strategies that were used by VHPOP at IPC3. These and other novel flaw selection strategies are discussed in detail in Section 3.2.2.

Table: Planning times in seconds using different flaw selection strategies for a selection of problems in the DriverLog, ZenoTravel, and Satellite domains, showing the impact of taking reuse into account in the plan ranking heuristic. A dash (-) means that the planner ran out of memory (512Mb).
Problem MW-Loc MW-Loc-Conf LCFR-Loc LCFR-Loc-Conf
  hadd haddr hadd haddr hadd haddr hadd haddr
DriverLog6 8.65 0.16 4.41 0.13 87.58 2.01 - 1.16
DriverLog7 3.66 0.34 0.63 0.17 21.15 1.28 1.57 0.22
DriverLog8 - - 110.26 1.48 - 177.27 - 2.05
DriverLog9 - 0.33 - 0.28 - - - -
DriverLog10 4.13 2.11 0.71 0.76 3.79 0.64 1.30 0.83
ZenoTravel6 - 0.93 17.41 2.90 25.09 0.95 11.24 2.82
ZenoTravel7 - - - 37.81 - - - 33.10
ZenoTravel8 - 15.48 - 37.99 - - - 6.45
ZenoTravel9 - 86.21 - 11.53 - 33.37 26.33 9.49
ZenoTravel10 - 26.59 - 21.22 - 21.20 - 18.22
Satellite6 0.36 0.22 0.37 0.24 0.32 0.21 0.40 0.24
Satellite7 0.49 0.37 0.54 0.84 0.55 0.51 0.62 -
Satellite8 1.09 - 1.29 0.84 0.85 0.83 1.25 0.68
Satellite9 2.41 - 2.11 - 1.84 - 2.50 -
Satellite10 1.53 1.12 1.95 1.11 1.50 1.36 2.08 1.37

Hoffmann and Nebel (2001) describe the FF heuristic that takes positive interaction between actions into account by extracting a plan from the relaxed planning graph1, and argue that the accounting of action reuse is one of the contributing factors to FF's performance advantage over HSP. The FF heuristic can take reuse of potential actions into account, and not just existing actions as is the case with our modified additive heuristic. This should result in a better estimate of actual plan cost, but requires that a plan is extracted from the relaxed planning graph for every search node, which could be costly. It would be interesting to see how the FF heuristic performs if used in a plan space planner.

The heuristic cost function used in REPOP (Nguyen & Kambhampati, 2001), a heuristic partial order planner working solely with ground actions, is defined using a serial planning graph.2 The heuristic is similar in spirit to the FF heuristic, and can like the FF heuristic take reuse of potential actions into account. The REPOP heuristic also takes into account reuse of existing actions, but seemingly without considering ordering constraints, which is something we do in our modified additive heuristic. Furthermore, our haddr heuristic always takes reuse of any existing actions that achieves a literal q into account, while the REPOP heuristic only considers an existing action if it happens to be selected from the serial planning graph as the the action that achieves q. The results in Table 2 indicate that the REPOP heuristic may be less effective than the additive heuristic (with and without reuse) in certain domains.

Håkan L. S. Younes