Peot and Smith's work provided the foundation for our subsequent exploration of the least-cost flaw repair (LCFR) strategy [Joslin & Pollack, 1994]. We hypothesized that the power of the DUnf strategy might come not from its relative ordering of threats and open conditions, but instead from the fact that DUnf has the effect of imposing a partial preference for least-cost flaw selection. DUnf will always prefer a forced threat, which, by definition has a repair cost of at most one; thus, in cases in which there is a forced threat, DUnf will make a low-cost selection. What about cases in which there are no forced threats? Then DUnf will have to select among open conditions, assuming there are any. If our hypothesis is correct, a version of DUnf that makes this selection using a least-cost strategy (i.e., DUnf-LC) ought to perform better than a version that uses one of the other strategies (i.e., bare DUnf or DUnf-FIFO). In fact, if it is the selection of low-cost repairs that is causing the search-space reduction, then the idea of treating threat resolution differently from open condition establishment ought to be abandoned. Instead, a strategy that always selects the flaw with minimal repair cost, regardless of whether it is a threat or an open condition, ought to show the best performance. This is the Least-Cost Flaw Repair (LCFR) strategy:
There are strong similarities between LCFR and certain heuristics that have been proposed and studied in the literature on constraint satisfaction problems (CSPs). This is perhaps not surprising, given that flaw selection in POCL planning corresponds in some fairly strong ways to variable selection in constraint programming. Flaws in a POCL planner represent decisions that are yet to be made, and that must be made before the plan will be complete; unbound variables play a similar role in constraint satisfaction problems (CSPs). Although there exist a number of heuristics for selecting a variable to branch on in solving a CSP [Kumar, 1992], one well-known heuristic that is often quite effective is the fail first principle, which picks the variable that is the ``most constrained'' when selecting a variable to branch on. A simple and common implementation of the fail first principle selects the variable with the smallest domain [Tsang93, 1993].
The intuition behind the fail first principle is that one should prune dead-end regions of the search as early as possible. The unbound variables that are most tightly constrained are likely to be points at which the current partial solution is most ``brittle'' in some sense, and by branching on those variables we hope to find a contradiction (if one exists) quickly. Similarly, LCFR can be thought of as selecting the ``most constrained'' flaws, resulting in better pruning.
A similar heuristic has also been adopted in recent work on controlling search in hierarchical task network (HTN) planning, in the Dynamic Variable Commitment Strategy (DVCS). DVCS, like LCFR, is based on a minimal-branching heuristic. Experimental analyses demonstrate that DVCS generally produces a well-focused search [Tsuneto, Erol, Hendler & Nau, 1996].
Our own initial experimental results, presented in Joslin and Pollack (1994), similarly supported the hypothesis that a uniform least-cost flaw repair strategy could be highly effective in reducing the size of the search space in POCL planning. In those experiments, we compared LCFR against four other strategies: UCPOP, DUnf, and DUnf-LC, as defined above, and a new strategy, UCPOP-LC which we previously called LCOS [Joslin & Pollack, 1994]:
At the same time, we observed that LCFR incurred an unwieldy overhead, often taking longer to solve a problem than UCPOP, despite the fact that it was searching far fewer nodes. In part this was due a particularly inefficient implementation of LCFR that we were using, but in part it resulted from the fact that computing repair costs is bound to take more time than simply popping a stack (as in a LIFO strategy), or finding a flaw of a particular type (as in a strategy that prefers threats). We therefore explored approximation strategies, which reduce the overhead of flaw selection by accepting some inaccuracy in the repair cost calculation. For example, we developed the ``Quick LCFR'' (or QLCFR) strategy, which calculates the repair cost of any flaw only once, when that flaw is first encountered. In any successor node in which the flaw remains unresolved, QLCFR assumes that its repair cost has not changed. Our experiments with QLCFR showed it to be a promising means of making a least-cost approach sufficiently fast to pay for its own overhead. Additional approximation strategies were studied by Srinivasan and Howe [Srinivasan & Howe, 1995], who experimented with three variations of LCFR, along with a fourth, novel strategy that moves some of the control burden to the user.