Having described our overall experimental design, we now turn to the analysis of the results. To begin, we sought to re-establish the claims we originally made in our earlier work. Specifically, we wanted first to reconfirm, using a larger data set, that least-cost flaw selection is an effective technique for reducing the size of the search space generated by a POCL planner. We therefore ran an experiment in which we compared the the node counts for the five strategies we had earlier studied--LCFR, DUnf, DUnf-LC, UCPOP, and UCPOP-LC--plus one new one, DUnf-Gen, explained below.
The results of this experiment are shown in Figures and . The former is a log-log scatter plot, showing the performance of each of the six strategies on the 33 problems in the basic set. The problems were sorted by the minimal number of nodes generated on them by any of the six strategies. Thus, the left-hand side of the graph includes the problems that at least one of the six strategies found to be relatively easy, while the right-hand side has the problems that were hard for all six strategies. We omitted problems that none of the six strategies were able to solve. The actual number of nodes generated by each strategy is plotted on the Y-axis, against the minimal number of nodes for that problem, on the X-axis. LCFR's performance is highlighted with a line connecting its data points. This graph shows that, in general, LCFR generates small search spaces on this problem set, relative to the other strategies in this class. There were only six problems for which LCFR was not within 10% of the minimum. Three of these are in the Get-Paid/Uget-Paid class of problems--including two of the ``hardest'' problems (UGet-Paid3 and UGet-Paid4). We discuss this class of problems more in Section .
An alternative view of the data is given in Figure , which shows the aggregate performance of the six strategies, i.e., the average of their node-count %-overrun on the basic problems. As can be seen, LCFR has the smallest average %-overrun.
Figures and present similar views of the data for the Tileworld domain, while Figure gives the data for the Trains problems. On the Trains domain, these six strategies were only able to solve the easiest problem (Trains1), so we simply show the actual node counts in Figure . We have omitted two data points, because they were so extreme that their inclusion on the graph made it impossible to see the differences among the other strategies: LCFR and DUnf-Gen with S+OC+UC node selection took 28,218, and 35,483 nodes, respectively, to solve the problem.
For the Tileworld and Trains problems, we observed the same sorts of interactions between node and flaw selection strategies as were seen by Gerevini and Schubert. Specifically, LCFR performs relatively poorly with S+OC on the Tileworld problems, and it performs very poorly with S+OC+UC on the Trains problems. However, when paired with the other node-selection strategies, LCFR produces the smallest search spaces of any strategies in this class.
In sum, LCFR does tend to produce smaller search spaces than the other strategies in this class. But a question remains. LCFR uses a least-cost strategy, and a side effect of this is that it will prefer forced flaws, since forced flaws are low-cost flaws. It is therefore conceivable that LCFR's performance is mostly or even fully due to its preference for forced flaws, and not (or not greatly) influenced by its use of a least-cost strategy for unforced flaws. This same hypothesis could explain why DUnf-LC consistently outperforms DUnf, and why UCPOP-LC consistently outperforms UCPOP.
It was to address this issue that we included DUnf-Gen in our experiment. DUnf-Gen is a simple strategy that prefers forced flaws of any kind, and otherwise uses a LIFO regime. We would expect DUnf-Gen and LCFR to perform similarly, since they frequently make the same decision. Specifically, they will both select the same flaw in any node in which there is a forced flaw; they will differ when there are only unforced flaws, with DUnf-Gen selecting a most recently introduced flaw and LCFR selecting a least-cost flaw.
In practice, DUnf-Gen's performance closely mimicked that of LCFR's. On the basic problem set it did only marginally worse than LCFR. In fact, it does marginally better when we reverse the order in which the planner adds the preconditions of each new step to the open list (see Section ). LCFR does somewhat better than DUnf-Gen on both the Trains and Tileworld problems, and this is true regardless of the order in which the preconditions were added to the open list, but the extent to which it does better varies.
Thus, the data is inconclusive about the value of using a least-cost strategy for unforced flaws. LCFR clearly benefits from selecting forced flaws early (as a side effect of preferring least-cost flaws), but it may not matter whether it continues to use a least-cost strategy for the unforced flaws. If indeed it is generally sufficient to use a least-cost strategy only for forced flaws, then ZLIFO's performance is somewhat less puzzling, since ZLIFO also prefers forced flaws. However the puzzle is not completely resolved. After all, DUnf-Gen, like ZLIFO, prefers forced flaws and and then makes LIFO-based decisions about unforced flaws, and while its performance is not clearly inferior to LCFR's, neither is it clearly superior. Even if the use of LIFO for unforced flaws does not obviously increase the search-space, neither does it appear to decrease it.