Cross-boundary partial orderings

We performed a final collection of comparisons to try to better understand what advantages can be obtained from the use of hand-coded rather than fully-automated planners, in terms of speed and quality. We compare the best-performing fully-automated planner with the best-performing hand-coded planner in both categories: FF with TLPLAN for speed, at all levels, and LPG with TALPLANNER at the STRIPS level, SHOP2 at the numeric level and with TLPLAN at the remaining problem levels, for quality.

**Figure 13:** A comparison between the best of the fully-automated planners and the best of the hand-coded planners at each problem level.

**Figure 14:** Table of results for comparisons of fully-automated and hand-coded planners in terms of speed. Each cell represents a pair of planners being compared. It presents the Z-value and corresponding p-value identified from the Wilcoxon statistical table. The order of the planners' names in the title of the cell is significant: the first planner named is the one favoured by the comparison. Underneath the cell is an entry indicating the size of the sample used.
$\begin{figure}\begin{center} {\scriptsize\begin{tabular}{\vert l\vert r\vert r\v... ...icolumn{1}{c\vert}{102} \\ \cline{1-2} \end{tabular}} \end{center}\end{figure}$

**Figure 15:** Table of results of comparisons of plan quality between fully-automated and hand-coded planners.
$\begin{figure}\begin{center} {\scriptsize\begin{tabular}{\vert l\vert r\vert r\v... ...t}{93} \\ \cline{1-2} \cline{5-5} \end{tabular}\par } \end{center}\end{figure}$

The tables in Figures 14 and 15 show the results of the tests. Figure 13 summarises the conclusions. It can be observed that TLPLAN is consistently faster than FF at all problem levels in which they both participated, demonstrating that the control knowledge being exploited by TLPLAN is giving it a real speed advantage. It remains to be seen exactly why this should be the case, given that for several STRIPS domains the control knowledge that is usually described as having been encoded appears to prune no additional states over those already pruned when an FF-style heuristic measure is used. The reason for this added value is an interesting question for the community to consider in trying to evaluate the advantages and disadvantages of the hand-coded approach.

It can also be observed that TALPLANNER produces consistently better concurrent plans than LPG at the STRIPS level. Again, this result needs to be explained by an in-depth analysis of the control information being exploited by TALPLANNER. At the SIMPLETIME level LPG produces plans that are consistently better quality than those of TLPLAN.

It is interesting to observe that hand-coding control information does not appear to lead to any consistent improvement in plan quality across the data sets. It does seem to lead to a speed advantage which must indicate that, in general, control rules provide a basis for more efficient pruning than weak general heuristic measures. The Wilcoxon test does not measure the extent of the speed advantage obtained, nor does it measure the extent of the quality advantage obtained from using a fully-automated planner in preference. These trade-offs need further close analysis, but it is interesting to see that there was not in fact a uniform advantage obtained by the hand-coded planners, at least on the smaller problems that formed the common foundation for testing. Of course, the development of hand-coded control knowledge can prioritise different aspects of the solutions generated and it is possible that further development of control rules might support the construction of more heavily optimised plans.