We perform pairwise comparisons between planners to ascertain whether any consistent pattern can be identified in their relative speed and plan quality. We focus first on comparing the fully-automated planners and, separately, the hand-coded planners. We then perform an additional set of analyses to try to determine the raw performance benefit obtained from the use of hand-coded control knowledge. To do this, we perform the Wilcoxon test on pairs crossing the boundary between the fully-automated and hand-coded planner groupings. Where the conclusion is that the improvement obtained is significant, all we can say is that control rules yield an improvement in performance. We cannot account for the price, in terms of effort to encode the rules, that must be paid to obtain this improvement. The understanding of what is involved in writing useful control knowledge is still anecdotal and it remains an important challenge to the community to quantify this more precisely. One important consequence of the use of hand-crafted control knowledge is that to speak of ``planner'' performance blurs the distinction between the planning system itself and the control rules that have been produced to support its performance in each domain. Where a planner performs well it is impossible to separate the contributions from the planning system, the architecture of that system (and the extent that this contributes to the ease of expressing good control rules) and the sophistication of the control rules that have been used. We do not attempt to distinguish planner from control rules in the analysis that follows, but at least one competitor observed that results would have been significantly worse had there been less time to prepare, while, given more time, results could have been improved by concentrating on optimisation of plan metrics rather than simply on makespans. This observation helps to highlight the fact that, for planners exploiting hand-coded control knowledge, the competition format should be seen as a highly constrained basis for evaluation of performance.
To summarise, we now present the hypotheses we are exploring in this section:
Null Hypothesis: There is no basis for any pairwise distinction between the performances of planners in terms of either time taken to plan or in quality (according to the specified problem metrics) of plans produced.
Alternative Hypothesis: The planners can be partially ordered in terms of their time performances and, separately, their quality performances.