The tables in Figures 16 to 20 are organised as follows. Tables in Figures 16, 18 and 20 contain the speed results found for the fully-automated, hand-coded and mixed pairs respectively. Tables in Figures 17, 19 and 20 contain the quality results for the same three groups. In each table there is a row for each of the five competition levels (although empty rows have been omitted). The columns represent the pairs of planners being compared. In each cell five pieces of data are reported: the mean normalised performance for each planner; the t-value computed and the degrees of freedom used (which is derived from the number of double hits at that level) and the resulting p-value. A positive t-value means that the magnitude difference is in favour of the planner identified second in the cell. A negative t-value is in favour of the planner identified first. Where the resulting t-value indicates a difference in magnitude that is not significant at the p=0.05 level we use a bold font. In both speed and quality tests, an average performance smaller than 1 is favourable for a planner. The interpretation of the value is that it represents the average proportion of the mean performances of a pair of planners on each test set. Thus, an average performance of 0.66 for a planner (which will compare with an average performance of 1.34 for the other planner in the pair being considered) means that the first planner is, on average, twice as fast as the second.

To summarise, the hypotheses being explored in this section are:

Null Hypothesis:The domains used in the competition were equally challenging to all planners at all levels.

Alternative Hypothesis:Domain/level combinations can be distinguished in terms of relative difficulty posed to different planners.

For ease of comparison with the results presented in Sections 7 and 8 we observe that, in this section, we are specifically concerned with a cross-domain analysis and with whether the planners agreed on which of the domain/level combinations were hard.