# Scaling Issues

Section 6 addressed the issue of relative difficulty of problems without considering the question of whether planners agree about the difficulty of specific problems. The results of that section allow us to conclude that there is no overall consensus about which of the competition domains and levels were found hard, but it does not allow us to determine which planners agreed or disagreed on particular domains and levels. In order to look at the relative scaling behaviour of planners we need to identify the extent of such agreement. This is because to examine scaling behaviour it is necessary to have a scale that measures performance in a way that is meaningful to both planners in a comparison. The analysis described in this section therefore seeks to establish statistical evidence of such agreement.

In order to evaluate scaling behaviour we first explore whether the competing planners agree on what makes a problem, within a particular domain and level, hard. Although it might seem straightforward to ensure that a problem set consists of increasingly difficult problems (for example, by generating instances of increasing size) in fact it is not straightforward to achieve this. It appears that problem size and difficulty are not strongly correlated, whether size is taken as a measure of the number of objects, the number of relations or even the number of characters in a problem description. Although a coarse relationship can be observed -- very large instances take more time to parse and to ground -- small instances can sometimes present more difficult challenges than large instances. This indicates that factors other than size appear to be important in determining whether planners can solve individual instances.

In summary, the hypotheses explored in this section are:

Null Hypothesis: The planners differ in their judgements about which individual problem instances are hard within a given domain/level combination.
Alternative Hypothesis: The planners demonstrate significant agreement about the relative difficulties of the problem instances within any given domain/level combination.

In this section we are specifically concerned with a within-domain/level analysis and with whether planners agree on the relative difficulty of problem instances within a given domain/level combination.

Derek Long 2003-11-06