Interpretation

Next: Scaling Issues Up: Tests for Magnitude Previous: Results of Analysis

Interpretation

The results allow us to reject the null hypothesis in some cases, but not in others. We are able to determine significant differences in the relative hardness of domains as determined by specific planners, but there is also evidence of lack of consistency between the judgements of different planners. For example, there are some domain/level combinations that are found hard by certain planners and not by others.

The tables in Figures 22 and 23 allow us to determine which domains presented the most interesting challenges to the planners participating in the competition. Although it is difficult to draw firm conclusions from data that is really only indicative, some interesting patterns do emerge. For example, the level-specific data in Figure 22 shows that none of the fully-automated planners found ZenoTravel problems, at any levels, to be significantly hard by comparison with problems drawn from other domains at the same level. The Satellite STRIPS problems were significantly easy, by comparison with other STRIPS problems, for the majority of the participating planners, and not hard for any of them. On the other hand the Satellite NUMERIC problems were found to be challenging relative to other NUMERIC problems. Figure 23 shows that the hand-coded planners found ZenoTravel problems easy at all levels, by comparison with problems at similar levels, and this remains true for the large problem instances. Depots problems were also easy for the hand-coded planners.

When we consider the level-independent picture in the right-hand halves of Figures 22 and 23 we can observe that ZenoTravel emerges as significantly easy for the fully-automated planners, across all levels, by comparison with other problems irrespective of level. This pattern is broken by only one full-automated planner (LPG) finding these problems hard at the TIME problem level. The Satellite domain is similarly easy for the fully-automated planners, at all levels except NUMERIC. It can be noted that the number of planners finding the STRIPS problems easy in the level-independent comparisons is surprisingly high. The interpretation is that the problems in the population as a whole are much harder, so that the performance on STRIPS problems is pushed to the extremes of the performance on all problems. The hand-coded planners found the Depots and ZenoTravel problems to be uniformly easy at all levels.

Considering both the fully-automated and the hand-coded planners, the DriverLog, Rovers and Satellite domains present the most varied picture, suggesting that the problems in these domains presented the greatest challenges overall. All of the hand-coded planners found the SIMPLETIME Rovers problems significantly hard relative to other SIMPLETIME problems, but only one found these problems amongst the hardest that they had to solve overall. Interestingly, the perceived difficulty of the small Rovers problems does not persist into the large problems.

An interesting comparison can be made between the results of the analysis for STRIPS domains and the work of Hoffmann hofftop analysing the topologies of STRIPS and ADL versions of the common planning benchmark domains. Hoffmann examines the behaviour of the function, measuring relaxed distances between states in the state spaces for these problems, in order to determine whether the function offers a reliable guide to navigate through the state space in search of plans. According to Hoffmann's analysis, the STRIPS versions of Depots, DriverLog and Rovers have local minima in the function and can have arbitrarily wide plateaus (sequences of states with equal values under ). These features can make problem instances in these domains hard for planners relying on (or approximations of it) to guide their search. This includes most of the fully-automated planners in the competition. However, interestingly, several of the fully-automated planners found one or more of these three domains to be easy at the STRIPS level (although in a few cases they were found to be hard). As Hoffmann points out, the potential hardness of a domain does not mean that all collections of problem instances from that domain are hard. Our observations seem to suggest that the competition collections posed instances that tended towards the easy end of the spectrum. This was unintentional and demonstrates that it can be difficult to obtain a good spread of challenges, particularly when generating problems automatically. Satellite and ZenoTravel domains have, in contrast, constant-bounded plateaus and therefore the function is a reliable guide in navigating the state space for these domains. Interestingly, in our analysis all fully-automated planners found these domains either easy or neither easy nor hard at the STRIPS level.

Next: Scaling Issues Up: Tests for Magnitude Previous: Results of Analysis

Derek Long 2003-11-06