The first event was held in conjunction with the fourth international Artificial Intelligence Planning and Scheduling conference (AIPS'98). It was organised by Drew McDermott, together with a committee of senior members of the research community [McDermottMcDermott2000]. The event took some time to organise, with evolving agreement on the form of the event, the kinds of planning systems that should be compared, the basis of comparison and so on. The final event became a direct comparison of 5 STRIPS-based planners, with two of the planners also attempting an extended ADL-based language [McDermottMcDermott2000,LongLong2000]. The systems included three Graphplan-based systems, one forward heuristic search system and one planning-as-satisfiability SATsolver planner. A very important outcome of the first competition was the adoption of PDDL [McDermott the AIPS'98 Planning Competition CommitteeMcDermott the AIPS'98 Planning Competition Committee1998] as a common representation language for planning.
Although the opportunity was offered for competitors to hand-code control knowledge for their planners, in fact all of the planners were fully-automated and ran on the problem instances without any priming. The entire event was staged at the conference over a period of some four days, involving intensive sessions of generating and checking solutions and attempting to evaluate the results. One idea that was tried, but that turned out to be problematic in practice, was to score the planners' performances using a function that attempted to take into account the time taken to generate a plan, the length of the plan and the relative performance of all of the competitors on the problems. For example, a planner that produced a plan faster than all of its competitors would be rewarded based on how much faster it was than the average for the problem. This attempt to score planners using a one-dimensional measure proved difficult, with counter-intuitive results in certain cases. In the end it was abandoned in favour of two dimensions: length of plan and time taken to produce it. This decision indicates that, even for only five systems and a relatively small set of problems, it is impossible to make unequivocal decisions about which system is best. Nevertheless, the community can (and did) learn much from the data that is gathered, offering a variety of interpretations of the data, but ultimately being inspired to improve on it in every way possible.
In the second competition, chaired by Fahiem Bacchus, 17 planners competed. The increase in participation and the ambitions for larger scale testing required that the event be spread over a much longer period. In fact, testing was spread over a couple of months, with only one final test being carried out at the conference site (AIPS'00 in Breckenridge). In the second competition there was a more formal split between systems, with a small number using hand-coded control knowledge and others being fully-automated. There was also a split between STRIPS and ADL capable systems. The larger number of competitors included a wider range of approaches: as well as Graphplan-based systems, forward heuristic search and a SATsolver, there were several planners based on model-checking approaches using BDDs, and one using planning-by-rewriting. Again, it proved difficult to compare planners unequivocally, but several important observations could be made: the advantages of hand-coded control rules in most domains could be seen clearly (as would be expected), although there remained an important question about the difficulty of generating and writing the rules. Of the fully-automated planners, the forward heuristic search approach proved to be particularly successful, dominating performance in most domains. Pure Graphplan-based planning seemed to have reached its zenith between the first two competitions and no longer appeared competitive.
The third competition (and most recent at the time of writing) was held in association with AIPS'02 at Toulouse. Fourteen planners participated. The primary objective of the competition was to help to push forward research into temporal and resource-intensive planning. Extensions were made to PDDL to support the modelling of temporal and numeric domain features. These resulted in the PDDL2.1 language [Fox LongFox Long2003]. The extensive changes to PDDL2.1 and the ambitious objectives of the competition help to account for the fact that fewer people participated in 2002 than in 2000. Once again, the real testing and gathering of data took place over the two months prior to the conference. Although initial results were presented at the conference, no detailed analysis took place at the conference itself. The rest of this paper examines the objectives of the third competition, the results and some future challenges for the series.