The problem domains selected for the competitions have been, or have become, benchmark domains used by much of the community for empirical evaluation. The domains that have been used have often been chosen to probe some specific detail of performance. This has sometimes meant that the domains are not representative of general features of planning and are inappropriate for use in more widespread testing. A description of the domains used in all of the competitions so far can be found in Appendix A.
In the third competition, eight families of domains were used, broadly divided into transportation domains (Depots, DriverLog and ZenoTravel), applications-inspired domains (Rovers and Satellite) and a small collection of others (Settlers, FreeCell and UM-Translog-2).
We briefly summarise the collection here and describe them in more detail in Appendix A.
We also reused the Freecell domain from the second competition. This domain presented a serious challenge to participants in 2000 and we were interested to see whether planning technology had surpassed this challenge in the intervening two years. Although the domain produced some interesting data we did not attempt to precisely measure the extent to which the 2002 performance surpassed that of 2000.
Each domain (other than Settlers, Freecell and UM-Translog-2) was presented to the competitors for at least the four different levels previously identified: STRIPS, NUMERIC SIMPLETIME and TIME. The problems presented at each of these levels comprised distinct tracks and the competitors were able to choose in which tracks they wished to compete. In addition to the four main tracks we also included two additional tracks, intended to explore particular ideas. These tracks did not necessitate the use of additional expressive power but simply allowed existing expressiveness to be combined to produce interesting planning challenges. For example, the HARDNUMERIC track consisted of problems from the Satellite domain that had very few logical goals. Plans were evaluated by a metric based on amount of data recorded rather than by determining whether a specified logical goal had been achieved. The challenge was for planners to respond to the plan metric and include actions that would acquire data. The COMPLEX track consisted of problems that combined temporal and numeric features. The challenge was to reason about resource consumption in parallel with managing temporal constraints. In total, we defveloped 26 domains, with 20 problem instances in each domain (a few, unintentially, ended up with 16 or 22 instances). In most domains there were an additional 20 instances of large problems intended for the hand-coded planners. In total there were nearly 1000 problem instances to be solved, of which about half were intended primarily for the fully-automated planners.