Test Problems

Following standard practice, our experiments require planners to solve commonly available benchmark problems and the AIPS competition problems. In addition, to test our assumptions about the influence of domains (assumption PR1) and representations of problems (assumption PR2), we will also include permuted benchmark problems and some other application problems. This section describes the set of problems and domains in our study, focusing on their source and composition.

The problems require only STRIPS capabilities (i.e., add and delete lists). We chose this least common denominator for several reasons. First, more capable planners can still handle STRIPS requirements; thus, this maximized the number of planners that could be included in our experiment. Also, not surprisingly, more problems of this type are available. Second, we are examining assumptions of evaluation, including the effect of required capabilities on performance. We do not propose to duplicate the effort of the competitions in singling out planners for distinction, but rather, our purpose is to determine what factors differentially affect planners.

The bulk of the problems came from the AIPS98 and AIPS 2000 problem sets and the set of problems distributed with the PDDL specification. The remaining problems were solicited from several sources. The source and counts of problems and domains are summarized in Table 2.

Table 2: Summary of problems in our testing set: source of the problems, the number of domains and problems within those domains.
Source # of Domains # of Problems
Benchmarks 50 293
AIPS 1998 6 202
AIPS 2000 5 892
Developers 1 13
Application 3 72

