AIPS Competitions: 1998 and 2000

Next: Problems Solicited from Planner Up: Test Problems Previous: Benchmark Problems

AIPS Competitions: 1998 and 2000

For the first AIPS competition, Drew McDermott solicited problems from the competitors as well as constructing some of his own, such as the mystery domain, which had semantically useless names for objects and operators. Problems were generated for each domain automatically. The competition included 155 problems from six domains: robot movement in a grid, gripper in which balls had to be moved between rooms by a robot with two grippers, logistics of transporting packages, organizing snacks for movie watching, and two mystery domains, which were disguised logistics problems.

The format of the 1998 competition required entrants to execute 140 problems in the first round. Of these problems, 52 could not be solved by any planner. For round two, the planners executed 15 new problems in three domains, one of which had not been included in the first round.

The 2000 competition attracted 15 competitors in three tracks: STRIPS, ADL and a hand-tailored track. It required performance on problems in five domains: logistics, Blocksworld, parts machining, Freecell (a card game), and Miconic-10 elevator control. These domains were determined by the organizing committee, with Fahiem Bacchus as the chair, and represented a somewhat broader range. We chose problems from the Untyped STRIPS track for our set.

From a scientific standpoint, one of the most interesting conclusions of both competitions was the observed trade-offs in performance. Planners appeared to excel on different problems, either solving more from a set or finding a solution faster. In 1998, IPP solved more problems and found shorter plans in round two; STAN solved its problems the fastest; HSP solved the most problems in round one; and blackbox solved its problems the fastest in round one. In 2000, awards were given to two groups of distinguished planners across the different categories of planners (STRIPS, ADL and hand tailored), because according to the judges, ``it was impossible to say that any one planner was the best''[Bacchus2000]; TalPlanner and FF were in the highest distinguished planner group. The graphs of performance do show differences in computation time relative to other planners and to problem scale-up. However, each planner failed to solve some problems, which makes these trends harder to interpret (the computation time graphs have gaps).

The purpose of these competitions was to showcase planner technology at which they succeeded admirably. The planners solved much harder problems than could have been accomplished in years past. Because of this trend in planners handling increasingly difficult problems, the competition test sets may become of historical interest for tracking the field's progress.

Next: Problems Solicited from Planner Up: Test Problems Previous: Benchmark Problems