Journal of Artificial Intelligence Research
pp. 1-33. Submitted 8/01; published 7/02.
© 2002 AI Access Foundation
and Morgan Kaufmann Publishers.
All rights reserved.
Postscript and PDF versions of this document are
A Critical Assessment of
Benchmark Comparison in Planning
Adele E. Howe HOWE@CS.COLOSTATE.EDU
Eric Dahlman DAHLMAN@CS.COLOSTATE.EDU
Computer Science Department, Colorado State University,
Fort Collins, CO 80523
Recent trends in planning research have led to empirical comparison
becoming commonplace. The field has started to settle into a
methodology for such comparisons, which for obvious practical
reasons requires running a subset of planners on a subset of
problems. In this paper, we characterize the methodology and
examine eight implicit assumptions about the problems, planners and
metrics used in many of these comparisons. The problem assumptions
are: PR1) the performance of a general purpose planner should not be
penalized/biased if executed on a sampling of problems and domains,
PR2) minor syntactic differences in representation do not affect
performance, and PR3) problems should be solvable by STRIPS capable
planners unless they require ADL. The planner assumptions are: PL1)
the latest version of a planner is the best one to use, PL2) default
parameter settings approximate good performance, and PL3) time
cut-offs do not unduly bias outcome. The metrics assumptions are:
M1) performance degrades similarly for each planner when run on
degraded runtime environments (e.g., machine platform) and M2) the
number of plan steps distinguishes performance. We find that most of
these assumptions are not supported empirically; in particular, that
planners are affected differently by these assumptions. We conclude
with a call to the community to devote research resources to
improving the state of the practice and especially to enhancing the
available benchmark problems.
©1993 AI Access Foundation and Morgan Kaufmann
Publishers. All rights reserved.