To conduct our comparison, we implemented a set of flaw selection strategies in UCPOP v.4. Table lists the strategies that we implemented. Except for LCFR-DSep and DUnf-Gen, which are discussed later, all the implemented strategies were described in Section .
UCPOP | {n,s}LIFO/{o}LIFO |
UCPOP-LC | {n,s}LIFO/{o}LC |
DSep | {n}LIFO/{o}LIFO/{s}LIFO |
DSep-LC | {n}LIFO/{o}LIFO/{s}LIFO |
DUnf | {n,s}_{0}LIFO / {n,s}_{1}LIFO / {o}LIFO / {n,s}_{2-infinity}LIFO |
DUnf-LC | {n,s}_{0}LIFO / {n,s}_{1}LIFO / {o}LC / {n,s}_{2-infinity}LIFO |
DUnf-Gen | {n,s,o}_{0}LIFO / {n,s,o}_{1}LIFO / {n,s,o}LC |
LCFR | {n,o,s}LC |
LCFR-DSep | {n.o}LC/{s}LC |
ZLIFO | {n}LIFO / {o}_{0}{LIFO} / {o}_{1}{New} / {o}_{2-infinity}LIFO / {s}LIFO |
We tested all the strategies on three problem sets, also used in our earlier work [Joslin & Pollack, 1994] and in Gerevini and Schubert's (1996):
We ran each strategy on each problem twice. The first time, we imposed a node limit, of 10,000 nodes for the basic problems, and of 100,000 nodes for the Trains and Tileworld problems. The second time, we imposed a time limit, of 100 seconds for the basic problems, and of 1000 seconds for the Trains and Tileworld problems.
Gerevini and Schubert experimented with several different node selection strategies for the Trains and Tileworld domains, so to facilitate comparison we also used the same node selection strategies as they did. For the basic problems, we used S+OC.
In reporting our results, we make use not only of raw counts of nodes generated and computation time in seconds taken, but we also compute a measure of how badly a strategy performed on a given problem or set of problems. We call this measure %-overrun, and compute it as follows. Let m be the minimum node count on a given problem for any of the strategies we tested, and let c be the node count for a particular strategy S. Then
%-overrun(S) =[(c-m)/m]*100
If a strategy hit the node limit, we set c to the relevant node limit (10,000 or 100,000) to compute its node-count %-overrun. Similarly, if a strategy hit the time limit, we used the relevant time limit (100 or 1000) to compute the computation-time %-overrun.
Online Appendix A provides the raw data--node counts and computation-time taken--for all the experiments we conducted; it also includes computed %-overruns.
In conducting experiments such as these, one has to set either a node- or time limit cutoff for each strategy/problem pair. However, there is always a danger that these cutoffs unfairly bias the data, if the limits are set in such a way that certain strategies that fail would instead have succeeded were the limits increased slightly. We have carefully analyzed our data to help eliminate the possibility of such a bias; details are given in Appendix A.