#### Planner Assumption 1: Is the Latest Version the Best?

In this study, we compared performance of multiple versions of four planners (labeled for this section with W, X, Y and Z, with larger version numbers indicating subsequent versions). We considered two criteria for improvement: outcome of planning and computation time for solved problems. The outcome of planning is one of: solved, failed or timed-out. On each criterion, we statistically analyzed the data for superior performance of one of the versions. The outcome results for all the planners are summarized in Table 7. As the table shows, rarely does a new version result in more problems being solved. Only Z improved the number of our test problems solved in subsequent versions.

Table 7: Version performance: counts of outcome and change in number solved.
 Planner Version Solved Failed Timeout Solved? W 1 286 664 533 W 2 255 1082 147 X 1 502 973 3 X 2 441 940 103 Y 1 387 750 339 Y 2 382 771 329 Z 1 240 1043 201 Z 2 276 959 248 Z 3 268 963 252 Z 4 421 878 184

To check for whether the differences in outcome are significant, we ran 2x3 tests with planner version as independent variable and outcome as dependent. Table 8 summarizes the results of the analysis. For Z, we compared each version to its successor only. The differences are significant except for Y and the transition from Z 2 to 3 (this was expected because these two versions were extremely similar).

Table 8: results comparing versions of the same planner.
 old new Planner Version Version P W 1 2 320.96 .0001 X 1 2 98.84 .0001 Y 1 2 .46 .79 Z 1 2 10.96 .004 Z 2 3 .158 .924 Z 3 4 48.50 .0001

Another planner performance metric, which we evaluated, was the speed of solution. For this analysis, we limited the comparison to just those problems that were solved by both versions of the planner. We then classified each problem by whether the later version solved the problem faster, slower, or in the same time as the preceding version. From the results in Table 9, we see that all of the planners improved in the average speed of solution for subsequent versions, with the exception of Z (transition from the 1 to 2 versions). However, Z did increase the number of problems solved between those versions.

Table 9: Improvements in execution speed across versions. The Faster column counts the number of cases in which the new version solved the problem faster; Slower specifies those cases in which the new version took longer to solve a given problem.
 Planner Old New Faster Slower Same Total W 1 2 161 61 30 252 X 1 2 295 126 0 421 Y 1 2 222 82 53 357 Z 1 2 84 121 30 235 Z 2 3 131 84 53 268 Z 3 4 115 92 21 228

