In this subsection we evaluate how SOLEP macros can improve performance in the competition system. We compare the planner with implementation enhancements against the planner with both implementation enhancements and SOLEP macros.
For each of the seven test domains, we show the number of expanded nodes and the total CPU time, again, on a logarithmic scale. A CPU time chart shows no distinction between a problem solved very quickly (within a time close to 0) and a problem that could not be solved. To determine what the case is, check the corresponding node chart, where the absence of a data point always means no solution.
Figure 17 summarizes the results in Satellite, Promela Optical Telegraph, and Promela Dining Philosophers. In Satellite and Promela Optical Telegraph, macros greatly improve performance over the whole problem sets, allowing MACROFF to win these domain formulations in the competition. In Promela Optical Telegraph macros led to solving 12 additional problems. The savings in Promela Dining Philosophers are limited, resulting in one more problem solved.
Figure 18 shows the results in the ADL version of Airport. The savings in terms of expanded nodes are significant, but they have little effect on the total running time. In this domain, the preprocessing costs dominate the total running time.
The complexity of preprocessing in Airport also limits the number of solved problems to 21. The planner can solve more problems when the STRIPS version of Airport is used, but no macros could be generated for this domain version. STRIPS Airport contains one domain definition for each problem instance, whereas our learning method requires several training problems for a domain definition.
Figure 19 contains the results in Pipesworld NonTemporal NoTankage, Pipesworld NonTemporal Tankage, and PSR. In Pipesworld NonTemporal NoTankage, macros often lead to significant speedup. As a result, the system solves four new problems. On the other hand, the system with macros fails in three previously solved problems. The contribution of macros is less significant in Pipesworld NonTemporal Tankage. The system with macros solves two new problems and fails in one previously solved instance. Out of all seven benchmarks, PSR is the domain where macros have the smallest impact. Both systems solve 29 problems using similar amounts of resources. In the competition official run, MACROFF solved 32 problems in this domain formulation.



Table 3 shows the number of training problems, the total training time, and the selected macros in each domain. The training phase uses 10 problems for each of Airport, Satellite, Pipesworld NonTemporal NoTankage, and PSR. We reduced the training set to 5 problems for Promela Optical Telegraph, 6 problems for Promela Dining Philosophers, and 5 problems for Pipesworld NonTemporal Tankage. In Promela Optical Telegraph, the planner with no macros solves 13 problems, and using most of them for training would leave little room for evaluating the learned macros. The situation is similar in Promela Dining Philosophers; the planner with no macros solves 12 problems. In Pipesworld NonTemporal Tankage, the smaller number of training problems is caused by both the long training time and the structure of the competition problem set. The first 10 problems use only a part of the domain operators, so we did not include these into the training set. Out of the remaining problems, the planner with no macros solves 11 instances. The large training times in Pipesworld NonTemporal Tankage and PSR are caused by the increased difficulty of the training problems.