VHPOP allows for several flaw selection strategies to be used simultaneously in a round-robin scheme. This lets us exploit the strengths of different flaw selection strategies concurrently, which was essential for the success of VHPOP at IPC3 since we have yet to find a single superior flaw selection strategy that dominates all other flaw selection strategies in terms of the number of solved problems within a given time frame. The technique we use in VHPOP for supporting multiple flaw selection strategies is in essence the same as the technique proposed by Howe et al. (1999) for exploiting performance benefits of several planners at once in a meta-planner. Although the meta-planner is slower than the fastest planner on any single problem, it can solve more problems than any single planner.

We used four different flaw selection strategies at IPC3 (Table 6), preferring local flaw selection strategies as they tend to incur a lower overhead than global strategies such as LCFR and MW and often appear more effective than global strategies because of a maintained focused on subgoal achievement. The four strategies were selected after some initial experimentation with problems from a few of the competition domains.

Table 6: Flaw selection strategies used by VHPOP at IPC3.
Name Specification
MW-Loc {n, s}LR  /  {l}MWadd
MW-Loc-Conf {n, s}LR  /  {u}MWadd  /  {l}MWadd
LCFR-Loc {n, s, l}LR
LCFR-Loc-Conf {n, s, u}LR  /  {l}LR

The use of multiple flaw selection strategies can be thought of as running multiple concurrent instances of the planner, as a separate search queue is maintained for each flaw selection strategy that is used. Similar to HSP2.0 at the planning competition in 2000 (Bonet & Geffner, 2001a), we use a fixed control strategy to schedule these multiple instances of our planner. The first time a flaw selection strategy is used, it is allowed to generate up to 1000 search nodes. The second time the same flaw selection strategy is used, it can generate another 1000 search nodes, making it a total of 2000 search nodes. At each subsequent round i, each flaw selection strategy is permitted to generate up to 1000 . 2i-2 additional nodes. The maximum number of nodes generated using a specific flaw selection strategy is 1000 . 2i-1 after i rounds. An optional upper limit on the number of generated search nodes can be set for each flaw selection strategy. This is useful for flaw selection strategies that typically solve problems quickly, when they solve them at all within reasonable time. Table 7 shows the search limits used by VHPOP at IPC3. These limits were determined after some initial trials on the competition problems. Note that there was no set search limit for the last flaw selection strategy. Whenever the other three strategies all reached their search limits without having found a solution, LCFR-Loc-Conf was used until physical resource limits were reached.

Table 7: Execution order of flaw selection strategies used at IPC3, and also search limits used with each strategy on domains with and without durative actions.
Name Order STRIPS Limit Durative Limit
MW-Loc 1 10000 12000
MW-Loc-Conf 2 100000 100000
LCFR-Loc 3 200000 240000
LCFR-Loc-Conf 4 $ \infty$ $ \infty$

Table 8 shows the number of plans generated in the STRIPS Satellite domain before a solution is found for the four flaw selection strategies used at IPC3, and also the number of generated plans when combining the four strategies using the schedule in Table 7. To better understand how the round-robin scheduling works, we take a closer look at the numbers for problem 15. Table 9 shows how the total number of generated plans is divided between rounds and flaw selection strategies. Note that although MW-Loc is actually the best strategy for this problem, it is stopped already in round 5. The total number of generated plans does not exactly match the actual number of generated plans reported in Table 8. This is because we only consider suspending the use of a flaw selection strategy after all refinements of the last selected plan have been added, so the limit in a round can be exceeded slightly in practice. The numbers in Table 9 represent an idealized situation where flaw selection strategies are switched when the number of generated plans exactly matches the limit for the current round.

Table: Number of generated plans in the STRIPS Satellite domain for the four different flaw selection strategies used by VHPOP at IPC3. The rightmost column is the number of plans generated by VHPOP before finding a solution when using the schedule in Table 7. A dagger ($ \dagger$) means that the planner ran out of memory (800Mb) after generating at least the indicated number of plans.
Problem MW-Loc MW-Loc-Conf LCFR-Loc LCFR-Loc-Conf All
1 118 118 118 118 118
2 229 229 249 249 229
3 172 172 172 172 172
4 738 843 822 1797 738
5 448 723000$ \dagger$ 1018 706000$ \dagger$ 448
6 636000$ \dagger$ 629000$ \dagger$ 720 834 2727
7 571 745 620 688000$ \dagger$ 571
8 482000$ \dagger$ 874 1017 783 1874
10 1245 1178 1323 1275 4283
11 1172 1172 1172 1172 4172
12 3517 3733 525000$ \dagger$ 525000$ \dagger$ 9542
13 6241 382000$ \dagger$ 559000$ \dagger$ 544000$ \dagger$ 18265
14 2352 2352 2157 2157 8365
15 74738 444000$ \dagger$ 107375 465000$ \dagger$ 281387
16 533000$ \dagger$ 529000$ \dagger$ 3442 3571 13471
17 2975 2975 3438 3438 8981
18 1584 1584 1724 1724 4588

Table 9: A closer look at the round-robin scheduling for problem 15 in the STRIPS Satellite domain. Italic entries indicate that the search limit for a flaw selection strategy was reached in the round.
Round MW-Loc MW-Loc-Conf LCFR-Loc LCFR-Loc-Conf Total
1 1000 1000 1000 1000 4000
2 1000 1000 1000 1000 4000
3 2000 2000 2000 2000 8000
4 4000 4000 4000 4000 16000
5 2000 8000 8000 8000 26000
6 - 16000 16000 16000 48000
7 - 32000 32000 32000 96000
8 - 36000 43375   79375
Total 10000 100000 107375 64000 281375

VHPOP solved 122 problems out of 224 attempted at IPC3. The quality of the plans, in terms of number of steps, generated by VHPOP was generally very high. For plain STRIPS domains, VHPOP's plans were typically within 10 percent of the best plans found by any planner in the competition, with 28 of VHPOP's 68 STRIPS plans being at least as short as the best plans found and, being a POCL planner, VHPOP automatically exploits parallelism in planning domains, generating plans for STRIPS domains with low total plan execution time (Table 10). Table 11 shows that VHPOP also performed well in terms of number of solved problems in four of the six STRIPS domains, being competitive with top performers such as MIPS and LPG (particularly in the Rovers domain).

In domains with durative actions5, total execution time was given as an explicit plan metric, and the objective was to minimize this metric. The specification of an explicit plan metric is a feature of PDDL2.1 not present in earlier versions of PDDL. As VHPOP currently ignores this objective function and always tries to find plans with few steps, it should come as no surprise that the quality of VHPOP's plans for domains with durative actions was significantly worse than the quality of the best plans found (Table 12).6 VHPOP still produced plans with few steps, however, with over 60 percent of VHPOP's plans for domains with durative actions having the fewest steps. The plan selection heuristic that VHPOP uses is tuned for finding plans with few steps, and it would need to be modified in order to find plans with shorter total execution time. Table 13 shows that LPG solved by far the most problems in domains with durative actions, but that VHPOP was competitive with MIPS and clearly outperformed TP4 and TPSYS.

Table 10: Relative plan quality for the STRIPS domains where VHPOP solved more than half of the problems. There are two plan quality metrics. Number of steps is simply the total number of steps in a plan, while execution time is the total time required to execute a plan (counting parallel actions as one time step). The table shows the average ratio of VHPOP's plan quality and the quality of the best plan generated by any planner, and the number of problems in each domain where VHPOP found the best plan is also shown.
Domain # Solved # Steps # Best Execution Time # Best
DriverLog 14 1.09 5 1.15 4
ZenoTravel 13 1.04 7 1.20 5
Satellite 17 1.07 7 1.25 5
Rovers 20 1.08 7 1.08 13

Table 11: Number of problems solved by top performing fully automated planners in STRIPS domains.
Planner Depots DriverLog ZenoTravel Satellite Rovers FreeCell Total
FF 22 15 20 20 20 20 117
LPG 21 18 20 20 12 18 109
MIPS 10 15 16 14 12 19 86
SIMPLANNER 22 11 20 17 9 12 91
STELLA 4 10 18 14 4 0 50
VHPOP 3 14 13 17 20 1 68

Table 12: Same information as in Table 10, but for domains with durative actions.
Domain # Solved # Steps # Best Execution Time # Best
DriverLog 14 1.04 8 1.50 0
ZenoTravel 13 1.04 10 1.54 0
Satellite 17 1.04 9 2.51 0
Rovers 7 1.04 5 1.39 0

Table 13: Number of problems solved by fully automated planners in domains with durative actions.
Planner Depots DriverLog ZenoTravel Satellite Rovers Total
LPG 20 20 20 20 12 92
MIPS 11 15 14 9 9 58
TP4 1 2 5 3 4 15
TPSYS 0 2 2 2 4 10
VHPOP 3 14 13 17 7 54

While VHPOP was a top performer at IPC3 in terms of plan quality, it was far from the top in terms of planning time. VHPOP was typically orders of magnitude slower than the fastest planner. The high planning times for VHPOP can in part be attributed to implementation details. Improvements to the code (e.g. using pointer comparison instead of string comparison whenever possible) since the planning competition has resulted in 10 to 20 percent lower planning times when using ground actions and when using lifted actions the planner is more than twice as fast as before. The reachability analysis is still a bottleneck, however, and further improvements could definitely be made there. It is important to remember, though, that we basically run four planners at once by using four flaw selection strategies concurrently. Table 14 shows the average relative performance of VHPOP at IPC3 compared to the performance of VHPOP using only the best flaw selection strategy for each problem. VHPOP with the best flaw selection strategy is on average two to three times faster than VHPOP with four concurrent strategies. Using several flaw selection strategies simultaneously helps us solve more problems, but the price is reduced speed. By more intelligently scheduling the different flaw selection strategies depending on domain and problem features, and not just using a fixed schedule for all problems, we could potentially increase planner efficiency significantly.

Table 14: Each number in the table represents the average ratio of the planning time for VHPOP using all four flaw selection strategies concurrently and the planning time for VHPOP with only the fastest flaw selection strategy.
Domain STRIPS Durative
DriverLog 2.52 2.66
ZenoTravel 2.76 2.86
Satellite 1.78 2.01
Rovers 2.32 3.37

Håkan L. S. Younes