next up previous
Next: Improving Case Storage and Up: An Empirical Evaluation of Previous: Experimental Setup

Experimental Results

The results of the experiments are shown in Tables 1 and 2.
Table 1: Performance statistics in $ \theta _2 D^m S^1$ and Logistics Transportation Domain (Average solution length is shown in parentheses next to %Solved for the logistics domain only)
    $ \theta _2 D^m S^1$     Logistics  
Phase Learning Static Scratch Learning Static Scratch
(1) Two Goal            
%Solved 100% 100% 100% 100% (6.0) 100% (6.0) 100% (6.0)
nodes 90 240 300 1773 1773 2735
time(sec) 1 4 2 30 34 56
(2) Three Goal            
% Solved 100% 100% 100% 100% (8.2) 100% (8.2) 100% (8.2)
nodes 120 810 990 6924 13842 20677
time(sec) 2 15 8 146 290 402
(3) Four Goal            
% Solved 100% 100% 100% 100% (10.3) 100% (10.3) 100% (10.3)
nodes 150 2340 2533 290 38456 127237
time(sec) 3 41 21 32 916 2967

Each table entry represents cumulative results obtained from the sequence of 30 problems corresponding to one phase of the experiment. The first row of Table 1 shows the percentage of problems correctly solved within the time limit (550 seconds). The average solution length is shown in parentheses for the logistics domain (solution length was omitted in $ \theta _2 D^m S^1$ since all of the problems generated within a phase have the same solution length). The second and third rows of Table 1 contain respectively the total number of search nodes visited for all of the 30 test problems, and the total CPU time (including case retrieval time).

These results are also summarized in Figure 10.

Figure 10: Replay performance in the $ \theta _2 D^m S^1$and Logistics Transportation domain.
{cc\vert cc}
\subfigure[$ \theta ...

DERSNLP+EBL in learning mode was able to solve as many of the multi-goal problems as in the other two modes and did so in substantially less time. Case retrieval based on case failure resulted in performance improvements which increased with problem size. Comparable improvements were not found when retrieval was based on the static similarity metric alone. This should not be surprising since cases were retrieved that had experienced at least one earlier failure. This meant that testing was done on cases that had some likelihood of failing if retrieval was based on the static metric.

Table 2: Measures of effectiveness of replay.
  $ \theta _2 D^m S^1$   Logistics  
Phase Learning Static Learning Static
Two Goal        
% Seq 100% 0% 53% 53%
% Der 60% 0% 48% 48%
% Rep 100% 0% 85% 85%
Three Goal        
% Seq 100% 0% 80% 47%
% Der 70% 0% 63% 50%
% Rep 100% 0% 89% 72%
Four Goal        
% Seq 100% 0% 100% 70%
% Der 94% 0% 79% 62%
% Rep 100% 0% 100% 81%


Table 2 records three different measures which reflect the effectiveness of replay. The first is the percentage of sequenced replay. Recall that replay of a trace is considered here to be sequenced if the skeletal plan is further refined to reach a solution to the new problem. The results point to the greater efficiency of replay in learning mode. In the $ \theta _2 D^m S^1$ domain, replay was entirely sequenced in this mode. In the transportation domain, retrieval based on failure did not always result in sequenced replay, but did so more often than in static mode.

The greater effectiveness of replay in learning mode is also indicated by the two other measures contained in the subsequent two rows of Table 2. These are respectively, the percentage of plan refinements on the final derivation path that were formed through guidance from replay (% Der), and the percentage of the total number of plans created through replay that remain in the final derivation path (% Rep). The case-based planner in learning mode showed as much or greater improvements according to these measures, demonstrating the relative effectiveness of guiding retrieval through a learning component based on replay failures. These results indicate that DERSNLP+EBL's integration of CBP and EBL is a promising approach when extra interacting goals hinder the success of replay.

In Section 4 we report on a more thorough evaluation of DERSNLP+EBL's learning component. This was conducted with the purpose of investigating if learning from case failure is of benefit for a planner solving random problems in a complex domain. For this evaluation we implemented the full case-based planning system along with novel case storage and adaptation strategies. In the next section, we describe the storage strategy that was developed for this evaluation.

next up previous
Next: Improving Case Storage and Up: An Empirical Evaluation of Previous: Experimental Setup