next up previous
Next: Discussion Up: Solving the General NxN Previous: Experimenting with PARAMETRIC MICRO-HILLARY

Experimenting with PARAMETRIC MICRO-HILLARY in Other Scalable Domains

We tested PARAMETRIC MICRO-HILLARY in other parameterized domains. In the N-Cannibals and N-Stones domains, PARAMETRIC MICRO-HILLARY learned all the macros with the minimal value of the parameter (3). The test was performed using problems with a parameter of 20 in both domains. MICRO-HILLARY's performance indeed improved in both domains and problem solving proceeded without encountering local minima.

The N-Hanoi domain family is recursive in nature and we did not expect MICRO-HILLARY to find a complete macro set for these domains. The length of the macros should grow with the number of rings; therefore, MICRO-HILLARY should not reach quiescence in these domains. We were surprised to find that PARAMETRIC MICRO-HILLARY achieved quiescence after solving problems of 7 or 8 rings (7.13 on average). This error was caused by the domain-independent training-problems generator. The probability that the largest ring will be moved from its target location after a random sequence of moves is very low. Indeed, when we increased the length of the sequences used for generating training problems and increased the quiescence parameter, learning continued, but MICRO-HILLARY still reached quiescence after solving problems with 9 rings. In both cases, the macros learned were not sufficient for solving problems with a parameter that is larger than the values encountered during training. The test was performed with problems of 6 rings. For solving domain families such as N-Hanoi, we should extend MICRO-HILLARY and endow it with the capability of generating recursive macros. To avoid the problem of PARAMETRIC MICRO-HILLARY quitting prematurely, we can modify the problem generator of MICRO-HILLARY to use random sequences with lengths that are based on the domain parameter. Alternatively, we can use domain-specific problem generators. The results of this experiment are summarized in Tables 16, 17 and 18.

Table 16: Summary of the learning resources consumed in various domains. Each number represents the mean over 100 learning sessions.
  Operator applications CPU seconds Problems
Domain Mean Std. Mean Std. Mean Std.
N-Cannibals 331,183 59723 20.84 3.8 112 8.5
N-Stones 277,852 3384 11.53 2.7 1030 0.6
N-Hanoi 2,427,186 508,093 579.00 176.0 370 43.7

Table 17: Statistics of the macro sets generated in various domains. Each number represents the mean over 100 sets.
  Total number Mean Length Max Length
Domain Mean Std. Mean Std. Mean Std.
N-Cannibals 4.40 0.5 2.4 0.05 4.0 0.0
N-Stones 1.22 0.4 2.0 0.00 2.0 0.0
N-Hanoi 16.08 0.9 12.5 1.24 33.2 6.6

Table 18: The performance of MICRO-HILLARY before and after learning in various domains.
  Before learning After learning
Domain Mean Std. Mean Std.
N-Cannibals 150 85 105 10
N-Stones 3,671 605 2,009 155
N-Hanoi 171,956 168,338 2,913 16,530

Another related experiment involved transfer of knowledge between two similar domains (but not parameterized as the domains above). We generated a random grid, different from the one used for the experiments above, and performed a testing session with MICRO-HILLARY, using macros that were learned in the first grid. Using the macros improved MICRO-HILLARY's performance--from 1529 operator applications without macros down to 594 with macros. This ability of transferring skill from one grid to another arises from the similar shape of obstacles. A macro such as SSSSSWNNNNN can be helpful in getting around walls of various sizes in the original grid used for learning and in the grid used for testing.

next up previous
Next: Discussion Up: Solving the General NxN Previous: Experimenting with PARAMETRIC MICRO-HILLARY
Shaul Markovitch