Experimenting with PARAMETRIC MICRO-HILLARY in Other Scalable Domains

We tested PARAMETRIC MICRO-HILLARY in other parameterized domains. In the N-Cannibals and N-Stones domains, PARAMETRIC MICRO-HILLARY learned all the macros with the minimal value of the parameter (3). The test was performed using problems with a parameter of 20 in both domains. MICRO-HILLARY's performance indeed improved in both domains and problem solving proceeded without encountering local minima.

The N-Hanoi domain family is recursive in nature and we did not expect MICRO-HILLARY to find a complete macro set for these domains. The length of the macros should grow with the number of rings; therefore, MICRO-HILLARY should not reach quiescence in these domains. We were surprised to find that PARAMETRIC MICRO-HILLARY achieved quiescence after solving problems of 7 or 8 rings (7.13 on average). This error was caused by the domain-independent training-problems generator. The probability that the largest ring will be moved from its target location after a random sequence of moves is very low. Indeed, when we increased the length of the sequences used for generating training problems and increased the quiescence parameter, learning continued, but MICRO-HILLARY still reached quiescence after solving problems with 9 rings. In both cases, the macros learned were not sufficient for solving problems with a parameter that is larger than the values encountered during training. The test was performed with problems of 6 rings. For solving domain families such as N-Hanoi, we should extend MICRO-HILLARY and endow it with the capability of generating recursive macros. To avoid the problem of PARAMETRIC MICRO-HILLARY quitting prematurely, we can modify the problem generator of MICRO-HILLARY to use random sequences with lengths that are based on the domain parameter. Alternatively, we can use domain-specific problem generators. The results of this experiment are summarized in Tables 16, 17 and 18.

Table 16: Summary of the learning resources consumed in various domains. Each number represents the mean over 100 learning sessions.

	Operator applications		CPU seconds		Problems
Domain	Mean	Std.	Mean	Std.	Mean	Std.
N-Cannibals	331,183	59723	20.84	3.8	112	8.5
N-Stones	277,852	3384	11.53	2.7	1030	0.6
N-Hanoi	2,427,186	508,093	579.00	176.0	370	43.7

Table 17: Statistics of the macro sets generated in various domains. Each number represents the mean over 100 sets.

	Total number		Mean Length		Max Length
Domain	Mean	Std.	Mean	Std.	Mean	Std.
N-Cannibals	4.40	0.5	2.4	0.05	4.0	0.0
N-Stones	1.22	0.4	2.0	0.00	2.0	0.0
N-Hanoi	16.08	0.9	12.5	1.24	33.2	6.6

Table 18: The performance of MICRO-HILLARY before and after learning in various domains.

	Before learning		After learning
Domain	Mean	Std.	Mean	Std.
N-Cannibals	150	85	105	10
N-Stones	3,671	605	2,009	155
N-Hanoi	171,956	168,338	2,913	16,530

Another related experiment involved transfer of knowledge between two similar domains (but not parameterized as the domains above). We generated a random grid, different from the one used for the experiments above, and performed a testing session with MICRO-HILLARY, using macros that were learned in the first grid. Using the macros improved MICRO-HILLARY's performance--from 1529 operator applications without macros down to 594 with macros. This ability of transferring skill from one grid to another arises from the similar shape of obstacles. A macro such as SSSSSWNNNNN can be helpful in getting around walls of various sizes in the original grid used for learning and in the grid used for testing.