Next: Learning Macros in Other Up: Experimenting with MICRO-HILLARY Previous: The Effect of the

## The Effect of the Heuristic Function on Learning

The MICRO-HILLARY algorithm is based upon the availability of a generally good'' heuristic function to start with. The heuristic function should be able to lead the search program towards the goal, except in a small number of local minima where an exhaustive search and learning take place. The next experiment tests the effect of the heuristic function on the performance of MICRO-HILLARY. We have conducted 100 learning sessions and testing sessions with each of the heuristic functions described in Section 4.1.2. Tables 8, 9 and 10 show the results obtained.

Table 8: The learning resources consumed while learning in the 15-puzzle domain using various heuristic functions. Each number represents the mean over 100 learning sessions.
 Operator applications CPU seconds Problems Heuristic Mean Std. Mean Std. Mean Std. RR 498,172 74,047 735 218 60 4.2 Row-by-row-2 4,391,468 1,805,776 2191 1,181 247 62.9 Reduction 552,463 128,769 878 334 63 9.1 Spiral 891,728 144,633 1,621 47 74 15.3 MD 228,487,090 44,106,048 98,659 2,341 1,917 210.3

Table 9: The statistics of the macro sets acquired with various heuristic functions. Each number represents the mean over 100 databases.
 Total number Mean Length Max Length Heuristic Mean Std. Mean Std. Mean Std. RR 14.16 0.74 8.38 0.20 18.0 0.0 Row-by-row-2 73.44 3.65 5.60 0.14 27.7 2.1 Reduction 16.59 0.76 6.08 0.08 13.0 0.0 Spiral 24.10 1.04 6.43 0.09 13.0 0.0 MD 730.00 40.09 16.37 0.23 48.2 1.0

Table 10: The utility of the learned macros with various heuristic functions. The resources required for solving the testing set compared with various non-learning problem solvers. The results are first averaged over the 100 problems in the test set and then averaged again over the 100 macro sets.
 Operator Expanded CPU Solution applications nodes seconds length Heuristic Mean Std. Mean Std. Mean Std. Mean Std. RR 688 21 47.4 1.3 0.30 0.01 149.5 0.19 Row-by-row-2 2,827 21 57.24 144.8 1.33 3.60 148.2 0.19 Reduction 738 19 42.67 0.4 0.29 0.01 125.5 1.60 Spiral 944 34 44.76 0.5 0.37 0.02 130.1 5.27 MD 11,520 1009 175.00 83.6 2.40 0.01 142.0 2.04

The row-by-row-2 function is a crippled version of the RR heuristic. It is therefore not surprising that the performance degraded by all measures when using it. It is more surprising to note that the performance also degraded with the powerful MD heuristic, which, according to Table 2, is much better than RR even for satisficing search. The lower performance is expressed in two main differences:
.
The learning resources with the MD heuristic are 460 times higher than with the RR heuristic.
.
The utility of the learned macro sets is lower--the number of operator applications is 16 times higher with the MD heuristic.

To understand the reasons for these differences we can look at the statistics of the acquired macros (Table 9). The total number of macros acquired with MD is 50 times larger than the number of macros acquired with RR. A larger number of macros implies a larger branching factor, which explains the lower efficiency of the problem solver using the MD macros. The high learning resources can be explained by three factors:

.
The problem solver is used during training. Therefore, lower efficiency of the problem solver implies slower learning.
.
The larger number of macros indicates a larger number of training problems before quiescence. Indeed, with the MD heuristic, MICRO-HILLARY solved 30 times more problems during training.
.
The mean length of the macros acquired with the MD heuristic is twice as large as the mean length of macros acquired with the RR heuristic. Even more significant is the difference in the maximal length of a macro. With the MD heuristic MICRO-HILLARY acquires macros that are 48 operators long, while the maximal length of a macro using the RR heuristic is only 18. This indicates that the maximal radius of the MD heuristic is much larger than that of the RR heuristic. Larger radius implies more resources invested in escaping from local minima.

The above explanation is only partially satisfactory, since it does not tell us the source of the problem. To understand why the MD heuristic has a large radius, look at the following puzzle:

The sum-of-Manhattan distances for this puzzle is only 4, yet a very long sequence of moves is required to reduce the distance. The larger radius implies longer (and less general) macros and more resources for the escape processes.

Table 11: Examples for macros acquired with the RR and the MD heuristics. Each example shows the state where the macro was acquired, the macro itself, and the state after the application of the acquired macro.
 MD RR Start Macro End Start Macro End r r u u u d l l u r r d l d r u u l d r d l u r r d l u l d u l u r d l l l u r r r d d l l u r u l l r d d r r l r u d l l l u r u l d d u l d l l u r d r u l l d r r u r d

It is more difficult to explain the larger number of macros. A larger number of macros means that during training MICRO-HILLARY encounters many states where none of the previously learned macros is useful, and new macros are acquired. This implies that the likelihood of a macro acquired with MD to be useful in a state is much lower than that of a macro acquired with RR. One reason for this difference is the longer macros acquired with MD. As we stated before, longer macros tend to be less general. There are, however, other reasons which are best understood by examples. Table 11 shows three macros acquired with each of the two heuristics, along with the states before and after the macro acquisition.

The MD macro at the first line, rruuu, is applicable only in states where the empty tile is in the last row at one of the two leftmost locations. Five tiles are moved and the Manhattan distance of each is either increased or decreased by one. The macro is useful only in states where the distance of at least three of the tiles is decreased. Compare this to the equally long RR macro dllur. It is applicable in states where the empty tile is in one of 6 locations (last two columns and last three rows), and is therefore much more likely to be applicable than the MD macro. It is useful in states where the blank is to the right of the next tile to be moved and the direction of the movement is to the left. Since the RR heuristic imposes a particular order on the placement of tiles, any attempt to solve a problem will get to a state where, for example, the first 9 tiles are in place and tile 10 should be placed next. At a certain point tile 10 will be at distance 1 from its target location, either below the target location or to the right of it. In all the states where the tile is to the right of its target location, and the blank tile is to its right, the macro dllur will be useful. Thus, this macro has high likelihood of being useful.

The MD macro in the second line is, again, useful for a very particular situation. The RR macro, however, is useful for at least half of the problems. While solving a problem with RR there must be a situation where the first 3 tiles are in their target location and the fourth tile is at distance one from its target. If the blank is to its left, then the macro is useful. The MD macro in the last row of the table is the most extreme case of low usefulness. It is useful only if the tiles are placed in a particular order, with the tiles in the first, third and fourth row placed before the tiles in the second row. Contrast this macro with the longest RR macro shown to its right. This macro is useful for most problems since there will always be a situation where the first three rows are ordered and tile 13 should be moved towards its goal.

To conclude, the difference between RR and MD that makes the RR macros more useful is the restriction that RR puts on the order of solving subproblems. This strict order makes the problem solver go through similar states regardless of the initial state. The freedom given by MD allows the problem solver pick any of the N2! possible orders for tile placement. Furthermore, it allows macros that move tiles away from their target location, provided that other tiles are moved towards their target. Such macros are useful only in very particular states.

The particular order used by RR reduces the maximal radius by making it possible to place tiles without the need to (even temporarily) displace too many already-placed tiles. The other two heuristics, Reduction and Spiral, also have this property. Consider, however, a heuristic similar to RR where the tiles are placed in order from tile 2 to tile 15 after which tile 1 is placed. Such a placement order increases the maximal radius significantly. In Section 6 we discuss the differences between the two heuristics when used for the NxN puzzle domain.

Next: Learning Macros in Other Up: Experimenting with MICRO-HILLARY Previous: The Effect of the
Shaul Markovitch
1998-07-21