Background: Selective Macro-Learning

Let *S* be a finite set of *states*. Let *O* be a finite
set of *operators*
where each operator
is a function
.
If
,
we say that *o*(*s*) is
undefined.
A *problem* is a pair of states
,
where
is called the *initial state* and
is called the *goal state*.^{2} A *solution* is a sequence
of operators
,
such
that
.
A problem
is *solvable*
if there exists a solution for *p* (specified
by
*solvable*(*s*_{i},*s*_{g})).

A *macro-operator* is a sequence of operators
.
A macro-operator *m* is applied to a state
by applying
its basic
operators in a sequence. If while applying the sequence of operators we encounter
an undefined application, the application of the macro is undefined.

A *problem solver*
is an algorithm that takes a problem
*p*
and a set of operators *O* and applies operators to states, searching
for a solution
to *p*.
Given a solvable problem *p*, a problem solver
and a set of operators
*O*, we define
,
the *cost* of solving *p* using
and
*O*, as the number of operator applications performed by
until a solution is found. Note that SearchCost is different from
the solution cost--the cost of the solution path according to some
cost function. In this work we are only interested in *satisficing* search--minimizing the
cost of the search regardless of the cost of the resulting solution.

The *utility* of a set of macros *M*, with respect to a problem
*p*,
a problem solver
and a set of operators *O*, is defined as

Thus, the utility of a set of macro operators with respect to a given problem and a given problem solver is the time saved by using these operators. When the problems are drawn from some fixed distribution

In general, the utility of knowledge depends on the criteria used for evaluating the performance of the problem solver [22]. Equation 2 assumes a

Most of the macro-learning systems perform
*learning by experimentation* [25]. The
program
solves training problems and acquires sequences of operators applied
during the search. Given a set of operators *O*,
a distribution of problems *D*,
and a problem solver ,
the goal of a macro-learning
system is to acquire a set
of macro-operators *M*, such that
is positive,
and as high as possible. It is quite possible, however, that
the utility of *M* will be negative, as the added macros
also increase the branching factor.

Markovitch and Scott [22] introduced the
*information
filtering* framework, which offers a systematic way of dealing with the
utility
problem by being more selective.
The framework identifies five logical types of selection
processes in learning systems: selective experience, selective
attention, selective acquisition, selective retention and selective
utilization. The framework views learning programs as information
processing systems where information flows from the experience space
through some attention mechanism, through an acquisition procedure to
the knowledge base and finally to the problem solver. The
five filters are defined with respect to their logical location within
the information flow.

The general architecture offered by the information-filtering framework, instantiated to off-line macro-learning, is shown in Figure 1. Markovitch and Scott argue that the art of building successful learning systems often lies in selecting the right combination of filters. A careful examination of existing macro-learning systems reveals that those containing sophisticated filtering mechanisms performed the most successful learning.