Windowing can be viewed as special case of a wider variety of
optimization techniques that try to reduce the complexity of a
learning problem by identifying appropriate low-complexity versions of
the problem. The complexity of a learning problem mostly depends on
the number of training examples and the size of the searched
hypothesis space.^{17} Figure 14 shows an
abstraction of the windowing algorithm. It starts by initializing the
learning problem with a reduced learning problem (e.g., with a
subsample of the examples), then applies the learning algorithm to
this reduced problem and analyzes the resulting theory with respect to
the original problem. Unless some stopping criterion specifies that
the quality of learned theory is already sufficient (e.g., if no
exceptions could be found on the complete data set), the reduced
learning problem will be expanded to incorporate more information
(e.g., by adding all misclassified examples) and a new theory is
induced.

Note that this abstract framework also describes other approaches for
reducing the complexity of a learning problem, such as hypothesis
space reduction. As an example think of an algorithm that attempts to
learn a theory in a simple hypothesis space first and only switches to
more complex hypothesis spaces if the result in the simple space in
unsatisfactory. For example, many *Inductive Logic Programming*
systems provide some explicit control over the complexity of their
hypothesis space, which might be controlled with an instantiation of
the generalized windowing algorithm. Such an approach has in fact been
realized in CLINT [17], which has a predefined set of
hypothesis spaces with increasing complexity, and is able to switch to
a more expressive hypothesis language if it is not able to find a
satisfactory theory in its current search space. Similar approaches
could also be imagined for other ILP algorithms. For example, in
FOIL [52], one could systematically vary certain parameters
that influence the complexity of the hypothesis space, like the number
of new variables that can be introduced in the body of a clause or the
maximum length of a clause, in order to define increasingly complex
hypothesis spaces. This procedure could be automatized in a way
similar to the one described by [38]. The
crucial point is how to efficiently evaluate that no progress can be
made by shifting to a more complex hypothesis space. In the
propositional case, *forward selection* approaches to feature
subset selection, i.e., algorithms that select the best subset of
features by adding one feature at a time to an initially empty set of
features, can be viewed in this framework
[8,39].

All of the above-mentioned approaches may be viewed as a particular
type of *bias shift* operator
[58,18] focussing on
shifts to computationally more expensive inductive biases.
[57] investigates this in more detail, but
suggests that -- in order to maximize accuracy -- one should start
with a weak bias and gradually shift to stronger biases. Our results
suggest the opposite strategy if efficiency is the main concern. This
is consistent with results in comparing forward and backward feature
subset selection [39].

We believe that thinking in this framework may lead to more general
approaches for reducing the complexity of a learning problem, which
aim at reducing both hypothesis and example space at the same time. As
an example consider the *peepholing* technique introduced by
[10], where sub-sampling is used to reliably
eliminate unpromising candidate conditions from the hypothesis space.