Complexity Reduction

Next: Future Work Up: Related Work Previous: Active Learning

Complexity Reduction

Windowing can be viewed as special case of a wider variety of optimization techniques that try to reduce the complexity of a learning problem by identifying appropriate low-complexity versions of the problem. The complexity of a learning problem mostly depends on the number of training examples and the size of the searched hypothesis space.¹⁷ Figure 14 shows an abstraction of the windowing algorithm. It starts by initializing the learning problem with a reduced learning problem (e.g., with a subsample of the examples), then applies the learning algorithm to this reduced problem and analyzes the resulting theory with respect to the original problem. Unless some stopping criterion specifies that the quality of learned theory is already sufficient (e.g., if no exceptions could be found on the complete data set), the reduced learning problem will be expanded to incorporate more information (e.g., by adding all misclassified examples) and a new theory is induced.

**Figure 14:** A general view of windowing.

Note that this abstract framework also describes other approaches for reducing the complexity of a learning problem, such as hypothesis space reduction. As an example think of an algorithm that attempts to learn a theory in a simple hypothesis space first and only switches to more complex hypothesis spaces if the result in the simple space in unsatisfactory. For example, many Inductive Logic Programming systems provide some explicit control over the complexity of their hypothesis space, which might be controlled with an instantiation of the generalized windowing algorithm. Such an approach has in fact been realized in CLINT [17], which has a predefined set of hypothesis spaces with increasing complexity, and is able to switch to a more expressive hypothesis language if it is not able to find a satisfactory theory in its current search space. Similar approaches could also be imagined for other ILP algorithms. For example, in FOIL [52], one could systematically vary certain parameters that influence the complexity of the hypothesis space, like the number of new variables that can be introduced in the body of a clause or the maximum length of a clause, in order to define increasingly complex hypothesis spaces. This procedure could be automatized in a way similar to the one described by [38]. The crucial point is how to efficiently evaluate that no progress can be made by shifting to a more complex hypothesis space. In the propositional case, forward selection approaches to feature subset selection, i.e., algorithms that select the best subset of features by adding one feature at a time to an initially empty set of features, can be viewed in this framework [8,39].

All of the above-mentioned approaches may be viewed as a particular type of bias shift operator [58,18] focussing on shifts to computationally more expensive inductive biases. [57] investigates this in more detail, but suggests that -- in order to maximize accuracy -- one should start with a weak bias and gradually shift to stronger biases. Our results suggest the opposite strategy if efficiency is the main concern. This is consistent with results in comparing forward and backward feature subset selection [39].

We believe that thinking in this framework may lead to more general approaches for reducing the complexity of a learning problem, which aim at reducing both hypothesis and example space at the same time. As an example consider the peepholing technique introduced by [10], where sub-sampling is used to reliably eliminate unpromising candidate conditions from the hypothesis space.

Next: Future Work Up: Related Work Previous: Active Learning