** Next:** Experimental Setup
** Up:** Integrative Windowing
** Previous:** The Algorithm

##

Implementation

To test this hypothesis, we implemented several algorithms in Common
LISP, building on Ray Mooney's publicly available Machine Learning
library. The implemented algorithms are DOS, a basic
separate-and-conquer rule learning algorithm using information gain as
a search heuristic and no stopping criterion for noise handling,
WIN-DOS-3.1, an implementation of windowing as shown in
Figure 1 that wraps a windowing procedure around
DOS, and WIN-DOS-95, an algorithm that integrates windowing into
DOS as shown in Figure 3. All algorithms are limited to
2-class problems, i.e., they learn a theory that discriminates
positive from negative examples by classifying all examples that are
covered by the rules as positive, and all examples that are not
covered by the rules as negative. Note, however, that this is not a
principle limitation of the approach, as there are several ways for
solving multi-class problems with binary rule learning algorithms of
the type discussed in this paper
[12,1,19].

In preliminary experiments it turned out that one problem that happens
more frequently in integrative windowing than in regular windowing or
basic separate-and-conquer learning is that of over-specialized
rules. Often a consistent rule is found at a low example size, but
other rules are found later that cover all of the examples this
special rule covers. Note that this problem cannot be removed with a
syntactic generality test: consider, for example, the case where a
rule stating that a KRK position is illegal if the two kings are on
the same square is learned from a small set of the data, and a more
general rule is discovered later, which states that all positions are
illegal in which the two kings occupy adjacent squares. Sometimes the
examples of the special case can also be covered by more than one of
the other rules.

We have implemented a heuristic procedure for removing such redundant
rules: After the final theory has been learned by either of the
algorithms, each of its rules is tested on the complete training set
and the rules are ordered according to the number of examples they
cover. Starting with the rule with the least coverage, each rule is
tested whether the examples it covers are also covered by the
remaining rules. If so, the rule is removed. This procedure can be
implemented quite efficiently and will only be performed once at the
end of each of the three algorithms.

** Next:** Experimental Setup
** Up:** Integrative Windowing
** Previous:** The Algorithm