To test this hypothesis, we implemented several algorithms in Common LISP, building on Ray Mooney's publicly available Machine Learning library. The implemented algorithms are DOS, a basic separate-and-conquer rule learning algorithm using information gain as a search heuristic and no stopping criterion for noise handling, WIN-DOS-3.1, an implementation of windowing as shown in Figure 1 that wraps a windowing procedure around DOS, and WIN-DOS-95, an algorithm that integrates windowing into DOS as shown in Figure 3. All algorithms are limited to 2-class problems, i.e., they learn a theory that discriminates positive from negative examples by classifying all examples that are covered by the rules as positive, and all examples that are not covered by the rules as negative. Note, however, that this is not a principle limitation of the approach, as there are several ways for solving multi-class problems with binary rule learning algorithms of the type discussed in this paper [12,1,19].
In preliminary experiments it turned out that one problem that happens more frequently in integrative windowing than in regular windowing or basic separate-and-conquer learning is that of over-specialized rules. Often a consistent rule is found at a low example size, but other rules are found later that cover all of the examples this special rule covers. Note that this problem cannot be removed with a syntactic generality test: consider, for example, the case where a rule stating that a KRK position is illegal if the two kings are on the same square is learned from a small set of the data, and a more general rule is discovered later, which states that all positions are illegal in which the two kings occupy adjacent squares. Sometimes the examples of the special case can also be covered by more than one of the other rules.
We have implemented a heuristic procedure for removing such redundant rules: After the final theory has been learned by either of the algorithms, each of its rules is tested on the complete training set and the rules are ordered according to the number of examples they cover. Starting with the rule with the least coverage, each rule is tested whether the examples it covers are also covered by the remaining rules. If so, the rule is removed. This procedure can be implemented quite efficiently and will only be performed once at the end of each of the three algorithms.