A straight-forward approach for attacking the completeness problem, i.e., the decision when to stop learning more rules, would be to stop learning whenever the learner can find no more rules from the current window. This approach, however, might miss some important rules that could be found if only more of the uncovered positive examples had been added to the window. So a different approach is to continue the windowing process until all remaining positive examples are in the learning window. This, on the other hand, may lead to many unnecessary iterations in the case when no more meaningful rules can be found.
We aimed at a compromise here. The strategy we employ is to double the window size in the case when the learner has not discovered any rules from the current window, which ensures a fast convergence for the case when no more rules can be found, but also tries to find the rules from lower window sizes first. Also note that this approach relies on a truly noise-tolerant learning algorithm. If, for example, the learning algorithm discovers rules even in randomly classified training data (i.e., pure noise), this criterion will never kick in. The windowing algorithm will then discover that these rules are insignificant and continue to add more examples to the window. This will go on until all examples are in the window, in which case the found rules will be retained.