We implemented the algorithm described in the last section in the same LISP environment as the noisefree algorithms. The noisetolerant learning algorithm used was IRIP, a separateandconquer learner halfway between IREP [32,30] and RIPPER [14]. Like IREP, it learns single rules by greedily adding one condition at a time (using FOIL's information gain heuristic [52]) until the rule no longer makes incorrect predictions on the growing set, a randomly chosen set of 2/3 of the training examples. Thereafter, the learned rule is simplified by greedily deleting conditions as long as the performance of the rule does not decrease on the remaining set of examples (the pruning set). All examples covered by the resulting rule are then removed from the training set and a new rule is learned in the same way until all positive examples are covered by at least one rule or the stopping criterion fires. Like RIPPER, IRIP evaluates the quality of a rule on the pruning set by computing the heuristic (where p and n are the covered positive and negative examples respectively) and stops adding rules to the theory whenever the fraction of positive examples that are covered by the best rule does not exceed 0.5. These choices have been shown to outperform IREP's original choices [14]. IRIP is quite similar to IREP*, which is also described by [14], but it retains IREP's method of considering all conditions for pruning (instead of considering a final sequence of conditions).
Around IRIP, we wrapped the windowing procedure of Figure 9. The resulting algorithm is referred to as WINRIPNT.^{15} Again, the implementations of both algorithms remove semantically redundant rules in a postprocessing phase as described in section 4.2.
We studied the behavior of the algorithms in the KRK illegality domain with varying levels of artificial noise. A noise level of n% means that in about n% of the examples the class label has been replaced with a random class label. In each of the experiments described in this section, we report the average results on 10 different subsets in terms of runtime of the algorithm and measured accuracy on a separate noisefree test set. We did not evaluate the algorithms according to the total number of examples processed by the basic learning algorithm, because the way we compute the completeness check (doubling the size of the window if no rules can be learned from the current window, see section 5.2.2) makes these results less useful. All algorithms were run on identical data sets, but some random variation results from the fact that IRIP uses internal random splits of the training data. All experiments shown below were conducted with the same parameter setting of InitSize = 100and MaxIncSize = 50 which performed well in the noisefree case. We have not yet made an attempt to evaluate their appropriateness for noisy domains.
First, we aimed at making sure that the noisefree setting ( ) of the WINRIPNT algorithm still performs reasonably well, so that the noisetolerant generalization of the INTEGRATIVEWINDOWING algorithm does not lose its efficiency in noisefree domains. Figure 10 shows the performance of IRIP, WINRIP3.1 (basic windowing using IRIP in its loop), WINRIP95 (integrative windowing using IRIP), and WINRIPNT0.0 (the noisetolerant version of integrative windowing with a setting of ). The graphs resemble very much the graphs of Figure 5 for the KRK domain, except that IRIP overgeneralizes occasionally, so that the accuracy never reaches exactly 100%. WINRIPNT0.0 performs a little worse than WINRIP95, but it still is better than regular windowing. A similar graph for the Mushroom domain is shown in previous work [29]. So we can conclude that the changes in the algorithm did some harm, but the algorithm still performs reasonably well in noisefree domains.

Figure 11 shows the performance of three of these algorithms (we did not test WINRIP95) for datasets with 5% artificial noise. Here, IRIP is clearly the fastest algorithm, a windowed version takes about twice as long. Although this is only a moderate level of noise, integrative windowing with WINRIPNT0.0 is already unacceptably slow because it successively adds all examples into the window as we have discussed in section 5.2.1.
The interesting question, of course, is what happens if we increase the parameter, which supposedly should be able to deal with the noise in the data? To this end, we have performed a series of experiments with varying noise and varying values for the parameters. Our basic finding is that the algorithm is very sensitive to the choice of the parameter. The first thing to notice is that there is an inverse relationship between the value and the runtime of the algorithm. This is not surprising because higher values of will make it easier for rules to be added to the final theory. Thus, the algorithm is likely to terminate earlier. On the other hand, we also notice a correlation between predictive accuracy and the value of . The reason for this is that lower values impose more stringent restrictions on the quality of the accepted rules. Again, this can be explained with the fact that higher values are likely to admit less accurate rules (see section 5.2.1).
Therefore, what we hope to find, is an ``optimal'' range for the parameter for which WINRIPNT outperforms IRIP in terms of runtime while maintaining about the same performance in terms of accuracy. For example, Figure 12 shows the results of IRIP and WINRIPNT with a setting of in the KRK domain with 20% noise added. Both perform about the same in terms of accuracy, but there is some gain in terms of runtime. Although it is not as remarkable as in noisefree domains, there is a clear difference in the growth function: IRIP's runtime grows faster with increasing sample set sizes, while WINRIPNT's growth function appears to be sublinear. Unfortunately, the performance of the algorithm seems to be quite sensitive to the choice of , so that we cannot give a general recommendation for a setting for .^{16} Figure 13 shows the graph for three different settings of the parameter for varying levels of noise. In terms of runtime, it is obvious that higher values of produce lower runtimes. In terms of accuracy, seems to be a consistently good choice. However, for example, at a noise level of 30% the version with is clearly preferable when considering the performance in both dimensions. With further increasing noise levels, all algorithms become prohibitively expensive, but are able to achieve surprisingly good results in terms of accuracy. From this and similar experiments, we conclude that there seems to be some correlation between an optimal choice of the parameter and the noiselevel in the data.

We have also tested WINRIPNT on a discretized, 2class version of Quinlan's 9172 example thyroid diseases database, where we achieved improvements in terms of both, runtime and accuracy with the windowing algorithms. These experiments are discussed in previous work [29]. However, later experiments showed that this domain exhibited the same kind of behavior as the Shuttle domain, shown in Figure 7, i.e., most versions of WINRIPNT consistently outperformed IRIP in terms of accuracy when learning on the minority class, while this picture reversed when learning on the majority class. Runtime gains could be observed in both cases. However, these results can only be regarded as preliminary. In order to answer the question whether our results in the KRK domain with artificial class noise are predictive for the algorithm's performance on realworld problems, a more elaborate study must be performed.
In summary, we have discussed why noisy data appear to be a problem for windowing algorithms and have argued that three basic problems have to be addressed in order to adapt windowing to noisy domains. We have also shown that, with three particular choices for these procedures, it is in fact possible to achieve runtime gains without sacrificing accuracy. However, we do not claim that these three particular choices are optimal and are convinced that better choices are possible and should lead to a more robust learning system.