Towards Noise-Tolerant Windowing

Next: Consistency-Check Up: Windowing and Noise Previous: Noise is a Problem

Towards Noise-Tolerant Windowing

The integrative windowing algorithm described in section 4, which is only applicable to noise-free domains, is based on the observation that rule learning algorithms will re-discover good rules again and again in subsequent iterations of the windowing procedure. Such consistent rules do not add examples to the current window (hence they are unlikely to change), but they nevertheless have to be re-discovered in subsequent iterations. Integrative windowing detects these rules early on, saves them, and removes all examples they cover from the window, thus gaining computational efficiency.

In order to adapt this procedure for noisy domains, three parts of the algorithm have to be modified:

Consistency-Check:

When is a rule learned from the window good enough?

In the noise-free case, all rules that do not cover any negative examples are added to the final theory. For noisy data, a criterion has to be found that allows rules to cover some noisy negative examples.

Completeness-Check:

When should we stop adding rules to a theory?

In the noise-free case, rules are added to the theory until all positive examples are covered by at least one rule. For noisy data, we have to find a criterion that estimates whether the remaining uncovered positive examples can be considered as noise or whether another rule should be added to the theory that explains (some of) them.

Resampling:

Which examples should be added to the current window?

In the noise-free case, all misclassified examples are candidates for being added to the window. However, we have seen above that this can lead to severe problems with windowing, because noisy examples will be misclassified by a correct theory.

We have built a prototype system that deals with these three questions in the way described in the following sections.

Next: Consistency-Check Up: Windowing and Noise Previous: Noise is a Problem