** Next:** Towards Noise-Tolerant Windowing
** Up:** Windowing and Noise
** Previous:** Windowing and Noise

##

Noise is a Problem

An efficient adaptation of the basic windowing technique shown in
Figure 1 to noisy domains is a non-trivial
endeavor. In particular, it cannot be expected that the use of a
noise-tolerant learning algorithm inside the windowing loop will lead
to performance gains in noisy domains. In our opinion, the main
problem with windowing in a noisy domain lies in the fact that a good
theory will misclassify most of the noisy examples, and consequently
incorporate them into the learning window for the next iteration. On
the other hand, the window will typically only contain a subset of the
original learning examples. Hence, after a few iterations, the
proportion of noisy examples in the learning window can be much higher
than the noise level in the entire data set. Naturally, this makes the
task for the learning module considerably more difficult.

Assume, for example, that your favorite noise-tolerant learner has
learned a correct theory from a randomly selected starting window of
size 1000 in a 11,000 examples domain. Further assume that 10% of the
examples are labeled incorrectly. Therefore, the correct theory will
misclassify 1000 of the remaining 10,000 examples because they are
noisy. These examples will consequently be added to the window, thus
doubling its size. Assuming that the original window also contained
about 10% noise, more than half of the examples in the new window are
now erroneous, so that the classification of the examples in the
window is in fact random. It can be assumed that many more examples
have to be added to the window in order to recover the structure that
is inherent in the data. This conjecture is consistent with the
experimental results of [59] and
[9], which showed that windowing is highly sensitive
to noise.

** Next:** Towards Noise-Tolerant Windowing
** Up:** Windowing and Noise
** Previous:** Windowing and Noise