Noise Handling

**Figure 4:** Plots of the errors (up) and sizes (down) of WIDC(p), C4.5 and Bayes rule against various class noise levels (left) and attribute noise levels (right).
$\begin{figure}\begin{center} \begin{tabular}{c\vert c} \epsfig{file=classnoise-... ...e=attrnoise-size.eps,width=6cm,height=6cm}\end{tabular}\end{center}\end{figure}$

Noise handling is a crucial issue for boosting [Bauer Kohavi1999,Opitz Maclin1999], even considered [Bauer Kohavi1999] as its potential main problem. Experimental studies show that substantial noise levels can alter the vote to the point that its accuracy is lower than that of a single of its classifier [Opitz Maclin1999]. Opitz $\&$ Maclin (1999) point out the reweighting scheme of the examples in boosting as being a potential reason for this behavior. Though we do not use any reweighting scheme, we have chosen for the sake of completeness to address the behavior of WIDC(p) against noise, and compare its results with perhaps the major induction algorithm with which we share the ``top-down and prune'' induction scheme: C4.5 [Quinlan1994]. This study relies on the XD6 domain, in which we replace the original 10 $\%$ class noise [Buntine Niblett1992] by various increasing amounts of class noise ranging from $0\%$ to $40 \%$ by steps of $2\%$ , or various increasing amounts of attribute noise in the same range. The XD6 domain has the advantage that the target concept is known, and it has been addressed in a substantial amount of previous experimental works. We have simulated corresponding datasets of 512 examples each, for each noise level. Each such dataset was processed by WIDC(p) and C4.5, using a 10-fold-cross-validation procedure. Figure 4 depicts the results obtained for the errors and for the sizes of the classifiers. The size of a DC is its whole number of literals, and that of a DT is its number of internal nodes.
While the resistance against noise seems to be relatively well distributed among WIDC(p) and C4.5 (WIDC(p) seems to perform better for class noise, while C4.5 seems to perform better for attribute noise), a phenomenon more interesting comes from the sizes of the formulas induced. First, the DCs have very small size fluctuations compared to the DTs : for class noises greater than $20\%$ , the DTs have size increasing by a factor of 1.5-2. Second, note that the ratio between the number of nodes of the target DT, and the number of literals of the target DC is 3. For a majority of class or attribute noise levels, the ratio between the DTs build and the DCs built is

, with a pathologic case for $10\%$ attribute noise, for which the ratio is

. These remarks, along with the fact that the DCs built have a very reasonable size when compared to that of the target DC for any type and level of noise, tend to show a good noise handling for WIDC(p). Apart from these considerations, glimpses at the DCs output by WIDC(p) show that even for large noise levels, it manages to find concepts syntactically close to the target DC. For example, one of the DCs output at $30\%$ class noise is exactly the target DC; also, it is only for class noise $\geq 12\%$ (and attribute noise $\geq 16\%$ ) that some DCs found do not syntactically include the target DC anymore.