Consistency-Check

The first choice we have to make is to find a criterion for determining which rules should be added to the final theory, and which rules should be further improved. A straight-forward idea is to use some sort of significance test in order to determine whether a rule that has been learned from the current window is significant on the entire set of training examples. We have experimented with a variety of criteria known from the literature, but found that they are insufficient for our purposes. For example, it turned out that, at higher training set sizes, CN2's likelihood ratio significance test [13], which accepts a rule when the distribution of positive and negative examples covered by the rule differs significantly from the class distribution in the entire set, will deem any rule learned from the window as significant.

Eventually, we have settled for the following criterion: For each rule
*r* learned from the current window we compute two accuracy estimates,
*AccWin*(*r*) which is determined using only examples from the current
window and *AccTot*(*r*) which is estimated on the entire training
set. The criterion we use for detecting good rules consists of two
parts:

- 1.
- The
*AccWin*(*r*) estimate has to be significantly above the default accuracy of the domain. This is ensured by requiring that

where

*DA*is the default accuracy, and is the standard error of classification [7]. - 2.
- The accuracy of the learned rule on the window has to be
within a certain interval of its overall
accuracy:
^{14}

is a user-settable parameter that can be used to adjust the width of this range.

The parameter
determines the degree to which the estimates
*AccWin*(*r*) and *AccTot*(*r*) have to correspond. A setting of
requires that
*AccWin*(*r*) = *AccTot*(*r*), which in general will only
be happen if *r* is consistent or has been learned from the entire
training set. This is the recommended setting in noise-free domains.
In noisy domains, values of
have to be used because the
rules returned from the learning algorithm will typically be
inconsistent on the training set. Note, however, that a setting of
in a noisy domain will *not* lead to over-fitting
and a decrease in predictive accuracy because over-fitting is caused
by too optimistic estimates of a rule's accuracy. The chance of
accepting such rules decreases with the value of .
Clearly, if
the chance of a rule being accepted decreases, the run-time of
algorithm increases because windowing has to perform more
iterations. The extreme case,
,
causes the window to grow
until it contains all examples. Then the rule is accepted by the
second criterion because the examples used for estimating *AccWin*(*r*)and *AccTot*(*r*) are basically the same. Typical settings in noisy
domains are
or
.
will
move all rules that have survived the first criterion into the final
rule set. In as much as the learner is sufficiently noise-tolerant and
the initial example size is sufficiently large, the algorithm
implements learning from a random subsample if
.