Several authors have discussed approaches that use sub-sampling
algorithms different from windowing. For decision tree algorithms it
has been proposed to use dynamic sub-sampling at each node in order to
determine the optimal test. This idea has been originally proposed,
but not evaluated by [7]. It was further explored in Catlett's
work on *peepholing* [11], which is a sophisticated
procedure for using sub-sampling to eliminate unpromising attributes
and thresholds from consideration.

[35] discuss a different approach that successively expands the learning window. ELC adds randomly selected examples to the window until an extrapolation of the learning curve (accuracy over window size) does no longer promise significant gains for further enlargements of the window. In terms of run-time, the authors note that this technique can in general only gain efficiency for incremental learning algorithms.

*Partitioning*, proposed by
[21,20], splits the example
space into segments of equal size and combines the rules learned on
each partition. This technique produced promising
results in noisy domains, but has substantially decreased learning
accuracy in non-noisy domains. Besides, the technique seems to be
tailored to a specific learning algorithm and not applicable to a
wider group of rule learning algorithms [20].

Sub-sampling techniques have recently been frequently used for
increasing the accuracy of a classifier. The basic idea is to train
classifiers on multiple subsamples of the data and combine their
predictions, usually by voting. Such approaches include *bagging* [5], where all examples are sub-sampled with
equal probabilities, and *boosting*
[22,24] -- also known as *arcing*
[4] -- where examples that have been misclassified in previous
iterations are more likely to be selected in the next
iterations. Interestingly, [5] has noted that these
techniques rely on unstable base learners, while we conjecture that
the better performance of windowing with rule learning algorithms is
due to more stability (section 3.3). While the main
focus of these works is to improve the accuracy of a given learning
algorithm, it would be interesting to evaluate bagging and boosting
techniques along their run-time and memory requirements as well. For
example, [6] discusses arcing algorithms for datasets
where limited memory requires the use of sub-sampling.

In Knowledge Discovery for Databases, sub-sampling techniques have been investigated for the discovery of association rules. [56] describes a straight-forward approach, where association rules are discovered from a subsample of the data and their validity is checked on the complete dataset. Kivinen and Mannila give theoretical bounds for the sample size that is required for establishing (with a given maximum error probability) the truth of association rules [36] or functional dependencies [37] that have been discovered from the sample.