Active Learning

Next: Complexity Reduction Up: Related Work Previous: Sub-Sampling Techniques

Active Learning

Windowing techniques are also closely related to the field of active learning. According to the term's original definition (within the field of Machine Learning), active learning includes ``any form of learning in which the learning program has some control over the inputs it trains on.'' [15]. While this definition would be broad enough to include windowing techniques, subsequent work in this area has mostly concentrated on the use of membership queries, i.e., on giving the learner the means to query the classification of examples of its own choice instead of providing it with a fixed set of labeled data.

Such approaches are based on a new motivation for studying sub-sampling techniques (in addition to the three motivations listed in the Introduction to this paper):

Expensive Labeling:: In many domains, it is very expensive to obtain pre-classified training examples, while unlabeled examples are cheaply available. Sub-sampling techniques can help to solve this problem by focussing on minimizing the number of examples that have to be labeled in order to learn a satisfying theory.

The prevalent example of such a domain is the World-Wide Web, which provides an abundance of training pages for text categorization problems. However, significant effort is required to assign semantic categories to these pages. Not surprisingly, much of the recent work in active learning has concentrated on text categorization problems.

Closely related to windowing is uncertainty sampling [41,40]. The difference is that the window is not adjusted on the basis of misclassified examples, but on the basis of the learner's confidence in its own predictions. The examples that are classified with the least confidence will be added to the training set in the next iteration (after obtaining their class labels).

However, not all learning algorithms are able to attach uncertainty estimates to their predictions. Besides, using uncertainty estimates from a single learning algorithm may be problematic in some cases [15]. Thus, it was suggested to use a committee of classifiers and measure the uncertainty in the predictions by the degree of disagreement among the classifiers. The selective sampling technique proposed by [15] is one such technique, which is based on a theoretical framework that uses the entire version space of consistent theories as a committee. Another version of this approach is the query by committee algorithm [55,25], which uses a probability distribution over hypotheses to randomly select two consistent hypotheses for classifying a new example. If their predictions differ, the algorithm asks for the true label of the example and adds it to the training set. Committee-based sampling [16] is an adaptation of this idea to probabilistic classifiers. [42] compare several ways of combining the predictions of a committee of Winnow-based learners on a text categorization task

Obviously, the above-mentioned approaches can also be applied to situations where a large amount of labeled data is available. An investigation of the suitability of these techniques for increasing the efficiency of learning algorithms and a comparison to our solutions for this problem is left for future work. However, it should be noted that these techniques are not susceptible to the problem of noise in the form we discussed in section 5.1, because they would simply ignore the class labels during the subsampling process.

It is less obvious that windowing techniques might also be useful in the presence of large amounts of unlabeled data. However, several authors have recently started to investigate the idea that, instead of submitting the predictions that the learner is most uncertain about to a teacher for labelling, the learner might autonomously label the predictions it is most certain about and add them to the training set. Proposals include the use of a committee of learners for co-training [3] or techniques based on the Expectation Maximization (EM) algorithm [48], possibly coupled with an active learning procedure [43]. If research in this direction is pushed further, the number of labeled examples available to a learning algorithm might be greatly enlarged at the expense of some incorrectly labeled examples. Noise-tolerant windowing algorithms may turn out to be an appropriate choice for such data sets.

Next: Complexity Reduction Up: Related Work Previous: Sub-Sampling Techniques