Combining Labeled and Unlabeled Data with Co-Training We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm where only a small set of labeled examples is available. In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated by the task of learning to classify web pages. For example the description of a web page can be partitioned into the words occurring on that page and occurring in hyperlinks that point to that page. We assume that either view of the example would be sufficient for learning if we had enough labeled data, but our goal is to use both views together to allow inexpensive unlabeled data to augment a much smaller set of labeled examples. Specifically, the presence of two distinct views of each example suggests strategies in which two learning algorithms are trained separately on each view, and then each algorithm's predictions on new unlabeled data are used to enlarge the training set of the other. Our goal in this paper is to provide a PAC-style analysis of learning, from both labeled and unlabeled data. We also provide empirical results on real web-page data indicating that this use of unlabeled examples can lead to significant improvement of hypotheses in practice. For the most up to date copy of the paper: http://www.cs.cmu.edu/~avrim/Papers/cotrain.ps.gz