CMU Text-Learning Group Data Archive

This file is /afs/cs/project/theo-3/www/index.html

Here you will find a collection of text data sets. The types of data sets range from newsgroup articles to collections of web pages. We are always looking for more sets of data to include in the archive, so if you have any text-related data sets you would like to submit, please click the link below on adding to the archive.

All of the data referenced by this page can be found in under the directory /afs/cs.cmu.edu/project/theo-3/. There, you will find a directory for data, one for results, one for training models and one for data packages (tarred and gzipped bundles)

General Info
Info on adding to the archive

Data

Results

This section contains pointers to results obtained from the data listed above.

Learned Models

This section contains pointers to knowledge models obtained from the data listed above.

Dataset Packages

This section will (eventually) contain pointers to tarred and gzipped datasets which are publicly distributable.

jr6b@cs.cmu.edu / Last modified 3/7/97