CMU World Wide Knowledge Base (Web->KB) project


To develop a probabilistic, symbolic knowledge base that mirrors the content of the world wide web. If successful, this will make text information on the web available in computer-understandable form, enabling much more sophisticated information retrieval and problem solving.


We are developing a system that can be trained to extract symbolic knowledge from hypertext, using a variety of machine learning methods.


The first experiments consisted in extracting knowledge about computer science departments. We have assembled two data sets for this task:

Other Datasets used by the WebKB Group

Related research on machine learning and text:

See the other research on text learning by our research group.


Overview of Cora, a related project:

           Automatic Corpus Construction from the Web



Project Alumni:

