CMU World Wide Knowledge Base (Web->KB) project


To develop a probabilistic, symbolic knowledge base that mirrors the content of the world wide web. If successful, this will make text information on the web available in computer-understandable form, enabling much more sophisticated information retrieval and problem solving.


We are developing a system that can be trained to extract symbolic knowledge from hypertext, using a variety of machine learning methods.


The first experiments consisted in extracting knowledge about computer science departments. We have assembled two data sets for this task:

           Automatic Corpus Construction from the Web



