Fall 2010
Information Extraction:
Machine Learning Approaches to Extracting Structured Information from Text

Instructor and Venue

Instructor: William Cohen, Machine Learning Dept and LTI
TA:Ni Lao
When/where: Mon/Wed 1:30-2:50, Gates 4101
Course Number: 10-707, cross-listed in LTI as 11-748



Information extraction is finding names of entities in unstructured or partially structured text, and determining the relationships that hold between these entities. More succinctly, information extraction is the problem of deriving structured factual information from text.

This course considers the problem of information extraction from a machine-learning prospective. We will survey a variety of learning methods that have been used for information extraction, including rule-learning, boosting, and sequential classification methods such as hidden Markov models, conditional random fields, and structured support vector machines. We will also look at experimental results from a number of specific information extraction domains, such as biomedical text, and discuss semi-supervised "bootstrapping" learning methods for information extraction.

Last modified: Tue Sep 07 10:40:59 Eastern Daylight Time 2010