Computer Science Thesis Presentation

Project Presentations
5th Year Scholars Masters Candidate
Computer Science Department
Carnegie Mellon University
Relation Extraction using Distant Supervision, SVMS, and Probabilistic First Order Logic
Friday, May 9, 2014 - 2:00pm
Traffic21 Classroom 6501 
Gates&Hillman Centers
Abstract:

We are drowning in information and having difficulty finding knowledge: useful and actionable information. recent studies estimate that humantiy has stored in excess of 295 exabytes (1018 bytes) of data. Much data is stored in the form of unstructured text, such as news articles, message boards and forums, texts, emails, status updates, tweets, and nearly a billion webpages.

In this thesis, we present a solution to extracting knowledge present in untold amounts of unstructured text. We define our problem as one of relation extraction: given a document, extract all instantiations of well-defined binary relations present in the text.  To this end, we use distant supervision and a novel probabilistic first order logic system combined with co-reference resolution to identify candidate relation instances. These candidates are then classified by a series of cost augmented, binary one-vs-all Support Vector machines to produce the final relation extractions. Results on a corpus of 5.7 million newswire articles over 29 different relations results in an F1 of 37.32%.

Thesis Committee:
William Cohen
Tom Mitchell

Keywords:
For More Information, Please Contact:

tracyf [atsymbol] cs ~replace-with-a-dot~ cmu ~replace-with-a-dot~ edu