FacebookTwitterGoogle PlusRSS News Feed

Language Technologies Thesis Defense

Thesis Orals
Ph.D. Student
Language Technologies Institute
Carnegie Mellon University
Lexical Semantic Analysis in Natural Language Text
Thursday, June 19, 2014 - 3:30pm
6115 
Gates&Hillman Centers
Abstract:

Computer programs that make inferences about natural language are easily fooled by the often haphazard relationship between words and their meanings. This thesis develops Lexical Semantic Analysis (LxSA), a general-purpose framework for describing word groupings and meanings in context. LxSA marries comprehensive linguistic annotation of corpora with engineering of statistical natural language processing tools. The framework does not require any lexical resource or syntactic parser, so it will be relatively simple to adapt to new languages and domains.

The contributions of this thesis are: a formal representation of lexical segments and coarse semantic classes; a well-tested linguistic annotation scheme with detailed guidelines for identifying multiword expressions and categorizing nouns, verbs, and prepositions; an English web corpus annotated with this scheme; and an open source NLP system that automates the analysis by statistical sequence tagging. Finally, we motivate the applicability of lexical semantic information to sentence-level language technologies (such as semantic parsing and machine translation) and to corpus-based linguistic inquiry.

Thesis Committee:
Noah Smith (Chair)
Lori Levin
Ed Hovy
Chris Dyer
Tim Bladwin (University of Melbourne)

Copy of Thesis Document

Keywords:
For More Information, Please Contact:

staceyy [atsymbol] cs ~replace-with-a-dot~ cmu ~replace-with-a-dot~ edu