Language understanding is the problem of mapping natural language text to a semantic representation connected with the real world. Many real-world applications depend on language understanding, including information extraction, robot command understanding, and natural language interfaces. In general, understanding natural language requires both background knowledge and an understanding of situated environmental context; background knowledge can be provided by a web-scale logical knowledge base, and environmental information can be provided using a camera or other sensors.
Obtaining labeled training data is a major challenge in language understanding. Web-scale knowledge bases have hundreds or thousands of predicates, and we further expect their predicate vocabularies to grow over time. Systems incorporating environmental context must, for example, learn the name of every object in their environment. Manually annotating a corpus of thousands of semantic parses or object names is expensive. What is needed for these applications are training procedures that use naturally occurring data to train language understanding systems.
The thesis of the proposed work is that semantic parsers can be trained for large knowledge bases and situated environments using naturally occurring forms of supervision and existing NLP resources. I propose to build on both semantic parsing and vector space semantics, to maximize generalization over both words and phrase structures. I also propose to use corpus statistics from large corpora (i.e., the web) to improve generalization by reducing sparsity. I present several preliminary applications on language understanding -- using both web-scale knowledge bases and environmental context -- that suggest the promise of this line of research.
Tom Mitchell (Chair)
Luke Zettlemoyer (University of Washington)
deb [atsymbol] cs ~replace-with-a-dot~ cmu ~replace-with-a-dot~ edu