Statistical Parsing Triptych: Jeopardy, Morphosyntax, and M-Estimation
Noah Smith, LTI, CMU


This talk covers three recent advances in statistical parsing: an application, an algorithmic solution, and a learning solution.

The first part of the talk presents our work using quasi-synchronous dependency grammars - an elegant model originally designed for machine translation - in question answering. By modeling loose answer-to-question transformations at the level of bare-bones dependency structure, we achieve notably high on a TREC-style answer-selection task (Wang, Smith, and Mitamura, EMNLP-CoNLL 2007).

The second part of the talk turns to parsing algorithms. While much research has been devoted to parsing algorithms for languages that have clear morpheme boundaries (e.g., English), it is not clear what to do when a language displays morphological ambiguity as well. We describe two efficient ways to apply models for morphological and syntactic disambiguation in tandem, giving significant gains on parsing the Hebrew Treebank (Cohen and Smith, EMNLP-CoNLL 2007).

The third part of the talk turns to a learning problem. Since log-linear ("maximum entropy") models were first applied to NLP at IBM in the 1990s, they have been widely used. Training them, however, is very expensive for models of sequences and trees. We present a novel, generative parameter estimation algorithm for log-linear structure models based on a generalization of maximum likelihood estimation called M-estimation. We compare this method to existing learning algorithms on a shallow parsing task (Smith, Vail, and Lafferty, ACL 2007).


Noah Smith is Assistant Professor of Language Technologies at Carnegie Mellon University. His research has spanned statistical machine translation, parallel corpus discovery, unsupervised statistical grammar induction, efficient morphological and syntactic processing algorithms, weighted logic programming, and the formal study of weighted grammars. He is a Hertz Fellow (2001-6), the recipient of an IBM Faculty Award (2007), and a member of the DARPA Computer Science Study Panel (2007).

Venue, Date, and Time

Venue: NSH 1507

Date: Monday, November 26

Time: 12:00 noon