This course will cover modern empirical methods in natural language processing. It is designed for language technologies students who want to understand statistical methodology in the language domain, and for machine learning students who want to know about current problems and solutions in text processing.
Students will, upon completion, understand how statistical modeling and learning can be applied to text, be able to develop and apply new statistical models for problems in their own research, and be able to critically read papers from the major related conferences (EMNLP and *ACL). A recurring theme will be the tradeoffs between computational cost, mathematical elegance, and applicability to real problems. The course will be organized around methods, with concrete tasks introduced throughout.
Each student will give a ~20-minute oral presentation on his/her literature review. A period of discussion will follow, in which we will aim to find connections between student topics. The driving questions will be: What can be borrowed from one area and applied to another? And what challenges are not being met by current methods?
A final written exam will be given to test basic competence with the technical material covered in the lectures.
|Dates (tentative)||Topic||Readings||Lecture Slides|
|8/29||Philosophy: the empirical way of thinking about language.||Pereira, 2000; Abney, 1996|
|8/31-9/5||Stochastic models for sequences: Markov models, hidden Markov models, and related algorithms||Manning & Schütze, 1999 (ch. 9); Smith, 2004 (works through an HMM example); Eisner, 2002 (MS Excel spreadsheet illustrating forward-backward)||pdf1, pdf2|
|9/7-9/14||Log-linear/exponential/maximum entropy models, conditional estimation, CRFs, regularization, and convex optimization||There's a lot of tutorial material on these kinds of models. Here are some starting points: Adam Berger's page, Adwait Ratnaparkhi's tutorial, a handout I made for another class.
Research papers: Lafferty, McCallum, and Pereira, 2001; Chen and Rosenfeld, 1999; Khudanpur and Wu, 2000; Rosenfeld, Chen, and Zhu, 2000; Della Pietra, Della Pietra, and Lafferty, 1997
|pdf1, pdf2, pdf3|
|9/19-9/21||Interspeech (no lecture)|
|9/26-9/28||Weighted finite-state technology||Eisner, 2002; Stolcke and Omohundro, 1993; Mehryar Mohri's list of references will be helpful if you want to know about algorithms for FSTs. Karttunen, 2001 will tell you all about two-level morphology using FSTs.||pdf1, pdf2|
|10/3-10/12||Stochastic grammars and statistical parsing||Johnson, 1998
papers about some important parsers: Charniak, 1997; Charniak, 2000; Collins, 2003; Klein and Manning, 2003; McDonald, Pereira, Ribarov, and Hajic, 2005
|pdf1, pdf2, pdf3, pdf4|
|10/17-10/19||Weighted dynamic programming||Goodman, 1999; Eisner, Goldlust, and Smith, 2005; if you're in love, Shieber, Schabes, and Pereira, 1995||pdf1, pdf2|
|10/24||Discriminative training: perceptron, boosting, maximum margin estimation||Collins (2002); Taskar and Klein's tutorial at ACL 2005 on maximum margin methods|
|10/26||Information extraction (guest lecture: Vitor Carvalho)||Cohen and McCallum's tutorial at KDD 2003; Siefkes and Siniakov, 2005|
|10/31-11/2||Discriminative training (continued) and reranking; transformation-based learning||Collins, 2000; Collins and Duffy, 2002; Brill, 1992||pdf1,pdf2|
|11/7||Unsupervised learning: clustering and EM, clustering words||Brown et al., 1992; Pereira, Tishby, and Lee, 1993; Schütze, 1993|
|11/9-11/14||The EM algorithm for structured models, and with hidden data and partially-hidden data; contrastive estimation||Merialdo, 1994; Pereira and Schabes, 1992 (note corrected link); Klein and Manning, 2002; Smith and Eisner, 2005||pdf1, pdf2|
|11/16||Semisupervised learning: Yarowsky algorithms, co-training.||Yarowsky, 1995; Blum and Mitchell, 1998; Nigam and Ghani, 2000; Abney, 2004|
|11/21||Experimentation and hypothesis testing.|
|11/28||Final presentations|| 3:00 Mengqiu Wang: question answering
3:30 Yitao Sun: syntactic language modeling
|11/30||Final presentations||3:00 David Huggins-Daines: optimality theory|
|12/5||Final presentations|| 3:00 Kevin Gimpel: topic modeling
3:30 Greg Hanneman: statistical syntactic machine translation (1)
|12/7||Final presentations|| 3:00 Amr Ahmed: statistical syntactic
machine translation (2)
3:30 Jaime Arguello: unsupervised morphology induction