Language and Statistics II:
(More) Empirical Methods in Natural Language Processing

Fall 2006

Instructor: Prof. Noah Smith
Meeting time: Tuesday and Thursday 3-4:20 pm
Office hours: Tuesday and Thursday 4:30-5:30 pm, NSH 2602F (right after lecture)
Location: Doherty 1217
Prerequisite: Language and Statistics (11-761) or permission of instructor. Recommended: Algorithms for Natural Language Processing (11-711), Machine Learning (15-681, 15-781, or 11-746)
Click here to see the final presentation schedule.

Syllabus (subject to change)

Course Description

This course will cover modern empirical methods in natural language processing. It is designed for language technologies students who want to understand statistical methodology in the language domain, and for machine learning students who want to know about current problems and solutions in text processing.

Students will, upon completion, understand how statistical modeling and learning can be applied to text, be able to develop and apply new statistical models for problems in their own research, and be able to critically read papers from the major related conferences (EMNLP and *ACL). A recurring theme will be the tradeoffs between computational cost, mathematical elegance, and applicability to real problems. The course will be organized around methods, with concrete tasks introduced throughout.


Literature review (35%)

Tentative due dates (all by email, in pdf): September 12, 5pm (topics selected), November 10, 5pm (first draft), December 8, 5pm (final draft)
Instructions and list of suggested topics

Oral presentation and discussion (25%)

Tentative dates: weeks of November 27 and December 4

Each student will give a ~20-minute oral presentation on his/her literature review. A period of discussion will follow, in which we will aim to find connections between student topics. The driving questions will be: What can be borrowed from one area and applied to another? And what challenges are not being met by current methods?

Assignments (20%)

Small projects with a programming component will be assigned about every three weeks. Typically some or all of the software will be available, and students will be expected to run experiments or extend implementation. To encourage creative exploration, some projects may be graded competitively.

Final Exam (20%)

Tentative date: December 11, 1pm-4pm (location TBD)

A final written exam will be given to test basic competence with the technical material covered in the lectures.


Course Plan

Dates (tentative)TopicReadingsLecture Slides
8/29 Philosophy: the empirical way of thinking about language. Pereira, 2000; Abney, 1996pdf
8/31-9/5 Stochastic models for sequences: Markov models, hidden Markov models, and related algorithms Manning & Schütze, 1999 (ch. 9); Smith, 2004 (works through an HMM example); Eisner, 2002 (MS Excel spreadsheet illustrating forward-backward) pdf1, pdf2
9/7-9/14 Log-linear/exponential/maximum entropy models, conditional estimation, CRFs, regularization, and convex optimization There's a lot of tutorial material on these kinds of models. Here are some starting points: Adam Berger's page, Adwait Ratnaparkhi's tutorial, a handout I made for another class.
Research papers: Lafferty, McCallum, and Pereira, 2001; Chen and Rosenfeld, 1999; Khudanpur and Wu, 2000; Rosenfeld, Chen, and Zhu, 2000; Della Pietra, Della Pietra, and Lafferty, 1997
pdf1, pdf2, pdf3
9/19-9/21 Interspeech (no lecture)
9/26-9/28 Weighted finite-state technology Eisner, 2002; Stolcke and Omohundro, 1993; Mehryar Mohri's list of references will be helpful if you want to know about algorithms for FSTs. Karttunen, 2001 will tell you all about two-level morphology using FSTs.

Tools: Xerox's FS group, AT&T FSM libraries, RWTH FSA toolkit

pdf1, pdf2
10/3-10/12 Stochastic grammars and statistical parsing Johnson, 1998
papers about some important parsers: Charniak, 1997; Charniak, 2000; Collins, 2003; Klein and Manning, 2003; McDonald, Pereira, Ribarov, and Hajic, 2005
pdf1, pdf2, pdf3, pdf4
10/17-10/19Weighted dynamic programming Goodman, 1999; Eisner, Goldlust, and Smith, 2005; if you're in love, Shieber, Schabes, and Pereira, 1995 pdf1, pdf2
10/24 Discriminative training: perceptron, boosting, maximum margin estimation Collins (2002); Taskar and Klein's tutorial at ACL 2005 on maximum margin methods pdf
10/26 Information extraction (guest lecture: Vitor Carvalho) Cohen and McCallum's tutorial at KDD 2003; Siefkes and Siniakov, 2005
10/31-11/2 Discriminative training (continued) and reranking; transformation-based learning Collins, 2000; Collins and Duffy, 2002; Brill, 1992 pdf1,pdf2
11/7 Unsupervised learning: clustering and EM, clustering words Brown et al., 1992; Pereira, Tishby, and Lee, 1993; Schütze, 1993 pdf
11/9-11/14 The EM algorithm for structured models, and with hidden data and partially-hidden data; contrastive estimation Merialdo, 1994; Pereira and Schabes, 1992 (note corrected link); Klein and Manning, 2002; Smith and Eisner, 2005 pdf1, pdf2
11/16 Semisupervised learning: Yarowsky algorithms, co-training. Yarowsky, 1995; Blum and Mitchell, 1998; Nigam and Ghani, 2000; Abney, 2004 pdf
11/21 Experimentation and hypothesis testing. pdf
11/28 Final presentations 3:00 Mengqiu Wang: question answering
3:30 Yitao Sun: syntactic language modeling
11/30 Final presentations 3:00 David Huggins-Daines: optimality theory
12/5 Final presentations 3:00 Kevin Gimpel: topic modeling
3:30 Greg Hanneman: statistical syntactic machine translation (1)
12/7 Final presentations 3:00 Amr Ahmed: statistical syntactic machine translation (2)
3:30 Jaime Arguello: unsupervised morphology induction
? Course review
12/11 Final exam


  1. Assignment, training data, test data (due 9/14)
  2. Assignment, training data, test data (due 10/3)
  3. Assignment, training data, test data (due 10/27 - note change!)
  4. Assignment (due 11/16)
  5. Assignment, training data (due 12/7)
[an error occurred while processing this directive]