Structured Prediction for Language and Other Discrete Data
(10-710, 11-763)

Instructors: Prof. Noah Smith (2011, 2013), Prof. William Cohen (2011), Prof. Chris Dyer (2013)
History: Taught in Fall 2013, Fall 2011
Prerequisite: Machine Learning (10-601 or 10-701) or instructors' permission

Course Description

This course seeks to cover statistical modeling techniques for discrete, structured data such as text. It brings together content previously covered in Language and Statistics 2 (11-762) and Information Extraction (10-707 and 11-748), and aims to define a canonical set of models and techniques applicable to problems in natural language processing, information extraction, and other application areas. Upon completion, students will have a broad understanding of machine learning techniques for structured outputs, will be able to develop appropriate algorithms for use in new research, and will be able to critically read related literature. The course is organized around methods, with example tasks introduced throughout. We expect that the course will be of interest not only to LTI and MLD students, but also to students in the Lane Center, RI, and CSD.


Subject to change. Parenthesized numbers are approximate numbers of lectures.
  1. Sequence models: HMMs and MEMMs for part-of-speech tagging, BIO tagging/chunking, and segmentation; CRFs; cyclic models and pseudolikelihood
  2. Large margin models: structured and ranking perceptrons; structured SVMs
  3. Inference: dynamic programming; search; integer linear programming; stacking and Searn
  4. Tree models: PCFGs and phrase-structure parsing; spanning trees and dependency parsing
  5. Kernels: kernels for inputs and relation extraction; kernels for outputs, reranking, and non-local features
  6. Alignment models: edit distances for text or genomics; weighted FSTs; non-monotonic alignment and machine translation
  7. Incomplete data: EM; latent-variable CRFs and SVMs; regularizers from unlabeled data; graph-based semisupervised learning; associative Markov networks; Bayesian grammars


Many lectures will be accompanied by a reading from recent literature (typically machine learning or natural language processing publications from the past decade). Supplementary readings will be suggested from Prof. Smith's book, Linguistic Structure Prediction.