Structured Prediction for Language and Other Discrete Data
(10-710, 11-763)

Instructors: Prof. Noah Smith, Prof. William Cohen
History: Taught in Fall 2011
Prerequisite: Machine Learning (10-601 or 10-701) or instructors' permission
Current information about this course is available on the wiki.

Course Description

This course seeks to cover statistical modeling techniques for discrete, structured data such as text. It brings together content previously covered in Language and Statistics 2 (11-762) and Information Extraction (10-707 and 11-748), and aims to define a canonical set of models and techniques applicable to problems in natural language processing, information extraction, and other application areas. Upon completion, students will have a broad understanding of machine learning techniques for structured outputs, will be able to develop appropriate algorithms for use in new research, and will be able to critically read related literature. The course is organized around methods, with example tasks introduced throughout. We expect that the course will be of interest not only to LTI and MLD students, but also to students in the Lane Center, RI, and CSD.

Format

  1. Synthesis (45%): Students will be expected to make substantial contributions to a public wiki that synthesizes material covered in the course. The wiki environment allows students to leverage each other's contributions and illustrate connections across different research ideas. It allows the instructors to track each contributions. It also provides a resource to students and researchers at CMU and elsewhere; each year the wiki will evolve to cover more recent literature. Most of the synthesis requirement will be due before the middle of the semester.
  2. Practicum (45%): Students will work in small teams to complete projects on relevant topics of their choosing. Projects are expected to directly explore ideas covered in the course, and to have an experimental component. The use and extension of existing libraries and tools is encouraged. Project proposals must include milestones, and regular graded reports will discourage procrastination. The project will be a greater focus in the second half of the course.
  3. Oral presentation (10%): At the end of the course, each team will either give a brief oral presentation or present at a poster session, depending on course size.
No midterm or final exam is planned.

Topics

Subject to change. Parenthesized numbers are approximate numbers of lectures.
  1. Sequence models: HMMs and MEMMs for part-of-speech tagging, BIO tagging/chunking, and segmentation; CRFs; cyclic models and pseudolikelihood (3)
  2. Large margin models: structured and ranking perceptrons; structured SVMs (3)
  3. Inference: dynamic programming; search; integer linear programming; stacking and Searn (3)
  4. Tree models: PCFGs and phrase-structure parsing; spanning trees and dependency parsing (3)
  5. Kernels: kernels for inputs and relation extraction; kernels for outputs, reranking, and non-local features (2)
  6. Alignment models: edit distances for text or genomics; weighted FSTs; non-monotonic alignment and machine translation (3)
  7. Incomplete data: EM; latent-variable CRFs and SVMs; regularizers from unlabeled data; graph-based semisupervised learning; associative Markov networks; Bayesian grammars (6)

Readings

Many lectures will be accompanied by a reading from recent literature (typically machine learning or natural language processing publications from the past decade). Supplementary readings will be suggested from Prof. Smith's book, Linguistic Structure Prediction.