11-761/11-661 Fall 2017 Course Syllabus

Topics (w/ videos of relevant lectures from a past year)

Dates

Required reading (due BEFORE class)

Additional Background

Course Overview, Statistical Language Modeling, Computational Linguistics, Statistical Decision Making, and the Source-Channel Paradigm (video1, video2)

8/28, 8/30

[MS] 1.1 - 1.3, 2.1

historical overview by Lafferty

All About Words: Types, Tokens and Vocabularies (video) Type-Token Data

9/6

[MS] 1.4

[BCW] ch. 4, zipf's law

Unigrams: Statistical Estimation, Maximum Likelihood Estimates (video1, video2, video3)

9/11, 9/13

[MS] 6.2.1

[mD] ch. 6, esp. 6.5

Sparseness; Smoothing (video1, video2)

9/18, 9/25

[MS] 6.2.2

Good-Turing

N-grams: linear interpolation; Backoff (video1, video2)

9/27, 10/4,

[MS] 6.1, 6.3; [sK]

Chen & Goodman 98 (pp. 1-21)

Measuring Success: Information Theory, Entropy and Perplexity (video1, video2, video3) 

10/9, 10/11, 10/16, 11/8

[MS] 2.2

Elements of Information Theory;

visual information theory;

IT notes: Entropy of English

Clustering (video1, video2)

10/18

Class LM

[MS] 14.1; Lattice LM

Hidden Markov Models (video1, video2, video3)

10/23, 10/25, 10/30

[MS] ch. 9

Larry Rabiner's classic HMM tutorial

Maximum Entropy Models, Whole-Sentence Models, Semantic Modeling (video1, video2, video3, video4)

10/30, 11/1, 11/6, 11/13

Adam Berger's online tutorial,; Convexity, Maximum Likelihood, and All That

[MS] 16.2; Noah Smith's tutorial; [BDD]; [rR]; [rR slides] using MCMC with language

Latent Variable Models, EM Algorithm (video1, video2, video3)

11/13

[MS] 14.2; Notes by Guy Lebanon: Derivation of EM for Gaussian mixture; EM derivation shortcut for exponential family

More advanced EM notes by John Lafferty

Review

11/15

 

 

Exam

11/16 7pm

DH2315

Guest lecture by Prof. Bhiksha Raj: EM for sound separation

11/20

 

 

Probabilistic Context Free Grammars (PCFG), the Inside-Outside Algorithm (video1, video2)

11/27

Notes on Probabilistic Context Free Grammars

[MS] 11.1-11.4

Semantic representation (embedding): Latent Semantic Analysis, Dimensionality Reduction, (video), cf. Word2vec

11/29

Bellegarda 99; Indexing by latent semantic analysis

Bellegarda 00; Yan Liu's Slides; Hoffmann 99; Gildea and Hofmann 99 Raux and Singh 04

 

 

 

 

Final project presentations (video1, video2)

12/4, 12/6

Mandatory class attendance

 

Overflow material:

Syntactic Language Models (video)

11/29

Jelinek and Chelba 99

Chelba Slides 98

Decision Tree Language Models (video1, video2, video3)

[BBDM]

[MS] 16.1

 

 

 

 

 

 

 

 

 

 

Abbreviations (in order of appearance):

[MS]

Manning and Schutze, Foundations of Statistical Natural Language Processing.

[BCW]

Bell, Cleary and Witten, Text Compression.

[mD]

Morris DeGroot, Probability and Statistics, 2nd edition.

[sK]

Slava M. Katz, "Estimation of probabilities from sparse data for the language model component of a speech recognizer", IEEE Transactions on Acoustic, Speech and Signal Processing, vol. 35, no. 3, pp. 400-401, 1987.

[BBDM]

L. Bahl, P. Brown, P. de Souza and R. Mercer, "A tree-based statistical language model for natural language speech recognition", IEEE Transactions on Acoustic, Speech and Signal Processing, vol. 37, no. 7, pp. 1001-1008, 1989.

[rR]

Roni Rosenfeld, "A maximum entropy approach to adaptive statistical language modeling", Speech and Language, vol. 10, pp. 187-228, 1996.