|
Topics (w/ videos of
relevant lectures from a past year) |
Dates |
Required reading (due
BEFORE class) |
Additional Background |
|
Course Overview, Statistical Language Modeling,
Computational Linguistics, Statistical Decision Making, and the
Source-Channel Paradigm (video1, video2) |
8/28,
8/30 |
[MS]
1.1 - 1.3, 2.1 |
|
|
All About Words: Types, Tokens and
Vocabularies (video) Type-Token
Data |
9/6 |
[MS]
1.4 |
[BCW]
ch. 4, zipf's law |
|
Unigrams: Statistical Estimation,
Maximum Likelihood Estimates (video1, video2, video3) |
9/11,
9/13 |
[MS]
6.2.1 |
[mD] ch. 6, esp. 6.5 |
|
9/18,
9/25 |
[MS]
6.2.2 |
||
|
9/27,
10/4, |
[MS]
6.1, 6.3; [sK] |
Chen & Goodman 98
(pp. 1-21) |
|
|
Measuring Success: Information
Theory, Entropy and Perplexity (video1, video2, video3) |
10/9,
10/11, 10/16, 11/8 |
[MS]
2.2 |
|
|
10/18 |
[MS]
14.1; Lattice
LM |
||
|
10/23,
10/25, 10/30 |
[MS]
ch. 9 |
||
|
Maximum
Entropy Models, Whole-Sentence Models, Semantic Modeling (video1, video2, video3, video4) |
10/30,
11/1, 11/6, 11/13 |
Adam
Berger's online tutorial,; Convexity,
Maximum Likelihood, and All That |
[MS]
16.2; Noah
Smith's tutorial; [BDD];
[rR]; [rR
slides] using
MCMC with language |
|
Latent
Variable Models, EM Algorithm (video1, video2, video3) |
11/13 |
[MS]
14.2; Notes by Guy Lebanon: Derivation
of EM for Gaussian mixture; EM derivation shortcut for
exponential family |
|
|
Review |
11/15 |
|
|
|
Exam |
11/16 7pm DH2315 |
||
|
Guest
lecture by Prof. Bhiksha Raj: EM for sound separation |
11/20 |
|
|
|
Probabilistic
Context Free Grammars (PCFG), the Inside-Outside Algorithm (video1, video2) |
11/27 |
[MS]
11.1-11.4 |
|
|
Semantic
representation (embedding): Latent Semantic Analysis, Dimensionality
Reduction, (video), cf. Word2vec |
11/29 |
Bellegarda 00; Yan Liu's Slides; Hoffmann 99; Gildea and Hofmann 99 Raux and Singh 04 |
|
|
|
|
|
|
|
12/4,
12/6 |
Mandatory
class attendance |
|
|
|
Overflow material: |
|||
|
Syntactic
Language Models (video) |
11/29 |
||
|
[MS]
16.1 |
|||
|
|
|||
|
|
|||
|
|
|
|
|
|
|
|
|
|
Abbreviations (in order of appearance):
[MS]
Manning and Schutze, Foundations
of Statistical Natural Language Processing.
[BCW]
Bell, Cleary and Witten, Text Compression.
[mD]
Morris DeGroot, Probability and
Statistics, 2nd edition.
[sK]
Slava M. Katz, "Estimation of
probabilities from sparse data for the language model component of a speech
recognizer", IEEE Transactions on Acoustic, Speech and Signal
Processing, vol. 35, no. 3, pp. 400-401, 1987.
[BBDM]
L. Bahl, P. Brown, P. de Souza and
R. Mercer, "A tree-based statistical language model for natural language
speech recognition", IEEE Transactions on Acoustic, Speech and Signal
Processing, vol. 37, no. 7, pp. 1001-1008, 1989.
[rR]
Roni Rosenfeld, "A maximum entropy approach to adaptive
statistical language modeling", Speech and Language, vol. 10, pp.
187-228, 1996.