10-702 CALD
15-802 Computer Science
36-712 Statistics
80-802 Philosophy
Statistical Approaches to Learning and Discovery

Spring 2003


Time: Mondays and Wednesdays 12:00 pm - 1:20 pm

Location:  Wean Hall 5409

Instructors:  John Lafferty   (lafferty@cs.cmu.edu), Tom Mitchell (tom.mitchell@cs.cmu.edu)  and Teddy Seidenfeld (teddy@stat.cmu.edu)

Teaching Assistant:  Tianjiao Chu (tchu@andrew.cmu.edu)

Recommended Texts: There is no required textbook. However, the following are recommended optional texts. Both will be placed on reserve in the CMU Engineering and Science Library:

Background Reference Material:  These books provide useful background on Bayesian Statistics.

Written Requirements:  There will be 5 homework assignments. Homeworks are worth full credit at the due date/time, half credit for the next 48 hours, and zero credit after that. You must turn in at least n-1 of the n assignments in order to pass the course. We will drop out your lowest homework score when calculating your final grade.

Exams: There will be a mid-term examination, and a final examination. We will not be able to reschedule exams for individuals, so be sure you are available on the final exam date - we will announce this date as soon as we receive it from the registrar. x

Grading: Grading will be based half on homeworks, half on exams.

Tentative Topic List and Schedule

Below is a schedule of class meetings and a tentative choice of topics to be covered up through the midterm exam.
Jan 13
  • Course overview, introductions.
Example: statistical learning and brain imaging
Jan 15
Basic concepts from statistics
and information theory
  • Fisher information
  • Cramer-Rao bounds
  • Maximum likelihood estimators
  • sufficiency
Lecture slides A (postscript, 4up)
Lecture slides B (postscript, 4up)
Jan 20
No class - Martin Luther King Day
Jan 22
Bayesian inference concepts
  • Asymptotics
  • Data reduction
  • Conjugate priors
  • Role of models
Lecture slides
Jan 22
Assignment 1 out; Due January 29 in class.
    Concepts: Suffuciency, MLEs, conjugacy, estimating a mixture model
Assignment 1
Jan 29, Feb 3, Feb 5
Decision theoretic concepts
  • Risk and admissibility
  • Examples of relevant loss structures
  • Bayes rules and the complete class theorem
Lecture slides A (postscript;postscript, 4up)
Lecture slides B (postscript, 4up)
Lecture slides C
Feb 10, 12, 17
Expectation Maximization
  • General theory
  • Missing data
  • Exponential family
  • Rate of convergence
  • Counterexamples
  • Aspect model
Lecture notes A
Lecture notes B
Feb 10
Assignment 2 out; Due February 19 in class.
    Concepts: EM, rates of convergence, exponential models
Assignment 2 (postscript)
February 19, 24, 26
Graphical models
  • directed and undirected
  • Hammersley-Clifford theorem
  • d-separation
  • Variational methods
  • Loopy belief propagation
Interactive tetrahedron, by Edoardo Airoldi
Lecture Notes on Graphical Models Feb 19
Jordan chapters passed out in class: elimination, sum-product, Markov properties, and junction tree
Lecture notes passed out in class on message passing algorithms and loopy BP.
Pointers to further details on the theoretical results discussed:
Understanding belief propagation and its generalizations, Yedidia, Freeman, and Weiss
On the optimality of solutions of the max-product belief propagation algorithm in arbitrary graphs, Weiss and Freeman.
March 3
  MIDTERM EXAM IN CLASS. Open notes, not open book.
One question covering information theory and Bayesian inference, one on EM, one on Directed Graphical Models, one on Undirected Graphical Models
March 5,10
Causal inference
  • dependence and causality
  • Tetrad
Introduction to causal inference
Slides on causal inference
Causal inference and structural equation modeling
Mar 19
Assignment 3 out; Due April 2 in class.
    Concepts: Causality, D-separation equivalence, structural equation models, Tetrad.
Assignment 3
Data set for problem 4
Mar 17, 19
Generative and Discriminative Classifiers
  • Generative vs. discriminative models
  • Loss functions and MAP estimation
  • Naive Bayes vs. Logistic regression
Rubenstein, D. and Hastie, T. " Discriminative vs Informative Learning" KDD, 1997.

On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes. Andrew Y. Ng and Michael Jordan. in NIPS 14, 2002.

Class slides
Mar 31, Apr 2
Learning from labeled and unlabeled data
  • Using unlabeled data to reweight labeled examples
  • EM with generative models
  • CoTraining
  • Using unlabeled data to detect overfitting
Lecture notes ( ppt ) ( pdf )

Text classification from labeled and unlabeled documents using EM, K. Nigam et al.

Combining labeled and unlabeled data with Co-Training, A. Blum and T. Mitchell

Metric-based methods for adaptive model selection and regularization, D. Schuurmans and F. Southey
Apr 7,9
Model selection
  • Cross validation
  • Bootstrap
Methods and criteria for model selection, J. Kadane and N. Lazar

April 14, 16
Kernel methods
  • Mercer's theorem & Mercer kernels
  • Reproducing kernel Hilbert spaces
  • Representer theorem
Assignment 4
April 16, 21
Sampling methods
  • Gibbs sampling
  • Gibbs as a generalized EM procedure
  • Accept/reject
  • Metropolis-Hastings
Lecture notes
Apr 23
Assignment 5 out; Due April 30 in class.
    Concepts: Bootstrap, delta method
Assignment 5
April 23
Variational methods
Guest lecture: Prof. Zoubin Ghahramani
  Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., and Saul L.K. (1999) An Introduction to Variational Methods for Graphical Models. Machine Learning, 37:183-233.

Beal, M.J. and Ghahramani, Z. (in press) The Variational Bayesian EM Algorithm for Incomplete Data with Application to Scoring Graphical Model Structures. To appear in Bayesian Statistics 7.

Lecture slides

April 28
Laplace's method, posterior approximations
  Accurate approximations for posterior moments and marginal densities, by L. Tierney and J. Kadane, JASA, Vol. 81, (March 1986).
April 30
Active learning
  Slides on sequential decisions
May 2
Final out; Due May 12, 5:00 p.m.
    Concepts: Decision theory, normal approximations, kernels, generative-discriminative models