10708 Probabilistic Graphical Models

Date	Lecture	Topics	Readings	Handouts
Module 1: Representation
Wed 9th Sept	Lecture 1 : Introduction Slides Annotated Slides	Introduction to and Examples of Graphical Models Logistics A broad overview The running example: hidden markov model	Chpt. 1 An Introduction to Graphical Models
Mon 14th Sept	Lecture 2 : An Introduction to Bayesian Networks Slides Annotated Slides	Representation of Bayesian Networks Bayesian networks Factorization theorem Local structure and independencies I-MAPs I-equivalence, Minimal I-MAPs, Perfect MAP Examples : Gaussian models, HMM	Chpt. 3, 7.1
Wed 16th Sept	Lecture 3 : An Introduction to Undirected Graphical Models Slides Annotated Slides	Representation of Markov Random Fields Clique potentials Local and global markov independencies Hammersley-Clifford theorem Soundness and completeness in markov random fields Examples : Boltzmann machines, Ising models, gaussian graphical models, conditional random fields	Chpt 4, Optional : Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
Mon 21st Sept	Lecture 4 : A unified view of BN and MRF Slides Annotated Slides	A unified view of BN and MRF Graphical lasso for structure learning in gaussian graphical models, Minimal I-MAPs from BN to MRF : Markov blanket Minimal I-MAPs from MRF to BN : chordal graphs and triangulation	Chpt 4.5, Pseudo-Likelihood Based Structure Estimation using Neighborhood Estimation: Neighborhood Selection in Gaussian Graphical Models Likelihood Based Structures Estimation of GGMs: Glasso
Module 2: Basic Inference and Learning Methods
Wed 23rd Sept	Lecture 5 : Learning one-node GM: Slides Annotated Slides	Learning one-node GMs Parameter learning from IID models Maximum likelihood estimation Bayesian and frequentist parameter estimation Example : Bernoulli model, multinomial model, plate model Hierarchical bayesian estimation for multinomials and gaussians Multinomial model : Dirichlet versus Logistic Normal prior	Chpt 17.1, 17.3 Pattern Recognition and Machine Learning : Chpt 2.3	HW1 out
Mon 28th Sept	Lecture 6 : Learning two-node GM Slides Annotated Slides	Two-node graphical models Generative v/s discriminative classifiers. Optimal classification via bayes classifier Maximum likelihood v/s bayesian estimation of conditional gaussians. Linear regression Least mean squares or Widrow-Hoff learning rule. Bayesian linear regression and regularized regression	Required : On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes. Pattern Recognition and Machine Learning : Chpt 1.2, 9.2, 9.3
Wed 30th Sept	Lecture 7 : Exponential Families Slides Annotated Slides	Generalized Linear Models (GLIM) Exponential Families Examples : linear regression, logistic regression, multivariate gaussian distribution, multinomial distribution Moment estimation MLE and batch learning for GLIMs Iteratively weighted least squares MLE for BNs : decomposable likelihood Relationship with KL divergence Supervised parameter estimation for HMMs	Chpt 17.2, 17.3, 17.4 Additional Readings : 1. Parameter Priors for Directed Acyclic Graphical Models and the Characterization of Several Probability Distributions. 2. A Characterization of the Dirichlet Distribution Through Global and Local Parameter Independence
Mon 5th Oct	Lecture 8 : Variable Elimination Slides Annotated Slides	Inference via Elimination Probabilistic inference : likelihood, conditional probability, most probable assignment Marginalization and elimination Examples : Elimination on chains, hidden markov models, and CRFs Sum-product operation Dealing with evidence variables	Chpt 9.1, 9.2, 9.3, 9.4
Wed 7th Oct	Lecture 9 : Belief Propagation Slides Annotated Slides	Belief Propagation From elimination to message passing Message passing for trees Correctness of Belief Propagation Parallel synchronous versus sequential implementation Factor graphs Max product algorithm	Chpt. 10.1, 10.2, 10.3 A useful tutorial is here.	HW1 due. Hw2 out Project Proposal due.
Mon 12th Oct	Lecture 10 : Junction Trees Slides Annotated Slides	Junction Trees From Elimination to Message Passing Junction Tree Algorithm Triangulation Case Study : Hidden Markov Model Forward Backward Algorithm Viterbi Algorithm	Chpts 10.1, 10.2, 10.3, 11.3, 10.4, Forward Backward Search Algorithm, Viterbi Algorithm
Wed 14th Oct	Lecture 11 : Expectation-maximization algorithm Slides Annotated Slides	Expectation-maximization algorithm Mixture Models Gaussian Mixture Models (GMMs) Expectation Maximization(EM) - learning from partially observed data Lower bounds and free energy EM for HMMs - Baum Welch algorithm EM for general BNs, conditional mixture-of-experts model	Chpts. 19.1, 19.2.2, 19.2.3 A tutorial on HMMs Some interesting aspects of EM
Module 3 : Case Studies : Popular Graphical Models
Mon 19th Oct	Lecture 12 : HMM and CRF Slides Annotated Slides	Hidden Markov Models Forward Backward Algorithm (Junction Tree algorithm) Viterbi decoding Supervised and unsupervised learning (Baum-Welch) for HMMs Maximum entropy markov models, label bias Conditional Random Fields CRF inference and learning	1. A tutorial on HMMs 2. CRF Tutorial by Hanna Wallach 3. The original CRF paper 4. Shallow Parsing with CRFs
Wed 21st Oct	Lecture 13 : Multivariate Gaussian models, Gaussian graphical models Slides Annotated Slides	Gaussian graphical models Covariance v/s precision matrix in GGMs Sparse covariance matrix v/s sparse precision matrix The Meinshausen-Buhlmann (MB) algorithm : graph regression L1-regularized maximum likelihood learning KELLER: Kernel Weighted L1-regularized Logistic Regression	1. Covariance Selection - the original GGM paper by Dempster 2. Meinshausen-Buhlmann algorithm 3. Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian 4. Glasso	HW2 due. HW3 out.
Mon 26th Oct	Lecture 14 : State space models Slides Annotated Slides	Factor Analysis Constrained Covariance Gaussian Inference and EM for factor analysis Independent Components Analysis State Space Models(SSMs) Online v/s Offline Inference in SSMs Kalman FIlter Rauch-Tung-Strievelsmoother Nonlinear systems : extended Kalman filter	1. Chpts 15.4, 2. An introduction to the Kalman filter 3. Variational Learning for Switching State-Space Models 4. A discrete state-space model for linear image processing
Wed 28th Oct	Lecture 15 : Complex Graphical Models Slides Annotated Slides	Complex Dynamic Networks Dynamic Bayesian Networks (DBNs) Factorial HMM (fHMM), switching HMMs, Hidden Markov Decision Trees Switching SSMs Junction tree for coupled HMMs Latent Semantic Indexing, Topic Models, Admixture Models, Mixed Membership Models, Latent Dirichlet Allocation	1. Latent Semantic Indexing 2. Dynamic Bayesian Networks 3. Factorial Hidden Markov Models 4. Latent Dirichlet Allocation
Module 4: Approximate Inference
Mon 2nd Nov	Lecture 16 : Variational inference I Slides Annotated Slides	Energy Functional, KL Divergence Bethe Approximation to Gibbs Free Energy Bethe = BP on Factor Graphs Loopy Belief Propagation Region-based Approximations to the Gibbs Free Energy (Kikuchi) Generalized Belief Propagation	1. Chpt. 11.1, 11.2, 11.3 2. Stable fixed points of loopy belief propagation are minima of the Bethe free energy
Wed 4th Nov	Lecture 17 : Variational inference II Slides	Mean parametrization for exponential family GMs Variational inference Bethe variational inference, connection to sum-product Kikuchi approximation Mean Field and KL divergence	1. Chpt. 11 2. Bethe free energy, Kikuchi approximations, and belief propagation algorithms	HW3 due. HW4 out. Midway progress report due.
Mon 9th Nov	Lecture 18 : Monte Carlo 1 Slides Annotated Slides	Monte Carlo methods Direct sampling Rejection sampling, importance sampling Likelihood weighting Rao-Blackwellised sampling	Chpt. 12.1, 12.2
Wed 11th Nov	Lecture 19 : Monte Carlo 2 Slides Annotated Slides	Markov Chains Metropolis Hasting Gibbs sampling	Chpt. 12.3, 12.4
Module 5 : Advanced learning methods
Mon 16th Nov	Lecture 20 : Applications 1 : Topic Models Slides Annotated Slides	Structured and semantic-driven browsing of dynamic, multi-modal information Examples : Ideological polarity, total scene understanding, machine translation, topic evolution Mixed membership models aka topic models Bayesian inference : variational inference, collapsed gibbs sampling. Joint topic and perspective models Evolving social networks Mixed membership stochastic block model, generalized mean field.	1. Latent dirichlet alloc ation : David M. Blei, Andrew NG, Michael Jordan 2. A correlated topic model of Science : David M. Blei and John D. Lafferty 3. On Tight Approximate Inference of Logistic-Normal Admixture Model : Amr Ahmed and Eric P. Xing 4. An introduction to variational methods for graphical models : MI Jordan, Z Ghahramani, TS Jaakkola, LK … 5. Graph partition strategies for generalized mean field inference : E.P. Xing, M.I Jordan and S. Russell 6. Finding scientific topics : Griffiths, Steyvers 7. A Joint Topic and Perspective Model for Ideological Discourse : W.-H. Lin, E. P. Xing, and A. Hauptmann 8. Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework : L.-J. Li, R. Socher and L. Fei-Fei 9 HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation : B Zhao and E P Xing 10. Mixed membership stochastic block models for relational data, with applications to protein-protein interactions : E.M Airodi, D.M. Blei, E.P. Xing and S.E. Fienberg
Wed 18th Nov	Lecture 21 : MLE of undirected graphical models Slides Annotated Slides	MLE for graphical models Conditions on clique marginals for MLE estimation MLE for decomposable undirected models Iterative proportional fitting IPF minimizes KL divergence MLE of feature based models : Generalized Iterative Scaling(GIS) Maximum entropy formulation	1. Chpt. 20.1, 20.2, 20.3 Generalized iterative scaling for log-linear models
Mon 23rd Nov	Lecture 22 : Max-margin learning of graphical models Slides Annotated Slides	Conditional Random Fields (CRFs) Max-margin Markov Networks (M3Ns) Large Margin Estimation, Min-max Formulation Primal and Dual Problems of M3Ns Maximum Entropy Discrimination Markov Networks Gaussian/Laplacian MaxEnDNet Supervised Topic Models	1. Max-Margin Markov Networks 2. Laplace Maximum Margin Markov Networks 3. MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification	HW 4 due.
Wed 25th Nov	No Class
Mon 30th Nov	Lecture 23 : Nonparametric Bayesian Models Slides Annotated Slides	Model selection vs. posterior inference Relationship between dirichlet process and infinite mixtures Dirichlet process, stick breaking, chinese restaurant process Approximate inference via MCMC, variational inference. eg. haplotype inference Hierarchical dirichlet process and multi-task clustering Hidden markov dirichlet process, temporal DPM	1. Bayesian Haplotype Inference via the Dirichlet Process 2. Variational inference for Dirichlet process mixtures 3. Collapsed variational Dirichlet process mixture models 4. Hierarchical dirichlet processes 5. Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space	Project Poster Session
Wed 2nd Dec	Lecture 24 : How to put things together Slides Annotated Slides	Representation, model semantics. Topic models, choice of priors LoNTAM variations inference Evaluation, testing inference Deterministic annealing Supervised LDA, medLDA		Final Project Report due.

Wed 9th Sept

Lecture 1 : Introduction
Slides Annotated Slides

Introduction to and Examples of Graphical Models

Logistics
A broad overview
The running example: hidden markov model

Chpt. 1
An Introduction to Graphical Models

Mon 14th Sept

Lecture 2 : An Introduction to Bayesian Networks
Slides Annotated Slides

Representation of Bayesian Networks

Bayesian networks
Factorization theorem
Local structure and independencies
I-MAPs
I-equivalence, Minimal I-MAPs, Perfect MAP
Examples : Gaussian models, HMM

Chpt. 3, 7.1

Wed 16th Sept

Lecture 3 : An Introduction to Undirected Graphical Models
Slides Annotated Slides

Representation of Markov Random Fields

Clique potentials
Local and global markov independencies
Hammersley-Clifford theorem
Soundness and completeness in markov random fields
Examples : Boltzmann machines, Ising models, gaussian graphical models, conditional random fields

Chpt 4,
Optional : Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Mon 21st Sept

Lecture 4 : A unified view of BN and MRF
Slides Annotated Slides

A unified view of BN and MRF

Graphical lasso for structure learning in gaussian graphical models,
Minimal I-MAPs from BN to MRF : Markov blanket
Minimal I-MAPs from MRF to BN : chordal graphs and triangulation

Chpt 4.5,
Pseudo-Likelihood Based Structure Estimation using Neighborhood Estimation: Neighborhood Selection in Gaussian Graphical Models
Likelihood Based Structures Estimation of GGMs: Glasso

Wed 23rd Sept

Lecture 5 : Learning one-node GM:
Slides Annotated Slides

Learning one-node GMs

Parameter learning from IID models
Maximum likelihood estimation
Bayesian and frequentist parameter estimation
Example : Bernoulli model, multinomial model, plate model
Hierarchical bayesian estimation for multinomials and gaussians
Multinomial model : Dirichlet versus Logistic Normal prior

Chpt 17.1, 17.3
Pattern Recognition and Machine Learning : Chpt 2.3

HW1 out

Mon 28th Sept

Lecture 6 :
Learning two-node GM Slides Annotated Slides

Two-node graphical models

Generative v/s discriminative classifiers.
Optimal classification via bayes classifier
Maximum likelihood v/s bayesian estimation of conditional gaussians.
Linear regression
Least mean squares or Widrow-Hoff learning rule.
Bayesian linear regression and regularized regression

Required : On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes.
Pattern Recognition and Machine Learning : Chpt 1.2, 9.2, 9.3

Wed 30th Sept

Lecture 7 : Exponential Families
Slides Annotated Slides

Generalized Linear Models (GLIM)

Exponential Families
Examples : linear regression, logistic regression, multivariate gaussian distribution, multinomial distribution
Moment estimation
MLE and batch learning for GLIMs
Iteratively weighted least squares
MLE for BNs : decomposable likelihood
Relationship with KL divergence
Supervised parameter estimation for HMMs

Chpt 17.2, 17.3, 17.4
Additional Readings :
1. Parameter Priors for Directed Acyclic Graphical Models and the Characterization of Several Probability Distributions.
2. A Characterization of the Dirichlet Distribution Through Global and Local Parameter Independence

Mon 5th Oct

Lecture 8 :
Variable Elimination
Slides Annotated Slides

Inference via Elimination

Probabilistic inference : likelihood, conditional probability, most probable assignment
Marginalization and elimination
Examples : Elimination on chains, hidden markov models, and CRFs
Sum-product operation
Dealing with evidence variables

Chpt 9.1, 9.2, 9.3, 9.4

Wed 7th Oct

Lecture 9 : Belief Propagation
Slides Annotated Slides

Belief Propagation

From elimination to message passing
Message passing for trees
Correctness of Belief Propagation
Parallel synchronous versus sequential implementation
Factor graphs
Max product algorithm

Chpt. 10.1, 10.2, 10.3
A useful tutorial is here.

HW1 due.
Hw2 out
Project Proposal due.

Mon 12th Oct

Lecture 10 : Junction Trees
Slides Annotated Slides

Junction Trees

From Elimination to Message Passing
Junction Tree Algorithm
Triangulation
Case Study : Hidden Markov Model
Forward Backward Algorithm
Viterbi Algorithm

Chpts 10.1, 10.2, 10.3, 11.3, 10.4,
Forward Backward Search Algorithm,
Viterbi Algorithm

Wed 14th Oct

Lecture 11 : Expectation-maximization algorithm
Slides Annotated Slides

Expectation-maximization algorithm

Mixture Models
Gaussian Mixture Models (GMMs)
Expectation Maximization(EM) - learning from partially observed data
Lower bounds and free energy
EM for HMMs - Baum Welch algorithm
EM for general BNs, conditional mixture-of-experts model

Chpts. 19.1, 19.2.2, 19.2.3
A tutorial on HMMs
Some interesting aspects of EM

Mon 19th Oct

Lecture 12 : HMM and CRF
Slides Annotated Slides

Hidden Markov Models
Forward Backward Algorithm (Junction Tree algorithm)
Viterbi decoding
Supervised and unsupervised learning (Baum-Welch) for HMMs
Maximum entropy markov models, label bias
Conditional Random Fields
CRF inference and learning

1. A tutorial on HMMs
2. CRF Tutorial by Hanna Wallach
3. The original CRF paper
4. Shallow Parsing with CRFs

Wed 21st Oct

Lecture 13 : Multivariate Gaussian models, Gaussian graphical models
Slides Annotated Slides

Gaussian graphical models

Covariance v/s precision matrix in GGMs
Sparse covariance matrix v/s sparse precision matrix
The Meinshausen-Buhlmann (MB) algorithm : graph regression
L1-regularized maximum likelihood learning
KELLER: Kernel Weighted L1-regularized Logistic Regression

1. Covariance Selection - the original GGM paper by Dempster
2. Meinshausen-Buhlmann algorithm
3. Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian
4. Glasso

HW2 due.
HW3 out.

Mon 26th Oct

Lecture 14 : State space models
Slides Annotated Slides

Factor Analysis
Constrained Covariance Gaussian
Inference and EM for factor analysis
Independent Components Analysis
State Space Models(SSMs)
Online v/s Offline Inference in SSMs
Kalman FIlter
Rauch-Tung-Strievelsmoother
Nonlinear systems : extended Kalman filter

1. Chpts 15.4,
2. An introduction to the Kalman filter
3. Variational Learning for Switching State-Space Models
4. A discrete state-space model for linear image processing

Wed 28th Oct

Lecture 15 : Complex Graphical Models
Slides Annotated Slides

Complex Dynamic Networks
Dynamic Bayesian Networks (DBNs)
Factorial HMM (fHMM), switching HMMs, Hidden Markov Decision Trees
Switching SSMs
Junction tree for coupled HMMs
Latent Semantic Indexing, Topic Models, Admixture Models, Mixed Membership Models, Latent Dirichlet Allocation

1. Latent Semantic Indexing
2. Dynamic Bayesian Networks
3. Factorial Hidden Markov Models
4. Latent Dirichlet Allocation

Mon 2nd Nov

Lecture 16 : Variational inference I
Slides Annotated Slides

Energy Functional, KL Divergence
Bethe Approximation to Gibbs Free Energy
Bethe = BP on Factor Graphs
Loopy Belief Propagation
Region-based Approximations to the Gibbs Free Energy (Kikuchi)
Generalized Belief Propagation

1. Chpt. 11.1, 11.2, 11.3
2. Stable fixed points of loopy belief propagation are minima of the Bethe free energy

Wed 4th Nov

Lecture 17 : Variational inference II
Slides

Mean parametrization for exponential family GMs
Variational inference
Bethe variational inference, connection to sum-product
Kikuchi approximation
Mean Field and KL divergence

1. Chpt. 11
2. Bethe free energy, Kikuchi approximations, and belief propagation algorithms

HW3 due.
HW4 out.
Midway progress report due.

Mon 9th Nov

Lecture 18 : Monte Carlo 1
Slides Annotated Slides

Monte Carlo methods

Direct sampling
Rejection sampling, importance sampling
Likelihood weighting
Rao-Blackwellised sampling

Chpt. 12.1, 12.2

Wed 11th Nov

Lecture 19 : Monte Carlo 2
Slides Annotated Slides

Markov Chains
Metropolis Hasting
Gibbs sampling

Chpt. 12.3, 12.4

Mon 16th Nov

Lecture 20 : Applications 1 : Topic Models
Slides Annotated Slides

Structured and semantic-driven browsing of dynamic, multi-modal information
Examples : Ideological polarity, total scene understanding, machine translation, topic evolution
Mixed membership models aka topic models
Bayesian inference : variational inference, collapsed gibbs sampling.
Joint topic and perspective models
Evolving social networks
Mixed membership stochastic block model, generalized mean field.

1. Latent dirichlet alloc ation : David M. Blei, Andrew NG, Michael Jordan
2. A correlated topic model of Science : David M. Blei and John D. Lafferty
3. On Tight Approximate Inference of Logistic-Normal Admixture Model : Amr Ahmed and Eric P. Xing
4. An introduction to variational methods for graphical models : MI Jordan, Z Ghahramani, TS Jaakkola, LK …
5. Graph partition strategies for generalized mean field inference : E.P. Xing, M.I Jordan and S. Russell
6. Finding scientific topics : Griffiths, Steyvers
7. A Joint Topic and Perspective Model for Ideological Discourse : W.-H. Lin, E. P. Xing, and A. Hauptmann
8. Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework : L.-J. Li, R. Socher and L. Fei-Fei
9 HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation : B Zhao and E P Xing
10. Mixed membership stochastic block models for relational data, with applications to protein-protein interactions : E.M Airodi, D.M. Blei, E.P. Xing and S.E. Fienberg

Wed 18th Nov

Lecture 21 : MLE of undirected graphical models
Slides Annotated Slides

MLE for graphical models

Conditions on clique marginals for MLE estimation
MLE for decomposable undirected models
Iterative proportional fitting
IPF minimizes KL divergence
MLE of feature based models : Generalized Iterative Scaling(GIS)
Maximum entropy formulation

1. Chpt. 20.1, 20.2, 20.3
Generalized iterative scaling for log-linear models

Mon 23rd Nov

Lecture 22 : Max-margin learning of graphical models
Slides Annotated Slides

Conditional Random Fields (CRFs)
Max-margin Markov Networks (M3Ns)
Large Margin Estimation, Min-max Formulation
Primal and Dual Problems of M3Ns
Maximum Entropy Discrimination Markov Networks
Gaussian/Laplacian MaxEnDNet
Supervised Topic Models

1. Max-Margin Markov Networks
2. Laplace Maximum Margin Markov Networks
3. MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification

HW 4 due.

Wed 25th Nov

No Class

Mon 30th Nov

Lecture 23 : Nonparametric Bayesian Models
Slides Annotated Slides

Model selection vs. posterior inference
Relationship between dirichlet process and infinite mixtures
Dirichlet process, stick breaking, chinese restaurant process
Approximate inference via MCMC, variational inference. eg. haplotype inference
Hierarchical dirichlet process and multi-task clustering
Hidden markov dirichlet process, temporal DPM

1. Bayesian Haplotype Inference via the Dirichlet Process
2. Variational inference for Dirichlet process mixtures
3. Collapsed variational Dirichlet process mixture models
4. Hierarchical dirichlet processes
5. Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space

Project Poster Session

Wed 2nd Dec

Lecture 24 : How to put things together
Slides Annotated Slides

Representation, model semantics.
Topic models, choice of priors
LoNTAM variations inference
Evaluation, testing inference
Deterministic annealing
Supervised LDA, medLDA

Final Project Report due.

Course Description