Graphical Models Reading Group

Tuesdays 4pm-5:30pm, NSH4513

Apr 6
Zoubin Ghahramani Graphical models overview slides
Factor Graph Propagation slides
whiteboard jpg image
Apr 13
Variational methods
Zoubin Ghahramani An introduction to variational methods for graphical models
Michael Jordan, Zoubin Ghahramani, Tommi Jaakkola, and Lawrence Saul

Tutorial on variational approximation methods

Tommi Jaakkola

J. M. Winn. Variational Message Passing and its Applications.
Ph.D. Thesis, Department of Physics, University of Cambridge, 2003
Chapter 2 and Chapter 5

Wiegerinck, W. Variational approximations between mean field theory and the junction tree algorithm ,  UAI-2000.

M. J. Wainwright, and M. I. Jordan. Graphical models, exponential families, and variational inference. UC Berkeley, Dept. of Statistics, Technical Report 649. September, 2003.
Apr 20
Expectation Propagation
(Note: starts 4:30pm, half an hour later than usual)
Yuan Qi Intro to EP
A family of algorithms for approximate Bayesian inference
Thomas Minka Thesis, Ch. 4

Tree EP
Tree-structured Approximations by Expectation Propagation,
Thomas Minka and Yuan Qi, NIPS2003

EP for dynamic systems
Expectation Propagation for Signal Detection in Flat-fading Channels,
Yuan Qi and Thomas Minka,
in the proceedings of IEEE International Symposium on Information Theory,
June, 2003, Yokohama, Japan

Apr 27
Structure learning (1)
Ricardo Silva, Anna Goldenberg, Fan Li
'Learning Bayesian Networks' book by Richard Neapolitan

A tutorial on learning with bayesian networks
David Heckerman

Scheines, R, (1997) "An Introduction to Causal Inference", in Causality in Crisis, ed. by Steven Turner and Vaughan McKim, University of Notre Dame Press. Available at
(a nice overview of representational issues that are very relevant for structure learning)

Learning Bayesian Networks with Local Structure
Nir Friedman and Moises Goldszmidt 
May 4
Bayesian Error-Bars for Belief Net Inference
(notice room change: NSH1507)
Russell Greiner
Russ' graphical model results

A Bayesian Belief Network (BN) models a joint distribution over a set of n
variables, using a DAG structure to represent the immediate dependencies
between the variables, and a set of parameters (aka "CPTables") to represent
the local conditional probabilities of a node, given each assignment to its
parents.  In many situations, these parameters are themselves random variables
--- this may reflect the uncertainty of the domain expert, or may come from a
training sample used to estimate the parameter values.  The distribution over
these "CPtable variables" induces a distribution over the response the BN
will return to any "What is Pr(Q=q | E=e)?" query.  This paper investigates
properties of this response: showing first that it is asymptotically normal,
then providing, in closed form, its mean and asymptotic variance.  We then
present an effective general algorithm for computing this variance, which has
the same complexity as simply computing (the mean value of) the response
itself --- ie, O(n 2^w), where w is the effective tree width.  Finally, we
provide empirical evidence that a Beta approximation works much better than
the normal distribution, especially for small sample sizes, and that our
algorithm works effectively in practice, over a range of belief net
structures, sample sizes and queries.

This is joint work with Tim Van Allen, Ajit Singh and Peter Hooper.

May 11
A Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers
Russell Greiner
Bayesian belief nets (BNs) are often used for classification tasks ---
typically to return the most likely class label for each specified instance.
Many BN-learners, however, attempt to find the BN that maximizes a different
objective function --- viz., likelihood, rather than classification accuracy
--- typically by first learning an appropriate graphical structure, then
finding the maximal likelihood parameters for that structure.  As these
parameters may not maximize the classification accuracy, ``discriminative
learners'' follow the alternative approach of seeking the parameters that
maximize *conditional likelihood* (CL), over the distribution of instances the
BN will have to classify.  This presentation first formally specifies this
task, and shows how it extends standard logistic regression.  After analyzing
its inherent sample and computational complexity, we present a general
algorithm for this task, ELR, that applies to arbitrary BN structures and
works effectively even when given incomplete training data.  We present
empirical evidence that ELR produces better classifiers than are produced by
the standard ``generative'' algorithms in a variety of situations, especially
in common situations where the given BN-structure is incorrect.

This is joint work with Wei Zhou, Xiaoyuan Su and Bin Shen

May 18
Structure learning (2)
Ricardo Silva, Anna Goldenberg, Fan Li
Scheines, R, (1997) "An Introduction to Causal Inference", in Causality
in Crisis, ed. by Steven Turner and Vaughan McKim, University of Notre
Dame Press. Available at
(a nice overview of representational issues that are very relevant for
structure learning)

Friedman, N. (1997). Learning belief networks in the presence of missing
values and hidden variables..
In Fourteenth Inter. Conf. on Machine Learning (ICML). 1997.

Friedman, N. and Koller, D. (2003). Being Bayesian about Network
Structure: A Bayesian Approach to Structure Discovery in Bayesian
Networks. Machine Learning, 50:95-126, 2003. PostScript, PDF.

Elidan, G. Discovering hidden variables: A structure Based-Approach with
Noam Lotner, Nir Friedman and Daphne Koller. Proceeding of the Neural
Information Processing Systems conference (NIPS), 2000.

Silva, R.; Scheines, R.; Glymour, C. and Spirtes P. (2003) "Learning
measurement models for unobserved variables". Proceedings of the 19th
Conference on Uncertainty on Artificial Intelligence.

N. Friedman, D. Pe'er, and I. Nachman
Learning Bayesian Network Structure from Massive Datasets: The ``Sparse
Candidate'' Algorithm.  N. Friedman, D. Pe'er, and I. Nachman
UAI 15, 1999.

A Moore, Weng-Keen Wong Optimal Reinsertion: A new search operator for
accelerated and more accurate Bayesian network structure learning, ICML

A. Goldenberg and A. Moore, Tractable Learning of Large Bayes Net
Structures from Sparse Data, ICML 2004
(do we want to cover this?)
They cover the bulk of causality, which is estimating causal effects from a structure given in advance.

David Edwards (2000): "Causal Inference", this is Chapter 8 (pp. 219-243) of his book "Introduction to Graphical Modelling" (Springer, 2nd ed).

Phil Dawid (2000): Causal inference without counterfactuals. J. Amer. Statist. Ass. 95 (2000), 407-448. An earlier version is available for download at number 188 (year 1997)

J. Pearl, "Statistics and Causal Inference: A Review" In Test Journal,
Vol. 12(2), pp. 281-345, December 2003 (with discussions). Available at

J. Pearl,  ``Simpson's paradox: An anatomy''  Extracted from Chapter 6
of CAUSALITY. Available at
parameter learning,
active learning,
dynamic bayes nets,
prob. relational models,
other types of graphs,
feature selection for maxent models,
(suggestions welcome)

Kernel Conditional Random Fields
Jerry Zhu
Kernel Conditional Random Fields: Representation, Clique Selection, and Semi-Supervised Learning.  
John Lafferty, Yan Liu, Xiaojin Zhu   CMU tech report CMU-CS-04-115
Monte Carlo
Introduction to Monte Carlo Methods
David MacKay
Probabilistic Inference Using Markov Chain Monte Carlo Methods
Radford Neal

More links

Zoubin's 2003 unsupervised learning course website

Lise's GM reading group

Rutgers GM course:

Learning in Graphical Models (eds, Jordan):

Kevin Murphy's reading list:

Contact (

Luo Si (lsi)
Jian Zhang (jian.zhang)
Jerry Zhu (zhuxj)

Last Modified 2004 Apr 16, Jerry Zhu