Schedule

CMU seal

10-701/15-781, Fall 2006

Eric Xing and Tom Mitchell
School of Computer Science, Carnegie-Mellon University


Syllabus and Course Schedule

Module

Lectures, readings, online materials

Homeworks, Exams

Supervised Learning

Lecture 1: 9/12/06 (Mitchell)

Slides: Overview of Machine Learning, Decision tree learning, Overfitting, pruning

Reading:

Mitchell, Chapter 3 on Decision Trees

Bishop, Section 1.6

Optional: The Discipline of Machine Learning, Mitchell, CMU Machine Learning department technical report CMU-ML-06-108, 2006

Lecture 2: 9/14/06 (Xing)

Slides (annotated slides)

Tutorial on Basic Probability:

Multinomial and Gaussian distributions

Maximizing likelihood

Bayes Rule

MAP

Reading:

Bishop: chap. 1 and 2

Mitchell: chap. 5 and 6.

HW1out: (Decision tree learning, Probability, Linear regression)

Lecture 3: 9/19/06 (Xing)

Slides (annotated slides)

Discriminative algorithms:

Linear regression and its probabilistic interpretation

Reading:

Bishop: chap. 3

Mitchell: chap. 8.3

Lecture 4: 9/21/06 (Mitchell)

Generative and discriminative classifiers:

slides

Naive Bayes classifiers

with discrete and continuous (Gaussian) features

Example: text classification

Reading::

Required: Naive Bayes and Logistic Regression, Mitchell chapter draft

Optional: Bishop, Chapter 4.

~~HW1 due at begining of class.~~

Lecture 5: 9/26/06 (Mitchell)

Generative and discriminative classifiers:

Slides

Logistic regression

Relationship to Naive Bayes

Generative or discriminative classifiers?

Reading::

Required: Naive Bayes and Logistic Regression, Mitchell chapter draft.

Optional: Bishop, Chapter 4

Optional: On Discriminative and Generative Classifiers, Ng and Jordan, NIPS, 2001.

HW1 due at the beginning of class.

HW2 out (Naive Bayes, Logistic regression, Neural networks, overfitting, learning as optimizing an objective function).

Lecture 6: 9/28/06 (Mitchell)

Neural networks and gradient descent::

Slides

Non-linear regression, classifiers

Artificial neural networks

Gradient descent

Discovered representations at hidden layers

Reading:

Bishop, Chapter 5

Mitchell, Chapter 4

Lecture 7: 10/3/06 (Xing)

Practical issues in learning:

Slides (annotated slides)

Instance-based learning

Overfitting

Decomposition of error into bias and variance

Cross-validation

Regularization

Feature Selection

Model selection

Reading:

Bishop, Chap 1 & 2

Mitchell, Chap 5&6

E.P. Xing, Feature Selection in Microarray Analysis, in D.P. Berrar, W. Dubitzky and M. Granzow (Eds.), A Practical Approach to Microarray Data Analysis, Kluwer Academic Publishers, 2003.

Model comparison and Occam's Razor, Chapter 28 from David Mackay's book

Model selection and Minimum Description Length principle, Mark Hansen and Bin Yu, J. Amer. Statist. Assoc. vol. 96, 746-774, 2001.

Lecture 8: 10/5/06 (Xing)

Slides (annotated slides)

Optimal margin classification, kernel methods, and convex optimization:

Support vector machine

Duality and convex optimization

The Kernel methods

Reading:

Bishop, Chap 6 & 7

Burgess tutorial

HW2 due

HW3 out (SVM's, Boosting, Learning theory)

Lecture 9: 10/10/06 (Xing)

Slides (annotated slides)

Boosting poor classifiers

Combination of classifiers

Ada boost

Reading:
the boosting homepage

Lecture 10: 10/12/06 (Mitchell)
Review of learning methods, and
Learning theory I: Slides

Sample complexity

Hypothesis space

PAC learning theory

Lecture 11: 10/17/06 (Mitchell)
Learning theory II: Slides

VC dimension

Agnostic learning

Overfitting and PAC bounds

HW3 due.

MIDTERM

Midterm Exam (10/19/06): open book, open notes, no computers
Fall 2005 midterm, previous exams

MIDTERM

Bayesian networks, Graphical models

Lecture 12: 10/24/06 (Xing)

Slides (annotated slides)
Graphical models I

Representation of joint probability distributions

conditional independence

Markov blanket

directed graphs

undirected graphs

Markov random fields

Reading:

Bishop, Chap 8

Graphical models. M. I. Jordan. Statistical Science (Special Issue on Bayesian Statistics), 19, 140-155, 2004.

Project proposals (1 page) due

Lecture 13: 10/26/06 (Xing)

Slides (annotated slides)
Graphical models II

Inference

elimination algorithm

sampling methods

Reading:

Bishop, Chap 8,9

HW4a out (graphical models and EM)

Lecture 14: 10/31/06 (Xing)

Slides (annotated slides)
Learning from full and partially observed data

MLE

EM

Use of EM for training Bayesian networks

Mixture of Gaussian clustering

Reading:

Bishop, Chap 9

Lecture 15: 11/2/06 (Xing)

Slides (annotated slides)
Hidden Markov Models

Representation: discrete and continuous observations

Inference: the forward-backward algorithm

Learning: the Balm-Welsh (EM) algorithm

Example: Gene finding

Reading:

Bishop, Chap 13

Semi-supervised Learning Lecture 16: 11/7/06 (Mitchell)

Learning with both labeled and unlabeled data: slides

EM and Naive Bayes classification

Reweighting the labeled data

Co-training

Unlabeled data for model selection

Reading:

Nigam et al., Text Classification from Labeled and Unlabeled Documents using EM

Blum and Mitchell, Combining Labeled and Unlabeled Data with CoTraining

HW4a due

Lecture 17: 11/9/06 (Xing)

Slides (annotated slides)
Graph-theoretic clustering algorithms:

normalized cut

spectral clustering

kennel methods revisited

Reading:

On Spectral Clustering: Analysis and an algorithm, Andrew Y. Ng, Michael Jordan, and Yair Weiss. In NIPS 14,, 2002. [ps, pdf]

Normalized Cuts and Image Segmentation, Jianbo Shi and Jitendra Malik, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888-905, August 2000. [pdf]

Project midway reports due

HW4b out (HMM, dimensionality reduction)

Applications I Lecture 18: 11/14/06 (Mitchell)

Dimensionality reduction I (slides)

Feature selection

Principal Components Analysis

Singular value decomposition

Fisher LDA

Reading:

required: “A Tutorial on PCA,’ J. Schlens

optional: “SVD and PCA,’ M. Wall, et al.

Dimensionality Reduction Lecture 19: 11/16/06 (Mitchell)

Finsh up Dimensionality reduction (Fisher Linear Discriminant, Neural nets)
Hot topics: Machine learning and natural language analysis
lecture slides

Reading:

optional: "Latent Dirichlet Allocation," Blei et al., JMLR 2003

optional: "Topic and Role Discovery in Social Networks," McCallum et al., IJCAI 2005.

Lecture 20: 11/21/06 (Xing)

Slides (annotated slides)
Dimensionality reduction II:

Probabilistic PCA

Factor Analysis

Metric Learning

Independent Components Analysis

Reading:

Tipping and Bishop, Probabilistic Principal Component Analysis

E.P. Xing, A.Y. Ng, M.I. Jordan and S. Russell, Distance Metric Learning, with application to Clustering with side-information (NIPS2002).

HW4b due

Applications II
Lecture 21: 11/28/06 (Xing)

Slides (annotated slides)
Hot topics: Machine learning in computational biology

Reading:

E.P. Xing, K. Sohn, M.I. Jordan and Y.W. Teh, Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture, Proceedings of the 23st International Conference on Machine Learning (ICML 2006).

Multiple-sequence functional annotation and the generalized hidden Markov phylogeny. J. D. McAuliffe, L. Pachter, and M. I. Jordan. Bioinformatics, 20, 1850-1860, 2004.

Learning control strategies
Lecture 22: 11/30/06 (Mitchell)

Reinforcement learning I: (annotated slides for 11/30 and 12/5 lecture)

Markov decision processes

Learning control stategies when next-state function is known

Value iteration

Reading:

Mitchell, Chapter 13

Kaelbling, et al., Reinforcement Learning: A Survey

Project final reports due Nov 29,

Project poster presentations Nov 30.

Lecture 23: 12/5/06 (Mitchell)

Reinforcement learning II: (slides: see previous lecture)

Learning when next-state function is unknown

Q-Learning

Temporal difference learning in primates

NO Lecture on 12/7/06

Final Exam
12/15/06, 5:30-8:30pm, Wean 7500

Module	Lectures, readings, online materials	Homeworks, Exams
Supervised Learning	Lecture 1: 9/12/06 (Mitchell) Slides: Overview of Machine Learning, Decision tree learning, Overfitting, pruning Reading: Mitchell, Chapter 3 on Decision Trees Bishop, Section 1.6 Optional: The Discipline of Machine Learning, Mitchell, CMU Machine Learning department technical report CMU-ML-06-108, 2006
Lecture 2: 9/14/06 (Xing) Slides (annotated slides) Tutorial on Basic Probability: Multinomial and Gaussian distributions Maximizing likelihood Bayes Rule MAP Reading: Bishop: chap. 1 and 2 Mitchell: chap. 5 and 6.	HW1out: (Decision tree learning, Probability, Linear regression)
Lecture 3: 9/19/06 (Xing) Slides (annotated slides) Discriminative algorithms: Linear regression and its probabilistic interpretation Reading: Bishop: chap. 3 Mitchell: chap. 8.3
Lecture 4: 9/21/06 (Mitchell) Generative and discriminative classifiers: slides Naive Bayes classifiers with discrete and continuous (Gaussian) features Example: text classification Reading:: Required: Naive Bayes and Logistic Regression, Mitchell chapter draft Optional: Bishop, Chapter 4.	~~HW1 due at begining of class.~~
Lecture 5: 9/26/06 (Mitchell) Generative and discriminative classifiers: Slides Logistic regression Relationship to Naive Bayes Generative or discriminative classifiers? Reading:: Required: Naive Bayes and Logistic Regression, Mitchell chapter draft. Optional: Bishop, Chapter 4 Optional: On Discriminative and Generative Classifiers, Ng and Jordan, NIPS, 2001.	HW1 due at the beginning of class. HW2 out (Naive Bayes, Logistic regression, Neural networks, overfitting, learning as optimizing an objective function).
Lecture 6: 9/28/06 (Mitchell) Neural networks and gradient descent:: Slides Non-linear regression, classifiers Artificial neural networks Gradient descent Discovered representations at hidden layers Reading: Bishop, Chapter 5 Mitchell, Chapter 4
Lecture 7: 10/3/06 (Xing) Practical issues in learning: Slides (annotated slides) Instance-based learning Overfitting Decomposition of error into bias and variance Cross-validation Regularization Feature Selection Model selection Reading: Bishop, Chap 1 & 2 Mitchell, Chap 5&6 E.P. Xing, Feature Selection in Microarray Analysis, in D.P. Berrar, W. Dubitzky and M. Granzow (Eds.), A Practical Approach to Microarray Data Analysis, Kluwer Academic Publishers, 2003. Model comparison and Occam's Razor, Chapter 28 from David Mackay's book Model selection and Minimum Description Length principle, Mark Hansen and Bin Yu, J. Amer. Statist. Assoc. vol. 96, 746-774, 2001.
Lecture 8: 10/5/06 (Xing) Slides (annotated slides) Optimal margin classification, kernel methods, and convex optimization: Support vector machine Duality and convex optimization The Kernel methods Reading: Bishop, Chap 6 & 7 Burgess tutorial	HW2 due HW3 out (SVM's, Boosting, Learning theory)
Lecture 9: 10/10/06 (Xing) Slides (annotated slides) Boosting poor classifiers Combination of classifiers Ada boost Reading: the boosting homepage
Lecture 10: 10/12/06 (Mitchell) Review of learning methods, and Learning theory I: Slides Sample complexity Hypothesis space PAC learning theory
Lecture 11: 10/17/06 (Mitchell) Learning theory II: Slides VC dimension Agnostic learning Overfitting and PAC bounds	HW3 due.
MIDTERM	Midterm Exam (10/19/06): open book, open notes, no computers Fall 2005 midterm, previous exams	MIDTERM
Bayesian networks, Graphical models	Lecture 12: 10/24/06 (Xing) Slides (annotated slides) Graphical models I Representation of joint probability distributions conditional independence Markov blanket directed graphs undirected graphs Markov random fields Reading: Bishop, Chap 8 Graphical models. M. I. Jordan. Statistical Science (Special Issue on Bayesian Statistics), 19, 140-155, 2004.	Project proposals (1 page) due
Lecture 13: 10/26/06 (Xing) Slides (annotated slides) Graphical models II Inference elimination algorithm sampling methods Reading: Bishop, Chap 8,9	HW4a out (graphical models and EM)
Lecture 14: 10/31/06 (Xing) Slides (annotated slides) Learning from full and partially observed data MLE EM Use of EM for training Bayesian networks Mixture of Gaussian clustering Reading: Bishop, Chap 9
Lecture 15: 11/2/06 (Xing) Slides (annotated slides) Hidden Markov Models Representation: discrete and continuous observations Inference: the forward-backward algorithm Learning: the Balm-Welsh (EM) algorithm Example: Gene finding Reading: Bishop, Chap 13
Semi-supervised Learning	Lecture 16: 11/7/06 (Mitchell) Learning with both labeled and unlabeled data: slides EM and Naive Bayes classification Reweighting the labeled data Co-training Unlabeled data for model selection Reading: Nigam et al., Text Classification from Labeled and Unlabeled Documents using EM Blum and Mitchell, Combining Labeled and Unlabeled Data with CoTraining	HW4a due
	Lecture 17: 11/9/06 (Xing) Slides (annotated slides) Graph-theoretic clustering algorithms: normalized cut spectral clustering kennel methods revisited Reading: On Spectral Clustering: Analysis and an algorithm, Andrew Y. Ng, Michael Jordan, and Yair Weiss. In NIPS 14,, 2002. [ps, pdf] Normalized Cuts and Image Segmentation, Jianbo Shi and Jitendra Malik, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888-905, August 2000. [pdf]	Project midway reports due HW4b out (HMM, dimensionality reduction)
Applications I	Lecture 18: 11/14/06 (Mitchell) Dimensionality reduction I (slides) Feature selection Principal Components Analysis Singular value decomposition Fisher LDA Reading: required: “A Tutorial on PCA,’ J. Schlens optional: “SVD and PCA,’ M. Wall, et al.
Dimensionality Reduction	Lecture 19: 11/16/06 (Mitchell) Finsh up Dimensionality reduction (Fisher Linear Discriminant, Neural nets) Hot topics: Machine learning and natural language analysis lecture slides Reading: optional: "Latent Dirichlet Allocation," Blei et al., JMLR 2003 optional: "Topic and Role Discovery in Social Networks," McCallum et al., IJCAI 2005.
Lecture 20: 11/21/06 (Xing) Slides (annotated slides) Dimensionality reduction II: Probabilistic PCA Factor Analysis Metric Learning Independent Components Analysis Reading: Tipping and Bishop, Probabilistic Principal Component Analysis E.P. Xing, A.Y. Ng, M.I. Jordan and S. Russell, Distance Metric Learning, with application to Clustering with side-information (NIPS2002).	HW4b due
Applications II	Lecture 21: 11/28/06 (Xing) Slides (annotated slides) Hot topics: Machine learning in computational biology Reading: E.P. Xing, K. Sohn, M.I. Jordan and Y.W. Teh, Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture, Proceedings of the 23st International Conference on Machine Learning (ICML 2006). Multiple-sequence functional annotation and the generalized hidden Markov phylogeny. J. D. McAuliffe, L. Pachter, and M. I. Jordan. Bioinformatics, 20, 1850-1860, 2004.
Learning control strategies	Lecture 22: 11/30/06 (Mitchell) Reinforcement learning I: (annotated slides for 11/30 and 12/5 lecture) Markov decision processes Learning control stategies when next-state function is known Value iteration Reading: Mitchell, Chapter 13 Kaelbling, et al., Reinforcement Learning: A Survey	Project final reports due Nov 29, Project poster presentations Nov 30.
Lecture 23: 12/5/06 (Mitchell) Reinforcement learning II: (slides: see previous lecture) Learning when next-state function is unknown Q-Learning Temporal difference learning in primates
	NO Lecture on 12/7/06
Final Exam	12/15/06, 5:30-8:30pm, Wean 7500


Recitation Schedule


Date	Time	Place	Topic
Thursday, 9/14/2006	5pm	NSH 1305	Probability review
Tuesday, 9/19/2006	5pm	NSH 1305	MATLAB review
Thursday, 9/21/2006	5pm	NSH 1305	MLE, Bayesian estimation, homework 1
Thursday, 9/26/2006	5pm	NSH 1305	Homework 2, regularization, neural networks, cross-validation
Tuesday, 10/3/2006	5pm	NSH 1305	Common mistakes with homework 1, homework 2, naive Bayes, logistic regression
Thursday, 10/12/2006	5pm	NSH 1305	Homework 3, common mistakes with homework 2, learning theory, SVM
Tuesday, 10/17/2006	5pm	NSH 1305	Midterm review
Thursday, 10/26/2006	5pm	NSH 1305	Midterm solution, d-separation
Tuesday, 10/31/2006	5pm	NSH 1305	Variable elimination, EM in Gaussian mixture models
Thursday, 11/9/2006	5pm	NSH 1305	Hidden Markov models, Viterbi algorithm, forward-backward algorithm
Tuesday, 11/14/2006	5pm	NSH 1305	Baum-Welch algorithm, linear algebra review, PCA
Tuesday, 12/5/2006	5pm	NSH 1305	Reinforcement learning
Tuesday, 12/12/2006	5pm	NSH 1305	Final review 1
Thursday, 12/14/2006	10am	NSH 1305	Final review 2


Additional Readings:

Syllabus and Course Schedule

Module

Lectures, readings, online materials

Homeworks, Exams