CMU seal

Machine Learning

10-701/15-781, Fall 2006


Eric Xing and Tom Mitchell
School
of Computer Science, Carnegie-Mellon University


Syllabus and Course Schedule



Module

Lectures, readings, online materials 

Homeworks, Exams

Supervised Learning

 Lecture 1: 9/12/06  (Mitchell) 
    • Slides: Overview of Machine Learning, Decision tree learning, Overfitting, pruning

  Reading:
  • Mitchell, Chapter 3 on Decision Trees
  • Bishop, Section 1.6
  • Optional: The Discipline of Machine Learning, Mitchell, CMU Machine Learning department technical report CMU-ML-06-108, 2006

 Lecture 2: 9/14/06  (Xing)

Slides (annotated slides)

Tutorial on Basic Probability:
  • Multinomial and Gaussian distributions
  • Maximizing likelihood 
  • Bayes Rule
  • MAP

Reading:
  • Bishop: chap. 1 and 2
  • Mitchell: chap. 5 and 6.

HW1out:  (Decision tree learning, Probability, Linear regression)
 Lecture 3: 9/19/06  (Xing)

Slides (annotated slides)

Discriminative algorithms:
  • Linear regression and its probabilistic interpretation

Reading:
  • Bishop: chap. 3
  • Mitchell: chap. 8.3


 Lecture 4: 9/21/06  (Mitchell)
Generative and discriminative classifiers:
  • slides
  • Naive Bayes classifiers
  • with discrete  and continuous (Gaussian) features 
  • Example: text classification

Reading::
HW1 due at begining of class.
 Lecture 5: 9/26/06 (Mitchell)
Generative and discriminative classifiers:
  • Slides
  • Logistic regression
  • Relationship to Naive Bayes
  • Generative or discriminative classifiers?

Reading::
HW1 due at the beginning of class.

HW2 out  (Naive Bayes, Logistic regression, Neural networks, overfitting, learning as optimizing an objective function).
 Lecture 6: 9/28/06 (Mitchell)
       Neural networks and gradient descent::
    • Slides
    • Non-linear regression, classifiers
    • Artificial  neural networks
    • Gradient descent 
    • Discovered representations at hidden layers

Reading:
    • Bishop, Chapter 5
    • Mitchell, Chapter 4

 Lecture 7: 10/3/06 (Xing)
Practical issues in learning:

Slides (annotated slides)
  • Instance-based learning
  • Overfitting
  • Decomposition of error into bias and variance
  • Cross-validation
  • Regularization
  • Feature Selection
  • Model selection

Reading:


 
 Lecture 8: 10/5/06 (Xing)

Slides (annotated slides)

Optimal margin classification, kernel methods, and convex optimization:
  • Support vector machine
  • Duality and convex optimization
  • The Kernel methods

Reading:

HW2 due

HW3 out  (SVM's, Boosting, Learning theory)
 Lecture 9: 10/10/06 (Xing)

Boosting poor classifiers
  • Combination of classifiers
  • Ada boost


 Lecture 10: 10/12/06 (Mitchell)
Review of learning methods, and
Learning theory I: Slides
  • Sample complexity
  • Hypothesis space
  • PAC learning theory


 Lecture 11: 10/17/06 (Mitchell)
Learning theory II: Slides
  • VC dimension
  • Agnostic learning
  • Overfitting and PAC bounds

HW3 due.

 MIDTERM

Midterm Exam (10/19/06): open book, open notes, no computers
Fall 2005 midterm, previous exams

MIDTERM
Bayesian networks, Graphical models

 Lecture 12: 10/24/06 (Xing)
 
Slides (annotated slides)
Graphical models I
    • Representation of joint probability distributions
      • conditional independence
      • Markov blanket
      • directed graphs
      • undirected graphs
      • Markov random fields

Reading:
  • Bishop, Chap 8
  • Graphical models. M. I. Jordan. Statistical Science (Special Issue on Bayesian Statistics), 19, 140-155, 2004.

Project proposals (1 page) due
 Lecture 13: 10/26/06 (Xing)
 
Slides (annotated slides)
Graphical models II
    • Inference
      • elimination algorithm
      • sampling methods

Reading:
  • Bishop, Chap 8,9

HW4a out (graphical models and EM)
 Lecture 14: 10/31/06 (Xing)

Slides (annotated slides)
Learning from full and partially observed data
    • MLE
    • EM
    • Use of EM for training Bayesian networks
    • Mixture of Gaussian clustering

Reading:
  • Bishop, Chap 9


 Lecture 15: 11/2/06 (Xing)

Slides (annotated slides)
Hidden Markov Models
    • Representation: discrete and continuous observations
    • Inference: the forward-backward algorithm
    • Learning: the Balm-Welsh (EM) algorithm
    • Example: Gene finding

Reading:
  • Bishop, Chap 13


Semi-supervised Learning  Lecture 16: 11/7/06 (Mitchell)
 Learning with both labeled and unlabeled data: slides
    • EM and Naive Bayes classification
    • Reweighting the labeled data
    • Co-training
    • Unlabeled data for model selection

HW4a due

 Lecture 17: 11/9/06 (Xing)
 
Slides (annotated slides)
Graph-theoretic clustering algorithms:
    • normalized cut
    • spectral clustering
    • kennel methods revisited

Reading:
  • On Spectral Clustering: Analysis and an algorithm, Andrew Y. Ng, Michael Jordan, and Yair Weiss. In NIPS 14,, 2002. [ps, pdf]
  • Normalized Cuts and Image Segmentation, Jianbo Shi and Jitendra Malik, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888-905, August 2000. [pdf]

Project midway reports due

HW4b out (HMM, dimensionality reduction)
Applications I  Lecture 18: 11/14/06 (Mitchell)

 Dimensionality reduction I (slides)
    • Feature selection
    • Principal Components Analysis
    • Singular value decomposition
    • Fisher LDA

          Reading:


Dimensionality Reduction  Lecture 19: 11/16/06 (Mitchell)

Finsh up Dimensionality reduction (Fisher Linear Discriminant, Neural nets)
Hot topics: Machine learning and natural language analysis
lecture slides

Reading:


 Lecture 20: 11/21/06 (Xing)
 
Slides (annotated slides)
Dimensionality reduction II:
    • Probabilistic PCA
    • Factor Analysis
    • Metric Learning
    • Independent Components Analysis

Reading:

HW4b due
 Applications II
 Lecture 21: 11/28/06 (Xing)
 

Slides (annotated slides)
Hot topics: Machine learning in computational biology

Reading:


Learning control strategies
 Lecture 22: 11/30/06 (Mitchell)
 Reinforcement learning I:  (annotated slides for 11/30 and 12/5 lecture)
  • Markov decision processes
  • Learning control stategies when next-state function is known
  • Value iteration

Reading:

Project final reports due Nov 29,

Project poster presentations Nov 30.
 Lecture 23: 12/5/06 (Mitchell)
 Reinforcement learning II: (slides: see previous lecture)
  • Learning when next-state function is unknown
  • Q-Learning
  • Temporal difference learning in primates


 NO Lecture on 12/7/06 


Final Exam
 12/15/06, 5:30-8:30pm, Wean 7500



Recitation Schedule

Date Time Place Topic
Thursday, 9/14/2006
5pm
NSH 1305
Probability review
Tuesday, 9/19/2006
5pm
NSH 1305
MATLAB review
Thursday, 9/21/2006
5pm
NSH 1305
MLE, Bayesian estimation, homework 1
Thursday, 9/26/2006
5pm
NSH 1305
Homework 2, regularization, neural networks, cross-validation
Tuesday, 10/3/2006
5pm
NSH 1305
Common mistakes with homework 1, homework 2, naive Bayes, logistic regression
Thursday, 10/12/2006
5pm
NSH 1305
Homework 3, common mistakes with homework 2, learning theory, SVM
Tuesday, 10/17/2006 5pm NSH 1305 Midterm review
Thursday, 10/26/2006 5pm NSH 1305 Midterm solution, d-separation
Tuesday, 10/31/2006 5pm NSH 1305 Variable elimination, EM in Gaussian mixture models
Thursday, 11/9/2006 5pm NSH 1305 Hidden Markov models, Viterbi algorithm, forward-backward algorithm
Tuesday, 11/14/2006 5pm NSH 1305 Baum-Welch algorithm, linear algebra review, PCA
Tuesday,
12/5/2006
5pm NSH 1305 Reinforcement learning
Tuesday, 12/12/2006 5pm NSH 1305 Final review 1
Thursday, 12/14/2006 10am NSH 1305 Final review 2


Additional Readings: