Machine Learning, 10-701 and 15-781

Prof. Carlos Guestrin
School of Computer Science, Carnegie Mellon University

Spring 2006

Class lectures: Mondays & Wednesdays from 10:30-11:50 in Wean Hall 7500

Review sessions: Thursdays 5:00-6:30 in Wean Hall 5409

It is hard to imagine anything more fascinating than automated systems that improve their own performance. The study of learning from data is commercially and scientifically important. This course is designed to give a graduate-level student a thorough grounding in the methodologies, technologies, mathematics and algorithms currently needed by people who do research in learning and data mining or who may need to apply learning or data mining techniques to a target problem. The topics of the course draw from classical statistics, from machine learning, from data mining, from Bayesian statistics and from statistical algorithmics.

Students entering the class should have a pre-existing working knowledge of probability, statistics and algorithms, though the class has been designed to allow students with a strong numerate background to catch up and fully participate.

Page links


Teaching Assistants


The first point of contact for questions pertaining to homework problems is according to the following schedule. Please contact the TA specific to the homework problem that you have a question about. Also, questions may be emailed to

Out: 23-Jan
In: 8-Feb (new date!!!)

Assignment: [PDF]
Solutions: [PDF]
Prob. #1: Stano
Prob. #2: Jure
Chess dataset [zip]
Prob. #3: Andreas
Prob. #4: Anton
Out: Feb 9
In: Feb 20

Updated assignment with more hints: [PDF]
Solutions: [PDF]
Consistency checker for Q1.2: [code]
Voting dataset [zip]
Prob. #1: Anton
Prob. #2,3: Stano
Out: 20-Feb
In: 1-Mar

Assignment: [PDF]
Solutions: [PDF]
libsvm download [link]
Matlab and data files [zip]
Andreas, Jure
Solutions [PDF]
Out: 22-Mar
In: 5-Apr

Assignment: [PDF]
Solutions: [PDF]
Matlab and data files [zip]
Out: 5-Apr
In: 19-Apr

Assignment: [PDF]
Matlab and data files [zip]
Solutions: [PDF]

Course Assistant

Adminstrative Assistant


Announcement Emails

Course Website (this page)


Collaboration policy

Homeworks will be done individually: each student must hand in their own answers. It is acceptable, however, for students to collaborate in figuring out answers and helping each other solve the problems. We will be assuming that, as participants in a graduate course, you will be taking the responsibility to make sure you personally understand the solution to any work arising from such collaboration. You also must indicate on each homework with whom you collaborated. The final project may be completed by small teams.

Late homework policy

Homework regrades policy

If you feel that we have made an error in grading your homework, please turn in your homework with a written explanation to Sharon, and we will consider your request. Please note that regrading of a homework may cause your grade to go up or down.

Final project

For project milestone, roughly half of the project work should be completed. A short, graded write-up will be required, and we will provide feedback.

Lecture schedule


Material covered

Class details, online material, and homework

Module 1; Basics
(1 Lectures)
  • What is learning?
    • Version spaces
    • Sample complexity
    • Training set/test set split
  • Point estimation
    • Loss functions
    • MLE
    • Bayesian
    • MAP
    • Bias-Variance trade off
Mon., Jan 16:
** No Class. MLK B-Day **  

Wed., Jan 18:
Module 2: Linear models
(3 Lectures)
  • Linear regression [Applet]
  • Bias-Variance tradeoff
  • Overfitting
  • Bayes optimal classifier
  • Naive Bayes
  • Logistic regression
  • Discriminative v. Generative models
Mon., Jan. 23:
  Homework #1 out
Assignment: [PDF]
Prob#2 Chess dataset [zip]

Wed., Jan 25:

Mon., Jan 30:
Module 3: Non-linear models
Model selection
(5 Lectures)
  • Decision trees
  • Overfitting, again
  • Regularization
  • MDL
  • Cross-validation
  • Boosting [Adaboost Applet]
  • Instance-based learning [Applet]
    • K-nearest neighbors
    • Kernels
  • Neural nets [CMU Course]
Wed., Feb. 1:

Mon., Feb 6:

Wed., Feb. 8:
  • Lecture: Boosting, Cross Validation, Simple Model Selection, Regularization, MDL
    [Slides] [Annotated]
  •   EXTENSION: Homework #1 due
    (beginning of class)
    Homework #2 out
    Updated assignment with more hints: [PDF]
    Voting dataset [zip]

    Mon., Feb. 13:
  • Lecture: Cross Validation, Simple Model Selection, Regularization, MDL, Neural Nets
    [Slides] [Annotated]

    Wed., Feb. 15:
  • Lecture: Neural Nets, Instance-based Learning
    [Slides] [Annotated]
    Module 4: Margin-based approaches
    (2 Lectures)
    • SVMs [Applets]
    • Kernel trick
    Mon., Feb 20:
  • Lecture: Instance-based Learning, SVMs
    [Slides] [Annotated]
  • Homework #2 due
    (beginning of class)
    Homework #3 out

    Wed., Feb. 22:
  • Lecture: SVMs
    [Slides] [Annotated]
  • Reading: Hearst 1998: High Level Presentation
  • Reading: Burges 1998: Detailed Tutorial
    Module 5: Learning theory
    (3 Lectures)
    • Sample complexity
    • PAC learning [Applets]
    • Error bounds
    • VC-dimension
    • Margin-based bounds
    • Large-deviation bounds
      • Hoeffding's inequality, Chernoff bound
    • Mistake bounds
    • No Free Lunch theorem
    Mon., Feb. 27:
  • Lecture: SVMs - The Kernel Trick
    [Slides] [Annotated]

    Wed., Mar. 1
  • Lecture: SVMs - The Kernel Trick, Learning Theory
    [Slides] [Annotated]
  • Homework #3 due
    (beginning of class)
    Project Out

    Mon., Mar. 6
  • Lecture: Learning Theory, Midterm review
    [Slides] [Annotated]

    Mid-term Exam

    All material thus far
    Wed., Mar 8:
    Mid-term exam (in class)  

    Spring break


    Mon., Mar. 13:
    ** No class **  

    Wed., Mar. 15:
    ** No class **  
    Module 6: Structured models
    (4 Lectures)

    • HMMs
      • Forwards-Backwards
      • Viterbi
      • Supervised learning
    • Graphical Models
      • Representation
      • Inference
      • Learning
      • BIC
    Mon., Mar. 20:
  • Lecture: Bayes nets - Representation
    [Slides] [Annotated]

    Wed., Mar. 22:
  • Lecture: Bayes nets - Representation (cont.), Inference
    [Slides] [Annotated]
  • Homework #4 out
    Project Proposal due
    (beginning of class)

    Mon., Mar. 27:
  • Lecture: Bayes nets - Inference (cont.),
    [Slides] [Annotated]
  • Reading: Rabiner's Detailed HMMs Tutorial

  • Wed., Mar. 29:
  • Lecture: HMMs
    Bayes nets - Structure Learning
    [Slides] [Annotated]
  • Additional Reading: Heckerman BN Learning Tutorial
  • Additional Reading: Tree-Augmented Naive Bayes paper
    Module 7: Unsupervised
    and  semi-supervised learning
    (4 Lectures)
    • K-means
    • Expectation Maximization (EM)
    • Combining labeled and unlabeled data
      • EM
      • reweighting labeled data
      • Co-training
      • unlabeled data and model selection
    • Dimensionality reduction
    • Feature selection
    Mon., Apr. 3:
  • Lecture: Bayes nets - Structure Learning
    Clustering - K-means & Gaussian mixture models
    [Slides] [Annotated]

    Wed., Apr. 5:
  • Lecture: Clustering - K-means & Gaussian mixture models
    [Slides] [Annotated]
  • Reading: Neal and Hinton EM paper
  • Homework #4 due
    (beginning of class)
    Homework #5 out

    Mon., Apr. 10:
  • Lecture: EM
    Baum-Welch (EM for HMMs)
    [Slides] [Annotated]

    Wed., Apr. 12:
  • Lecture: Baum-Welch (EM for HMMs)
    EM for Bayes Nets
    [Slides] [Annotated]
  • Reading: Ghahramani, "An introduction to HMMs and Bayesian Networks"
  • Project milestone due
    (beginning of class)
    Module 8: Learning to make decisions
    (3 Lectures)
    • Markov decision processes
    • Reinforcement learning
    Mon., Apr. 17:
  • Lecture: EM for Bayes Nets
    Co-Training for semi-supervised learning
    [Slides] [Annotated]
  • Reading: Blum and Mitchell co-training paper
  • Optional reading: Joachims Transductive SVMs paper

    Wed., Apr. 19:
    Learning from text data
    Lecture by Tom Mitchell
    Homework #5 due
    (beginning of class)

    Mon., Apr. 24:
  • Lecture: Semi-supervised learning in SVMs
    Principal Component Analysis (PCA)
    [Slides] [Annotated]
  • Reading: Shlens' PCA tutorial
  • Optional reading: Wall et al. 2003 - PCA for gene expression data
  • Module 9: Advanced topics
    (3 Lectures)
    • Text data
    • Hierarchial Bayesian models
    • Tackling very large datasets
    • Active learning
    • Overview of follow-up classes
    Wed., Apr. 26:
  • Lecture: Principal Component Analysis (PCA) (cont.)
    Markov Decision Processes
    [Slides] [Annotated]
  • Reading: Kaelbling et al. Reinforcement Learning tutorial

    Mon., May 1:
  • Lecture: Markov Decision Processes
    Reinforcement Learning
    [Slides] [Annotated]
  • Reading: Brafman and Tennenholtz: Rmax paper

    Wed., May 3:
  • Lecture: Reinforcement Learning
    Big Picture
    [Slides] [Annotated]

    Project Poster Session

    Fri., May 5:
    Newell-Simon Hall Atrium

    Project Paper

    Mon., May 8:
    Project paper due

    Final Exam

    All material thus far
    Friday, May 12th, 1-4 p.m.
    Location TBD


    All recitations are Thursdays, 5:00-6:30, Wean Hall 5409, unless otherwise noted.

    Jan. 19 Stano Review of Probability; Distributions; Bayes Rule
    Jan. 25
    NSH 3305
    Jure Introduction to Matlab [FILES Directory]
    Jan. 26 Anton Linear regression, overfitting, examples. [Slides] [Matlab code]
    Feb. 2 Andreas  
    Feb. 9 Jure Information Gain, Decision Trees and Boosting. [Slides]
    Feb. 16 Stano Cross-validation, Neural networks.
    Feb. 23 Anton Nearest neighbors, weighted linear regression, SVMs
    Mar. 2 Andreas  
    Mar. 6 5-7pm All Midterm review session, MONDAY 6th, in NSH 3305. Please bring questions.
    Mar. 9   **NO RECITATION
    Mar. 23 Jure Review of the midterm exam.
    Mar. 30 Stano Bayesian networks. [Slides]
    Apr. 6 Anton Gaussians, Gaussian Mixtures, EM
    Apr. 13 Andreas EM for Hidden Markov Models
    Apr. 20*   ** NO RECITATION -- University Closed
    Apr. 27 Jure Dimensionality reduction: PCA, SVD, Graphs, PageRank, Power iteration [Slides]
    May 4    
    May 8 Andreas MDPs and Reinforcement Learning, Monday 8, 5 pm in NSH 3305.
    May 10 Stano Final exam review session, Wednesday 10, 5:30 pm in NSH 1305.

    Exam Schedule

    Additional Resources

    Here are some example questions here for studying for the midterm/final. Note that these are exams from earlier years, and contain some topics that will not appear in this year's final. And some topics will appear this year that do not appear in the following examples.

    Note to people outside CMU

    Feel free to use the slides and materials available online here. Please email the instructors with any corrections or improvements. Additional slides and software are available at the Machine Learning textbook homepage and at Andrew Moore's tutorials page.