Machine Learning

10-701/15-781, Fall 2010

Aarti Singh

Home People Lectures Recitations Homeworks Project Previous material Table of algorithms
Date Lecture Topics Readings and useful links Handouts
Sept 8 Intro to ML
  • ML applications
  • What consitutes an ML algorithm?
  • Learning paradigms, Loss functions
    • Supervised learning (classification, regression)
    • Unsupervised learning (density estimation, clustering, dimensionality reduction)
  • Bayes Optimal Learning Rule
Bishop: Sec 2.1, Appendix B
Mithcell: Ch 1
Sept 13 Learning distributions
  • Learning parametric distributions
    • Maximum Likelihood Estimation (MLE)
    • Maximum A Posterior (MAP) Estimation
Andrew Moore's Basic Probability Tutorial
Bishop: Sec 2.2, 2.3 (up to 2.3.6)
HW1 is out
Sept 15 Optimal Classifier Slides
  • MLE vs. MAP
  • Bayes Optimal Classifier
Bishop: Sec 1.5
Sept 20 Naive Bayes
  • Conditional Independence
  • Naive Bayes Classifer
    • Discrete Features
    • Continuous Features
Mitchell's Chapter Draft
Sept 22 Logistic regression
  • Generative vs. Discrimiative Classifiers
  • Logistic regression
Mitchell's Chapter Draft
Bishop: Sec 4.1-4.3
On Discriminative and Generative Classifiers, Ng and Jordan, NIPS, 2001 (pdf)
On gradient descent and Newton's method: Boyd's slides and Chapter 9 of Convex Optimization.
Sept 27 Regression
  • Linear Regression
  • Polynomial Regression
Least Squares Applet
Tutorial on regression by Andrew Moore
Bishop: Sec 3.1
HW1 due
Sept 29 Nonparametric methods
  • Histogram, Kernel Density Estimation
  • K-NN Classifier
  • Kernel Regression
Bishop: Sec 2.5, 6.3
Mitchell: Ch 8
Tutorial on Instance-based Learning by Andrew Moore
HW2 is out
Oct 4 Model Selection
  • Overfitting
  • Bias-Variance Tradeoff
  • Model Selection
    • Cross-validation
    • Structural Risk Minimization
    • Complexity Regularization
    • Information Criteria (AIC, BIC, MDL)
Bishop: Sec 1.3, 3.1.4
Hastie: Ch 7 (recommended)
A study of CV and Bootstrap (optional)
MDL website (optional)
Model Selection and MDL principle paper by M. Hansen and B. Yu (optional)
Oct 6 Decision Trees
  • Decision Tree Representation
  • Entropy, Information gain
  • Overfitting, Pre-and Post-pruning, MDL
Mitchell: Ch 3
Decision Tree Applet
Oct 11 Boosting
  • Combining weak classifiers
  • Adaboost algorithm
  • Comparison with logistic regression and bagging
Bishop: Sec 14.3
Boosting homepage
Schapire: Boosting Tutorial, Video
Adaboost Applet
Project Proposal due
Oct 13 Support Vector Machines
  • Maximizing margin
  • SVM formulation
  • Slack variables, Hinge loss
  • Multi-class SVM
Bishop: Sec 7.1, Sec 4.1.1, 4.1.2,
Appendix E
Stephen Boyd's book: Ch 5 (optional)
HW2 due
HW3 is out
Oct 18 Suuport Vector Machines
  • Constrained Optimization
  • Dual SVM
  • Kernel Trick
  • Comparison with Kernel regression and Logistic Regression
Bishop: Sec 6.1, 6.2
Tutorials on SVMs and Kernels
Additional resource: SVM website
Oct 20
Midterm Exam Score distribution Exam
Oct 25 Clustering
  • What is clustering?
  • Hierarhical Clustering
    • Single linkage
    • Complete linkage
    • Average linkage
  • Partition based Clustering
    • K-means algorithm
Bishop: Sec 9.1
Oct 27 EM Algorithm
  • Gaussian Mixture Model
  • Expectation Maximization Algo
Bishop: Ch 9
Nov 1 Learning Theory I
Annotated Slides
  • Sample complexity
  • Haussler bound
  • PAC Learning
  • Hoeffding's bound
Mitchell: Ch 7 HW3 due
HW4 is out
Nov 3 Learning Theory II
  • VC dimension
  • Mistake Bounds
Mitchell: Ch 7
Nov 8 HMM
  • HMM Representation
  • Forward Algorithm
  • Forward-Backward Algorithm
  • Viterbi Algorithm
  • Baum-Welch Algorithm
Bishop: Ch 13
HMM and EM Tutorial
Midterm project report due
Nov 10 Graphical Models I
Representation - Directed models
  • Factorization of joint distrubtion
  • Local Markov Assumption
  • D-separation
  • Representation Theorem
Bishop: Ch 8
Graphical Models tutorial by M. Jordan
Intro to Graphical Models by K. Murphy
Nov 15 Graphical Models II
Representation - Undirected models
  • Factorization of joint distribution
  • Graph separation
  • Hammersley-Clifford Theorem
  • Variable Elimination
Bishop: Ch 8
Graphical Models tutorial by M. Jordan
Intro to Graphical Models by K. Murphy
HW4 due
Nov 17 Graphical Models III
Dimensionality Reduction
Learning - Graphical Models
  • Learning CPTs
  • Learning structure - Chow-Liu Algorithm
Dimensionality Reduction
  • Feature Selection
  • PCA (Principal Components Analysis)
HW5 is out
Nov 22 Nonlinear Dim Red
Spectral Clustering
  • Laplacian Eigenmaps
  • Spectral Clustering
Belkin-Niyogi Paper on Laplacian Emaps
Spectral Clustering tutorial by Ulrike von Luxburg
Spectral Clustering demo
Nov 29 Neural Networks
Neural Networks
  • Prediction - Forward Propagation
  • Training - Backpropagation
Derivation of Backpropagation (pdf)
Dec 1 Semi-Supervised Learning
Dec 2
Project Poster Presentation (3-6 pm NSH Atrium)
Dec 7
Final Project report due (by 10:30 am) Both project report and HW5 are due by 10:30 am in Michelle's office (GHC 8001) HW5 due (by 10:30 am)
Dec 14
Final Exam (1-4 pm), DH 2210