Date 
Lecture 
Topics 
Readings and useful links 
Handouts 
Sept 8 
Intro to ML
Slides 
 ML applications
 What consitutes an ML algorithm?
 Learning paradigms, Loss functions
 Supervised learning (classification, regression)
 Unsupervised learning (density estimation, clustering, dimensionality reduction)
 Bayes Optimal Learning Rule

Bishop: Sec 2.1, Appendix B Mithcell: Ch 1 

Sept 13 
Learning distributions
Slides 
 Learning parametric distributions
 Maximum Likelihood Estimation (MLE)
 Maximum A Posterior (MAP) Estimation

Andrew Moore's Basic Probability Tutorial
Bishop: Sec 2.2, 2.3 (up to 2.3.6) 
HW1 is out 
Sept 15 
Optimal Classifier
Slides 
 MLE vs. MAP
 Bayes Optimal Classifier

Bishop: Sec 1.5
 
Sept 20 
Naive Bayes
Slides 
 Conditional Independence
 Naive Bayes Classifer
 Discrete Features
 Continuous Features

Mitchell's Chapter Draft


Sept 22 
Logistic regression
Slides 
 Generative vs. Discrimiative Classifiers
 Logistic regression

Mitchell's Chapter Draft
Bishop: Sec 4.14.3
On Discriminative and Generative Classifiers, Ng and Jordan, NIPS, 2001 (pdf)
On gradient descent and Newton's method: Boyd's slides and Chapter 9 of Convex Optimization.


Sept 27 
Regression
Slides 
 Linear Regression
 Polynomial Regression

Least Squares Applet
Tutorial on regression by Andrew Moore
Bishop: Sec 3.1

HW1 due

Sept 29 
Nonparametric methods
Slides 
 Histogram, Kernel Density Estimation
 KNN Classifier
 Kernel Regression

Bishop: Sec 2.5, 6.3
Mitchell: Ch 8
Tutorial on Instancebased Learning by Andrew Moore

HW2 is out 
Oct 4 
Model Selection
Slides

 Overfitting
 BiasVariance Tradeoff
 Model Selection
 Crossvalidation
 Structural Risk Minimization
 Complexity Regularization
 Information Criteria (AIC, BIC, MDL)

Bishop: Sec 1.3, 3.1.4
Hastie: Ch 7 (recommended)
A study of CV and Bootstrap (optional)
MDL website (optional)
Model Selection and MDL principle paper by M. Hansen and B. Yu (optional)


Oct 6 
Decision Trees
Slides

 Decision Tree Representation
 Entropy, Information gain
 Overfitting, Preand Postpruning, MDL

Mitchell: Ch 3
Decision Tree Applet


Oct 11 
Boosting
Slides

 Combining weak classifiers
 Adaboost algorithm
 Comparison with logistic regression and bagging

Bishop: Sec 14.3
Boosting homepage
Schapire: Boosting Tutorial, Video
Adaboost Applet

Project Proposal due

Oct 13 
Support Vector Machines
Slides

 Maximizing margin
 SVM formulation
 Slack variables, Hinge loss
 Multiclass SVM

Bishop: Sec 7.1, Sec 4.1.1, 4.1.2, Appendix E
Stephen Boyd's book: Ch 5 (optional)

HW2 due
HW3 is out

Oct 18 
Suuport Vector Machines
Slides

 Constrained Optimization
 Dual SVM
 Kernel Trick
 Comparison with Kernel regression and Logistic Regression

Bishop: Sec 6.1, 6.2
Tutorials on SVMs and Kernels
Additional resource: SVM website


Oct 20 

Midterm Exam

Score distribution

Exam
Solution

Oct 25 
Clustering
Slides

 What is clustering?
 Hierarhical Clustering
 Single linkage
 Complete linkage
 Average linkage
 Partition based Clustering

Bishop: Sec 9.1


Oct 27 
EM Algorithm
Slides

 Gaussian Mixture Model
 Expectation Maximization Algo

Bishop: Ch 9


Nov 1 
Learning Theory I
Slides
Annotated Slides

 Sample complexity
 Haussler bound
 PAC Learning
 Hoeffding's bound

Mitchell: Ch 7

HW3 due
HW4 is out

Nov 3 
Learning Theory II
Slides

 VC dimension
 Mistake Bounds

Mitchell: Ch 7


Nov 8 
HMM
Slides

 HMM Representation
 Forward Algorithm
 ForwardBackward Algorithm
 Viterbi Algorithm
 BaumWelch Algorithm

Bishop: Ch 13
HMM and EM Tutorial

Midterm project report due

Nov 10 
Graphical Models I
Slides

Representation  Directed models
 Factorization of joint distrubtion
 Local Markov Assumption
 Dseparation
 Representation Theorem

Bishop: Ch 8
Graphical Models tutorial by M. Jordan
Intro to Graphical Models by K. Murphy


Nov 15 
Graphical Models II
Slides

Representation  Undirected models
 Factorization of joint distribution
 Graph separation
 HammersleyClifford Theorem
Inference

Bishop: Ch 8
Graphical Models tutorial by M. Jordan
Intro to Graphical Models by K. Murphy

HW4 due

Nov 17 
Graphical Models III
Dimensionality Reduction
Slides

Learning  Graphical Models
 Learning CPTs
 Learning structure  ChowLiu Algorithm
Dimensionality Reduction
 Feature Selection
 PCA (Principal Components Analysis)


HW5 is out

Nov 22 
Nonlinear Dim Red
Slides
Spectral Clustering
Slides

 Laplacian Eigenmaps
 Spectral Clustering

BelkinNiyogi Paper on Laplacian Emaps
Spectral Clustering tutorial by Ulrike von Luxburg
Spectral Clustering demo


Nov 29 
Neural Networks
Slides

Neural Networks
 Prediction  Forward Propagation
 Training  Backpropagation

Derivation of Backpropagation (pdf)


Dec 1 
SemiSupervised Learning
Slides




Dec 2 

Project Poster Presentation (36 pm NSH Atrium)



Dec 7 

Final Project report due (by 10:30 am)

Both project report and HW5 are due by 10:30 am in Michelle's office (GHC 8001)

HW5 due (by 10:30 am)

Dec 14 

Final Exam (14 pm), DH 2210


