Date 
Lecture 
Topics 
Readings and useful links 
Handouts 
Sep 4 
Intro to ML
Decision Trees
Slides

 Machine learning examples
 Well defined machine learning problem
 Decision tree learning

Mitchell: Ch 3
Bishop: Ch 14.4
The
Discipline of Machine Learning

HW1 out 
Sep 6 
Decision Tree learning
Review of Probability
slides

 The big picture
 Overfitting
 Random variables, probabilities

Andrew Moore's Basic Probability Tutorial
Bishop: Ch. 1 thru 1.2.3
Bishop: Ch 2 thru 2.2


Sep 11 


Andrew Moore's Basic Probability Tutorial
Bishop: Ch. 1 thru 1.2.3
Bishop: Ch 2 thru 2.2
Tom's VIDEO:
Probability and Estimation


Sep 13 

 Conditional independence
 Naive Bayes

Mitchell:
Naive Bayes and Logistic Regression
Tom's
VIDEO: Naive Bayes


Sep 18 

 Gaussian Bayes classifiers
 Document classification
 Brain image classification
 Form of decision surfaces

Mitchell:
Naive Bayes and Logistic Regression
Tom's
VIDEO: Gaussian Bayes


Sep 20 

 Naive Bayes  the big picture
 Logistic Regression: Maximizing conditional likelihood
 Gradient ascent as a general learning/optimization method

Mitchell:
Naive Bayes and Logistic Regression
Optional: Ng & Jordan: On
Discriminative and Generative Classifiers, NIPS, 2001.
Tom's
VIDEO: Logistic regression


Sep 25 

 Generative/Discriminative models
 minimizing squared error and maximizing data likelihood
 regularization
 biasvariance decomposition

Bishop: Ch. 1 thru 1.2.5, Ch. 3 thru 3.2
Optional: Mitchell: Ch. 6.4
Tom's
VIDEO: Linear regression


Sep 27 

 Nonlinear regression
 Gradient descent
 Learning of representations
 Deep Belief Networks

Mitchell: Ch. 4, or Bishop: Ch. 5
Optional: Le et al., 2012
Tom's
VIDEO: Neural networks


Oct 2 

 Bayes nets
 representing joint distributions with conditional independence assumptions



Oct 4 

 Inference
 Learning from fully observed data
 Learning from partially observed data

Intro. to Graphical Models, K. Murphy
Tom's VIDEO: Graphical models 2


Oct 9 

 EM
 Semisupervised learning
 Mixture of Gaussian clustering
 KMeans clustering

Bishop: Ch. 9 through 9.2
Optional: EM
and HMM tutorial J.Bilmes (sec. 13)
Tom's VIDEO: Graphical models 3
Tom's VIDEO: Graphical models 4


Oct 11 

 Computational Learning Theory
 Probably approximately correct learning

Mitchell: Ch. 7
Tom's
VIDEO:
Learning theory 1


Oct 16 

 VC Dimension
 Agnostic learning models
 Mistake bound models

Mitchell: Ch. 7
Tom's
VIDEO:
Learning theory 2


Oct 18 

Midterm



Oct 23 
Hierarchical Clustering
Slides

 Distance functions
 Hierarchical clustering
 Number of clusters

Bishop: 99.2
Optional: Tutorial on clustering
Hierarchical clustering app


Oct 25 
SemiSupervised Learning
Slides

 Semisupervised learning
 Reweighting labeled examples
 CoTraining
 Detecting overfitting

Optional: Advanced tutorial

HW 4 out HW 4 data 
Oct 30 

 Graphical models
 Constructing a BN
 Inference in BNs
 Why its hard
 Variable elimination
 Stochastic inference

Chap 8.1 and 8.2.2 (Bishop)
Optional: Tutorials: 1
2


Nov 1 
Inference in Bayesian Networks
Slides

 Why its hard
 Variable elimination
 Stochastic inference
 Introduction to HMMs

Chap 8.1 and 8.2.2 (Bishop)
Optional: Tutorials: 1
2


Nov 6 
Inference in Hidden Markov models
Slides

 Formal definition of HMMs
 Inference in HMMs
 With no observations
 With observations
 The Viterbi algorithm

Bishop  1313.2.1 (inclusive)
Tutorial Tutorial


Nov 8 

 Learning parameters when states can be observed
 Fully unsupervised
 Forward backward algorithm
 EM for HMM learning
 Introduction to Markov decision processes (MDPs)

Bishop  13.2.113.2.2
Tutorial Tutorial


Nov 13 
Markov Decision Processes (MDP)
Slides

 Formal definition of MDPs
 Inference in MDPs
 With no actions
 With actions

Tutorial
Demo


Nov 15 




Nov 20 
Dimensionality reduction (PCA)
Notes


Chapter 4.1.4  4.1.6 in Bishop
Tutorial for PCA including MATLAB code.


Nov 27 
Suport Vector Machine (SVM)
Slides

 Max margin
 Support vectors
 Quadratic programming
 Linear separation



Nov 29 
Suport Vector Machine (SVM)
Slides

 Lagrange multiplies
 Dual formulation of SVM
 Transformation of the input vector
 The kernel trick

Software
Optional reading


Dec 4 

 Weak classifiers
 AdaBoost
 Boosting and logistic regression



Dec 6 
Model and feature selection
Slides

 Cross validation
 Regularization
 Information theoretical selection methods
 Feature selection


