Statistical Machine Learning is a second graduate level course in
machine learning, assuming students have taken Machine Learning
(10-701) and Intermediate Statistics (36-705). The term
"statistical" in the title reflects the emphasis on statistical
analysis and methodology, which is the predominant approach in modern
The course combines methodology with theoretical foundations and
computational aspects. It treats both the "art" of designing good
learning algorithms and the "science" of analyzing an
algorithm's statistical properties and performance
guarantees. Theorems are presented together with practical aspects of
methodology and intuition to help students develop tools for selecting
appropriate methods and approaches to problems in their own
The course includes topics in statistical theory that are
now becoming important for researchers in machine learning, including
consistency, minimax estimation, and concentration of measure. It
also presents topics in computation including elements of convex
optimization, variational methods, randomized projection algorithms,
and techniques for handling large data sets.
Topics will be chosen from the following basic outline, which is
subject to change.
- Statistical theory:
Maximum likelihood, Bayes, minimax,
Parametric versus Nonparametric Methods,
Bayesian versus Non-Bayesian Approaches,
classification, regression, density estimation.
- Convexity and optimization:
Convexity, conjugate functions, unconstrained and constrained
optimization, KKT conditions.
- Parametric methods:
Linear Regression, Model Selection, Generalized Linear Models,
Mixture Models, Classification (linear, logistic, support vector machines),
Structured Prediction, Hidden Markov Models.
High Dimensional Data and Sparsity,
Basis Pursuit and the Lasso Revisited,
Sparsistency, Consistency, Persistency,
Greedy Algorithms for Sparse Linear Regression,
Sparsity in Nonparametric Regression.
Sparsity in Graphical Models, Compressed Sensing.
- Nonparametric methods:
Nonparametric Regression and Density Estimation,
Clustering and Dimension Reduction, PCA,
Manifold Methods, Principal Curves, Spectral Methods,
The Bootstrap and Subsampling,
- Advanced theory:
Concentration of Measure, Covering numbers,
Tsybakov noise, minimax rates
for classification and regression, surrogate loss functions, boosting, sparsistency, Minimax theory.
- Kernel methods:
reproducing kernel Hilbert spaces,
relationship to nonparametric statistics,
kernel classification, kernel PCA, kernel tests of independence.
The EM Algorithm, Simulation,
Variational Methods, Regularization Path Algorithms,
- Other learning methods:
Functional Data, Semi-Supervised Learning, Reinforcement Learning,
Minimum Description Length, Online Learning, The PAC Model, Active Learning.