Georgia Tech, CS 7545 Machine Learning Theory, Fall 2013

MACHINE LEARNING THEORY

Maria Florina Balcan

Time: Mon/Wed 3:05-4:25, Place: ES&T L1175.
Course description: Machine learning studies automatic methods for learning to make accurate predictions or useful decisions based on past observations and experience, and it has become a highly successful discipline with applications in many areas such as natural language processing, speech recognition, computer vision, or gene discovery. This course on the design and theoretical analysis of machine learning methods will cover a broad range of important problems studied in theoretical machine learning. It will provide a basic arsenal of powerful mathematical tools for their analysis, focusing on both statistical and computational aspects. We will examine questions such as "What guarantees can we prove on the performance of learning algorithms? " and "What can we say about the inherent ease or difficulty of learning problems?". In addressing these and related questions we will make connections to statistics, algorithms, complexity theory, information theory, game theory, and empirical machine learning research. You can find more info here.

Recommended (but not required) textbooks:

An Introduction to Computational Learning Theory by M. Kearns and U. Vazirani
A Probabilistic Theory of Pattern Recognition by L. Devroye, L. Györfi, G. Lugosi.

Additionally, we will use a number of survery articles and tutorials.

Homeworks

Homework 1 [pdf]
Homework 2 [pdf]
Homework 2.5 [pdf]
Homework 3 [pdf]
Homework 4 [pdf]

Project ideas

Lecture Notes & Handouts

08/19: Introduction. The consistency model.
See also Chapter 1 in the Kearns-Vazirani book.
08/21: The PAC model for passive supervised learning.
See also Chapter 1 in the Kearns-Vazirani book.
08/26: The PAC model for passive supervised learning (cont'd).
See also Chapter 1 in the Kearns-Vazirani book.
08/28: The Perceptron algorithm for learning linear separators.
09/05: The mistake bound model.
See also: An Online Learning Survey by Avrim Blum (Section 3 in particular).
09/09: The Winnow algorithm.
See also: The original Winnow paper. See also: Kakade and Tewari lecture notes.
09/11: Tail Inequalities. Simple sample complexity results for the agnostic case.
09/16: The Vapnik Chervonenkis dimension. Sauer's lemma .
See also Chapter 3 in the Kearns-Vazirani book.
09/18: Sample complexity results for infinite hypothesis spaces (cont'd)..
See also Chapter 3 in the Kearns-Vazirani book.
09/23: Sample complexity lower bounds for passive supervised learning.
See also Chapter 3.6 in the Kearns-Vazirani book.
09/25: Sample complexity results for infinite hypothesis spaces (cont'd).
09/30: Boosting: Weak Learning and Strong Learning. See Avrim Blum's lecture notes.
See also Chapter 4 in the Kearns-Vazirani book.
10/02: Adaboost. See Rob Schapire's lecture notes.
See also: A Short Introduction to Boosting, by Yoav Freund and Rob Schapire.
10/07: Adaboost. Generalization error bounds: naive and margins-based.
10/09: Kernels methods.
10/16: Kernels methods (cont'd). Properties of Legal Kernel Functions. More Examples of Kernels.
See also: Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings, Machine Learning Journal 2006.
10/21: Support Vector Machines. See Andrew Ng's notes. See also Steve Boyd's lectures on Convex Optimization.
10/23: Generalization bounds based on the Rademacher Complexity.
10/28: Generalization bounds based on the Rademacher Complexity (cont'd).
See Introduction to Statistical Learning Theory. by O. Bousquet, S. Boucheron, and G. Lugosi.
10/30: Semi-Supervised Learning.
See also The Encyclopedia of Machine Learning entry on Semi-Supervised Learning by X. Zhu.
11/04: Theoretical Foundations for Semi-Supervised Learning.
See also: A Discriminative Model for Semi-Supervised Learning, JACM 2010.
11/06: Active Learning.
For the CAL and A^2 algorithms, as well as the disagreement coefficient analysis see Shai Shalev-Shwartz's notes.
11/11:Computationally efficient with nearly optimal label complexity active learning.
See also: Active and Passive Learning of Linear Separators under Log-concave Distributions, COLT 2013.
11/13: Online Learning, Combining Experts Advice, and Regret Minimization. The Weighted Majority Algorithm.
See also The original Weighted Majority paper by N. Littlestone and M. Warmuth.

Additional Readings & More Information

Books and tutorials:

PASCAL video lectures.
Kernel Methods for Pattern Analysis by N. Cristianini and J. Shawe-Taylor. Cambridge University Press, 2004.
An Introduction to Support Vector Machines (and other kernel-based learning methods) by N. Cristianini and J. Shawe-Taylor. Cambridge University Press, 2000.
Learning in Neural Networks: Theoretical Foundations by M. Anthony and P. Bartlett. Cambridge University Press, 1999.
Statistical Learning Theory by V. Vapnik. Wiley, 1998.
Foundations of machine learning by M. Mohri, A. Rostamizadeh, and A. Talwalkar. MIT Press, 2012
Learning theory by A. Tewari and P. L. Bartlett. E-Reference Signal Processing, 2013