15-884 Theoretical and Empirical Foundations of Modern Machine Learning

Welcome to Theoretical and Empirical Foundations of Modern Machine Learning (15-884), Fall 2022!

Instructor: Aditi Raghunathan (raditi at cmu dot edu)

TA: Christina Baek (kbaek at cs dot cmu dot edu)

Lectures: Tuesday, Thursday 4:40-6:00pm at GHC 4102

Overview:
In this advanced machine learning seminar class, we tackle the typical struggle in using the powerful deep learning machinery: what works and why? We build a conceptual understanding of deep learning through several different angles: standard in-distribution generalization, out-of-distribution generalization, self-supervised learning, scaling laws, memorization etc. We will read papers that contain a mix of theoretical and empirical insights with a focus on making connections to classic ideas, identifying recurring themes, and discussing avenues for future developments. The class aims to equip students with the ability to critically reason about and build a more principled understanding of current advances which will hopefully spark their own research.

Format:
This course combines lectures with paper presentations by the students, encouraging both fundamental knowledge acquisition as well as open-ended discussions and new research directions. The lectures will briefly introduce the main concepts, summarize a few key papers and connect to classical ideas if applicable.

The paper discussions will involve role-playing student seminars inspired by Alec Jacobson and Colin Raffel. We will be adopting the following roles.

Positive reviewer: who advocates for the paper to be accepted at a conference (e.g., NeurIPS)
Negative reviewer: who advocates for the paper to be rejected at a conference (e.g., NeurIPS)
Archaeologist: who determines where this paper sits in the context of previous and subsequent work. They must find and report on atleast one older paper cited within the current paper that substantially influenced the current paper and atleast one newer paper that cites this current paper. Keep an eye out for follow-up work that contradicts the takeaways in the current paper
Academic researcher: who proposes potential follow-up projects not just based on the current paper but also only possible due to the existence and success of the current paper

Visitor from the past: who is a researcher from the early 2000s. They must discuss how they comprehend the results of the paper, what they like or dislike about the settings and benchmarks considered, and what surprises them the most about presented results

Prerequisites: There are no official prerequisites but a knowledge of probability, linear algebra, machine learning is expected.

Course requirements:

Regular participation (25%): Written summaries of assigned readings must be submitted before each class, plus participation in online discussion
Paper presentation (40%): A student must present 1-2 paper presentations throughout the class. A paper will be presented by 2 students where each student takes on the role of either a positive or negative reviewer and one other role from the list above
Class participation during lectures and paper discussions (10%)
Final project (25%) if taking for letter grade

Important dates:

Project proposal: Oct 13 2022
Midway project check: Nov 15 2022
Project reports in style of NeurIPS paper: Dec 12 2022
Final project presentations: Lectures starting from Dec 15 2022

Topics (tentative):

Generalization in deep learning (uniform convergence, NTK, …)
Implicit biases (algorithmic regularization, simplicity bias, …)
Brittleness and robust training (min-max robustness, spurious correlations, domain invariance, …)
ML with unlabeled data (semi-supervised learning, self-supervised learning, …)
Adaptation (fine-tuning, few-shot learning, continual learning, …)
Large language models (transformers, in-context learning, prompt tuning, scaling laws, …)
Implications on security/privacy, fairness and ethics

Schedule:

Date Topic Content Presenter

08/30/2022 [Lecture 1] Introduction

Why does this course exist?

Course logistics

Overview of the course

Aditi Raghunathan

09/01/2022 [Lecture 2] The generalization puzzle Uniform convergence, implicit regularization Aditi Raghunathan

09/06/2022 [Paper discussion 1] Generalization

The tradeoffs of large scale learning

The lottery ticket hypothesis: finding sparse, trainable neural networks

09/08/2022 [Paper discussion 2] Generalization

Exploring generalization in deep learning

The implicit bias of gradient descent on separable data

09/13/2022 [Guest Lecture] Limitations of uniform convergence Vaishnavh Nagarajan

09/15/2022 [Lecture 3] Phenomena captured by simpler models Double descent, bias-variance tradeoff, kernel methods

09/20/2022 [Paper discussion 3]

Neural Tangent Kernel: convergence and generalization in Neural Networks

Benign overfitting in linear regression

09/22/2022 [Lecture 4] Robustness of deep networks Out-of-distribution generalization, adversarial examples, spurious correlations, shortcut learning and simplicity bias Aditi Raghunathan

09/27/2022 [Paper discussion 4] Why are models brittle? (I)

A universal law of robustness via isoperimetry

Adversarial examples are not bugs, they are features

09/29/2022 [Lecture 5] Robust training of deep networks Robust optimization, accuracy tradeoff, effect of overparameterization Aditi Raghunathan

10/04/2022 [Paper discussion 5] Why are models brittle? (II)

Understanding the failure modes of out-of-distribution generalization

Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization

10/06/2022 [Lecture 6] Data poisoning, causality Discussion of data poisoning, intro to causality Aditi Raghunathan

10/11/2022 [Paper discussion 6] Causality

On causal and anti-causal learning

Invariant risk minimization

10/13/2022 [Lecture 7]Unlabeled data-I A brief history Aditi Raghunathan

10/18/2022 [Fall Break]

10/20/2022 [Fall Break]

10/25/2022 [Paper discussion 7] Learning from unlabeled data

Realistic evaluation of deep semi-supervised learning algorithms

Masked autoencoders are scalable vision learners

10/27/2022 [Guest Lecture] Self-supervised learning Alexei A. Efros

11/1/2022 [Paper discussion 8] Learning from unlabeled data

Emerging properties in self-supervised vision transformers

Provable guarantees for self-supervised deep learning with spectral contrastive loss

11/3/2022 [Lecture 8]Unlabeled data-II Analysis of self-training, self-supervision and domain adaptation methods

11/8/2022 [Paper discussion 9]Distribution shifts with access to unlabeled data

Domain-adversarial training of neural networks

Test-time training with self-supervision for generalization under distribution shifts

11/10/2022 [Lecture 9] Foundation models Transfer learning, analysis of fine-tuning methods, in-context learning

11/15/2022 [Guest lecture] Graham Neubig, Maarten Sap

11/17/2022 [Paper discussion 10]

Model-agnostic meta-learning for fast adaptation of deep networks

The power of scale for parameter-efficient prompt tuning

11/22/2022 [Paper discussion 11]

Scaling laws for neural language models

What can transformers learn in-context? A case study for simple functions

11/24/2022 [Thanksgiving break]

11/29/2022 [NeurIPS break]

12/1/2022 [NeurIPS break]

12/06/2022 [Guest Lecture] Privacy and fairness in modern machine learning Nicholas Carlini

12/08/2022 [Guest Lecture] Benchmarking large language models Rishi Bommasani

12/13/2022 [Paper presentation 12]

Beyond the imitation game: quantifying and extrapolating the capabilities of language models

On the dangers of stochastic parrots: can language models be too big?

12/15/2022 [Project presentations]

12/20/2022 [Project presentations]

Date	Topic	Content	Presenter
08/30/2022	[Lecture 1] Introduction	Why does this course exist? Course logistics Overview of the course	Aditi Raghunathan
09/01/2022	[Lecture 2] The generalization puzzle	Uniform convergence, implicit regularization	Aditi Raghunathan
09/06/2022	[Paper discussion 1] Generalization	The tradeoffs of large scale learning The lottery ticket hypothesis: finding sparse, trainable neural networks
09/08/2022	[Paper discussion 2] Generalization	Exploring generalization in deep learning The implicit bias of gradient descent on separable data
09/13/2022	[Guest Lecture]	Limitations of uniform convergence	Vaishnavh Nagarajan
09/15/2022	[Lecture 3] Phenomena captured by simpler models	Double descent, bias-variance tradeoff, kernel methods
09/20/2022	[Paper discussion 3]	Neural Tangent Kernel: convergence and generalization in Neural Networks Benign overfitting in linear regression
09/22/2022	[Lecture 4] Robustness of deep networks	Out-of-distribution generalization, adversarial examples, spurious correlations, shortcut learning and simplicity bias	Aditi Raghunathan
09/27/2022	[Paper discussion 4] Why are models brittle? (I)	A universal law of robustness via isoperimetry Adversarial examples are not bugs, they are features
09/29/2022	[Lecture 5] Robust training of deep networks	Robust optimization, accuracy tradeoff, effect of overparameterization	Aditi Raghunathan
10/04/2022	[Paper discussion 5] Why are models brittle? (II)	Understanding the failure modes of out-of-distribution generalization Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization
10/06/2022	[Lecture 6] Data poisoning, causality	Discussion of data poisoning, intro to causality	Aditi Raghunathan
10/11/2022	[Paper discussion 6] Causality	On causal and anti-causal learning Invariant risk minimization
10/13/2022	[Lecture 7]Unlabeled data-I	A brief history	Aditi Raghunathan
10/18/2022	[Fall Break]
10/20/2022	[Fall Break]
10/25/2022	[Paper discussion 7] Learning from unlabeled data	Realistic evaluation of deep semi-supervised learning algorithms Masked autoencoders are scalable vision learners
10/27/2022	[Guest Lecture]	Self-supervised learning	Alexei A. Efros
11/1/2022	[Paper discussion 8] Learning from unlabeled data	Emerging properties in self-supervised vision transformers Provable guarantees for self-supervised deep learning with spectral contrastive loss
11/3/2022	[Lecture 8]Unlabeled data-II	Analysis of self-training, self-supervision and domain adaptation methods
11/8/2022	[Paper discussion 9]Distribution shifts with access to unlabeled data	Domain-adversarial training of neural networks Test-time training with self-supervision for generalization under distribution shifts
11/10/2022	[Lecture 9] Foundation models	Transfer learning, analysis of fine-tuning methods, in-context learning
11/15/2022	[Guest lecture]		Graham Neubig, Maarten Sap
11/17/2022	[Paper discussion 10]	Model-agnostic meta-learning for fast adaptation of deep networks The power of scale for parameter-efficient prompt tuning
11/22/2022	[Paper discussion 11]	Scaling laws for neural language models What can transformers learn in-context? A case study for simple functions
11/24/2022	[Thanksgiving break]
11/29/2022	[NeurIPS break]
12/1/2022	[NeurIPS break]
12/06/2022	[Guest Lecture]	Privacy and fairness in modern machine learning	Nicholas Carlini
12/08/2022	[Guest Lecture]	Benchmarking large language models	Rishi Bommasani
12/13/2022	[Paper presentation 12]	Beyond the imitation game: quantifying and extrapolating the capabilities of language models On the dangers of stochastic parrots: can language models be too big?
12/15/2022	[Project presentations]
12/20/2022	[Project presentations]