15-789 Theoretical and Empirical Foundations of Modern Machine Learning

Welcome to Theoretical and Empirical Foundations of Modern Machine Learning
15-789, Fall 2024!

Instructor: Aditi Raghunathan (raditi at cmu dot edu) OH: Fri 10-11am

TA: Jacob Springer (jspringe at andrew dot cmu dot edu) OH: Tue 5-6pm

Lectures: Monday, Wednesday 3:30-5:00pm at TEP 1403

Overview

The Fall 2024 iteration of this class will focus on foundation models. Foundation models have heralded a new era in modern machine learning. They are trained on massive raw data at scale and work well on a wide range of tasks with little to no fine-tuning. In this advanced machine learning seminar class, we will build a principled understanding of when and why they work or fail, and avenues for improving their reliability and trustworthiness. The class aims to equip students with the ability to critically reason about and build a more principled understanding of current advances which will hopefully spark their own research.

The Fall 2022 offering can be accessed here.

Format

This course combines lectures with paper presentations by the students, encouraging both fundamental knowledge acquisition as well as open-ended discussions and new research directions. The lectures will briefly introduce the main concepts, summarize a few key papers and connect to classical ideas if applicable.

The paper discussions will involve role-playing student seminars inspired by Alec Jacobson and Colin Raffel. We will be adopting the following roles.

Positive reviewer: who advocates for the paper to be accepted at a conference (e.g., NeurIPS)
Negative reviewer: who advocates for the paper to be rejected at a conference (e.g., NeurIPS)
Archaeologist: who determines where this paper sits in the context of previous and subsequent work. They must find and report on atleast one older paper cited within the current paper that substantially influenced the current paper and atleast one newer paper that cites this current paper. Keep an eye out for follow-up work that contradicts the takeaways in the current paper
Academic researcher: who proposes potential follow-up projects not just based on the current paper but also only possible due to the existence and success of the current paper
Visitor from the past: who is a researcher from the early 2000s. They must discuss how they comprehend the results of the paper, what they like or dislike about the settings and benchmarks considered, and what surprises them the most about presented results

Prerequisites:

There are no official prerequisites but a comfortable grasp of probability, linear algebra, machine learning is expected. This is an advanced class that is fast-paced and research focused.

Course Requirements

Regular participation (25%): Written summaries of assigned readings must be submitted before each class, plus participation in online discussion
Paper presentation (40%): A student must present 1-2 paper presentations throughout the class. A paper will be presented by 2 students where each student takes on the role of either a positive or negative reviewer and one other role from the list above
Class participation during lectures and paper discussions (10%)
Final project (25%) if taking for letter grade

Important Dates

Project proposal: due Oct 20 2024. See guidelines here
Midway project check: Nov 15 2024
Project reports in style of NeurIPS paper: Dec 9 2024
Final project presentations: Dec 2 2024, Dec 4 2024 (during class hours)

Schedule

Date	Topic	Content	Presenter
08/26/2024	Lecture 1: Introduction	Why does this course exist? Course logistics Overview of the course: scientific method to vaidate hypotheses, analysis of stylistic models	Aditi Raghunathan
08/28/2024	Lecture 2: From ''classical'' ML to foundation models	Supervised learning Semi-supervised learning Transfer learning Self-supervised learning	Aditi Raghunathan
09/02/2024	Labor Day (No class)
09/04/2024	Guest lecture	The pitfalls of next-token prediction	Vaishnavh Nagarajan
09/09/2024	Lecture moved to Friday 09/13/2024
09/11/2024	Paper discussion 1	Language Models are Unsupervised Multitask Learners Learning Transferable Visual Models From Natural Language Supervision
09/13/2024	Lecture 3: The generalization puzzle	Bias-variance tradeoff and double descent Uniform convergence Implicit regularization	Aditi Raghunathan
09/16/2024	Paper discussion 2	Understanding Deep Learning Requires Rethinking Generalization Grokking: Generalization beyond Overfitting on Small Algorithmic Datasets
09/18/2024	Lecture 4: Scaling laws		Aditi Raghunathan
09/23/2024	Paper discussion 3	Training Compute-Optimal Large Language Models Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
09/25/2024	Lecture 5: Downstream capabilities of pretrained models	Representation learning Few-shot learning In-context learning	Aditi Raghunathan
09/30/2024	Paper discussion 4	What Learning Algorithm is In-context Learning? Investigations with Linear models Rethinking the Role of Demonstrations: What Makes In-context Learning Work?
10/02/2024	Lecture 6: A critical look at capabilities	Out-of-distribution generalization Shortcuts and simplicity bias Causality	Aditi Raghunathan
10/07/2024	Paper discussion 5	Are Emergent Abilities of Large Language Models a Mirage? Training on the Test Task Confounds Emergence
10/09/2024	Paper discussion 6	Do ImageNet Classifiers Generalize to ImageNet? Transformers Learn Shortcuts to Automata
10/14/2024	Fall break (No class)
10/16/2024	Fall break (No class)
10/21/2024	Lecture 7: A critical look at capabilities - part 2	Out-of-distribution generalization Shortcuts and simplicity bias Causality	Aditi Raghunathan
10/23/2024	Lecture 8: Post-training	Fine-tuning Preference optimization Alignment, safety and robustness	Aditi Raghunathan
10/28/2024	Paper discussion 7	Robust Fine-tuning of Zero-Shot Models What Algorithms can Transformers Learn? A Study in Length Generalization
10/30/2024	Paper discussion 8	LORA: Low-Rank Adaptation of Large Language Models Direct Preference Optimization: Your Language Model is Secretly a Reward Model
11/04/2024	Lecture 9	Weak-to-strong generalization Synthetic data: strengths and limits Connections to classical ideas of self-training, domain adaptation	Aditi Raghunathan
11/06/2024	Paper discussion 9	Towards Deep Learning Models Resistant to Adversarial Attacks Universal and Transferable Adversarial Attacks on Aligned Language Models
11/11/2024	Paper discussion 10	STaR: Bootstrapping Reasoning with Reasoning Test-Time Training with Self-Supervision for Generalization under Distribution Shifts
11/13/2024	Guest Lecture		Tim Dettmers
11/18/2024	Guest Lecture: Inference-time methods		Sean Welleck
11/20/2024	Paper discussion 11	Let's Verify Step by Step Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
11/25/2024	Paper discussion 12	The Files are in the Computer: On Copyright, Memorization, and Generative AI On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
12/2/2024	Project Presentations
12/4/2024	Project Presentations