Welcome to Theoretical and Empirical Foundations of Modern Machine Learning
15-789, Fall 2024!
Instructor: Aditi Raghunathan (raditi at cmu dot edu) OH: Fri 10-11am
TA: Jacob Springer (jspringe at andrew dot cmu dot edu) OH: Tue 5-6pm
Lectures: Monday, Wednesday 3:30-5:00pm at TEP 1403
Overview
The Fall 2024 iteration of this class will focus on
foundation models. Foundation models have heralded a new era in modern machine learning. They are trained on massive raw data at scale and work well on a wide range of tasks with little to no fine-tuning. In this advanced machine learning seminar class, we will build a
principled understanding of when and why they work or fail, and avenues for improving their
reliability and trustworthiness. The class aims to equip students with the ability to critically reason about and build a more principled understanding of current advances which will hopefully spark their own research.
The Fall 2022 offering can be accessed here.
Format
This course combines lectures with paper presentations
by the students, encouraging both fundamental knowledge
acquisition as well as open-ended discussions and new research directions.
The lectures will briefly introduce the main concepts, summarize a few key papers
and connect to classical ideas if applicable.
The paper discussions will involve role-playing student seminars inspired by Alec Jacobson and Colin Raffel.
We will be adopting the following roles.
- Positive reviewer: who advocates for the paper to be accepted at a conference (e.g., NeurIPS)
- Negative reviewer: who advocates for the paper to be rejected at a conference (e.g., NeurIPS)
- Archaeologist: who determines where this paper sits in the context of previous and subsequent work. They must find and report on atleast one older paper cited within the current paper that substantially influenced the current paper and atleast one newer paper that cites this current paper. Keep an eye out for follow-up work that contradicts the takeaways in the current paper
- Academic researcher: who proposes potential follow-up projects not just based on the current paper but also only possible due to the existence and success of the current paper
- Visitor from the past: who is a researcher from the early 2000s. They must discuss how they comprehend the results of the paper, what they like or dislike about the settings and benchmarks considered, and what surprises them the most about presented results
Prerequisites:
There are no official prerequisites but a comfortable grasp of probability, linear algebra, machine learning is expected. This is an advanced class that is fast-paced and research focused.
Course Requirements
- Regular participation (25%): Written summaries of assigned readings must be submitted before each class, plus participation in online discussion
- Paper presentation (40%): A student must present 1-2 paper presentations throughout the class. A paper will be presented by 2 students where each student takes on the role of either a positive or negative reviewer and one other role from the list above
- Class participation during lectures and paper discussions (10%)
- Final project (25%) if taking for letter grade
Important Dates
- Project proposal: due Oct 20 2024. See guidelines here
- Midway project check: Nov 15 2024
- Project reports in style of NeurIPS paper: Dec 9 2024
- Final project presentations: Dec 2 2024, Dec 4 2024 (during class hours)
Schedule
Date |
Topic |
Content |
Presenter |
08/26/2024 |
Lecture 1: Introduction |
- Why does this course exist?
- Course logistics
- Overview of the course: scientific method to vaidate hypotheses, analysis of stylistic models
|
Aditi Raghunathan |
08/28/2024 |
Lecture 2: From ''classical'' ML to foundation models |
- Supervised learning
- Semi-supervised learning
- Transfer learning
- Self-supervised learning
|
Aditi Raghunathan |
09/02/2024 |
Labor Day (No class) |
|
|
09/04/2024 |
Guest lecture |
The pitfalls of next-token prediction
|
Vaishnavh Nagarajan |
09/09/2024 |
Lecture moved to Friday 09/13/2024 |
|
|
09/11/2024 |
Paper discussion 1 |
|
|
09/13/2024 |
Lecture 3: The generalization puzzle |
- Bias-variance tradeoff and double descent
- Uniform convergence
- Implicit regularization
|
Aditi Raghunathan |
09/16/2024 |
Paper discussion 2 |
|
|
09/18/2024 |
Lecture 4: Scaling laws |
|
Aditi Raghunathan |
09/23/2024 |
Paper discussion 3 |
|
|
09/25/2024 |
Lecture 5: Downstream capabilities of pretrained models |
- Representation learning
- Few-shot learning
- In-context learning
|
Aditi Raghunathan |
09/30/2024 |
Paper discussion 4 |
|
|
10/02/2024 |
Lecture 6: A critical look at capabilities |
- Out-of-distribution generalization
- Shortcuts and simplicity bias
- Causality
|
Aditi Raghunathan |
10/07/2024 |
Paper discussion 5 |
|
|
10/09/2024 |
Paper discussion 6 |
|
|
10/14/2024 |
Fall break (No class) |
|
|
10/16/2024 |
Fall break (No class) |
|
|
10/21/2024 |
Lecture 7: A critical look at capabilities - part 2 |
- Out-of-distribution generalization
- Shortcuts and simplicity bias
- Causality
|
Aditi Raghunathan |
10/23/2024 |
Lecture 8: Post-training |
- Fine-tuning
- Preference optimization
- Alignment, safety and robustness
|
Aditi Raghunathan |
10/28/2024 |
Paper discussion 7 |
|
|
10/30/2024 |
Paper discussion 8 |
|
|
11/04/2024 |
Lecture 9 |
- Weak-to-strong generalization
- Synthetic data: strengths and limits
- Connections to classical ideas of self-training, domain adaptation
|
Aditi Raghunathan |
11/06/2024 |
Paper discussion 9 |
|
|
11/11/2024 |
Paper discussion 10 |
|
|
11/13/2024 |
Guest Lecture |
|
Tim Dettmers |
11/18/2024 |
Guest Lecture: Inference-time methods |
|
Sean Welleck |
11/20/2024 |
Paper discussion 11 |
|
|
11/25/2024 |
Paper discussion 12 |
|
|
12/2/2024 |
Project Presentations |
|
|
12/4/2024 |
Project Presentations |
|
|