Welcome to Theoretical and Empirical Foundations of Modern Machine Learning
15-789, Fall 2024!

Instructor: Aditi Raghunathan (raditi at cmu dot edu) OH: Fri 10-11am

TA: Jacob Springer (jspringe at andrew dot cmu dot edu) OH: Tue 5-6pm

Lectures: Monday, Wednesday 3:30-5:00pm at TEP 1403

Overview


The Fall 2024 iteration of this class will focus on foundation models. Foundation models have heralded a new era in modern machine learning. They are trained on massive raw data at scale and work well on a wide range of tasks with little to no fine-tuning. In this advanced machine learning seminar class, we will build a principled understanding of when and why they work or fail, and avenues for improving their reliability and trustworthiness. The class aims to equip students with the ability to critically reason about and build a more principled understanding of current advances which will hopefully spark their own research.

The Fall 2022 offering can be accessed here.

Format


This course combines lectures with paper presentations by the students, encouraging both fundamental knowledge acquisition as well as open-ended discussions and new research directions. The lectures will briefly introduce the main concepts, summarize a few key papers and connect to classical ideas if applicable.

The paper discussions will involve role-playing student seminars inspired by Alec Jacobson and Colin Raffel. We will be adopting the following roles.

  • Positive reviewer: who advocates for the paper to be accepted at a conference (e.g., NeurIPS)
  • Negative reviewer: who advocates for the paper to be rejected at a conference (e.g., NeurIPS)
  • Archaeologist: who determines where this paper sits in the context of previous and subsequent work. They must find and report on atleast one older paper cited within the current paper that substantially influenced the current paper and atleast one newer paper that cites this current paper. Keep an eye out for follow-up work that contradicts the takeaways in the current paper
  • Academic researcher: who proposes potential follow-up projects not just based on the current paper but also only possible due to the existence and success of the current paper
  • Visitor from the past: who is a researcher from the early 2000s. They must discuss how they comprehend the results of the paper, what they like or dislike about the settings and benchmarks considered, and what surprises them the most about presented results

Prerequisites:

There are no official prerequisites but a comfortable grasp of probability, linear algebra, machine learning is expected. This is an advanced class that is fast-paced and research focused.

Course Requirements

  • Regular participation (25%): Written summaries of assigned readings must be submitted before each class, plus participation in online discussion
  • Paper presentation (40%): A student must present 1-2 paper presentations throughout the class. A paper will be presented by 2 students where each student takes on the role of either a positive or negative reviewer and one other role from the list above
  • Class participation during lectures and paper discussions (10%)
  • Final project (25%) if taking for letter grade

Important Dates

  • Project proposal: due Oct 20 2024. See guidelines here
  • Midway project check: Nov 15 2024
  • Project reports in style of NeurIPS paper: Dec 9 2024
  • Final project presentations: Dec 2 2024, Dec 4 2024 (during class hours)

Schedule



Date Topic Content Presenter
08/26/2024 Lecture 1: Introduction
  • Why does this course exist?
  • Course logistics
  • Overview of the course: scientific method to vaidate hypotheses, analysis of stylistic models
Aditi Raghunathan
08/28/2024 Lecture 2: From ''classical'' ML to foundation models
  • Supervised learning
  • Semi-supervised learning
  • Transfer learning
  • Self-supervised learning
Aditi Raghunathan
09/02/2024 Labor Day (No class)
09/04/2024 Guest lecture The pitfalls of next-token prediction Vaishnavh Nagarajan
09/09/2024 Lecture moved to Friday 09/13/2024
09/11/2024 Paper discussion 1
09/13/2024 Lecture 3: The generalization puzzle
  • Bias-variance tradeoff and double descent
  • Uniform convergence
  • Implicit regularization
Aditi Raghunathan
09/16/2024 Paper discussion 2
09/18/2024 Lecture 4: Scaling laws Aditi Raghunathan
09/23/2024 Paper discussion 3
09/25/2024 Lecture 5: Downstream capabilities of pretrained models
  • Representation learning
  • Few-shot learning
  • In-context learning
Aditi Raghunathan
09/30/2024 Paper discussion 4
10/02/2024 Lecture 6: A critical look at capabilities
  • Out-of-distribution generalization
  • Shortcuts and simplicity bias
  • Causality
Aditi Raghunathan
10/07/2024 Paper discussion 5
10/09/2024 Paper discussion 6
10/14/2024 Fall break (No class)
10/16/2024 Fall break (No class)
10/21/2024 Lecture 7: A critical look at capabilities - part 2
  • Out-of-distribution generalization
  • Shortcuts and simplicity bias
  • Causality
Aditi Raghunathan
10/23/2024 Lecture 8: Post-training
  • Fine-tuning
  • Preference optimization
  • Alignment, safety and robustness
Aditi Raghunathan
10/28/2024 Paper discussion 7
10/30/2024 Paper discussion 8
11/04/2024 Lecture 9
  • Weak-to-strong generalization
  • Synthetic data: strengths and limits
  • Connections to classical ideas of self-training, domain adaptation
Aditi Raghunathan
11/06/2024 Paper discussion 9
11/11/2024 Paper discussion 10
11/13/2024 Guest Lecture
Tim Dettmers
11/18/2024 Guest Lecture: Inference-time methods
Sean Welleck
11/20/2024 Paper discussion 11
11/25/2024 Paper discussion 12
12/2/2024 Project Presentations
12/4/2024 Project Presentations