16-745: Optimal Control and Reinforcement Learning
Spring 2026, MW 3:30-4:50 HOA 160
Instructor: Chris Atkeson, cga@cmu.edu
TAs: Krishna Suresh, ksuresh2@andrew
Anoushka Alavilli, apalavil@andrew
Mohit Javale, mjavale@andrew
Office hours TBA.
Zoom (Passcode: 019479)
Sign up for Piazza
Events of Interest
TBA
Items of Interest
TBA
Last year's course
Last year's lectures
-
See last years course and the last time Atkeson taught this course
for possible topics.
-
Jan 12: Introduction to the course.
Goal: Introduce course.
Slides: 16-745-Course-Introduction.pdf
Zoom recording failed.
-
Jan 14: Introduction to Gradient-Based Function Optimization.
Goal: Introduce gradient-based function optimization.
First 3 Zac lectures.
Zoom recording (Passcode: U1F#N$#W)
-
Jan 21: Who are the TAs?
Zoom recording Passcode: Z?=2hrn5
-
Jan 26: Class is on ZOOM (link is above).
2nd order methods: Role of Hessian Slides
Line Search (Zac lecture 4). Subspace search Slides
Non-gradient function optimization Slides
Constrained optimization,
Zac lectures 4 - equality constraints, and 5 - inequality constraints,
CGA slides
Zoom recording part 1 Passcode: ex2@?04.
Zoom recording part 2 Passcode: !r3VSg?s
-
Jan 28: Stuff
Linear (LP) and quadratic programs (QP).
see last slide
States etc.
Slides
Dynamic Programming
Zoom recording Passcode: BOh@6^uC
-
Feb 2:
Deriving dynamics (use ChatGPT or equivalent AI):
Intro slides,
Actuated inverted pendulum,
My derivation of TWIP dynamics,
ChatGPT's derivation of TWIP dynamics and linearizing TWIP dynamics,
older slides: CGA Lecture 10
Linearizing dynamics
Linear control A = dF/dx, B = dF/du
LQR, DDP
Zoom recording Passcode: EBN=Mr9u
-
Feb 4:
next assignment
Dynamic Programming
Zoom recording Passcode: sH?vC@9v
-
Feb 9:
Trajectory optimization: Zac lecture 10 and 13
Feasible trajectory approaches: first order gradient descent, DDP is 2nd order gradient descent.
Dynamics as a constraint
Temporally breaking trajectory up, multiple shooting
SQP
A*
David Vos's Robot Unicycle at MIT
Unicycle dynamics v2
Zoom recording Passcode: 3Q*#vBB#
-
Feb 11:
How can parallelism help?
Zoom recording Passcode: ms9X1?^p
-
Feb 16:
State estimation and KF
Controlling uncertainty
Dual control (continuous state case)
Discrete state case.
Zoom recording Passcode: 7dH*4N&!
-
Feb 18:
RL w/ neural networks
Zoom recording Passcode: ?v8RdU.0
-
Feb 23:
Policy Gradient (REINFORCE)
PPO
Zoom recording Passcode: qt+!T9uT
-
Feb 25:
Comparing model-free and model-based optimal control/RL
Are model-based approaches more biased than model-free approaches?
Zoom recording Passcode: O@TzD&7%
-
Mar 9:
Project presentations.
Presentation signup list
Max 10 presentations per class.
-
Mar 11:
Project presentations.
-
Mar 16:
Project presentations.
-
Mar 18:
-
Mar 23:
-
Mar 25:
-
Mar 30:
-
Apr 1:
-
Apr 6:
-
Apr 8:
-
Apr 13:
-
Apr 15:
-
Apr 13:
-
Apr 15:
-
April 20 & 22: Project presentations
-
-
May 5 - Graduating students have to have turned everything in.
-
May 10 - All students have to have turned everything in.
Assignments
-
Assignment 0 (Due Jan. 20): Send CGA and TAs email:
Who are you?
Why are you here?
What research do you do?
Describe any optimization you have done (point me to papers or
web pages if they exist).
Any project ideas?
What topics would you especially like the course to cover?
Be sure your name is obvious in the email, and you mention the course
name or number. I teach more than one course, and a random email from
robotlover@cs.cmu.edu is hard for me to process.
-
Assignment 1: Due Feb 3.
-
Assignment 2: Due March 15.
Project
The project will involve performing a substantial dynamic optimization,
and writing a paper about it. The writeup is as important as the programming
(if not more so) and will be in the format of a conference paper
(more on that later). Those of you who already have a dynamic optimization
problem you are working on for your research should work on that (subject
to the Professor's approval). The Professor and TAs can also work with you
to find topics of interest.