16745: Optimal Control and Reinforcement Learning
Spring 2020, TT 4:305:50 GHC 4303
Instructor: Chris Atkeson, cga@cmu.edu
TA: Ramkumar Natarajan rnataraj@cs.cmu.edu, Office hours Thursdays 67 Robolounge NSH 1513
Events of Interest
TBA
Items of Interest
DeepMind researchers introduce hybrid solution to robot control problems
Ubisoft Builds New AI Algorithm that Uses Reinforcement Learning to Teach Driving to Itself,
another article.
Ancestry turned to AI to bring down cloud costs
Last year's course

Jan 14: Introduction to the course.
Goal: Introduce course.
This years emphasis is TBD

Jan 16: AlphaZero/MuZero
Goal: Introduce you to an impressive example of reinforcement learning (its biggest success). If AI had a Nobel Prize, this work would get it.
Read MuZero: The triumph of the modelbased approach, and the reconciliation of engineering and machine learning approaches to optimal control and reinforcement learning.

For Jan 30: Read nice reinforcement learning paper:
Goal: Introduce you to another impressive example of reinforcement learning.
Read this article, Learning agile and dynamic motor skills for legged robots.
Summary video,
Just robot abuse video.
Running fast video (note asymmetry/"limp").
I have some comments in this article.
More comments from me.

Jan 21: Function Optimization Example
Goal: Introduce you to a useful tool, MATLAB
and its optimization subroutines, and show you how to use them on an example.
Robotics: redundant inverse kinematics.
Using Matlab's fminsearch and fminunc.
Using Matlab's fminsearch and fminunc, with
desired posture.
Using Matlab's fmincon.
Relationship of Jacobian approach to gradient descent.

Jan 23: Handling 3D Orientation
Goal: Enable you to do 3D robotics using optimization (and do the inverse kinematics assignment).
Rotation matrices,
Euler angles, and
Quaternions.
Metrics for how close two orientations are:
Metrics for 3D Rotations: Comparison and Analysis,
RigidBody Attitude Control: Using Rotation Matrices for Continuous, SingularityFree Control Laws,
ClosedLoop Manipulator Control Using Quaternion Feedback
Rotation matrix for small rotations

Jan 28:
Function optimization using
first and
second
order gradient methods
Goal: Review gradient descent approaches.
A nice chapter on function optimization techniques:
Numerical Recipes in C, chapter 10
(2nd or 3rd edition, 2nd edition is electronically available for free
under Obsolete Versions):
Minimization or Maximization of Functions,
This material from any other numerical methods book is also fine.
Resources:
Matlab fminunc,
Numerical Recipes,
GSL,
AMPL,
NEOS,
software list 1,
Useful
software guide,
gradient method,
line search,
conjugate gradient,
conjugate gradient v2,
quasiNewton/variable metric methods,
Newton's method,
Levenberg Marquardt,
Reduced dimensionality second order methods.
Other lectures:
Stanford MSandE 311;
U. Stuttgart: Toussaint
Papers:
Optimization Methods for LargeScale Machine Learning;
Identifying and attacking the saddle point problem in
highdimensional nonconvex optimization

Talk about Covariant rollout.

Talk about robot example:
Learning agile and dynamic motor skills for legged robots.

Jan 30: Nongradient ("derivativefree") function optimization methods:
Goal: Review nongradient approaches.
hill climbing
(including
local search,
local unimodal sampling,
pattern search,
random search,
random optimization),
Nelder Mead/Simplex/Amoeba method,
Matlab fminsearch,
simulated annealing,
fit surfaces (for example
Response Surface Methodology (RSM),
Memorybased Stochastic Optimization, and
Q2),
evolutionary algorithms,
genetic algorithms,
and ...
Paper:
Derivativefree optimization: A review of algorithms and comparison of software implementations by Luis Miguel Rios and Nikolaos V. Sahinidis,
Book: Introduction to DerivativeFree Optimization

Jan 30:
Covariance Matrix Adaptation Evolution Strategy.
Goal: Understand currently popular state of the art method.
See also Hansen web page.
Example1,
Ex2,
Ex3,
Ex4.

Feb 4: Constraints.
Goal: Understand how to handle constraints.
Soft/hard constraints, penalty functions,
Barrier functions,
Lagrange Multipliers,
KarushKuhnTucker conditions,
Slack variables,
Augmented Lagrangian method,
Interior point methods vs. Simplex methods vs. soft constraint methods.

Feb 4:
Quadratic Programming and
Sequential quadratic programming,
Goal: Understand QP components used in state of the art robot control.
Matlab fmincon.
SNOPT,
CVXGEN,
IPOPT

Feb 4: Dynamics and Numerical Integration
Goal: Review "mental practice".
Continuous time dynamics, discrete time dynamics. Euler integration, Forward and inverse dynamics. Linearization.

Feb 6: Formulating trajectory optimization as function optimization.
Goal: Use the tools we have so far to do trajectory optimization.
Examples of formulating a trajectory optimization problem
as a function optimization problem:
Case Studies In Trajectory Optimization: Trains, Planes, And Other
Pastimes,
Robert J. Vanderbei
Example use of AMPL
A free trial version of AMPL is available from here.
AMPL is also available for remote use through the Neos Server.
Click on SNOPT/[AMPL Input] under Nonlinearly Constrained Optimization.
Example use of Matlab: pend1xu,
pend1u,
pend1x
Spacetime Optimization: Witkin paper text
Witkin paper figures

Feb 6:
Use of splines in trajectory optimization.
Goal: Force smooth solutions.
Cubic Hermite spline.
Quintic Hermite interpolation.
Collocation,
Pseudospectral X.
Wavelets

Feb 11: Policy optimization I: Use function optimization.
Goal: Optimize feedback.
What is a policy?
Known in machine learning/reinforcement learning as policy search or refinement, ...
slides
See examples in CMAES section for policy optimization.

Feb 11: Ways to robustify function optimization:
Goal: Tricks of the trade.
Problems: How choose method?, more of an art than a science, local minima, bad answers, discontinuities, redundant/rank deficient constraints,
bad scaling, no formulas for derivatives, you are lazy, computational cost.
Techniques: Levenberg Marquardt,
Trust regions,
line search,
scaling and preconditioning, regularize parameters, soft constraints,
sparse methods,
Continuation Methods,
Paper on continuation methods,
Hand of God, allow constraint violations, add extra constraints, use multiple starts, use multiple methods, optimize metaparameters (learn to learn)
Matlab recommendations for optimization,
more,
more,
global optimization,
more

Feb 13:
Dynamic Programming.
Goal: Use of value function is what makes optimal control special.
Bellman equation,
slides

Feb 18:
Linear Quadratic Regulator,
Goal: An important special case.
Riccati Equation,
Differential Dynamic Programming

Feb 20: Ways to reduce the curse of dimensionality
Goal: Tricks of the trade.
slides

Policy Optimization II: Optimization using modelbased gradients
Goal: The Chain Rule Is Powerful.
slides

Robustness
Goal: How To Handle Bad Models.
Robustness to random disturbances, varying initial conditions, parametric
model error, structural modeling error such as
high frequency unmodelled dynamics,
and model jumps (touchdown and liftoff during walking, for example).
Monte Carlo trajectory/policy optimization.
Monte carlo financial planning.

Robustness using Linear Matrix Inequalities
Goal: Handling Parametric Uncertainty.
Robustness to parametric uncertainty in the linear(ized) model.
Tutorial on LMIs,
Slides: Continuous time stability slide 47, Discrete time stability slide 51

Receding Horizon Control
(a.k.a. Model Predictive Control (MPC))
Goal: Online Optimization.

Robustness: Policy Optimization with Multiple Models.
Goal: A powerful tool to handle all kinds of uncertainty.
MonteCarlo, DP, and DDP approaches to Multiple Models.

Bayesian Filters
Goal: Explicitly model uncertainty.
State Estimation,
Uncertainty Propagation:
Gaussian Propagation (like Kalman Filter),
Unscented (like Unscented Filter), Second Order Kalman Filter (See Kendrick below).
Review of Gaussians slides
State estimation slides
Matlab Kalman filter example
and
minimum jerk trajectory subroutine.
Example mobile robot Kalman filter slides

Robustness and state estimation:
Goal: How to combine state estimation and control.
LinearquadraticGaussian control (LQG),
Separation principle, Certainty equivalence,
Example of bad interactions, Loop Transfer Recovery (LTR),
A paper on the topic,
Policy optimization approaches.

Dual Control.
Simple example.
Information state DP.

Local Approaches to Dual Control/Stochastic DDP
Information state trajectory optimization.
Stochastic Control for Economic Models,
David Kendrick, Second Edition 2002.

A*like algorithms: R*

Avoiding obstacles using samplingbased methods: RRT,
slides
Projected RRT,
RRT*
slides
video 1
video 2
LQRRRT*
Random Sampling DP

Avoiding obstacles using gradient methods: CHOMP
STOMP

Learning From Demonstration

Reinforcement Learning: Model free policy gradient. Use trajectories to
determine outcomes.
Kober, J.; Peters, J. (2011). Policy Search for Motor Primitives in Robotics, Machine Learning, 84, 12, pp.171203
NIPS Tutorial 2016: Deep Reinforcement Learning Through Policy Optimization
10703 lecture notes I
Proximal Policy Optimization

Reinforcement Learning: Model free actorcritic: Model Q function to determine outcomes.
10703 lecture notes II
Continuous control with deep reinforcement learning

What's new (2018 version)?

Comparison of various RL methods
Freek Stulp and Olivier Sigaud. Path Integral Policy Improvement with Covariance Matrix Adaptation. In Proceedings of the 29th International Conference on Machine Learning (ICML), 2012.
Linear policies work: Towards Generalization and Simplicity in Continuous Control
Simple random search provides a competitive approach to reinforcement learning
Simple Nearest Neighbor Policy Method for Continuous Control Tasks, reddit commentary
Neural Network Dynamics
for ModelBased Deep Reinforcement Learning
with ModelFree FineTuning
Deep Reinforcement Learning for Dexterous Manipulation with Concept Networks
Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Inverse Reinforcement Learning.
Abbeel slides
Finn slides
10703 lecture

What's new (2017 version)?

Combine trajectory optimization (modelbased) and policy learning (modelfree).
I did some work on this 20+ years ago. Now it is coming back.
Robot Learning From Demonstration, ICML '97, (postscript),
Learning tasks from a single demonstration, ICRA '97,
Nonparametric ModelBased Reinforcement Learning, NIPS '97,
Using Local Trajectory Optimizers To Speed Up Global Optimization in Dynamic Programming, NIPS 93
Random Sampling of States in Dynamic Programming, Trans SMC, 2008
Combining ModelBased and ModelFree Updates for TrajectoryCentric Reinforcement Learning

Create primitives and learn to combine them. (Libin Liu).
Akihiko Yamaguchi
Google semantic event chains
Freek Stulp
Karinne RamirezAmaro
another Cheng paper

What did the Berkeley folks say?
See second
half of Sergey Levine's lecture
Finn's
lecture on Transfer, Learning to Learn
See last slide of Abbeel's lecture

Review of Traditional Approaches
Trajectory optimization based on integrating the dynamics:
calculus of variations,
EulerLagrange equation,
Discrete time Pontryagin's minimum principle,
Pontryagin's minimum principle,
HamiltonJacobiBellman equation,
costate equations,
shooting methods,
multiple shooting methods,
KarushKuhnTucker conditions
Continuation Methods,
Metaoptimization,
Learning during optimization

Automatic differentiation
Goal: Learn how taking derivatives is much easier than you thought.

Gaussian Process Optimization.
Goal: The role of knowledge in optimization.
When solving the same kind of problem many times:
Learn about the function: remember previous answers, bases of attraction,
features like saddle points (zero gradients), optimization paths, ...
Learn about which optimization method works best: Metaoptimization.
Assume or learn a structure for the function (kernel in GP is an example).

Finding Better Ways To Do Task
Goal: Think about an important current research problem.

...

April 28 & 30: Project presentations

May 17 (May 12 for graduating students): Final project writeup (web page) due.
Assignments

Assignment 0 (Due Jan. 20): Send CGA and TA email:
Who are you?
Why are you here?
What research do you do?
Describe any optimization you have done (point me to papers or
web pages if they exist).
Any project ideas?
What topics would you especially like the course to cover?
Be sure your name is obvious in the email, and you mention the course
name or number. I teach more than one course, and a random email from
robotlover@cs.cmu.edu is hard for me to process.

Assignment 1 (Due Jan 27): Using Optimization
to do Inverse Kinematics
Project
The project will involve performing a substantial dynamic optimization,
and writing a paper about it. The writeup is as important as the programming
(if not more so) and will be in the format of a conference paper
(more on that later). Those of you who already have a dynamic optimization
problem you are working on for your research should work on that (subject
to the Professor's approval). Another option is to work on the problem
or algorithm
described below (subject
to the Professor's approval).