16-745: Dynamic Optimization
Instructor: Chris Atkeson, cga at cmu
TT 3-4:20 NSH 3002
Events of Interest
Items of Interest
AI can beat us at games, but sometimes,
thats by cheating - MIT Technology Review
A computer was trained to play Qbert and immediately broke the game in a way no human ever has
Learning by playing | DeepMind
Welcoming the Era of Deep Neuroevolution | Uber Engineering Blog
Talk by Abbeel
Can increasing depth serve to accelerate optimization?
Are more parameters better?
This is the talk I recommended.
Interesting blog about Deep (and Model-Free) Reinforcement Learning
The successes mentioned apply to Deep RL. The criticisms actually
apply to all model-free RL approaches.
Last year's course
Jan 16: Introduction to the course.
Goal: Introduce course.
This years emphasis is DEEP RL
Jan 18: Alpha Go example
Jan 23: Function Optimization Example
Goal: Introduce you to a useful tool, MATLAB
and its optimization subroutines, and show you how to use them on an example.
Robotics: redundant inverse kinematics.
Using Matlab's fminsearch and fminunc.
Using Matlab's fminsearch and fminunc, with
Using Matlab's fmincon.
Relationship of Jacobian approach to gradient descent.
Jan 25: Handling 3D Orientation
Goal: Enable you to do 3D robotics using optimization (and do the inverse kinematics assignment).
Euler angles, and
Metrics for how close two orientations are:
Metrics for 3D Rotations: Comparison and Analysis,
Rigid-Body Attitude Control: Using Rotation Matrices for Continuous, Singularity-Free Control Laws,
Closed-Loop Manipulator Control Using Quaternion Feedback
Rotation matrix for small rotations
Function optimization using
order gradient methods
Goal: Review gradient descent approaches.
A nice chapter on function optimization techniques:
Numerical Recipes in C, chapter 10
(2nd or 3rd edition, 2nd edition is electronically available for free
under Obsolete Versions):
Minimization or Maximization of Functions,
This material from any other numerical methods book is also fine.
software list 1,
conjugate gradient v2,
quasi-Newton/variable metric methods,
Reduced dimensionality second order methods.
Stanford MSandE 311;
U. Stuttgart: Toussaint
Optimization Methods for Large-Scale Machine Learning;
Identifying and attacking the saddle point problem in
high-dimensional non-convex optimization
A Biased History of Artificial Neural Networks
Goal: Make gradient descent and the chain rule more interesting.
Rectifier units (ReLU),
Feb 1: Non-gradient ("derivative-free") function optimization methods:
Goal: Review non-gradient approaches.
local unimodal sampling,
Nelder Mead/Simplex/Amoeba method,
fit surfaces (for example
Response Surface Methodology (RSM),
Memory-based Stochastic Optimization, and
Derivative-free optimization: A review of algorithms and comparison of software implementations by Luis Miguel Rios and Nikolaos V. Sahinidis,
Book: Introduction to Derivative-Free Optimization
Covariance Matrix Adaptation Evolution Strategy.
Goal: Understand currently popular state of the art method.
See also Hansen web page.
Feb 6: Gaussian Process Optimization.
Goal: The role of knowledge in optimization.
When solving the same kind of problem many times:
Learn about the function: remember previous answers, bases of attraction,
features like saddle points (zero gradients), optimization paths, ...
Learn about which optimization method works best: Meta-optimization.
Assume or learn a structure for the function (kernel in GP is an example).
Feb 6: Constraints.
Goal: Understand how to best handle constraints.
Soft/hard constraints, penalty functions,
Augmented Lagrangian method,
Interior point methods vs. Simplex methods vs. soft constraint methods,
Quadratic Programming and
Sequential quadratic programming,
Goal: Understand QP components used in state of the art robot control.
Goal: Learn how taking derivatives is much easier than you thought.
Dynamics and Numerical Integration
Goal: Review "mental simulation".
Continous time, discrete time. Euler integration, Forward and inverse dynamics. Linearization.
Formulating trajectory optimization as function optimization.
Goal: Use the tools we have so far to do trajectory optimization.
Examples of formulating a trajectory optimization problem
as a function optimization problem:
Case Studies In Trajectory Optimization: Trains, Planes, And Other
Robert J. Vanderbei
Example use of AMPL
A free trial version of AMPL is available from here.
AMPL is also available for remote use through the Neos Server.
Click on SNOPT/[AMPL Input] under Nonlinearly Constrained Optimization.
Example use of Matlab: pend1-x-u,
Spacetime Optimization: Witkin paper text
Witkin paper figures
Use of splines in trajectory optimization.
Goal: Force smooth solutions.
Cubic Hermite spline.
Quintic Hermite interpolation.
Policy optimization I: Use function optimization.
Goal: Optimize feedback.
What is a policy?
Known in machine learning/reinforcement learning as policy search or refinement, ...
See examples in CMA-ES section for policy optimization.
Ways to robustify function optimization:
Goal: Tricks of the trade.
Problems: How choose method?, more of an art than a science, local minima, bad answers, discontinuities, redundant/rank deficient constraints,
bad scaling, no formulas for derivatives, you are lazy, computational cost.
Techniques: Levenberg Marquardt,
scaling and preconditioning, regularize parameters, soft constraints,
Paper on continuation methods,
Hand of God, allow constraint violations, add extra constraints,
Goal: This is what makes dynamic optimization special.
Linear Quadratic Regulator,
Goal: An important special case.
Differential Dynamic Programming
Ways to reduce the curse of dimensionality
Goal: Tricks of the trade.
Policy Optimization II: Optimization using model-based gradients
Goal: The Chain Rule Is Powerful.
Goal: How To Handle Bad Models.
Robustness to random disturbances, varying initial conditions, parametric
model error, structural modeling error such as
high frequency unmodelled dynamics,
and model jumps (touchdown and liftoff during walking, for example).
Monte Carlo trajectory/policy optimization.
Monte carlo financial planning.
Robustness using Linear Matrix Inequalities
Goal: Handling Parametric Uncertainty.
Robustness to parametric uncertainty in the linear(ized) model.
Tutorial on LMIs,
Slides: Continuous time stability slide 47, Discrete time stability slide 51
Receding Horizon Control
(a.k.a. Model Predictive Control (MPC))
Goal: Online Optimization.
Robustness: Policy Optimization with Multiple Models.
Goal: A powerful tool to handle all kinds of uncertainty.
Monte-Carlo, DP, and DDP approaches to Multiple Models.
Finding Better Ways To Do Task
Goal: Think about an important current research problem.
Goal: Explicitly model uncertainty.
Gaussian Propagation (like Kalman Filter),
Unscented (like Unscented Filter), Second Order Kalman Filter (See Kendrick below).
Review of Gaussians slides
State estimation slides
Matlab Kalman filter example
minimum jerk trajectory subroutine.
Example mobile robot Kalman filter slides
Robustness and state estimation:
Goal: How to combine state estimation and control.
Linear-quadratic-Gaussian control (LQG),
Separation principle, Certainty equivalence,
Example of bad interactions, Loop Transfer Recovery (LTR),
A paper on the topic,
Policy optimization approaches.
Information state DP.
Local Approaches to Dual Control/Stochastic DDP
Information state trajectory optimization.
Stochastic Control for Economic Models,
David Kendrick, Second Edition 2002.
A*-like algorithms: R*
Avoiding obstacles using sampling-based methods: RRT,
Random Sampling DP
Avoiding obstacles using gradient methods: CHOMP
Learning From Demonstration
Reinforcement Learning: Model free policy optimization.
Kober, J.; Peters, J. (2011). Policy Search for Motor Primitives in Robotics, Machine Learning, 84, 1-2, pp.171-203
Comparison of various RL methods: CMA-ES, CEM, PI2.
Freek Stulp and Olivier Sigaud. Path Integral Policy Improvement with Covariance Matrix Adaptation. In Proceedings of the 29th International Conference on Machine Learning (ICML), 2012.
Inverse Reinforcement Learning.
Combine trajectory optimization (model-based) and policy learning (model-free).
I did some work on this 20+ years ago. Now it is coming back.
Robot Learning From Demonstration, ICML '97, (postscript),
Learning tasks from a single demonstration, ICRA '97,
Nonparametric Model-Based Reinforcement Learning, NIPS '97,
Using Local Trajectory Optimizers To Speed Up Global Optimization in Dynamic Programming, NIPS 93
Random Sampling of States in Dynamic Programming, Trans SMC, 2008
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
Create primitives and learn to combine them. (Libin Liu).
Google semantic event chains
another Cheng paper
What did the Berkeley folks say?
half of Sergey Levine's lecture
lecture on Transfer, Learning to Learn
See last slide of Abbeel's lecture
Review of Traditional Approaches
Trajectory optimization based on integrating the dynamics:
calculus of variations,
Discrete time Pontryagin's minimum principle,
Pontryagin's minimum principle,
multiple shooting methods,
Learning during optimization
May 1: Project presentations
May 3: Project presentations
May ?: Project Writeups Due
Assignment 0 (Due Jan. 20): Send CGA email:
Who are you?
Why are you here?
What research do you do?
Describe any optimization you have done (point me to papers or
web pages if they exist).
Any project ideas?
What topics would you especially like the course to cover?
Be sure your name is obvious in the email, and you mention the course
name or number. I teach more than one course, and a random email from
firstname.lastname@example.org is hard for me to process.
Assignment 1 (Due Jan. 31): Using Optimization
to do Inverse Kinematics
Assignment 2 (Due Mar. 18): Using Optimization
to do Policy Optimization
Other relevant classes