Research Directions for Walking

I get a lot of email from potential students who have obviously not looked at my web pages. I throw that email away. To let me know you have read at least this far, here is a secret code to include in your email: 7592

I am looking for students to develop controllers for humanoid robots, especially for walking. Garth Zeglin has built a series of smaller electric bipeds for this research, and we have a larger hydraulic biped from Sarcos. We are exploring several different approaches (see below). These approaches look like optimal control approaches, but become learning approaches when a model must be learned as the robot attempts the task. Learning also plays a role in speeding up this type of planning, and leading to better plans.

Plan A: Policy Search

Policy search is currently one of the "hot" approaches to robot learning. A controller is manually defined, and then automatic algorithms are used to find good parameters for the controller. One big question is how to avoid manually defining the structure and parameters of the controller. Another is finding good optimization procedures that can handle very expensive evaluations, discontinuities and plateaus in the criteria, and the need to not destroy the robot. Some papers you can read about this:

Inverted autonomous helicopter flight via reinforcement learning, Andrew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben Tse, Eric Berger and Eric Liang. In International Symposium on Experimental Robotics, 2004.
Rhex. Look in IEEEXplore for Weingarten papers. There should be an ICRA03 paper which is better than the ICRA04 paper listed here.
German Aibo Robo Soccer
CMU Aibo Robo Soccer
Texas Aibo Robo Soccer
English Aibo Robo Soccer
Automatic gait Optimisation for Quadruped Robots M Kim, W Uther Proceedings of 2003 Australasian conference on robotics and automation
Hornby's stuff

Numerical Recipes is online (free) and has a nice explanation of function optimization techniques. Matlab has an optimization toolbox that is useful as well. The GNU Scientific Library has optimization code. Genetic algorithms code is widely available on the Web.

Here is a biped simulator written in C, with graphics for the X11 windows system (so it is easy to run it in Unix/Linux, but needs some conversion to run it in Windows): Zip format
Look in biped/biped/notes to get some instructions. It uses the Numerical Recipes in C library, which you must provide.

Plan B: Dynamic Programming

Remi Munos has developed some nice algorithms for large scale dynamic programming.

Plan C: Reduced Dimensional Dynamic Programming

Mike Stilman developed a planning approach that used simplified models (ICRA 2005). Eric Whitman developed an approach based on decomposing the problem (Humanoids 2009). These approaches need to be tested more thoroughly, and then applied to actual bipeds.

Plan D: Dynamic Programing Using Key Trajectories

We use second order local plan optimization to generate locally optimal plans and local models of the value function and its derivatives. We combine many local plans to build more global plans. We maintain global consistency of the local models of the value function, guaranteeing that our locally optimal plans are actually globally optimal, up to the resolution of the search procedures. Random Sampling of States in Dynamic Programming, Christopher Atkeson and Benjamin Stephens, NIPS 2007; Multiple Balance Strategies From One Optimization Criterion, Christopher Atkeson and Benjamin Stephens, Humanoids 2007. Review: Random Sampling of States in Dynamic Programming, Christopher Atkeson and Benjamin Stephens, IEEE Transactions on Ssytems, Man, and Cybernetics - Part B: Cybernetics, Vol. 38, No. 4, pp. 924-929, 2008.

Prior work: [NIPS 1993 (pdf) (ps.gz)] [NIPS 2002] Morimoto and Atkeson have developed robust versions of the local trajectory planner. [IROS 2003]

Plan E: Poincare Dynamic Programing

Nonparametric representation of an approximated Poincare map for learning biped locomotion, J. Morimoto and C. G. Atkeson, Autonomous Robots 27(2):131-144, 2009.

Learning Biped Locomotion: Application of Poincare-map-based reinforcement learning, Jun Morimoto and Christopher G. Atkeson, IEEE Robotics & Automation Magazine, 14(2):41-51 June 2007.

Controlling Velocity In Bipedal Walking: A Dynamic Programming Approach, Thijs Mandersloot, Martijn Wisse, and Christopher G. Atkeson, Humanoids 2006.

Plan F: Trajectory Optimization

We will try applying several trajectory-based optimization approaches. DIRCOL has been used to optimize biped walking.

A group associated with the University of Heidelberg is applying optimization to biped walking: Katja Mombaur, Moritz Diehl (see software on web page), Hans Georg Bock, and Richard Longman (Columbia).

This paper used differential dynamic programming (DDP), a trajectory optimizer, as a component in learning from demonstration: Atkeson, C. G. and S. Schaal
``Robot Learning From Demonstration'', Machine Learning: Proceedings of the Fourteenth International Conference (ICML '97), Edited by Douglas H. Fisher, Jr. pp. 12-20, Morgan Kaufmann, San Francisco, CA, 1997. gzipped postscript

A Matlab trajectory optimization package (RIOTS).

Another Matlab trajectory optimization package (Miser).

Dynamic Optimisation by Arthur Bryson Jr includes a disk that contains 40 gradient and shooting Matlab codes, as well as codes that solve the time-varying Riccati equation (the DYNOPT Toolbox).

DYNOPT: Fikar and Cizniar


NTG (Nonlinear Trajectory Generation) is a software library for solving online optimal control problems. It uses a collocation technique to compute optimal trajectories very quickly, allowing online implementation of optimization-based control. Richard Murray, Caltech.

Plan G: Reduced Dimensional Trajectory Optimization

An interesting example of reduced dimensional trajectory optimization.

Plan H: Robot Motion Planning Techniques

We are exploring using robot motion planning techniques to generate "good" trajectories, to use as is, or to be improved by other optimizers.

Plan I: Trajectory Libraries

This approach is similar to Plan D, Dynamic Programming Using Key Trajectories, in that collections of optimized trajectories, but in this case a global estimate of the value function is not maintained.

Standing balance control using a trajectory library, Liu, Chenggang; Atkeson, Christopher G., IEEE/RSJ International Conference on Intelligent Robots and Systems, (IROS) 2009. pp.3031-3036.