Chris Atkeson's Research: Selected Papers, Proposals, Videos, and Talks

What Should Be Learned? A review of some of my robot learning work, and a discussion of model-based learning with respect to deep reinforcement learning.

Muscle: Two Actuators in One

Is (Deep) Reinforcement Learning Barking Up The Wrong Tree?

The Future Of Health Care My "Don't build another robot" talk.

Why Build A Personal Health Care Companion?

Dynamic Walking 2016: Dynamic Walking 10 Years In: Time For A Name Change

An oral history interview I did in 2010

What follows is a selection of research with some commentary.

Perception: Robot Skin

I have a longstanding interest in perceptive and aware environments (Classroom 2000, Aware Home, CareMedia, see the biographical slides mentioned above for more information).

Now I am applying what I have learned from building those systems to robot skin. The key idea is to cover the robot with eyeballs (cameras), and have transparent skin. The current prototype (FingerVision) is described in "Implementing Tactile Behaviors Using FingerVision", A. Yamaguchi and C. G. Atkeson, Humanoids 2017. More papers and videos on FingerVision, slides, and an NSF proposal.

Safe Robots: Soft Robotics

I am interested in robots that are inherently safe, even when control computers crash. It helps to make robots lightweight, and one way to do that is to use inflatable structural elements. An important aspect of this work is the goal of building human-scale soft robots that can interact with humans and operate in human environments. Most soft robotics focuses on much smaller robots. This work led to an outreach effort with a large impact, our work in conjunction with the Disney movie Big Hero 6.

Robot Reasoning: Using Abstraction To Find Better Task Strategies

I see work on reasoning with abstractions about task strategies as the most important work we can be doing right now in robotics. Some thoughts on abstraction and thinking about task strategies, and some slides. Temporal decomposition and abstraction Akihiko Yamaguchi has led work on how to decompose and abstract complex learning problems. (for example, "Differential dynamic programming for graph-structured dynamical systems: Generalization of pouring behavior with different skills", A. Yamaguchi and C.G. Atkeson, Humanoids 2016). More work in this vein. Finding better task strategies Here are some older case studies:

Swing leg retraction helps biped walking stability, M. Wisse, C. G. Atkeson, and D. K. Kloimwieder, 5th IEEE-RAS International Conference on Humanoid Robots, 295-300, Humanoids 2005.

"Open Loop Stable Control Strategies for Robot Juggling", Schaal, S. and C. G. Atkeson, In: IEEE International Conference on Robotics and Automation, Vol.3, pp.913-918, Atlanta, Georgia, 1993.
A look at humans doing the task: "One-handed Juggling: Dynamical Approaches to a Rhythmic Movement Task", Schaal, S., D. Sternad and C. G. Atkeson, Journal of Motor Behavior, 28(2):165-183, 1996.

Muscle: Two Actuators In One

This is work on rethinking how we model muscle. Muscle is much more efficient when the muscle (not including the tendon) is staying at the same length (isometric) or lengthening under load (doing negative work). I propose a hypothesis as to why. Talk on Youtube, slides, Writeup with BONUS material.

Planning, Learning, and Control

A major theme of my work is the role of optimization in planning, learning, and control. (I teach a course on this). I have long advocated the use of optimization for behavior planning, including planning feedback gains and error responses, rather than just finding feasible motions. I also take the position that learning is optimization, and can be done more effectively if models are learned and refined as part of the optimization process. This is especially true of reinforcement learning.

Below are some efforts in this area. Application: humanoid control We developed an approach to hierarchical optimization which was used to control humanoid walking in the DARPA Robotics Challenge. Here is a talk on research issues in humanoid legged locomotion. Work on walking was led by a number of students: Siyuan Feng, Xinjilefu (Ben), Eric Whitman, and Ben Stephens. Model-free optimization (aka reinforcement learning) Optimizing wearable robot behavior involves humans in the loop, which are poorly modeled. Here is an example of online model-free optimization of the behavior of an ankle exoskeleton: "Human-in-the-loop optimization of exoskeleton assistance during walking", Zhang, J., Fiers, P., Witte, K. A., Jackson, R. W., Poggensee, K. L., Atkeson, C. G., Collins, S. H. (2017) Science, 356:1280-1284. Slides. Robust model-based optimization (aka reinforcement learning) Optimizing a policy using multiple models is one way to achieve robustness to modelling errors: "Efficient robust policy optimization", C. G. Atkeson, American Control Conference (ACC), 5220-5227, 2012. Longer version. Learning From Mental Practice A major issue in reinforcement learning is how to most effectively transfer what is learned from simulation (mental practice) to further learning on an actual robot (actual practice). Akshara Rai is leading work on how to most effectively transfer policies learned in simulation to actual robots. (for example, Deep Kernels for Optimizing Locomotion Controllers, R. Antova, A. Rai, and C. G. Atkeson, PMLR, Volume 78: Conference on Robot Learning, 2017).

Trajectory Libraries: Trajectory-Based Optimization (a form of Model-Based Reinforcement Learning)

Being able to learn or plan how to do challenging dynamic tasks on high dimensional humanoid robots is a major challenge. I have emphasized the close links between model-based reinforcement learning, optimization, planning, and control for dynamic tasks. Dynamic programming provides a methodology for developing planners and controllers for nonlinear systems as well as highlighting the importance of the value function, which represents the expected lifetime cost starting in any given state. However, general dynamic programming is currently computationally intractable. I introduced differential dynamic programming (DDP) from optimal control to the field reinforcement learning. DDP is a local trajectory-based optimization method that produces local models of the policy and value function, as well as an optimal trajectory. I showed how, using sets of optimized trajectories, a more global (or at least less vulnerable to bad local minima) optimal policy can be found by having neighboring trajectories share local policy and/or value function information. This work naturally leads to the notion of trajectory libraries. These ideas are also seen in iLQR and LQR-Trees.

"Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming", C. G. Atkeson, Proceedings, Neural Information Processing Systems, Denver, Colorado, December, 1993, In: Neural Information Processing Systems 6, J. D. Cowan, G. Tesauro, and J. Alspector, eds. Morgan Kaufmann, 1994. Citeseer entry.

Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach, C. G. Atkeson, and J. Morimoto, In: Neural Information Processing Systems 15, MIT Press, 2003. Citeseer entry.

Morimoto and Atkeson have developed robust versions of the local trajectory planner. [IROS 2003]

Policies Based on Trajectory Libraries, M. Stolle and C. G. Atkeson, IEEE International Conference on Robotics and Automation, 3344-3349, 2006.

Transfer of policies based on trajectory libraries, M. Stolle, H. Tappeiner, J. Chestnutt, and C. G. Atkeson, IEEE/RSJ International Conference on Intelligent Robots and Systems, 4234-4240, 2007.

Random Sampling of States in Dynamic Programming, C. G. Atkeson and B. Stephens, in IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, Vol. 38, No. 4, pp. 924-929, 2008.

Standing balance control using a trajectory library, Liu, Chenggang; Atkeson, Christopher G.; IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2009, Pages: 3031 - 3036.

Finding and transferring policies using stored behaviors, M. Stolle and C. G. Atkeson, Autonomous Robots, 29(2): 169-200, 2010.

Learning Control in Robotics, S. Schaal and C. G. Atkeson, IEEE Robotics & Automation Magazine, 17, 20-29, 2010.

Trajectory-Based Dynamic Programming, C. G. Atkeson and C. Liu, in Modeling, Simulation and Optimization of Bipedal Walking Cognitive Systems Monographs Volume 18, 2013, pp 1-15.

Learning From Demonstration

An important area of robot learning is learning from a human (or another robot's) demonstration. In this work I focused on learning from a single demonstration in a small number (less than 10) of practice trials.

A first attempt at learning from demonstration, with parametric model learning: Learning Tasks From A Single Demonstration, C. G. Atkeson and S. Schaal, IEEE International Conference on Robotics and Automation, 1706-1712, 1997.

Using model-free learning to compensate for the limitations of model-based learning: Robot Learning From Demonstration, C. G. Atkeson and S. Schaal, Machine Learning: Proceedings of the Fourteenth International Conference (ICML '97).

Using regularization to make nonparametric model-based reinforcement learning work. Nonparametric Model-Based Reinforcement Learning, C. G. Atkeson, In: Neural Information Processing Systems 10, MIT Press, 1998.

Work by Bentivegna explored learning from demonstration by learning which task primitives to select in any situation. For example: Learning Similar Tasks From Observation and Practice, D. C. Bentivegna, C. G. Atkeson, and G. Cheng, IEEE/RSJ International Conference on Intelligent Robots and Systems, 2677-2683, 2006.

Non-Parametric Local Learning

In the last century I worked on memory-based learning, a non-parametric approach to learning in which functions were approximated by storing samples of the function, and delaying any generalization or model formation until a query was available. The generalization or local model formed is tuned to the query, and then thrown away after the query is answered. The main advantage of this approach is it avoids the interference found in parametric models (such as neural networks) when the training distribution changes. I expect this work to come back into favor as we move to distributing computational power (CPU/GPUs) with memory (think of CPU/GPUs with much larger caches or local memory, or memory chips with on-chip CPU/GPUs).

An overview of work on local learning algorithms is given by:
Atkeson, C. G., Moore, A. W., & Schaal, S.
"Locally Weighted Learning." Artificial Intelligence Review, 11:11-73, 1997.
gzipped postscript

An overview of local learning applied to robots is given by:
Atkeson, C. G., Moore, A. W., & Schaal, S.
"Locally Weighted Learning for Control." Artificial Intelligence Review, 11:75-113, 1997.
gzipped postscript

Looking at local learning from a neural network point of view:
Atkeson, C. G., and S. Schaal,
Memory-Based Neural Networks For Robot Learning, Neurocomputing, 9(3):243-69, 1995.
gzipped postscript

A mixture of experts approach to local learning is presented in:
Schaal, S., & Atkeson, C. G.
From Isolation to Cooperation: An Alternative View of a System of Experts In: D.S. Touretzky, and M.E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press. 1996.

Stefan Schaal, Atkeson, and colleagues have explored new approaches to nonparametric learning, Receptive Field Weighted Regression (RFWR) and Locally Weighted Projection Regression (LWPR), in which receptive fields representing local models are created and maintained during learning. These approaches provide an interesting alternative perspective on locally weighted learning. Unlike the original version of locally weighted learning, these approaches maintain local intermediate data structures such as receptive fields. [Applied Intelligence 2002] [ICRA 2000] [Neural Computation 1998] [NIPS 1997]

Applying local learning to robot learning:
Schaal, S., and C. G. Atkeson,
Robot Juggling: An Implementation of Memory-based Learning, Control Systems Magazine, 14(1):57-71, 1994.

Parametric Learning and Trajectory Learning From Practice

My thesis explored parametric learning of rigid body dynamic models of robot loads and the robots themselves, and also how to learn to follow a desired trajectory with practice (often referred to by the misleading and overly general term "learning control"). Key results were that rigid body dynamics is linear in a particular formulation of the unknown inertial parameters (mass, mass*location of the center of mass, and moments of inertia), and that the best operator mapping trajectory following errors to command corrections is an inverse model of the plant.

Model-Based Control of a Robot Manipulator, C. H. An, C. G. Atkeson, and J. M. Hollerbach, MIT Press, Cambridge, Massachusetts, 1988.

Humor

Giant Tortoise Park

Paradigm Shift: Suction Cups

Do humans prefer correct robot form or motion?

Getting ready for the DARPA Robotics Challenge

Atkeson Baymax screen test (Big Hero 6 audition)

Cognitive Capture

ALS Ice Bucket Challenge

Be the Robot