Robot learning references

This is a partial annotated bibliography for learning and planning with teams of robots using POMDPs and related approaches.

Special issues, workshops, and other collections
Multirobot coordination
Function approximation in MDPs
Hierarchy in POMDPs
Behavior-based robotics
Particle filters
EM and variational techniques

Special issues, workshops, and other collections

Machine Learning Journal published a special issue on robot learning, edited by some people we know well.  Take a look at the TOC online to see what's there.  MLJ also has a special issue on reinforcement learning, but that was back in 1992 so isn't online.  (But the library has a copy.)

There was a special issue of Autonomous Robots on multi-robot interaction.  It was vol 8, no 3; here's the TOC:  Unfortunately I don't think CMU subscribes to the online version of the journal, and the paper version isn't in the library.  But maybe there are electronic versions of single articles online somewhere...  (In fact, there's one below under the particle filters section.)

There was a workshop on hierarchy and memory in reinforcement learning at the last ICML.  The workshop page is  That page has links to notes on what was presented as well as to individual papers.  (Unfortunately the invited talks, which were quite interesting, don't seem to show up on the site.)

Readings for a course on multi-agent systems are here.  For models of teamwork see

Sebastian wrote a survey article on probabilistic robotics.  He also has a book on the topic, and is giving a NIPS tutorial this year (need links).

Multirobot coordination

As someone pointed out, RoboCup is a good place to look for multirobot ideas.  CMU has been active in RoboCup, so there are a bunch of local people who've done interesting research on it.  (Rosemary is one of them.)  I found a good summary page from an ex-CMU-er, Peter Stone, at  Peter also has a bunch of interesting publications at, for example learning how to choose between "keep" and "pass" behaviors and applying RL to an interactive agent in LambdaMOO.

Carlos Guestrin has done some stuff on multiagent planning in (PO)MDPs.  Here is his publications page:  Also, here is a link to his NIPS paper on multiagent planning.  (Thanks to Rosemary for this pointer.)

Another interesting angle is Jeff Schneider's work on sharing value functions as a way of communicating between robots.  (Rosemary has done some experimentation with these algorithms.)  See Jeff's paper at and also (no online version):

M. Riedmiller, A. Moore, J. Schneider.  Reinforcement Learning for Cooperating and Communicating Reactive Agents in Electrical Power Grids.  In Balancing Reactivity and Social Deliberation in Multi-agent Systems, edited by M. Hannebauer, J. Wendler, E. Pagello.  Springer, 2001.
The FIRE project is doing work on multi-robot planning by designing an economy which encourages collaboration.  See Dias and Stentz's paper on the topic.

Barry Brummit has used a version of Tony Stentz's D* search algorithm for multi-agent planning (need refs).  See the planning section below for cites about D* and related algorithms.

See also Lynne Parker's work below.


Michael Littman maintains a page on POMDP-solving algorithms at, including a paper that has a good description of value iteration in POMDPs:

Tony Cassandra has a (PO)MDP tutorial which you can get to from

See also Joelle's work below.

Function approximation in MDPs

For part of my thesis I had to write a survey of methods for solving MDPs and related models approximately.  Here's a link to that survey:  Here's a link with some more detail about the control theory methods which are mentioned in the survey; it is missing a definition of the Laplace transform but it has a lot of worked examples.

Carlos Guestrin has done some interesting work on on approximate value iteration using Bayes-net-based representations of MDPs.  His IJCAI paper describes how to combine value iteration with max-norm projection to get a convergent algorithm.

To understand Carlos' work better, it might help to read my paper on how to include function approximation in reinforcement learning algorithms, from ICML-95:  This describes how to use approximators which are max-norm nonexpansions with value iteration (note that max-norm projection is not necessarily a max-norm nonexpansion).

Here's a paper comparing various methods of function approximation:

Milos Hauskrecht.  Value-Function Approximations for Partially Observable Markov Decision Processes. Journal of Artificial Intelligence Research (2000).


David Andre has done some interesting work on hierarchical representations for learning (including desigining a variant of LISP that includes choices for learning algorithms to make).  Look under hierarchical reinforcement learning at his publications page, and in particular see his NIPS-01 paper.

David's work is partly an extension of Tom Dietterich's MAXQ algorithm, see (an overview paper) and (a description of Tom's work on RL).

Andrew Moore did some interesting stuff on hierarchical representations of spatial MDPs.  Here's a link:

Hierarchy in POMDPs

The MICA proposal was partly inspired by Joelle's work on decomposing POMDPs.  She has a paper at and slides at

Some of Sridhar Mahadevan's students have done work on hierarchy in POMDPs.  They've explored at least two approaches, one based on Andrew McCallum's utile distinction memory and the other with a Baum-Welch-style algorithm.  Here are links:

Natalia Hernandez and Sridhar Mahadevan, Hierarchical Memory-based Reinforcement Learning, Fifteenth International Conference on Neural Information Processing Systems, November 27 to December 2nd, Denver, 2000.
Georgios Theocharous, Khashayar Rohanimanesh, and Sridhar Mahadevan Learning Hierarchical Partially Observable Markov Decision Processes for Robot Navigation, IEEE Conference on Robotics and Automation (ICRA), 2001, Seoul, South Korea.

Behavior-based robotics

Rosemary pointed me to a paper by Lynne Parker on tracking moving targets with multiple robots:  See also Lynne's "Cooperating Autonomous Robots" page at

The ALLIANCE architecture was designed in part as a response to (and improved version of) the subsumption architecture, which was one of the first ways proposed of combining simple reactive behaviors to produce more complicated emergent behavior.  A summary I found on the web of the reasoning behind the subsumption architecture is at  A common criticism of pure subsumption architectures is that their emergent behaviors may not be sufficiently goal-directed since they lack an explicit model of the state of the world and what the robot's goals are.  Behavior-based architectures try to address this criticism while still avoiding central planning.

Maja Mataric is a good multi-robot hacker, and maintains a page on multi-robot coordination at  For example, see her paper on Designing and Understanding Adaptive Group Behavior.

Particle filters

One of the basic tasks in POMDP-based planning is tracking a robot's belief state.  The belief state is a probability distribution over possible states of the world; it usually includes at a minimum the robot's own pose, and may also include a map of stationary surroundings as well as poses of other moving objects.

There's been a bunch of work recently on using particle filters to track a robot's belief state. (Other approaches to tracking belief state include Kalman filters and evidence grids.)

Here's a paper on Monte Carlo localization, which is another name for using particle filters to track a robot's pose.

Dieter and friends' paper in the AR special issue mentioned above is about tracking the poses of multiple robots.

Mike Montemerlo's recent paper (need link) covers people-tracking, i.e., representing the positions of multiple moving objects within a known map.  To handle dependencies between the uncertainty in a robot's position and the positions of the tracked people, this paper proposes a conditional version of particle filters: every particle in the sample for the robot's position corresponds to a whole separate filter for each person's location.

Kevin Murphy has done some work on learning whole maps rather than just a few object locations.  (These maps are smaller and simpler than the ones we need to use, but they still have on the order of hundreds of unknown bits to estimate.)  To accomplish this, he uses Rao-Blackwellized particle filters, which are a combination between particle filters and exact inference.

Sebastian has written a paper on how to plan in general POMDPs using a particle filter representation of the belief state.

Dirk Ormoneit et al have written a paper on using quasi-Monte-Carlo methods (placing particles not completely randomly but dependent on each other in a space-filling way) to reduce the variance of particle filters.

See also Doucet, de Freitas, and Gordon.  Sequential Monte Carlo methods in practice.  2000.

More on Kalman filters: the basic Kalman filter assumes all distributions involved are Gaussian and all state updates are linear.  The extended Kalman filter allows for nonlinearities by linearizing the state updates around the current mean; it can handle moderate departures from linearity.  The unscented Kalman filter attacks the same problem as the EKF, but with a different (often better) update rule.  (Rather than linearizing the state update, it applies the true update to a bunch of particles then uses the results to estimate the posterior mean and covariance.)  The original paper on the UKF is by Juilier and Uhlmann (need link, but see next paragraph).

For handling highly nonlinear or non-Gaussian problems, we need a better state representation than a single Gaussian.  An obvious candidate is a mixture of Gaussians.  There have been several generalizations of Kalman filters to handle mixtures of Gaussians: the most obvious is to update each one separately using the EKF or UKF.  Another is the mixture Kalman filter.  Yet another is the unscented particle filter (this paper describes UKFs in some detail as well, and also has a list of common tricks to make particle filters work better).  For UKFs, see also here.


Nick Roy's paper on coastal navigation.  This is a method for incorporating a rough estimate of uncertainty into a path plan, in a way which allows us to decide separately how bad uncertainty is at each location in state space.

Craig Boutilier and friends wrote a (very long) paper on the relationship between POMDP-based planning and classical planning.  Here's the citation:

Craig Boutilier, Thomas Dean and Steve Hanks.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage.  Journal of AI Research (JAIR) 11:1--94 (1999).
Here's the cite for Tony Stentz's D* search algorithm.  Sven Koenig's recent seminar discussed (among other things) a relative of D* (need refs).

EM and variational methods

Neal and Hinton's paper on EM as free energy minimization.  This was the first paper I know of to point out the relationship between EM and free energy minimization (the latter being an example of a variational technique) and to suggest the idea of a "partial E-step."

David MacKay's book.  Chapter 31 is on variational methods in the draft I have, and chapter 45 applies variational techniques to the problem of model complexity control in neural network inference.  Chapter 42 is on Boltzmann machines, which are a good example of variational methods.  (All of the other chapters are also excellent.)

Tommi Jaakkola's paper on variational inference in the QMR-DT database.  This work is also mentioned in the following tutorial by Jordan, Ghahramani, Jaakkola, and Saul (which Daniel Nikovski presented at a recent ML lunch).  Jaakkola also has a tutorial of his own.

Tom Minka's tutorial papers on lower-bounding integrals, reversing EM, and the EM algorithm.  He also has many other good papers at his tutorials page.

Zoubin Ghahramani's NIPS tutorial has some slides about EM, variational bounds, and variational Bayes inference (among plenty of other slides on general learning topics).


Those who aren't already IPC hackers might want to take a look at the online docs at

Tony Cassandra maintains software for solving POMDPs.  You can get to it from his POMDP page.

Written by Geoff Gordon, with help from the other members of the robot learning lab.  Last modified Nov 29, 2001.