Special issues, workshops, and other collections
Function approximation in MDPs
Hierarchy in POMDPs
EM and variational techniques
There was a special issue of Autonomous Robots on multi-robot interaction. It was vol 8, no 3; here's the TOC: http://www.wkap.nl/issuetoc.htm/0929-5593+8+3+2000. Unfortunately I don't think CMU subscribes to the online version of the journal, and the paper version isn't in the library. But maybe there are electronic versions of single articles online somewhere... (In fact, there's one below under the particle filters section.)
There was a workshop on hierarchy and memory in reinforcement learning at the last ICML. The workshop page is http://www-anw.cs.umass.edu/~ajonsson/icml/. That page has links to notes on what was presented as well as to individual papers. (Unfortunately the invited talks, which were quite interesting, don't seem to show up on the site.)
Readings for a course on multi-agent systems are here. For models of teamwork see
Sebastian wrote a survey article on probabilistic robotics. He also has a book on the topic, and is giving a NIPS tutorial this year (need links).
Carlos Guestrin has done some stuff on multiagent planning in (PO)MDPs. Here is his publications page: http://robotics.stanford.edu/~guestrin/publications.html. Also, here is a link to his NIPS paper on multiagent planning. (Thanks to Rosemary for this pointer.)
Another interesting angle is Jeff Schneider's work on sharing value functions as a way of communicating between robots. (Rosemary has done some experimentation with these algorithms.) See Jeff's paper at http://www.cs.cmu.edu/afs/cs.cmu.edu/project/learn-2/schneide/psfiles/ml99.ps and also (no online version):
M. Riedmiller, A. Moore, J. Schneider. Reinforcement Learning for Cooperating and Communicating Reactive Agents in Electrical Power Grids. In Balancing Reactivity and Social Deliberation in Multi-agent Systems, edited by M. Hannebauer, J. Wendler, E. Pagello. Springer, 2001.The FIRE project is doing work on multi-robot planning by designing an economy which encourages collaboration. See Dias and Stentz's paper on the topic.
Barry Brummit has used a version of Tony Stentz's D* search algorithm for multi-agent planning (need refs). See the planning section below for cites about D* and related algorithms.
See also Lynne Parker's work below.
Tony Cassandra has a (PO)MDP tutorial which you can get to from http://www.cs.brown.edu/research/ai/pomdp/.
See also Joelle's work below.
To understand Carlos' work better, it might help to read my paper on how to include function approximation in reinforcement learning algorithms, from ICML-95: http://www.cs.cmu.edu/~ggordon/ml95-stable-dp.ps.Z. This describes how to use approximators which are max-norm nonexpansions with value iteration (note that max-norm projection is not necessarily a max-norm nonexpansion).
Here's a paper comparing various methods of function approximation:
Milos Hauskrecht. Value-Function Approximations for Partially Observable Markov Decision Processes. Journal of Artificial Intelligence Research (2000).
David's work is partly an extension of Tom Dietterich's MAXQ algorithm, see ftp://ftp.cs.orst.edu/pub/tgd/papers/sara2000-maxq.ps.gz (an overview paper) and http://www.cs.orst.edu/~tgd/projects/rl.html (a description of Tom's work on RL).
Andrew Moore did some interesting stuff on hierarchical representations of spatial MDPs. Here's a link: http://www.autonlab.org/papers/airports-ijcai.ps.
Some of Sridhar Mahadevan's students have done work on hierarchy in POMDPs. They've explored at least two approaches, one based on Andrew McCallum's utile distinction memory and the other with a Baum-Welch-style algorithm. Here are links:
Natalia Hernandez and Sridhar Mahadevan, Hierarchical Memory-based Reinforcement Learning, Fifteenth International Conference on Neural Information Processing Systems, November 27 to December 2nd, Denver, 2000.
Georgios Theocharous, Khashayar Rohanimanesh, and Sridhar Mahadevan Learning Hierarchical Partially Observable Markov Decision Processes for Robot Navigation, IEEE Conference on Robotics and Automation (ICRA), 2001, Seoul, South Korea.
The ALLIANCE architecture was designed in part as a response to (and improved version of) the subsumption architecture, which was one of the first ways proposed of combining simple reactive behaviors to produce more complicated emergent behavior. A summary I found on the web of the reasoning behind the subsumption architecture is at http://ai.eecs.umich.edu/cogarch0/subsump/. A common criticism of pure subsumption architectures is that their emergent behaviors may not be sufficiently goal-directed since they lack an explicit model of the state of the world and what the robot's goals are. Behavior-based architectures try to address this criticism while still avoiding central planning.
Maja Mataric is a good multi-robot hacker, and maintains a page on multi-robot coordination at http://www-robotics.usc.edu/~maja/group.html. For example, see her paper on Designing and Understanding Adaptive Group Behavior.
There's been a bunch of work recently on using particle filters to track a robot's belief state. (Other approaches to tracking belief state include Kalman filters and evidence grids.)
Here's a paper on Monte Carlo localization, which is another name for using particle filters to track a robot's pose.
Dieter and friends' paper in the AR special issue mentioned above is about tracking the poses of multiple robots.
Mike Montemerlo's recent paper (need link) covers people-tracking, i.e., representing the positions of multiple moving objects within a known map. To handle dependencies between the uncertainty in a robot's position and the positions of the tracked people, this paper proposes a conditional version of particle filters: every particle in the sample for the robot's position corresponds to a whole separate filter for each person's location.
Kevin Murphy has done some work on learning whole maps rather than just a few object locations. (These maps are smaller and simpler than the ones we need to use, but they still have on the order of hundreds of unknown bits to estimate.) To accomplish this, he uses Rao-Blackwellized particle filters, which are a combination between particle filters and exact inference.
Sebastian has written a paper on how to plan in general POMDPs using a particle filter representation of the belief state.
Dirk Ormoneit et al have written a paper on using quasi-Monte-Carlo methods (placing particles not completely randomly but dependent on each other in a space-filling way) to reduce the variance of particle filters.
See also Doucet, de Freitas, and Gordon. Sequential Monte Carlo methods in practice. 2000.
More on Kalman filters: the basic Kalman filter assumes all distributions involved are Gaussian and all state updates are linear. The extended Kalman filter allows for nonlinearities by linearizing the state updates around the current mean; it can handle moderate departures from linearity. The unscented Kalman filter attacks the same problem as the EKF, but with a different (often better) update rule. (Rather than linearizing the state update, it applies the true update to a bunch of particles then uses the results to estimate the posterior mean and covariance.) The original paper on the UKF is by Juilier and Uhlmann (need link, but see next paragraph).
For handling highly nonlinear or non-Gaussian problems, we need a better state representation than a single Gaussian. An obvious candidate is a mixture of Gaussians. There have been several generalizations of Kalman filters to handle mixtures of Gaussians: the most obvious is to update each one separately using the EKF or UKF. Another is the mixture Kalman filter. Yet another is the unscented particle filter (this paper describes UKFs in some detail as well, and also has a list of common tricks to make particle filters work better). For UKFs, see also here.
Craig Boutilier and friends wrote a (very long) paper on the relationship between POMDP-based planning and classical planning. Here's the citation:
Craig Boutilier, Thomas Dean and Steve Hanks. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage. Journal of AI Research (JAIR) 11:1--94 (1999).Here's the cite for Tony Stentz's D* search algorithm. Sven Koenig's recent seminar discussed (among other things) a relative of D* (need refs).
David MacKay's book. Chapter 31 is on variational methods in the draft I have, and chapter 45 applies variational techniques to the problem of model complexity control in neural network inference. Chapter 42 is on Boltzmann machines, which are a good example of variational methods. (All of the other chapters are also excellent.)
Tommi Jaakkola's paper on variational inference in the QMR-DT database. This work is also mentioned in the following tutorial by Jordan, Ghahramani, Jaakkola, and Saul (which Daniel Nikovski presented at a recent ML lunch). Jaakkola also has a tutorial of his own.
Tom Minka's tutorial papers on lower-bounding integrals, reversing EM, and the EM algorithm. He also has many other good papers at his tutorials page.
Zoubin Ghahramani's NIPS tutorial has some slides about EM, variational bounds, and variational Bayes inference (among plenty of other slides on general learning topics).
Tony Cassandra maintains software for solving POMDPs. You can get to it from his POMDP page.
Written by Geoff Gordon, with help from the other members of the robot learning lab. Last modified Nov 29, 2001.