Decision and Control

Baird / Boyan / Davies / Moore / Munos / Schneider

How does a system exploit what it has learned in order to make good decisions? And how can it perform well when long term effects (delayed rewards) are important?

Our research in this area covers several optimization and optimal control approaches, but is mainly centered around reinforcement learning.
A survey of Reinforcement Learning
Reinforcement has, on paper, amazingly attractive properties for learning to make decisions in a highly stochastic world. But the computational challenges are devastating because of the "curse of dimensionality", in which costs increase exponentially with the number of state variables. We are one of many research groups battling to beat, delay, or dodge this curse. We are applying reinforcement learning to problems like packaging ($$$ savings), production scheduling ($$$ savings), autonomous aircraft path planning, radio-therapy treatment planning, and active target detection.
Basic MDP and Reinforcement Learning theory
There are still several fascinating questions lurking in the apparently simple problem of learning optimal control of large discrete Markov Decision Processes. We are enthusiastic about approaches that learn models of the state transition probabilities on-line, and are especially interested in ways to learn the models and simultaneously solve them. Hunter Payne is working on new related strategies.
Geometric Reinforcement Learning
In one project we are combining A-star search, reinforcement learning and robot motion planning algorithms for discovering trajectories for dynamic systems. This is able to find trajectories in 20^4 discretizations that standard finite-element dynamic programming could not solve even with 30^4 discretizations. This research is currently being extended by a team of four CMU undergraduates (Arnt, Halstead, Hsiung, Lanza) who are working on learning versions, with possible application to autonomous aircraft control. A web page overflowing with results is forthcoming.
A paper on Geometric Approaches to finding trajectories
Variable Resolution Reinforcement Learning
We are investigating how variable resolution representations, including triangulations and multilinear interpolation, can combat the curse of dimensionality. The Partigame algorithm and its extensions look promising. For example, it can solve a robot motion learning problem in nine dimensions within two minutes, for which regular quantization approaches would have been pretty much intractable. The Variable Resolution Discretizations page uses adaptive resolution techniques to solve stochastic optimal control problems, such as the Acrobot, the Inverted Pendulum, Airplane and Space-Shuttle controllers. We developped adaptive refinement methods that takes into account the global effect of splitting, thus refining only the areas of the state-space that have an "influence" on the switching boundaries in the optimal control (see papers IJCAI99 and CDC99).
Variable Resolution Discretizations

The Partigame algorithm

An earlier, more memory-intensive but more general approach

Value Function Approximation
We are investigating ways for function approximation to represent value functions. Leemon Baird is leading an investigation into new robust ways for neural networks to learn value functions. For practical problem solving for relatively low dimensional systems we like to use multilinear interpolation and are keen on triangulations. We have extended this kind of value function approximation cope with the case of safely learning optimal control...how do you plan to gather the data in a risk-minimizing way?

We have also examined some of the dangers of value function approximation, and proposed two new algorithms, ROUT and Grow Support for dealing with the problems.

Remi Munos is investigating powerful numerical methods for solving stochastic optimal control in continuous time.
The appoach is to solve directly the Hamilton-Jacobi-Bellman equations. This is a difficult problem since these equations are non-linear partial differential equations. However this method is very general and can be used for very non-linear dynamics and for any cost function. Efficient numerical methods need to combine the most recent technologies developed in different fields:

Reinforcement Learning for optimization
Justin Boyan is completing his thesis (check out his thesisometer) on STAGE: a practical theory of how reinforcement learning can increase the effectiveness of combinatorial optimizers. (Justin, please insert more details!)
Logistics and Production Scheduling
In recent research, in collaboration with a major U.S. food manufacturer, we have produced a new suite of algorithms for multi-product production scheduling problems that are too difficult to be represented as constraint graphs, but must be optimized to minimize costs predicted by non-linear simulations. This includes a novel way of applying reinforcement learning to produce a policy that correctly takes into account uncertainty in the production capabilities of the plant: previous researchers had dealt with the uncertainties by frequent replanning, supplemented by techniques to make replanning computationally cheap. But, independently of computational issues, we showed that this can lead to grossly suboptimal operation. By computing an approximate value function and policy, we can produce an approximate solution to optimal expected operation, and empirically it significantly beats industrial-strength replanning-based methods.
Paper on a reinforcement-learning aspect of the production scheduling work.
Auton Research Index
Locally weighted Learning
Decision and Reinforcement Learning