NIPS 2003 Conference Review Session

Click here for the conference pre-proceedings.


Policy Search by Dynamic Programming
Drew Bagnell

I'll present work on combining policy search methods for POMDPs. The Principle of Optimality is generalized to allow the use of techniques from supervised learning to the solution of control and decision problems. The algorithm allows us to provide non-trivial performance guarantees as well as good computional and sample complexity. Finally, I'll show application to interesting benchmark problems.


Planning in the Real World:
Joelle Pineau, Drew Bagnell

The promises and challenges of dealing with uncertainty As autonomous systems move towards increasingly complex real-world domains, new challenges arise for planning and control. Perhaps the most critical aspect is the problem of integrating uncertainty about state into the control loop. The overwhelming intractability of obtaining optimal solutions to these problems has been a major barrier for the deployment of planning systems that reason about uncertainty. In recent years, however, more scalable approaches have been proposed and some have been applied to practical problems. In light of these novel approaches, the goal of this workshop is to discuss important questions and open issues:

  1. What are currently the best techniques for planning under uncertainty?
  2. What representations of uncertainty are most appropriate for real world problems?
  3. What are the remaining frontiers, in terms of algorithmic challenges, for planning uncertainty?
  4. In which real world problems can we gain traction by applying these techniques that manage uncertainty?
  5. What role does learning have to play with regard to both efficient solution and representation of uncertainty?


ARA*: Anytime A* with Provable Bounds on Sub-Optimality
Maxim Likhach,

In real world planning problems, time for deliberation is often limited. Anytime planners are well suited for these problems: they find a feasible solution quickly and then continually work on improving it until time runs out. In this paper we propose an anytime heuristic search, ARA*, which tunes its performance bound based on available search time. It starts by finding a suboptimal solution quickly using a loose bound, then tightens the bound progressively as time allows. Given enough time it finds a provably optimal solution. While improving its bound, ARA* reuses previous search efforts and, as a result, is significantly more efficient than other anytime search methods. In addition to our theoretical analysis, we demonstrate the practical utility of ARA* with experiments on a simulated robot kinematic arm and a dynamic path planning problem for an outdoor rover.


Back to the Main Page

Pradeep Ravikumar
Last modified: Thu Jan 22 18:22:14 EST 2004