Game-Theoretic Control for Robot Teams

Rosemary Emery-Montemerlo

12th August, 2005

CMU-RI-TR-05-36

Thesis Committee:
Jeff Schneider (co-chair)
Sebastian Thrun (co-chair)
Geoff Gordon
Michael Littman, Rutger, The State University of New Jersey

Planning for a decentralized team of robots is a fundamentally different problem from that of centralized control. During decision making, robots must take into account not only their own observations of world state, but also the possible observations and actions of teammates. While the interconnectedness of such a reasoning process seems to require an infinite recursion of beliefs to be modelled by each member of the team, game theory provides an alternative approach. Partially observable stochastic games (POSGs) generalize notions of single-stage games and Markov decision processes to both multiple agents and partially observable worlds. Even if there is only limited communication between teammates, POSGs allow robots to come up with policies that still take into account possible teammate experiences without the need to explicitly model any recursive beliefs about those experiences.

While a powerful model of decentralized teams, POSGs are computationally intractable for all but the smallest problems. This dissertation proposes a Bayesian game approximation to POSGs in which game-theoretic reasoning about action selection is retained, but robots reason only a limited time ahead about uncertainty in world state and the experiences of their teammates. Planning and execution are interleaved to further reduce computational burdens: at each timestep, robots perform a step of full game-theoretic reasoning about their current action selection given any possible history of observations and a heuristic evaluation of the expected future value of those decisions.

The Bayesian game approximation algorithm (BaGA) is able to find solutions to much larger problems than previously solved. Further computational savings are gained by reasoning about groups of similar observation histories rather than single histories. Finally, efficiency and performance are also improved through the use of run-time communication policies that trade off expected gains in performance with the costs of using bandwidth. In this dissertation, the performance of BaGA is compared to policies generated for full POSGs as well as heuristics. BaGA is also used to develop real-time robot controllers for a series of simulated and physical robotic tag problems that gradually increase in realism.

full text