Approximate Solutions to Partially Observable Stochastic Games with
Common Payoffs

Rosemary Emery-Montemerlo, Geoff Gordon, Jeff Schneider and Sebastian
Thrun

Stochastic games are a framework for decision making in multi-agent
systems.  They are a generalization of both general-sum repeated games
as well as single agent Markov decision processes.  As with its single
agent counterpart, the stochastic game framework is defined assuming
full observability of state (i.e. the current stage-game is known to
all players) and is therefore limited in its applicability to real-world
problems.  Finding optimal solutions to partially observable
stochastic games, however, is intractable as problems generally have
state and action spaces exponential in the number of agents. We
therefore propose a QMDP-like algorithm for finding approximate
solutions to partially observable stochastic games with common
payoffs.  This subset of partially observable stochastic games deals
with problems that require a high degree of coordination amongst a
team of agents receiving the same reward signal.  We also assume that
communication is unavailable or limited so that agents cannot freely
share information with the team.

A key issue in constructing solutions to these games, either optimally
or approximately, is to ensure that agents correctly reason about not
only their own uncertainty but also those of their teammates.  A
Bayesian game, or game of incomplete information, is a game theory
concept that allows players to reason about uncertainties in a
single-stage game.  These uncertainties can include the number of
players in the games, the actions available to each player, or the
game payoffs, and in general can be represented by each player being
privately assigned a 'type' by nature.  If agents hold a common prior
over the distribution of these types, they can then find policies
which try to maximize the expected payoff to the agent given its type
and a probability distribution over the types of all other players.
Communication between agents can take place if the expected value of
the game to an agent is much greater when its type is common knowledge
than when it is not.

In our algorithm we use the notion of types to construct a Bayesian
game at each time step, using the value function generated from a fully
observable version of the problem as the payoff function.  Each type
represents a possible observation history for an agent, and each agent
maintains a probability distribution over all possible joint
observation histories given game play to date.  Clearly, an issue with
this approach is that the size of the individual and joint observation
histories grows exponentially.  In many problems, however, the number
of observation histories that are possible, (or probable with
probability greater than epsilon), is much smaller.  When a
communication act takes place, agents can farther reduce the number of
possible joint observations based on the new common knowledge.

Our algorithm allows agents to perform a one-step lookahead in
uncertainty while reasoning about the belief states of teammates in a
game theoretically sound manner.  The policies themselves are found by
using an alternating-maximization step in conjunction with the
sequence form representation.  We will present results applying this
algorithm to a multi-agent version of the lady and the tiger problem as
well as to a set of increasingly realistic herding problems with tens
of thousands of joint states.