Approximate Solutions to Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo, Geoff Gordon, Jeff Schneider and Sebastian Thrun Stochastic games are a framework for decision making in multi-agent systems. They are a generalization of both general-sum repeated games as well as single agent Markov decision processes. As with its single agent counterpart, the stochastic game framework is defined assuming full observability of state (i.e. the current stage-game is known to all players) and is therefore limited in its applicability to real-world problems. Finding optimal solutions to partially observable stochastic games, however, is intractable as problems generally have state and action spaces exponential in the number of agents. We therefore propose a QMDP-like algorithm for finding approximate solutions to partially observable stochastic games with common payoffs. This subset of partially observable stochastic games deals with problems that require a high degree of coordination amongst a team of agents receiving the same reward signal. We also assume that communication is unavailable or limited so that agents cannot freely share information with the team. A key issue in constructing solutions to these games, either optimally or approximately, is to ensure that agents correctly reason about not only their own uncertainty but also those of their teammates. A Bayesian game, or game of incomplete information, is a game theory concept that allows players to reason about uncertainties in a single-stage game. These uncertainties can include the number of players in the games, the actions available to each player, or the game payoffs, and in general can be represented by each player being privately assigned a 'type' by nature. If agents hold a common prior over the distribution of these types, they can then find policies which try to maximize the expected payoff to the agent given its type and a probability distribution over the types of all other players. Communication between agents can take place if the expected value of the game to an agent is much greater when its type is common knowledge than when it is not. In our algorithm we use the notion of types to construct a Bayesian game at each time step, using the value function generated from a fully observable version of the problem as the payoff function. Each type represents a possible observation history for an agent, and each agent maintains a probability distribution over all possible joint observation histories given game play to date. Clearly, an issue with this approach is that the size of the individual and joint observation histories grows exponentially. In many problems, however, the number of observation histories that are possible, (or probable with probability greater than epsilon), is much smaller. When a communication act takes place, agents can farther reduce the number of possible joint observations based on the new common knowledge. Our algorithm allows agents to perform a one-step lookahead in uncertainty while reasoning about the belief states of teammates in a game theoretically sound manner. The policies themselves are found by using an alternating-maximization step in conjunction with the sequence form representation. We will present results applying this algorithm to a multi-agent version of the lady and the tiger problem as well as to a set of increasingly realistic herding problems with tens of thousands of joint states.