Regret minimization in bounded memory games

Oct 20, 2010

We consider the problem of performing regret minimization in 'bounded' memory games. Bounded memory games are a subclass of stochastic games and a super class of repeated games. Each round of a two player bounded memory-m game is played as follows:

1. Player A and Player B each play an action (denoted a,b respectively)

2. Player A (resp. B) receives a reward. This reward may depend on the actions taken by both players in the last m rounds as well as the current actions (a,b).

For example, a repeated game is a bounded memory game with memory m = 0 because the rewards in each round of a repeated game depend only on the actions taken during that round. The traditional notion of regret is very natural in repeated games. In a bounded memory-m game, it is more complicated to define appropriate notions of regret. We explore several alternative notions of regret in a memory-m game focusing on a natural notion we call ‘k-adaptive regret’. On the positive side, we demonstrate that k-adaptive regret minimizing algorithms do exist. On the negative side, we demonstrate that any k-adaptive regret minimizing algorithm must be inefficient unless NP=RP.

Joint work with Nicolas Christin, Anupam Datta, and Arunesh Sinha.

1. Player A and Player B each play an action (denoted a,b respectively)

2. Player A (resp. B) receives a reward. This reward may depend on the actions taken by both players in the last m rounds as well as the current actions (a,b).

For example, a repeated game is a bounded memory game with memory m = 0 because the rewards in each round of a repeated game depend only on the actions taken during that round. The traditional notion of regret is very natural in repeated games. In a bounded memory-m game, it is more complicated to define appropriate notions of regret. We explore several alternative notions of regret in a memory-m game focusing on a natural notion we call ‘k-adaptive regret’. On the positive side, we demonstrate that k-adaptive regret minimizing algorithms do exist. On the negative side, we demonstrate that any k-adaptive regret minimizing algorithm must be inefficient unless NP=RP.

Joint work with Nicolas Christin, Anupam Datta, and Arunesh Sinha.