# Approachability in Two-Player Games

#### Contents

## 1. Introduction

*Blackwell’s Approachability Game*. The game is formally represented by a 4-tuple $(\mathcal{X}, \mathcal{Y}, f, S)$, where:

- $\mathcal{X}$ denotes a convex compact action space of the first player, identified as $\mathcal{X}$ with an abuse of notation;
- $\mathcal{Y}$ similarly denotes a convex compact action space of the second player, identified as $\mathcal{Y}$ with an abuse of notation;
- $f : \mathcal{X}\times\mathcal{Y} \to \mathbb{R}^d$ biaffine payoff
function of the game. We assume $f$ is bounded above in norm by $\gamma > 0$,
that is
Notice that we can always rescale $f$ so that we can safely assume $\gamma = 1$, so that $f(x,y)\in B(0,1)$, the unit ball centered in the origin;\[\left\|f(x,y)\right\| < \gamma\qquad\forall (x, y) \in \mathcal{X}\times\mathcal{Y}.\] - $S \,\subseteq\, B(0, 1)$ is a closed convex
*target set*.

\[\sigma_{\mathcal{X}} \doteq \{\sigma_{\mathcal{X}}^t : (\mathcal{X}\times\mathcal{Y})^{t-1} \to \mathcal{X}\}_{t = 1}^{\infty}\] |

\[\phi_t \doteq \frac{1}{t} \sum_{i = 1}^t f(x_i, y_i)\] |

## 2. Approachable and forceable sets

*forceable*if there exists an action $\bar{x}$ for player $\mathcal{X}$ such that whatever the $y$ used by the opponent, $f(\bar{x}, y) \in H$.

*approachable*if there exists a strategy $\sigma_\mathcal{X}$ for player $\mathcal{X}$ such that whatever the strategy $\sigma_\mathcal{Y}$ used by the opponent, the distance from $\phi_t$ and $S$ converges uniformely to 0.

### 2.1. The scalar case

**Theorem 1**(Sion’s theorem).

Given compact convex sets $\mathcal{X}, \mathcal{Y}$, and a biaffine function
$f: \mathcal{X}\times\mathcal{Y}\to\mathbb{R}$, we have
\[
\nu = \max_{x\in\mathcal{X}}\min_{y\in\mathcal{Y}} f(x,y) =
\min_{y\in\mathcal{Y}}\max_{x\in\mathcal{X}} f(x,y).
\]

\[f(\bar{x}, y) \ge \nu\qquad \forall y\in\mathcal{Y}.\] |

\[f(x, \bar{y}) \le \nu\qquad \forall x\in\mathcal{X}.\] |

*not*approachable nor foarceable for any value of $\lambda > \nu$. We can then state the following lemma.

**Lemma 1**(Scalar approachability condition).

In a scalar game, the following conditions are equivalent:

- $[\lambda, +\infty)$ is approachable;
- $[\lambda, +\infty)$ is forceable;
- $\lambda \le \nu$.

### 2.2. Approachable halfspaces

Before turning our attention to general closed convex sets (Section 3), we focus on an important class of target sets: halfspaces. The following lemma is central in our discussion.**Lemma 2**.

A halfspace $H(\lambda, q) = \{h \in\, \mathbb{R}^d\, :\, \lambda^\top h \ge q\}$
with $\left\|\lambda\right\| = 1$ is approachable if and only if it is forceable.

**Proof.**Consider a new two-player game $\hat{\Gamma}$ whose payoff function $\hat{f}: \mathcal{X}\times\mathcal{Y} \to \mathbb{R}$ is defined as \[ \hat{f}(x, y) \doteq \lambda^\top f(x,y). \] This is a scalar game ($\hat{f}$ maps to $\mathbb{R}$) with a biaffine payoff function. The key observation is that $H(\lambda, q)$ is approachable if and only if $\hat{H}(q) = [q, +\infty)$ is approachable. Indeed, notice that for any $u\in\mathbb{R}^d$:

\[\rho(u, H(\lambda, q)) = \rho(\lambda^\top u, \hat{H}(q)).\] | (1) |

\[\begin{array}{rl} \rho\left(\phi_t, H(\lambda, q)\right) &= \rho\left(\lambda^\top\left(\frac{1}{t}\sum_{i=1}^t f(x_i, y_i)\right), \hat{H}(q)\right)\\\\ &= \rho\left(\frac{1}{t}\sum_{i=1}^t \hat{f}(x_i, y_i), \hat{H}(q)\right)\\\\ &= \rho(\hat{\phi_t}, \hat{H}(q)). \end{array}\] |

\[\exists \bar{x}\in\mathcal{X} : \forall y\ f(\bar{x}, y) \in H(\lambda, q) \Longleftrightarrow \exists \bar{x}\in\mathcal{X} : \forall y\ \hat{f}(\bar{x}, y) \ge q,\] |

## 3. Blackwell’s Theorem

**Theorem 2**.

A closed convex set $S$ is approachable if and only if every halfspace
$H \supseteq S$ is forceable.

**Proof.**($\Longrightarrow$) This direction is trivial: if $S$ is approachable than any set $Q \supseteq S$ is approachable, including any halfspace $H \supseteq S$. On the other hand, we know from Lemma 2 that approchability and forceability are equivalent when dealing with halfspaces.

\[\hat{\lambda} = \frac{\psi_t - \phi_t}{\left\|\psi_t - \phi_t\right\|},\qquad \hat{q} = \hat{\lambda}^\top \psi_t.\] |

\[\hat{\lambda}^\top f(\bar{x}, y) \ge \hat{q}.\] |

\[\phi_{t + 1} = \frac{1}{t + 1}\sum_{i = 1}^{t + 1} f(x_i, y_i) = \frac{t}{t + 1} \phi_t + \frac{1}{t + 1} f(\bar{x}, y_{t + 1}).\] |

\[\begin{array}{rl} \rho(\phi_{t + 1}, S)^2 &= \min_{s\in S} \left\|\phi_{t + 1} - s\right\|^2 \\ \\ &\le \left\|\phi_{t + 1} - \psi_t\right\|^2 \\ \\ &= \left\|\frac{t}{t + 1} \phi_t + \frac{1}{t + 1} f(\bar{x}, y_{t + 1}) - \psi_t\right\|^2 \\ \\ &= \frac{1}{(t + 1)^2}\left\|t \left(\phi_t - \psi_t\right) + \left(f(\bar{x}, y_{t + 1}) - \psi_t\right)\right\|^2 \end{array}\] |

\[\rho(\phi_{t + 1}, S)^2 \le \frac{1}{(t + 1)^2} \left( t^2 \left\| \phi_t - \psi_t\right\|^2 + \left\|f_{t + 1} - \psi_t\right\|^2 + 2t (\phi_t - \psi_t)^\top (f_{t+1} - \psi_t ) \right),\] | (2) |

\[(\phi_t - \psi_t)^\top (f(\bar{x}, y_{t + 1}) - \psi_t) \le 0,\] |

\[\rho(\phi_{t + 1}, S)^2 \le \frac{1}{(t+1)^2} (t^2 \|\phi_t - \psi_t\|^2 + \|f_{t+1}-\psi_t\|^2).\] |

\[\rho(\phi_{t + 1}, S)^2 \le \frac{1}{(t+1)^2}\sum_{i=1}^t \|f_{i+1}-\psi_i\|^2.\] | (3) |

\[\rho(\phi_{t + 1}, S)^2 \le \frac{4t}{(t + 1)^2} \le \frac{4}{t + 1}\] | (4) |

## 4. Approximate Nash equilibria

Here we make the further assumption that $\mathcal{X}$ and $\mathcal{Y}$ are (finite-dimensional) polytopes, and that the game is zero-sum. We let $u : \mathcal{X}\times\mathcal{Y} \to \mathbb{R}$ represent the bilinear payoff function of the game. We start by recalling two classical definitions in game theory.**Definition 1**(Regret).

The (external)

where $u$ is the (bilinear) utility function of the game.

*regret*of player $\mathcal{X}$ against action $\hat{x}\in\mathcal{X}$ after history $\mathcal{H}_{t} = ((x_i, y_i))_{i=1}^{t}$ is defined as\[R_\mathcal{X}(\mathcal{H}_{t}, \hat{x}) \doteq \sum_{i = 1}^{t} u(\hat{x}, y_i) - u(x_i, y_i),\] |

**Definition 2**(Regret-minimizing strategy).

A strategy $\sigma_\mathcal{X}$ for player $\mathcal{X}$ is

for all histories $\mathcal{H}_{t}$ and all $\hat{x}\in\mathcal{X}$.

*regret-minimizing*, or*Hannan consistent*, if\[\limsup_{t\to+\infty} \frac{1}{t}R_\mathcal{X}(\mathcal{H}_{t}, \hat{x}) \le 0\] | (5) |

**Definition 3**(Approximate Nash equilibrium).

A strategy pair $(x, y) \in \mathcal{X}\times\mathcal{Y}$ is a $\epsilon$-Nash equilibrium
if

for all $\hat{x}\in\mathcal{X}$, and

for all $\hat{y}\in\mathcal{Y}$.

\[u(x,y) + \epsilon \ge u(\hat{x}, y)\] |

\[-u(x, y) + \epsilon \ge -u(x, \hat{y})\] |

**Theorem 3**.

In a zero-sum game, if at time $t$ the players’ average regrets are such that

for all histories $\mathcal{H}_{t} = ((x_i, y_i))_{i=1}^t$ and actions $\hat{x}\in\mathcal{X}, \hat{y}\in\mathcal{Y}$,
then the strategy pair

is a $(\epsilon_\mathcal{X} + \epsilon_\mathcal{Y})$-Nash equilibrium.

\[\frac{1}{t}R_\mathcal{X}(\mathcal{H}_{t}, \hat{x}) \le \epsilon_\mathcal{X}, \qquad \frac{1}{t}R_\mathcal{Y}(\mathcal{H}_{t}, \hat{y}) \le \epsilon_\mathcal{Y}\] |

\[(\bar{x}_t, \bar{y}_t) \doteq \left(\frac{1}{t}\sum_{i=1}^t x_i, \frac{1}{t}\sum_{i=1}^t y_i\right)\in\mathcal{X}\times\mathcal{Y}\] |

**Proof**. We have by hypothesis that for all $\hat{x},\hat{y}$,

\[\begin{array}{c} \frac{1}{t} \sum_{i=1}^t u(\hat{x}, y_i) - u(x_i, y_i) \le \epsilon_\mathcal{X}, \\ \\ \frac{1}{t} \sum_{i=1}^t -u(x_i, \hat{y}) + u(x_i, y_i) \le \epsilon_\mathcal{Y}. \end{array}\] |

\[\frac{1}{t} \sum_{i=1}^t u(\hat{x}, y_i) - u(x_i, \hat{y}) = u(\hat{x},\bar{y}_t) - u(\bar{x}_t, \hat{y}) \le \epsilon_\mathcal{X} + \epsilon_\mathcal{Y}.\] |

\[u(\bar{x}_t, \bar{y}_t) + (\epsilon_\mathcal{X} + \epsilon_\mathcal{Y}) \ge u(\hat{x}, \bar{y}_t),\quad \forall\ \hat{x}\in\mathcal{X}.\] |

\[-u(\bar{x}_t,\bar{y}_t) + (\epsilon_\mathcal{X} + \epsilon_\mathcal{Y}) \ge -u(\bar{x}_t,\hat{y}),\quad \forall\ \hat{y}\in\mathcal{Y}.\] |

### 4.1. From regret-minimization to approachability

*instantaneous regret*, $r_{\hat{x}}(x_t,y_t)$, representing the regret accumulated by player $\mathcal{X}$ against action $\hat{x}$ when the two players play $x_t\in\mathcal{X}$ and $y_t\in\mathcal{Y}$, respectively:

\[r_{\hat{x}}(x, y) \doteq u(\hat{x}, y) - u(x, y).\] |

\[\frac{1}{t} R_\mathcal{X}(\mathcal{H}_{t}, \hat{x}) = \frac{1}{t} \sum_{i=1}^{t} r_{\hat{x}}(x_i, y_i)\] |

\[r: (x, y) \mapsto (r_{\hat{x}}(x, y))_{\hat{x}\in\mathcal{X}},\] | (6) |

**Lemma 3**.

A strategy $\sigma_\mathcal{X}$ for player $\mathcal{X}$ is regret-minimizing
if

for all histories $\mathcal{H}_{t}$ and all $\hat{x} \in \{b_1, \ldots, b_n\}$,
where $\{b_1,\ldots,b_n\}$ is a convex basis for $\mathcal{X}$.

\[\limsup_{t\to+\infty} \frac{1}{t} R_\mathcal{X}(\mathcal{H}_{t}, \hat{x})\le 0\] |

**Proof**. ($\Longrightarrow$) This directions is trivial, as clearly $b_1, \ldots, b_n \in \mathcal{X}$.

\[\begin{array}{rl} \frac{1}{t}R_\mathcal{X}(\mathcal{H}_{t}, \hat{x}) &= \frac{1}{t}\sum_{i=1}^{t-1} u(\hat{x}, y_i) - u(x_i, y_i) \\ \\ &= \frac{1}{t} \sum_{i=1}^{t-1} \left(\sum_{j=1}^n \lambda_j(\hat{x}) u(b_j, y_i)\right) - \left(\sum_{j=1}^n \lambda_j(\hat{x}) u(x_i, y_i)\right)\\ \\ &= \sum_{j=1}^n \lambda_j(\hat{x})\left(\frac{1}{t}\sum_{i=1}^{t-1} u(b_j, y_i) - u(x_i,y_i)\right) \\ \\ &= \sum_{j=1}^n \lambda_j(\hat{x}) R_\mathcal{X}(\mathcal{H}_{t}, b_j). \end{array}\] |

\[\limsup_{t\to+\infty} \frac{1}{t}R_\mathcal{X}(\mathcal{H}_{t}, \hat{x}) \le \sum_{j=1}^n \lambda_j(\hat{x}) \limsup_{t\to+\infty} R_\mathcal{X}(\mathcal{H}_{t}, b_j) \le 0.\] |

### 4.2. Building an explicit strategy

\[f : (x, y) \mapsto (r_{b_1}(x, y), \ldots, r_{b_n}(x,y)),\] |

- We start by projecting $\phi_t$ onto $S$. The particular structure
of $S$ (non-positive orthant) makes it very easy:
where $[x]^-$ denotes the vector whose $i$-th component is equal to $\min\{x_i, 0\}$.\[\psi_t = [\phi_t]^-,\] All halfplanes tangent to $S$ and containing $S$ are of the form By definition, $H(\lambda)$ is forceable if there exists an action $\bar{x}\in\mathcal{X}$ such that $\lambda^\top f(\bar{x}, y) \ge 0$ for all $y\in\mathcal{Y}$. Expanding the definition of $f$,\[H(\lambda) = \{x : \lambda^\top x \ge 0\}, \quad \|\lambda\| \neq 0,\ \ \lambda_j \le 0 \ \ \forall j.\] so that the forceability condition becomes:\[\lambda^\top f(\bar{x}, y) = \sum_{j=1}^n \lambda_j (u(b_j, y) - u(\bar{x}, y)),\] Exploiting the bilinearity of $u$, we finally find that a forcing action for $\bar{x}$ has to satisfy\[u(\bar{x},y) \sum_{j=1}^n \lambda_j \le \sum_{j=1}^n \lambda_j u(b_j,y).\] Luckily, we can simply choose $\bar{x}$ so that the first argument on the left hand side matches\[u\left(\left(\sum_{j=1}^n \lambda_j\right) \bar{x}, y\right) \le u\left(\sum_{j=1}^n \lambda_j b_j, y\right).\] *exactly*the first argument on the right hand side: where $\Lambda\doteq\sum_{j=1}^n \lambda_j < 0$ conveniently acts as a normalization constant, so that Equation 7 not only gives us a candidate $\bar{x}$, it also guarantees that such $\bar{x}$ belongs to the polytope $\mathcal{X}$ by expressing it as a convex combination of the basis vectors $b_1,\ldots,b_n$.\[\bar{x} = \sum_{j=1}^n \frac{\lambda_j}{\Lambda} b_j,\] (7)

- We start by playing a random $x_1\in\mathcal{X}$, and observe the opponent’s move $y_1$.
- At time $t$, if $\phi_t \in S$, we play at random. Otherwise, we
play the action $x_t$ defined by Equation 7.
In both cases, move and wait for the opponent’s move $y_{t}$.
Notice that since $\lambda$ is parallel to $\psi_t - \phi_t = -[\phi_t]^+$,
we can rewrite $x_t$ as
\[x_t = \sum_{j=1}^n \frac{[\phi_t]_j^+}{\Lambda} b_j, \qquad \Lambda = \sum_{j=1}^n\,[\phi_t]_j^+.\]

\[\frac{1}{t} R_\mathcal{X}(\mathcal{H}_t, \hat{x}) \le \frac{2}{\sqrt{t}},\qquad \frac{1}{t} R_\mathcal{Y}(\mathcal{H}_t, \hat{y}) \le \frac{2}{\sqrt{t}}.\] |