
15-888 Computational Game Solving (Fall 2021) Thu, September 16th 2021
Lecture 4
Blackwell approachability and regret minimization on simplex domains
Instructor: Gabriele Farina
In Lecture 3 we have seen how no-external-regret dynamics converge to several solution concepts of interest,
including Nash equilibria in two-player zero-sum games, normal-form coarse-correlated equilibria in multiplayer
general-sum games, and more generally convex-concave saddle point problems. Furthermore, we have seen
that no-external-regret dynamics are a fundamental building block for constructing no-Φ-regret dynamics for
general sets Φ of deviation functions.
In this lecture, we start exploring how no-external-regret dynamics have been historically constructed for
action spaces of one-shot decision making problems (that is, probability simplexes). The historical construction
we will present today is not only very elegant, but also it leads to an algorithm that is still extremely popular
in practice today, called regret matching.
We will see other, more modern, techniques to construct no-external-regret dynamics for general convex
and compact domains later on in this course.
1 No-external-regret dynamics for probability simplexes
In this lecture we focus on the task of constructing no-external-regret dynamics for one-shot decision making
problems, that is, from minimizing regret on a simplex domain
n = {(x1, . . . , xn) n
0 : x1 + · · · + xn = 1}.
As we will see in the next lecture, it will be later possible to “upgrade” any regret minimizer for one-shot
decision making to a regret minimizer for tree-form decision making.
Remember that constructing a regret minimizer for n means that we need to build a mathematical object
that supports two operations:
NextElement has the effect that the regret minimizer will output an element xt n;
ObserveUtility(`t) provides the environment’s feedback to the regret minimizer, in the form of a
linear utility vector `t n that evaluates how good the last-output point.
Our goal will be to make sure that the (external) regret
RT := max
ˆxn
{ T
t=1
(
`t, ˆx〉 − 〈`t, xt
)}
.
grows sublinearly in T no matter the utility vectors `t chosen by the environment.
We will construct a regret minimizer for n by means of an ancient (and very elegant!) construction,
called Blackwell approachability. Blackwell approachability is a precursor of the theory of regret minimization,
and played a fundamental role in the historical development of several efficient online optimization methods.
In particular, as we will show in a minute, the problem of minimizing regret on a simplex can be rewritten as
a Blackwell approachability game. The solution of the Blackwell approachability game is a regret-minimizing
dynamic called regret matching (RM). As of today, regret matching and its variants are still often some of the
most practical algorithms for learning in games.
1.1 Blackwell approachability game
Blackwell approachability generalizes the problem of playing a repeated two-player game to games whose
utilites are vectors instead of scalars.
Definition 1.1. A Blackwell approachability game is a tuple (X , Y, u, S), where X , Y are closed convex
sets, u : X × Y → d is a biaffine function, and S d is a closed and convex target set. A Blackwell
approachability game represents a vector-valued repeated game between two players. At each time t, the
two payers interact in this order:
first, Player 1 selects an action xt ∈ X ;
then, Player 2 selects an action yt ∈ Y, which can depend adversarially on all the xt output so far;
finally, Player 1 incurs the vector-valued payoff u(xt, yt) d, where u is a biaffine function.
Player 1’s objective is to guarantee that the average payoff converges to the target set S. Formally, given
target set S d, Player 1’s goal is to pick actions x1, x2, . . . ∈ X such that no matter the actions
y1, y2, . . . ∈ Y played by Player 2,
min
ˆsS




ˆs 1
T
T
t=1
u(xt, yt)




2
0 as T → ∞. (1)
1.2 External regret minimization on the simplex via Blackwell approachability
Hart and Mas-Colell [2000] noted that the construction of a regret minimizer for a simplex domain n can be
reduced to constructing an algorithm for a particular Blackwell approachability game Γ := (∆n, n, u, n
0)
which we now describe. For all i ∈ {1, . . . , n}, the i-th component of the vector-valued payoff function u
measures the change in regret incurred at time t, compared to always playing the i-th vertex ei of the simplex.
Formally, u : ∆n × n n is defined as
u(xt, `t) = `t − 〈`t, xt1, (2)
where 1 is the n-dimensional vector whose components are all 1.
The following lemma establishes an important link between Blackwell approachability on Γ and external
regret minimization on the simplex n.
Lemma 1.1. The regret RT = maxˆxn 1
T
T
t=1`t, ˆx xt cumulated up to any time T by any sequence
of decisions x1, . . . , xT n is related to the distance of the average Blackwell payoff from the target
cone n
0 as
RT
T min
ˆsn
0




ˆs 1
T
T
t=1
u(xt, `t)




2
. (3)
So, a strategy for the Blackwell approachability game Γ is a regret-minimizing strategy for the simplex
domain n.
Proof. For any ˆx n, the regret cumulated compared to always playing ˆx satisfies
1
T RT ( ˆx) := 1
T
T
t=1
(
`t, ˆx〉 − 〈`t, xt
)
= 1
T
T
t=1
(
`t, ˆx〉 − 〈`t, xt〉〈1, ˆx
)
=

1
T
T
t=1
`t − 〈`t, xt1, ˆx

=

1
T
T
t=1
u(xt, `t), ˆx

= min
ˆsn
0

ˆs + 1
T
T
t=1
u(xt, `t), ˆx

, (4)
where we used the fact that ˆx n in the second equality, and that minˆsn
0 〈−ˆs, ˆx = 0 since ˆx 0.
Applying the Cauchy-Schwarz inequality to the right-hand side of (4), we obtain
1
T RT ( ˆx) min
ˆsn
0




ˆs + 1
T
T
t=1
u(xt, `t)




2
ˆx2.
So, using the fact that ˆx2 1 for any ˆx n,
1
T RT ( ˆx) min
ˆsn
0




ˆs + 1
T
T
t=1
u(xt, `t)




2
.
Taking a max over ˆx n yields the statement.
1.3 Solving Blackwell games: Blackwell’s algorithm
A central concept in the theory of Blackwell approachability is the following.
Definition 1.2 (Forceable halfspace). Let (X , Y, u, S) be a Blackwell approachability game and let H ⊆ d
be a halfspace, that is, a set of the form H = {x d : a>x b} for some a d, b . The halfspace
H is said to be forceable if there exists a strategy of Player 1 that guarantees that the payoff is in H no
matter the actions played by Player 2, that is, if there exists x ∈ X such that
u(x, y) ∈ H y ∈ Y.
When that is the case, we call action x a forcing action for H.
Blackwell’s approachability theorem [Blackwell, 1956] states the following.
Theorem 1.1 (Blackwell’s theorem). Goal (1) can be attained if and only if every halfspace H S is
forceable.
We constructively prove the direction that shows how forceability translates into a sequence of strategies
that guarantees that goal (1) is attained. Let (X , Y, u, S) be the Blackwell game. The method is pretty simple:
at each time step t = 1, 2, . . . operate the following:
1. Compute the average payoff received so far, that is, φt = 1
t
t1
τ =1 u(xτ , yτ ).
2. Compute the Euclidean projection ψt of φt onto the target set S.
3. If φt S (that is, goal (1) has already been met), pick and play any xt ∈ X , observe the opponent’s
action yt, and return.
4. Else, consider the halfspace Ht tangent to S at the projection point ψt, that contains S. In symbols,
Ht := {z d : (φt ψt)>z (φt ψt)>ψt}.
5. By hypothesis, Ht is forceable. Pick xt to be a forcing action for Ht, observe the opponent’s action yt,
and return.
The above method is summarized in Figure 1.
Ht
S
×
φt
ψt
u(xt, yt)
φt+1
Figure 1: Construction of the approachability strategy described in Section 1.3.
Let’s see how the average payoff φt changes when we play as described above. Clearly,
φt+1 = 1
t
t
τ =1
u(xτ , yτ ) = t 1
t φt + 1
t u(xt, yt).
Hence, denoting with ρt+1 the squared Euclidean distance between φt+1 and the target set, that is,
ρt := min
ˆsS

ˆs φt
2
2,
we have
ρt+1
ψt φt+1
2
2 =



ψt t 1
t φt 1
t u(xt, yt)




2
2
=




t 1
t
(ψt φt) + 1
t
(ψt u(xt, yt))∥



2
2
= (t 1)2
t2 ρt + 1
t2

ψt u(xt, yt)
2
2 + 2(t 1)
t2
ψt φt, ψt u(xt, yt). (5)
The proof so far does not use any particular assumption about how xt is picked. Here is where that enters the
picture. If φt S, then ψt = φt and therefore the last inner product is equal to 0. Otherwise, we have that
ψt φt 6 = 0. In that case, xt is constructed by forcing the halfspace Ht, and therefore, no matter how yt is
picked by the opponent we have
u(xt, yt) ∈ Ht ⇐⇒ (φt ψt)>u(xt, yt) (φt ψt)>ψt
⇐⇒ ψt φt, ψt u(xt, yt) 0.
Plugging in the last inequality into (5) and bounding ψt u(xt, yt)2
2 2 where 2 is a diameter parameter
of the game (which only depends on u and S), we obtain
ρt+1 (t 1)2
t2 ρt + Ω2
t2 = t2ρt+1 (t 1)2ρt 2 t = 1, 2, . . . .
Summing the inequality above for t = 0, . . . , T 1 and removing the telescoping terms, we obtain
T 2ρT +1 T 2 = ρT +1 2
T = min
ˆsS




ˆs 1
T
T
t=1
u(xt, yt)




2

T , (6)
which implies that the average payoff in the Blackwell game converges to S at a rate of O(1/T ).
1.4 The regret matching (RM) algorithm
At this point we have all the ingredients necessary to construct the first regret minimizer for simplex domains
of this course: the regret matching (RM) algorithm.
First, recall from Section 1.2 that the external regret minimization on the simplex can be solved via the
Blackwell game Γ := (∆n, n, u, n
0) where u : ∆n × n n is defined as
u(xt, `t) = `t − 〈`t, xt1, (7)
where 1 is the n-dimensional vector whose components are all 1. We will solve this Blackwell approachability
game using the strategy explained in Section 1.3.
Computation of ψt (Step 2). Let’s start from looking at how to compute the projection ψt of φt onto
S = 0. Projection onto the nonpositive orthant amounts to a component-wise minimum with 0, that is,
ψt = [φt]. Hence,
φt ψt = [φt]+ = (φt ψt)>ψt = 0.
Halfspace to be forced (Step 4). Following on with Blackwell’s algorithm, when [φt]+ 6 = 0, the halfspace
to be forced at each time t is
Ht := {z n : [φt]+, z〉 ≤ 0}.
Forcing action for Ht (Step 5). We now show that a forcing action for Ht indeed exists. Remember
that by definition, that is an action x n such that no matter the ` n, u(x, `) ∈ Ht. Expanding the
definition of Ht and u, we are looking for a x n such that
[φt], ` − 〈`, x1〉 ≤ 0 ` n ⇐⇒ [φt], `〉 − 〈`, x〉〈[φt]+, 1〉 ≤ 0 ` n
⇐⇒ [φt], `〉 − 〈`, x〉‖[φt]+1 0 ` n
⇐⇒

`, [φt]
[φt]+1

− 〈`, x〉 ≤ 0 ` n
⇐⇒

`, [φt]
[φt]+1
x

0 ` n.
Note that we are lucky: [φt]+/[φt]1 is a nonnegative vector whose entries sum to 1. So, the above inequality
can be satisfied with equality for the choice
x = [φt]+
[φt]+1
n.
In other words, we have that Blackwell’s algorithm in this case picks
xt+1 = [φt]+
[φt]+1
n ⇐⇒ xt+1 [φt]+ [rt]+, where rt :=
t
τ =1
`τ − 〈`τ , xτ 1.
Remark 1.1. The vector rt contains the regret cumulated compared to always playing each of the available
actions with probability 1. Since xt+1 is proportional to rt, effectively the algorithm picks a distribution
over actions that is proportional to the regret cumulated compared each of the actions. This justifies why
this algorithm is known under the name “regret matching”.
We also remark the following useful property of regret matching.
Remark 1.2. One very appealing property of the regret matching algorithm is its lack of hyperparameters.
It just works “out of the box”.
The resulting algorithm is summarized in Algorithm 1. By combining the analysis of the Blackwell approacha-
Algorithm 1: Regret matching
1 r0 0 n, x0 1/n n
2 function NextStrategy()
3 θt [rt1]+
4 if θt 6 = 0 return xt θt / θt1
5 else return xt any point in n
6 function ObserveUtility(`t)
7 rt rt1 + `t − 〈`t, xt1
Algorithm 2: Regret matching+
1 z0 0 n, x0 1/n n
2 function NextStrategy()
3 θt [zt1]+
4 if θt 6 = 0 return xt θt / θt1
5 else return xt any point in n
6 function ObserveUtility(`t)
7 zt [zt1 + `t − 〈`t, xt1]+
bility strategy (6) together with Lemma 1.1, we obtain the following result.
Theorem 1.2. The regret matching algorithm (Algorithm 1) is an external regret minimizer, and satisfies
the regret bound RT T , where is the maximum norm `t − 〈`t, xt12 up to time T .
1.5 The regret matching+ (RM+) algorithm
The regret matching+ algorithm was introduced by Tammelin [2014], Tammelin et al. [2015], and is given
in Algorithm 2. It differs from RM only on the last line, where a further thresholding is added. That small
change has the effect that actions with negative cumulated regret (that is, “bad” actions) are treated as actions
with 0 regret. Hence, intuitively, if a bad action were to become good over time, it would take less time for
RM+ to notice and act on that change. Because of that, regret matching+ has stronger practical performance
and is often preferred over regret matching in the game solving literature.
With a simple modification to the analysis in Section 1.3, the same bound as RM can be proven.
Theorem 1.3. The regret matching+ algorithm (Algorithm 2) is an external regret minimizer, and satisfies
the regret bound RT T , where is the maximum norm `t − 〈`t, xt12 up to time T .
We will not get into further details about RM+ in this lecture, but we might touch back on it again later
+ and online mirror
descent, probably the most well-studied algorithm in the field of online learning.
References
Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econo-
metrica, 68:1127–1150, 2000.
David Blackwell. An analog of the minmax theorem for vector payoffs. Pacific Journal of Mathematics, 6:1–8,
1956.
Oskari Tammelin. Solving large imperfect information games using CFR+, 2014.
Oskari Tammelin, Neil Burch, Michael Johanson, and Michael Bowling. Solving heads-up limit Texas hold’em.
In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2015.

Content

PDF File

Typo or question?

Get in touch!
gfarina AT mit.edu

Metadata

Course: CMU 15-888
Term: Fall 2021
Date: 2021-09-16