15-888 Computational Game Solving (Fall 2021) Tue, September 21th 2021
Lecture 5
Regret circuits and the Counterfactual Regret Minimization (CFR) paradigm
Instructor: Gabriele Farina
In Lecture 4 we studied online optimization for simplex domains. In this section we show that it is possible
to construct an efficient regret minimizer that outputs sequence-form strategies, by combining any local
regret minimizers at each decision point of the tree-form sequential decision making, including RM and RM+
discussed before. The resulting algorithm has been known since the seminal work by Zinkevich et al. [2007]
under the name counterfactual regret minimization (CFR).
The analysis we present is based on the formalism of regret circuits [Farina et al., 2019].
1 Regret circuits
Regret circuits are a constructive answer to the question: “given regret minimizers for two sets X , Y, can we
compose these regret minimizers for various operations performed on the sets (e.g., intersection, convex hull,
Cartesian product), in order to arrive at a regret minimizer for the resulting composite set?”. In particular, in
the following subsections, we will show that once regret minimizers for two sets X and Y are known, they can
always be efficiently combined to come up with composite regret minimizers for the Cartesian product X × Y
and the convex hull co{X , Y}.
Since, as we say in Lecture 2, any sequence-form strategy space Q can be defined inductively starting from
convex hulls and Cartesian products, we can keep combining regret circuits until we construct an algorithm
for Q. The resulting family of algorithms is known as counterfactual regret minimization (CFR).
1.1 Regret circuit for Cartesian product
Let X ⊆ m and Y ⊆ n be two sets, and let RX and RY be regret minimizers for X and Y respectively. We
can combine RX and RY to form a regret minimizer for the Cartesian product X × Y ⊆ m+n by doing the
following:
To output the next strategy, we ask RX and RY for their next strategies—call them xt and yt—and
return the pair (xt, yt) ∈ X × Y.
Any utility vector  m+n can be written as  = (X , Y ) m × n. When at time t we receive the
utility vector t = (t
X , t
Y ) we forward t
X to RX and t
Y to Y.
The algorithm we just described is summarized in Algorithm 1 and given pictorially in Figure 1.
Algorithm 1: Regret minimizer for X × Y
1 function NextStrategy()
2 xt ← RX .NextStrategy()
3 yt ← RY .NextStrategy()
4 return (xt, yt) ∈ X × Y
5 function ObserveUtility(t = (t
X , t
Y ) m × n)
6 RX .ObserveUtility(t
X )
7 RY .ObserveUtility(t
Y )
RX
RY
xt
yt
(xt, yt)t1
t1
X
t1
Y
Figure 1: Regret circuit for the Cartesian product
X × Y.
Analysis The regret minimizers RX and RY output strategies xt, yt and observe utilities t
X , t
Y at all time
t. Hence, the regret cumulated by RX and RY up to time T is given by
RT
X = max
ˆx∈X
{ T
t=1
(t
X )> ˆx (t
X )>xt
}
RT
Y = max
ˆy∈Y
{ T
t=1
(t
Y )> ˆy (t
Y )>yt
}
.
In order to verify that Algorithm 1 indeed yields a regret minimizer we study the regret cumulated by
Algorithm 1 up to any time T . Starting from the definition of regret, we have
RT
X ×Y = max
( ˆx, ˆy)∈X ×Y
{ T
t=1
(t
X , t
Y )>( ˆx, ˆy) (t
X , t
Y )>(xt, yt)
}
= max
ˆx∈X
{ T
t=1
(t
X )> ˆx
T
t=1
(t
X )>xt
}
+ min
ˆy∈Y
{ T
t=1
(t
Y )> ˆy
T
t=1
(t
Y )>yt
}
= RT
X + RT
Y .
So, if RX and RY guarantee sublinear regret, then so does Algorithm 1. In other words, it is possible
to minimize regret on X × Y by simply minimizing it on X and Y independently and then combining the
decisions.
In summary, we have proven the following result.
Theorem 1.1. The regret circuit of Figure 1 provides a regret minimizer for the Cartesian product X × Y
that satisfies the regret bound
RT
X ×Y = RT
X + RT
Y .
The extension to the Cartesian product of more than two sets is direct.
1.2 Regret circuit for convex hull
We now show how to combine RX and RY for two sets X ⊆ n and Y ⊆ n to construct a regret minimizer
for the convex hull co{X , Y}. The construction is harder than in the case of Cartesian products, but some of
the ideas carry over. Then, we will use a third regret minimizer R for the 2-simplex 2 decide how to “mix”
xt and yt. More precisely, we will do the following.
To output the next strategy, we ask RX and RY for their next strategies—call them xt and yt.
Then, we will ask R for its next strategy λt = (λt
1, λt
2) 2, and return the convex combination
λt
1 xt + λt
2 yt co{X , Y}.
When at time t we receive the utility vector t n, we forward t to RX and RY . Then, we forward
the utility vector
t
:=
((t)>xt
(t)>yt
)
. (1)
to R.
The algorithm we just described is summarized in Algorithm 2 and given pictorially in Figure 2.
Algorithm 2: Regret minimzer for co{X , Y}
1 function NextStrategy()
2 xt ← RX .NextStrategy()
3 yt ← RY .NextStrategy()
4 λt = (λt
1, λt
2) 2 ← R.NextStrategy()
5 return λt
1xt + λt
2yt co{X , Y}
6 function ObserveUtility(t)
7 RX .ObserveUtility(t)
8 RY .ObserveUtility(t)
9 R.ObserveUtility
(
t
:=
((t)>xt
(t)>yt
))
RX
RY
R
xt
xt1
yt
yt1
λt
1xt + λt
2yt
t1 t1
λt
Figure 2: Regret circuit for the convex hull co{X , Y}. T
utility vector t
λ is defined in Equation (1).
Analysis The regret minimizers RX and RY output strategies xt, yt and observe utilities t at all time t.
Hence, the regret cumulated by RX and RY up to time T is given by
RT
X = max
ˆx∈X
{ T
t=1
(t)> ˆx (t)>xt
}
(2)
RT
Y = max
ˆy∈Y
{ T
t=1
(t)> ˆy (t)>yt
}
. (3)
The regret minimizer R outputs strategies λt = (λt
1, λt
2) at all times t and observes utility vectors t
at all
times t. So, the regret cumulated by R is given by
RT
:= max
ˆλ2
{( T
t=1
ˆλ1(t)>xt + ˆλ2(t)>yt
)}

( T
t=1
λt
1(t)>xt + λt
2(t)>yt
)
.
In order to verify that Algorithm 2 indeed yields a regret minimizer we study the regret it cumulates up to
any time T . Starting from the definition of regret, we have
RT = max
ˆλ2
ˆx∈X , ˆy∈Y
{ T
t=1
(t)>λ1 ˆx + ˆλ2 ˆy) (t)>(λt
1xt + λt
2yt)
}
= max
ˆλ2
ˆx∈X , ˆy∈Y
{
ˆλ1
T
t=1
(t)> ˆx + ˆλ2
T
t=1
(t)> ˆy
}

( T
t=1
λt
1 (t)>xt + λt
2 (t)>yt
)
. (4)
Now, we make two important observations. First,
max
ˆλ2
ˆx∈X , ˆy∈Y
{
ˆλ1
T
t=1
(t)> ˆx + ˆλ2
T
t=1
(`t)> ˆy
}
= max
ˆλ2
{
ˆλ1 max
ˆx