15-888 Computational Game Solving (Fall 2021) Tue, September 21th 2021

Lecture 5

Regret circuits and the Counterfactual Regret Minimization (CFR) paradigm

Instructor: Gabriele Farina

In Lecture 4 we studied online optimization for simplex domains. In this section we show that it is possible

to construct an efficient regret minimizer that outputs sequence-form strategies, by combining any local

regret minimizers at each decision point of the tree-form sequential decision making, including RM and RM+

discussed before. The resulting algorithm has been known since the seminal work by Zinkevich et al. [2007]

under the name counterfactual regret minimization (CFR).

The analysis we present is based on the formalism of regret circuits [Farina et al., 2019].

1 Regret circuits

Regret circuits are a constructive answer to the question: “given regret minimizers for two sets X , Y, can we

compose these regret minimizers for various operations performed on the sets (e.g., intersection, convex hull,

Cartesian product), in order to arrive at a regret minimizer for the resulting composite set?”. In particular, in

the following subsections, we will show that once regret minimizers for two sets X and Y are known, they can

always be efficiently combined to come up with composite regret minimizers for the Cartesian product X × Y

and the convex hull co{X , Y}.

Since, as we say in Lecture 2, any sequence-form strategy space Q can be defined inductively starting from

convex hulls and Cartesian products, we can keep combining regret circuits until we construct an algorithm

for Q. The resulting family of algorithms is known as counterfactual regret minimization (CFR).

1.1 Regret circuit for Cartesian product

Let X ⊆ ℝm and Y ⊆ ℝn be two sets, and let RX and RY be regret minimizers for X and Y respectively. We

can combine RX and RY to form a regret minimizer for the Cartesian product X × Y ⊆ ℝm+n by doing the

following:

• To output the next strategy, we ask RX and RY for their next strategies—call them xt and yt—and

return the pair (xt, yt) ∈ X × Y.

• Any utility vector ` ∈ ℝm+n can be written as ` = (`X , `Y ) ∈ ℝm × ℝn. When at time t we receive the

utility vector `t = (`t

X , `t

Y ) we forward `t

X to RX and `t

Y to Y.

The algorithm we just described is summarized in Algorithm 1 and given pictorially in Figure 1.

Algorithm 1: Regret minimizer for X × Y

1 function NextStrategy()

2 xt ← RX .NextStrategy()

3 yt ← RY .NextStrategy()

4 return (xt, yt) ∈ X × Y

5 function ObserveUtility(`t = (`t

X , `t

Y ) ∈ ℝm × ℝn)

6 RX .ObserveUtility(`t

X )

7 RY .ObserveUtility(`t

Y )

RX

RY

xt

yt

(xt, yt)`t−1

`t−1

X

`t−1

Y

Figure 1: Regret circuit for the Cartesian product

X × Y.

Analysis The regret minimizers RX and RY output strategies xt, yt and observe utilities `t

X , `t

Y at all time

t. Hence, the regret cumulated by RX and RY up to time T is given by

RT

X = max

ˆx∈X

{ T∑

t=1

(`t

X )> ˆx − (`t

X )>xt

}

RT

Y = max

ˆy∈Y

{ T∑

t=1

(`t

Y )> ˆy − (`t

Y )>yt

}

.

In order to verify that Algorithm 1 indeed yields a regret minimizer we study the regret cumulated by

Algorithm 1 up to any time T . Starting from the definition of regret, we have

RT

X ×Y = max

( ˆx, ˆy)∈X ×Y

{ T∑

t=1

(`t

X , `t

Y )>( ˆx, ˆy) − (`t

X , `t

Y )>(xt, yt)

}

= max

ˆx∈X

{ T∑

t=1

(`t

X )> ˆx −

T∑

t=1

(`t

X )>xt

}

+ min

ˆy∈Y

{ T∑

t=1

(`t

Y )> ˆy −

T∑

t=1

(`t

Y )>yt

}

= RT

X + RT

Y .

So, if RX and RY guarantee sublinear regret, then so does Algorithm 1. In other words, it is possible

to minimize regret on X × Y by simply minimizing it on X and Y independently and then combining the

decisions.

In summary, we have proven the following result.

Theorem 1.1. The regret circuit of Figure 1 provides a regret minimizer for the Cartesian product X × Y

that satisfies the regret bound

RT

X ×Y = RT

X + RT

Y .

The extension to the Cartesian product of more than two sets is direct.

1.2 Regret circuit for convex hull

We now show how to combine RX and RY for two sets X ⊆ ℝn and Y ⊆ ℝn to construct a regret minimizer

for the convex hull co{X , Y}. The construction is harder than in the case of Cartesian products, but some of

the ideas carry over. Then, we will use a third regret minimizer R∆ for the 2-simplex ∆2 decide how to “mix”

xt and yt. More precisely, we will do the following.

• To output the next strategy, we ask RX and RY for their next strategies—call them xt and yt.

Then, we will ask R∆ for its next strategy λt = (λt

1, λt

2) ∈ ∆2, and return the convex combination

λt

1 xt + λt

2 yt ∈ co{X , Y}.

• When at time t we receive the utility vector `t ∈ ℝn, we forward `t to RX and RY . Then, we forward

the utility vector

`t

∆ :=

((`t)>xt

(`t)>yt

)

. (1)

to R∆.

The algorithm we just described is summarized in Algorithm 2 and given pictorially in Figure 2.

Algorithm 2: Regret minimzer for co{X , Y}

1 function NextStrategy()

2 xt ← RX .NextStrategy()

3 yt ← RY .NextStrategy()

4 λt = (λt

1, λt

2) ∈ ∆2 ← R∆.NextStrategy()

5 return λt

1xt + λt

2yt ∈ co{X , Y}

6 function ObserveUtility(`t)

7 RX .ObserveUtility(`t)

8 RY .ObserveUtility(`t)

9 R∆.ObserveUtility

(

`t

∆ :=

((`t)>xt

(`t)>yt

))

RX

RY

R∆

xt

xt−1

yt

yt−1

λt

1xt + λt

2yt

`t−1 `t−1

∆ λt

Figure 2: Regret circuit for the convex hull co{X , Y}. T

utility vector `t

λ is defined in Equation (1).

Analysis The regret minimizers RX and RY output strategies xt, yt and observe utilities `t at all time t.

Hence, the regret cumulated by RX and RY up to time T is given by

RT

X = max

ˆx∈X

{ T∑

t=1

(`t)> ˆx − (`t)>xt

}

(2)

RT

Y = max

ˆy∈Y

{ T∑

t=1

(`t)> ˆy − (`t)>yt

}

. (3)

The regret minimizer R∆ outputs strategies λt = (λt

1, λt

2) at all times t and observes utility vectors `t

∆ at all

times t. So, the regret cumulated by R∆ is given by

RT

∆ := max

ˆλ∈∆2

{( T∑

t=1

ˆλ1(`t)>xt + ˆλ2(`t)>yt

)}

−

( T∑

t=1

λt

1(`t)>xt + λt

2(`t)>yt

)

.

In order to verify that Algorithm 2 indeed yields a regret minimizer we study the regret it cumulates up to

any time T . Starting from the definition of regret, we have

RT = max

ˆλ∈∆2

ˆx∈X , ˆy∈Y

{ T∑

t=1

(`t)>(ˆλ1 ˆx + ˆλ2 ˆy) − (`t)>(λt

1xt + λt

2yt)

}

= max

ˆλ∈∆2

ˆx∈X , ˆy∈Y

{

ˆλ1

T∑

t=1

(`t)> ˆx + ˆλ2

T∑

t=1

(`t)> ˆy

}

−

( T∑

t=1

λt

1 (`t)>xt + λt

2 (`t)>yt

)

. (4)

Now, we make two important observations. First,

max

ˆλ∈∆2

ˆx∈X , ˆy∈Y

{

ˆλ1

T∑

t=1

(`t)> ˆx + ˆλ2

T∑

t=1

(`t)> ˆy

}

= max

ˆλ∈∆2

{

ˆλ1 max

ˆx

Lecture 5

Regret circuits and the Counterfactual Regret Minimization (CFR) paradigm

Instructor: Gabriele Farina

In Lecture 4 we studied online optimization for simplex domains. In this section we show that it is possible

to construct an efficient regret minimizer that outputs sequence-form strategies, by combining any local

regret minimizers at each decision point of the tree-form sequential decision making, including RM and RM+

discussed before. The resulting algorithm has been known since the seminal work by Zinkevich et al. [2007]

under the name counterfactual regret minimization (CFR).

The analysis we present is based on the formalism of regret circuits [Farina et al., 2019].

1 Regret circuits

Regret circuits are a constructive answer to the question: “given regret minimizers for two sets X , Y, can we

compose these regret minimizers for various operations performed on the sets (e.g., intersection, convex hull,

Cartesian product), in order to arrive at a regret minimizer for the resulting composite set?”. In particular, in

the following subsections, we will show that once regret minimizers for two sets X and Y are known, they can

always be efficiently combined to come up with composite regret minimizers for the Cartesian product X × Y

and the convex hull co{X , Y}.

Since, as we say in Lecture 2, any sequence-form strategy space Q can be defined inductively starting from

convex hulls and Cartesian products, we can keep combining regret circuits until we construct an algorithm

for Q. The resulting family of algorithms is known as counterfactual regret minimization (CFR).

1.1 Regret circuit for Cartesian product

Let X ⊆ ℝm and Y ⊆ ℝn be two sets, and let RX and RY be regret minimizers for X and Y respectively. We

can combine RX and RY to form a regret minimizer for the Cartesian product X × Y ⊆ ℝm+n by doing the

following:

• To output the next strategy, we ask RX and RY for their next strategies—call them xt and yt—and

return the pair (xt, yt) ∈ X × Y.

• Any utility vector ` ∈ ℝm+n can be written as ` = (`X , `Y ) ∈ ℝm × ℝn. When at time t we receive the

utility vector `t = (`t

X , `t

Y ) we forward `t

X to RX and `t

Y to Y.

The algorithm we just described is summarized in Algorithm 1 and given pictorially in Figure 1.

Algorithm 1: Regret minimizer for X × Y

1 function NextStrategy()

2 xt ← RX .NextStrategy()

3 yt ← RY .NextStrategy()

4 return (xt, yt) ∈ X × Y

5 function ObserveUtility(`t = (`t

X , `t

Y ) ∈ ℝm × ℝn)

6 RX .ObserveUtility(`t

X )

7 RY .ObserveUtility(`t

Y )

RX

RY

xt

yt

(xt, yt)`t−1

`t−1

X

`t−1

Y

Figure 1: Regret circuit for the Cartesian product

X × Y.

Analysis The regret minimizers RX and RY output strategies xt, yt and observe utilities `t

X , `t

Y at all time

t. Hence, the regret cumulated by RX and RY up to time T is given by

RT

X = max

ˆx∈X

{ T∑

t=1

(`t

X )> ˆx − (`t

X )>xt

}

RT

Y = max

ˆy∈Y

{ T∑

t=1

(`t

Y )> ˆy − (`t

Y )>yt

}

.

In order to verify that Algorithm 1 indeed yields a regret minimizer we study the regret cumulated by

Algorithm 1 up to any time T . Starting from the definition of regret, we have

RT

X ×Y = max

( ˆx, ˆy)∈X ×Y

{ T∑

t=1

(`t

X , `t

Y )>( ˆx, ˆy) − (`t

X , `t

Y )>(xt, yt)

}

= max

ˆx∈X

{ T∑

t=1

(`t

X )> ˆx −

T∑

t=1

(`t

X )>xt

}

+ min

ˆy∈Y

{ T∑

t=1

(`t

Y )> ˆy −

T∑

t=1

(`t

Y )>yt

}

= RT

X + RT

Y .

So, if RX and RY guarantee sublinear regret, then so does Algorithm 1. In other words, it is possible

to minimize regret on X × Y by simply minimizing it on X and Y independently and then combining the

decisions.

In summary, we have proven the following result.

Theorem 1.1. The regret circuit of Figure 1 provides a regret minimizer for the Cartesian product X × Y

that satisfies the regret bound

RT

X ×Y = RT

X + RT

Y .

The extension to the Cartesian product of more than two sets is direct.

1.2 Regret circuit for convex hull

We now show how to combine RX and RY for two sets X ⊆ ℝn and Y ⊆ ℝn to construct a regret minimizer

for the convex hull co{X , Y}. The construction is harder than in the case of Cartesian products, but some of

the ideas carry over. Then, we will use a third regret minimizer R∆ for the 2-simplex ∆2 decide how to “mix”

xt and yt. More precisely, we will do the following.

• To output the next strategy, we ask RX and RY for their next strategies—call them xt and yt.

Then, we will ask R∆ for its next strategy λt = (λt

1, λt

2) ∈ ∆2, and return the convex combination

λt

1 xt + λt

2 yt ∈ co{X , Y}.

• When at time t we receive the utility vector `t ∈ ℝn, we forward `t to RX and RY . Then, we forward

the utility vector

`t

∆ :=

((`t)>xt

(`t)>yt

)

. (1)

to R∆.

The algorithm we just described is summarized in Algorithm 2 and given pictorially in Figure 2.

Algorithm 2: Regret minimzer for co{X , Y}

1 function NextStrategy()

2 xt ← RX .NextStrategy()

3 yt ← RY .NextStrategy()

4 λt = (λt

1, λt

2) ∈ ∆2 ← R∆.NextStrategy()

5 return λt

1xt + λt

2yt ∈ co{X , Y}

6 function ObserveUtility(`t)

7 RX .ObserveUtility(`t)

8 RY .ObserveUtility(`t)

9 R∆.ObserveUtility

(

`t

∆ :=

((`t)>xt

(`t)>yt

))

RX

RY

R∆

xt

xt−1

yt

yt−1

λt

1xt + λt

2yt

`t−1 `t−1

∆ λt

Figure 2: Regret circuit for the convex hull co{X , Y}. T

utility vector `t

λ is defined in Equation (1).

Analysis The regret minimizers RX and RY output strategies xt, yt and observe utilities `t at all time t.

Hence, the regret cumulated by RX and RY up to time T is given by

RT

X = max

ˆx∈X

{ T∑

t=1

(`t)> ˆx − (`t)>xt

}

(2)

RT

Y = max

ˆy∈Y

{ T∑

t=1

(`t)> ˆy − (`t)>yt

}

. (3)

The regret minimizer R∆ outputs strategies λt = (λt

1, λt

2) at all times t and observes utility vectors `t

∆ at all

times t. So, the regret cumulated by R∆ is given by

RT

∆ := max

ˆλ∈∆2

{( T∑

t=1

ˆλ1(`t)>xt + ˆλ2(`t)>yt

)}

−

( T∑

t=1

λt

1(`t)>xt + λt

2(`t)>yt

)

.

In order to verify that Algorithm 2 indeed yields a regret minimizer we study the regret it cumulates up to

any time T . Starting from the definition of regret, we have

RT = max

ˆλ∈∆2

ˆx∈X , ˆy∈Y

{ T∑

t=1

(`t)>(ˆλ1 ˆx + ˆλ2 ˆy) − (`t)>(λt

1xt + λt

2yt)

}

= max

ˆλ∈∆2

ˆx∈X , ˆy∈Y

{

ˆλ1

T∑

t=1

(`t)> ˆx + ˆλ2

T∑

t=1

(`t)> ˆy

}

−

( T∑

t=1

λt

1 (`t)>xt + λt

2 (`t)>yt

)

. (4)

Now, we make two important observations. First,

max

ˆλ∈∆2

ˆx∈X , ˆy∈Y

{

ˆλ1

T∑

t=1

(`t)> ˆx + ˆλ2

T∑

t=1

(`t)> ˆy

}

= max

ˆλ∈∆2

{

ˆλ1 max

ˆx