Hidden convexity, Dines’ Theorem

In this post, I will prove Dines’ TheoremFun fact: Lloyd L. Dines was the chair of the math department at Carnegie Institute of Technology in the 30s–40s, prior to it merging with the Mellon Institute of Industrial Research to form CMU. (a neat fact about the convexity of a quadratic image of Rn{\mathbb{R}}^n) as well as a generalization to Cn{\mathbb{C}}^n. These facts will be useful when I eventually get around to writing a post about hidden convexity and the S-lemma.

new My friend Kevin Shu recently wrote a blog post giving a (relatively) elementary proof of Brickman’s Theorem, which is very related to this theorem.


Dines’ Theorem

Theorem 1 (Dines (1941)). Let A1A_1, A2SnA_2\in{\mathbb{S}}^n be n×nn\times n real symmetric matrices. Then the image of Rn{\mathbb{R}}^n under the quadratic map Q:RnR2Q:{\mathbb{R}}^n\to{\mathbb{R}}^2, given by Q:x(xA1xxA2x),\begin{aligned} Q:x\mapsto \begin{pmatrix} x^\intercal A_1 x\\ x^\intercal A_2 x \end{pmatrix},\end{aligned} is convex.

This is pretty surprising in my opinion: There are obvious counterexamples if one attempts to generalize this result. For example if one allows inhomogeneity, then the following set {(xx2):xR}\begin{aligned} \left\{\begin{pmatrix} x\\ x^2 \end{pmatrix}:\, x\in{\mathbb{R}}\right\}\end{aligned} is the graph of a parabola and is nonconvex. Similarly, if one were to allow an additional quadratic form A3SnA_3\in{\mathbb{S}}^n, then {(x12x1x2x22):xR2}\begin{aligned} \left\{\begin{pmatrix} x_1^2\\ x_1x_2\\ x_2^2 \end{pmatrix}:\, x\in{\mathbb{R}}^2\right\}\end{aligned} is (isomorphic to) the set of positive semidefinite rank-one matrices in S2{\mathbb{S}}^2, a nonconvex set (plotted below).

image

Clearly, both homogeneity and the fact that we are working in R2{\mathbb{R}}^2 will play roles in any proof of Dines’ Theorem.


Proof. Let Q(x)Q(x), Q(y)Q(Rn)Q(y)\in Q({\mathbb{R}}^n) and for notational convenience, let Q(x,y)(xA1yxA2y).\begin{aligned} Q(x,y) \coloneqq \begin{pmatrix} x^\intercal A_1 y\\ x^\intercal A_2 y \end{pmatrix}.\end{aligned}

First, suppose Q(x)Q(x) and Q(y)Q(y) are collinear in R2{\mathbb{R}}^2. Then, as Q(Rn)Q({\mathbb{R}}^n) is a cone, i.e., αqQ(Rn)\alpha q \in Q({\mathbb{R}}^n) for every α0\alpha\geq 0 and qQ(Rn)q\in Q({\mathbb{R}}^n), we have that the interval [Q(x),Q(y)][Q(x),Q(y)] is contained in Q(Rn)Q({\mathbb{R}}^n).

Next, suppose Q(x)Q(x) and Q(y)Q(y) are not collinear. Then, there exists a vR2v\in{\mathbb{R}}^2 such that v,Q(x),v,Q(y)ϵ>0\left\langle v, Q(x) \right\rangle, \left\langle v,Q(y) \right\rangle \geq \epsilon >0 (i.e., Q(x)Q(x) and Q(y)Q(y) are both bounded away from the origin in some direction vv).

Now, let z=(1μ)x+μ(±y)z = (1- \mu) x + \mu (\pm y) where we will fix the sign ±\pm later. In words, zz parameterizes the line containing xx and ±y\pm y. Note that when μ=0\mu = 0 we have Q(z)=Q(x)Q(z) = Q(x), and when μ=1\mu = 1 we have Q(z)=Q(±y)=Q(y)Q(z) = Q(\pm y) = Q(y). We will pick the sign ±\pm so that Q(z)Q(z) is also bounded away from the origin in the direction vv for all μ[0,1]\mu\in[0,1]: Note that v,Q(z)=μ2v,Q(x)+(1μ)2v,Q(y)±2μ(1μ)v,Q(x,y).\begin{aligned} \left\langle v,Q(z) \right\rangle &= \mu^2 \left\langle v,Q(x) \right\rangle + (1-\mu)^2 \left\langle v,Q(y) \right\rangle\\ &\qquad\pm 2\mu(1-\mu)\left\langle v, Q(x,y) \right\rangle.\end{aligned} In particular, by picking the sign ±\pm correctly, the last term above is nonnegative for all μ[0,1]\mu\in[0,1]. Consequently, v,Q(z)ϵ/2>0\left\langle v,Q(z) \right\rangle \geq \epsilon/2>0 for all μ[0,1]\mu\in[0,1]. Finally, leveraging the geometry of R2{\mathbb{R}}^2 and the fact that Q(Rn)Q({\mathbb{R}}^n) is a cone, we deduce that the interval [Q(x),Q(y)][Q(x), Q(y)] is contained in Q(Rn)Q({\mathbb{R}}^n). ◻


What breaks for three quadratic forms?

If we look over the above proof line by line to see what breaks when we go from two quadratic forms to three quadratic forms, we will see that not too much breaks. Obviously, something does break because of the counterexample we saw earlier, but in fact the entirety of the proof remains intact save the very last sentence.

Specifically, in the case that Q(x)Q(x) and Q(y)R3Q(y)\in{\mathbb{R}}^3 are not collinear, we can still find a path between Q(x)Q(x) and Q(y)Q(y) contained in the image Q(Rn)Q({\mathbb{R}}^n) and bounded away from zero in the direction vv. Unfortunately, we may no longer conclude that the interval [Q(x),Q(y)][Q(x), Q(y)] is contained in Q(Rn)Q({\mathbb{R}}^n), however, as this path may loop, so that it is not contained in the same plane as Q(x)Q(x) and Q(y)Q(y). See for example, the indicated points in the plotted figure above.


The power of Cn{\mathbb{C}}^n

Curiously, the three quadratic forms generalization of Dines’ Theorem is true when we switch from Rn{\mathbb{R}}^n to Cn{\mathbb{C}}^n.

Theorem 2 (???I remember reading about this result in some paper a while ago but don’t remember the correct reference at the moment. Will fix later.). Let A1,A2,A3HnA_1,A_2,A_3\in{\mathbb{H}}^n be n×nn\times n Hermitian matrices. Then the image of Cn{\mathbb{C}}^n under the quadratic map Q:CnR3Q:{\mathbb{C}}^n\to{\mathbb{R}}^3, given by Q:x(xA1xxA2xxA3x),\begin{aligned} Q: x\mapsto \begin{pmatrix} x^* A_1 x\\ x^* A_2 x\\ x^* A_3 x \end{pmatrix},\end{aligned} is convex.

The main difference between Rn{\mathbb{R}}^n and Cn{\mathbb{C}}^n, in this context, is that instead of simply being allowed to pick between a sign ±\pm for yy, we now have the freedom to pick eiθe^{i\theta} for any θ[0,2π)\theta\in[0,2\pi). Or, perhaps more suggestively, we get to pick ±eiη\pm e^{i\eta} for both some sign and some η[0,π)\eta\in [0,\pi). This extra freedom, the choice of η\eta, will allow us to force the path that we construct between Q(x)Q(x) and Q(y)Q(y) onto the plane containing Q(x)Q(x) and Q(y)Q(y). The rest of the proof follows analogously to Dines’ Theorem.

Dines, L. L. 1941. “On the Mapping of Quadratic Forms.” Bull. Amer. Math. Soc. 47 (6): 494–98.

Last updated Oct 31, 2021