List-Decodable Linear Regression

October 23, 2019 (GHC 6115)

Suppose we are given n noisy linear equations over the reals with $\alpha n$ inliers and $(1-\alpha) n$ outliers. The inliers are chosen as follows: Fix an unknown vector $L \in R^d$. Choose $x_1, x_2, \ldots,x_{\alpha n}$ i.i.d. from a distribution D on $R^d$ and let $y_i = \langle L,x_i \rangle$ for each i. The outliers are completely arbitrary and may be chosen potentially adversarially with respect to the inliers.

Here, \alpha, the fraction of inliers can be << 1/2 so only a minority of the given equations may be inliers. Our goal is to find a list of size $O(1/\alpha)$ potential vectors such that L is close in L2 distance to some member of the list.

In this talk, I'll describe a $O((d/\alpha)^{1/\alpha^8}))$ algorithm to solve this problem whenever D is satisfies a condition called as certifiable anti-concentration. This condition is satisfied by standard gaussians and more generally, any spherically symmetric distribution.

On the other hand, I'll also show why this problem is not "identifiable", i.e., insoluble even given unbounded time/samples if the distribution D of the inlier x does not satisfy a certain anti-concentration property. Surprisingly, even "nice" distributions such as uniform on the Boolean hypercube are not anti-concentrated.

The algorithm is based on a new framework for list-decodable estimation using the sum-of-squares method.

Based on joint work with Sushrut Karmalkar and Adam Klivans.