Efficient Learning Algorithms under (Heavy) Contamination
October 8, 2025 (GHC 8102)

In this talk, I will present a series of new results in supervised learning from contaminated datasets, based on a general outlier removal algorithm inspired by recent work on learning with distribution shift. Specifically:

We will show that any function class that can be approximated by low-degree polynomials with respect to a hypercontractive distribution can be efficiently learned under bounded contamination (also known as nasty noise). This resolves a longstanding gap between the complexity of agnostic learning and learning with contamination, even though it was widely believed that low-degree approximators only implied tolerance to label noise.

For any function class that admits the (stronger) notion of sandwiching approximators, we obtain near-optimal learning guarantees even with respect to heavy additive contamination, where far more than 1/2 of the training set may be added adversarially. Prior related work held only for regression and in a list-decodable setting.

These results significantly advance our understanding of efficient supervised learning under contamination, a setting that has been much less studied than its unsupervised counterpart. As a notable application, our framework yields the first quasipolynomial-time algorithm for learning constant-depth circuits (AC⁰) under bounded contamination, extending the seminal result of Linial, Mansour, and Nisan on learning AC⁰ under adversarial label noise.