Data driven algorithm design for combinatorial problems is an important aspect of modern data science and algorithm design. Rather than using off the shelf algorithms that only have worst case performance guarantees, practitioners typically optimize over large families of parametrized algorithms and tune the parameters of these algorithms using a training set of problem instances from their domain to determine a configuration with high expected performance over future instances. However, most of this work comes with no performance guarantees. The challenge is that for many combinatorial problems, including partitioning and subset selection problems, a small tweak to the parameters can cause a cascade of changes in the algorithm's behavior, so the algorithm's performance is a discontinuous function of its parameters.
In this talk, I will present new work that helps put data driven combinatorial algorithm selection on firm foundations. This includes strong computational and statistical performance guarantees, both for the batch and online scenarios where a collection of typical problem instances from the given application are presented either all at once or in an online fashion, respectively. I will describe both specific examples (for clustering, partitioning, and subset selection problems) and general principles that emerge in this context.
The Machine Learning Seminar is generously supported by Duolingo
This talk will be recorded and posted on the ML website