Computer Science Thesis Oral
- Gates Hillman Centers
- Traffic21 Classroom 6501
- COLIN WHITE
- Ph.D. Student
- Computer Science Department
- Carnegie Mellon University
New Aspects of Beyond Worst-Case Analysis
Traditionally, the theory of algorithms has focused on the analysis of worst-case instances. While this has led to a thorough understanding of a wide variety of problems, there are many problems for which worst-case analysis is not useful for empirical or real-world instances. A rapidly developing line of research, the so-called beyond worst-case analysis of algorithms (BWCA), considers the design and analysis of problems using more realistic models or using natural structural properties. In this thesis, we continue the line of work of BWCA for clustering and distributed learning by making contributions in several areas. Specifically, we focus on three main problems and models.
i) Clustering has benefited greatly from BWCA. The perturbation resilience assumption proposed by Bilu and Linial (2011) states that the optimal clustering does not change under small perturbations to the input distances. We design efficient algorithms that output optimal or near-optimal clusterings for the canonical k-center objective (in both symmetric and asymmetric forms) under perturbation resilience.
ii) An algorithm’s performance may vary drastically between two applications. We consider a data-driven approach, in which a clustering application is represented by a distribution over problem instances, and the goal is to find the best algorithm for the (unknown) distribution. We define rich, infinite classes of linkage algorithms with dynamic programming, as well as local search algorithms with initialization, generalizing the k-means algorithm. We bound the pseudo-dimension of these classes, which leads to computational- and sample-efficient meta-algorithms for some of the algorithm classes we study.
iii) We give a data-dependent dispatching algorithm for distributed machine learning, which takes advantage of the fact that similar data points often belong to the same or similar classes. We provide new dispatching algorithms which cast the problem as clustering with important balance and fault-tolerance conditions. Finally, we design general and robust distributed clustering algorithms.
Maria-Florina Balcan (Chair)
Avrim Blum (Toyota Technological Institute at Chicago)
Yury Makarychev (Toyota Technological Institute at Chicago)