CMU Theory Lunch

Parallel Approximation Algorithms for Metric Facility Location

Or Sheffet

Nov 03, 2010

Abstract: We consider k-median clustering in finite metric spaces and k-means clustering in Euclidean spaces. In 2006 Ostrovsky et al. proposed an interesting condition under which one can achieve better k-means approximations in time polynomial in n and k. They consider k-means instances where the optimal k-clustering has cost noticeably smaller than the cost of any (k-1)-clustering, motivated by the idea that ``if a near-optimal k-clustering can be achieved by a partition into fewer than k clusters, then that smaller value of k should be used to cluster the data''. Under the assumption that the ratio of the cost of the optimal (k-1)-means clustering to the cost of the optimal k-means clustering is at least max{100,1/\epsilon2}, Ostrovsky et al. show that one can obtain a (1+f(\epsilon))-approximation for k-means in time polynomial in n and k, by using a variant of Lloyd's algorithm. In this work we improve their result by giving a PTAS for the k-means problem under the assumption that the ratio of the cost of the optimal (k-1)-means clustering to the cost of the optimal k-means clustering is at least (1+alpha), for some constant alpha >= 0. This result also extends to the k-median clustering problem. In addition our technique also obtains a PTAS under the assumption of Balcan et al. that all (1+\alpha) approximations are delta-close to a desired target clustering, in the case that all target clusters have size greater than delta*n and alpha > 0 is constant. Our results are based on a new notion of clustering stability, which extends both stability assumptions mentioned above. This is joint work with Pranjal Awasthi and Avrim Blum.