Stability Yields a PTAS for k-Median and k-Means

Pranjal Awasthi, Avrim Blum and Or Sheffet
[Paper]

Abstract

Ostrovsky et al. [ORSS06] show that given n points in Euclidean space such that the optimal (k-1)-means clustering is a factor 1/epsilon^2 more expensive than the best k-means clustering, one can get a (1+f(epsilon))-approximation to k-means in time poly(n,k) by using a variant of Lloyd's algorithm. In this work we show we can replace the "1/epsilon^2" with just "1+alpha" for any constant alpha>0 and obtain a PTAS. In particular, under this assumption, for any epsilon>0 we can achieve a 1+epsilon approximation for k-means in time polynomial in n and k, and exponential in 1/epsilon and 1/alpha (our running time is n^O(1) * (k log n)^poly(1/epsilon,1/alpha).). We thus decouple the strength of the assumption from the quality of the approximation ratio. We give a PTAS for k-median in finite metrics under the analogous assumption.
We also show we can obtain a PTAS under the assumption of Balcan-Blum-Gupta09 (see below) that all 1+alpha approximations are delta-close to a desired target clustering, in the case that all target clusters have size greater than 2delta n and alpha is constant. Note that the point of BBG09 is that the true goal in clustering is usually to get close to the target rather than to achieve a good objective value. From this perspective, our advance is that for k-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(delta) to delta when all target clusters are large, and for k-median we improve the "largeness" condition in BBG09 needed to get exactly delta-close from O(delta*n) to delta*n. Our results are based on a new notion of clustering stability.