Stability Yields a PTAS for k-Median and k-Means

Pranjal Awasthi, Avrim Blum and Or Sheffet
[Paper]

Abstract

Ostrovsky et al. [ORSS06] show that given n points in Euclidean space such that the optimal (k-1)-means clustering is a factor 1/epsilon^2 more expensive than the best k-means clustering, one can get a (1+f(epsilon))-approximation to k-means in time poly(n,k) by using a variant of Lloyd's algorithm. In this work we show we can replace the "1/epsilon^2" with just "1+alpha" for any constant alpha>0 and obtain a PTAS. In particular, under this assumption, for any epsilon>0 we can achieve a 1+epsilon approximation for k-means in time polynomial in n and k, and exponential in 1/epsilon and 1/alpha (our running time is n^O(1) * (k log n)^poly(1/epsilon,1/alpha).). We thus decouple the strength of the assumption from the quality of the approximation ratio. We give a PTAS for k-median in finite metrics under the analogous assumption.
We also show we can obtain a PTAS under the assumption of Balcan-Blum-Gupta09 (see below) that all 1+alpha approximations are delta-close to a desired target clustering, in the case that all target clusters have size greater than 2delta n and alpha is constant. Note that the point of BBG09 is that the true goal in clustering is usually to get close to the target rather than to achieve a good objective value. From this perspective, our advance is that for k-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(delta) to delta when all target clusters are large, and for k-median we improve the "largeness" condition in BBG09 needed to get exactly delta-close from O(delta*n) to delta*n. Our results are based on a new notion of clustering stability.


Personal Comments

A motivation that is not well-detailed in the paper is to take the BBG-assumption to the extreme. Balcan, Blum and Gupta work under the premise that for some alpha we have the guarantee that any (1+\alpha) approximation for the k-Median objective must cluster the majority of points correctly. One can show that our notion of Weak-Deletion Stability comes from the premise that for some alpha, all (1+\alpha) approximations for the k-Median objective must indeed use k clusters and cannot have only (k-1) clusters. See presentation-slides for more details.