Approximate Clustering
without the Approximation. With Avrim Blum and Anupam Gupta. SODA
2009.
[view
abstract]
Approximation
algorithms for clustering points in metric spaces is a flourishing area
of research, with
much research effort spent on getting a better understanding of the
approximation guarantees possible for
many objective functions such as k-median, k-means, and min-sum
clustering.
This quest for better approximation algorithms is further fueled by the
implicit hope that these better
approximations also give us better clusterings. E.g., for many problems
such as clustering proteins by function,
or clustering images by subject, there is some unknown “correct” target
clustering—one that clusters
proteins by function, or that clusters images by content—and there is
an implicit hope is that approximately
optimizing these objective functions will in fact produce a clustering
that is close (in symmetric
difference) to the truth.
In this paper, we show that if we make this implicit assumption
explicit—that is, if we assume that
any c-approximation to the given clustering objective F is ε^O-close to
the target—then we can produce
clusterings that are O(ε)-close to the target, even for values c for
which obtaining a c-approximation is
NP-hard. In particular, for k-median and k-means objectives, we show
that we can achieve this guarantee
for any constant c > 1, and for min-sum objective we can do this for
any constant c > 2.
Our results also highlight a somewhat surprising conceptual difference
between assuming that the optimal
solution to, say, the k-median objective is ε-close to the target, and
assuming that any approximately
optimal solution is ε-close to the target, even for approximation
factor say c = 1.01. In the former case,
the problem of finding a solution that is O(ε)close to the target
remains computationally hard, and yet for
the latter we have an efficient algorithm.