15-451 Algorithms
Fall 2014
D. Sleator
Closest Pairs November 17, 2014
------------------------------------------------------------------
We'll give two algorithms for the following proglem:
Given n points in the plane, find the pair of points that is the
closest together.
The first algorithm is a deterministic divide and conquer and runs in
O(n log n). The second one is random incremental and runs in expected
time O(n).
Preliminaries.
==============
We assume the points are presented as real number pairs (x,y).
We assume arithmetic on reals is accurate and runs in O(n) time.
We will assume that we can take the floor function of a real.
We also assume that hashing is O(1) time.
These assumptions (in this context) are resonable, because the
algorithms will not abuse this power.
O(n log n) Algorithm
====================
First we present the algorithm. Then we prove that it works.
Then we analyze its running time.
Initiation: sort the points by x, and also separately by y.
Closest_Pair (p1, p2, ... pn):
if n <= 3 then solve and return the answer.
let m=floor(n/2)
let delta = min (Closest_Pair (p1...pm), Closest Pair (pm+1...pn))
form a list of the points (sorted by increasing y) that are
within delta of the x coordinate of pm. Call these points q1, q2... qk
Let d = the min over i=2 to k of
min distance from qi to q{i-7}, q{i-6}...q{i-1}.
Return min (d,delta)
The divide and conquer approach is obvious. The closest pair is
either within the left half, within the right half, or it has one
endpoint in the left half and one in the right half.
The only tricky part of proving that this works is the 2nd to last
part where we look at the 7 points below qi. Why is this sufficient?
See figure 5.7 of Kleinberg and Tardos.
Note that each box of size delta/2 can contain at most one point.
Assume WLOG that qi is in the left side. Then there are at most 3
points on the left that are within delta of qi, and at most 4 on the
right that are within delta of qi. Since we're only looking for a
point that is closer than delta to qi, we only have to look at those
seven points. This establishes the correctness of the algorithm.
As for running time, the initiation phase sorts by x and also by y.
This is O(n log n). Subsequent phases just filter these sorted lists,
and no further sorting is done.
The algorithm partitions the points (O(n)), does two recursive calls
of n/2 in size, scans the points again to form the list q (O(n)),
then scans the list q looking for the closest side crossing pair
(O(n)).
So we get the classic divide and conquer recurrence:
T(n) = 2 T(n/2) + n
which solves to O(n log n).
Sariel Har-Peled's Randomized O(n) Algorithm for closest pair
=============================================================
For any set of points P, let CP(P) be the closest pair
distance in P.
We're going to define a "grid" data structure, and an API for it. The
grid (denoted G) stores a set of points, (we'll call P) and also
stores the closest pair distance r for those points. The number r is
called "the grid size of G". Here's the API:
MakeGrid(p,q): Make and return a new grid using r=dist(p,q)
as the initial grid size.
Lookup(G,p): p is a point, G is a grid. This returns
two types of answers. Let r' be the closest
distance from p to a point in P. If r' < r
then return r'. If r'>r return "Not Closest".
Note that lookup() does not need to compute
r' if r' >= r.
Insert(G,p): G is a grid. p is a new point not in the
grid. This inserts p into the grid. It returns
the current grid size.
Here's how we implement these. First define a function
Boxify(x,y,r). This returns the integer point ((floor(x/r),floor(y/r)).
The data structure maintains a hash table whose keys are integer
pairs. The values in the hash table are lists of points from P. So
the key (i,j) (also called a box) in the hash table stores all the
points of P whose Boxify() value is (i,j). Also, it is inductively
maintained that the grid size r is always equal to CP(P), the
closest pair distance for the set of points being stored.
MakeGrid(p,q) is trivial. Just insert p and q into a new table
with r = dist(p,q).
Lookup(G,p) computes Boxify(p,r). It then looks in that box, and the
8 surrounding ones, and computes the distance between p and the
closest one of these. Call this number r'. If r'r then return "Not Closest". This works because we know
that if there is a point closer to p than r, it must be in one of the
9 boxes that are searched by this function. Also note that the
running time of this is O(1) because it does 9 lookups in the hash
table, and the total number of points it has to consider is at most
36. (A box contains at most 4 points.)
Insert(G,p) works as follows. It first does a Lookup(p,G). If the
result is "Not Closest" it just inserts p into the data structure
into box Boxify(p). This is O(1) time. On the other hand if the
Lookup() returns r' [] in
let add_to_h h i box =
Hashtbl.replace h box (i::(getbox h box))
in
let make_grid i r =
(* put points p.(0) ... p.(i) into a new grid of size r.
it has already been established that the closest pair
in that point set has distance r *)
let h = Hashtbl.create 10 in
for j=0 to i do
add_to_h h j (boxify p.(j) r)
done;
h
in
Random.self_init ();
let swap i j =
let (pi,pj) = (p.(i),p.(j)) in
p.(i) <- pj;
p.(j) <- pi
in
for i=0 to n-2 do
let r = Random.int (n-i) in
swap i (i+r)
done;
let rec loop h i r =
(* already built the table for points 0...i, and they have dist r *)
if i=n-1 then r else
let i = i+1 in
let (ix,iy) = boxify p.(i) r in
let li = ref [] in
for x = ix-1 to ix+1 do
for y = iy-1 to iy+1 do
li := (getbox h (x,y)) @ !li
done
done;
let r' = List.fold_left (
fun ac j -> min (dist i j) ac
) max_float !li in
if r' < r then (
loop (make_grid i r') i r'
) else (
add_to_h h i (ix,iy);
loop h i r
)
in
let r0 = dist 0 1 in
loop (make_grid 1 r0) 1 r0
let () =
let p = [|(0.0,0.0); (3.0,4.0); (20.0,15.0); (15.0,20.0)|] in
let answer = closest_pair p in
Printf.printf "%f\n" answer;