Computational Geometry Introduction
15-750 CMU Spring 2009
Gary Miller
The Convex Hull Problem
The sorting problem of CG.
A point set A subset R^d is convex if it is closed under convex
combinations. That is, if we take any convex combination of any
two points in A, the result is a point in A.
DEF: ConvexClosure(A) = CC(A) = smallest convex set containing A
There are two different definition of the convex hull. The first one
is the one we will use. The second is used by some books.
DEF1: ConvexHull(A) = boundary of CC(A)
DEF2: ConvexHull(A) = CC(A)
A computer representation of a convex hull must include the
combinatorial structure. In two dimensions, this just means a simple
polygon in, say counter clockwise order. In many of our algorithms
we'll need to be able to traverse the hull in either direction
starting form a vertex or an edge. We'll also need to be able to add
and delete points and edges.
We can characterize when a pair is on the convex hull as follows:
(a,b) is on CH(A) iff a,b in A and forall a' in A a' is either left or on the line (a,b)
-------------------------------------------------------------------
2D Convex Hulls by divide and conquer
Let the points be P1...Pn, where Pi = (xi, yi)
Preprocess the points by sorting all of them by
their x coordinates. (This step happens just once. In all
recursive calls, we can assume the points are sorted.)
Now recursively compute the convex hull
of {P1...Pn/2}, call it LH, and and {Pn/2+1...Pn}
call it RH.
(Picture here)
Now the problem is to stitch the left and right halves together to
form a hull for all the points. Here's how to do this:
Let a be the rightmost point in LH and b be the leftmost point in
RH. Start with the line segment [a,b]. Let a'(b') be the next
point clockwise around the LH(RH) from a(b). Let a"(b") be the
next point counter-clockwise around the LH(RH) from a(b). Consider
the three points [a',a,a] if a' is right of then set a = a' and
continue. If it's on or left, then stop. Now do the analogous
walk around RH starting from b. If at some point neither walk can
proceed, the segment [a,b] is the new segment to be added to the CH
of all the points on the bottom.
We give pseudo code:
Repeat the following:
While a' is right of (a,b) do a=a'
While b" is right of (a,b) do b=b"
An analogous algorithm suffices to compute the new edge to be added
on top of the hull. Once these new edges are found, the new convex
hull is stitched together form the relevant parts.
Preprocessing takes O(n log n) to sort. After that it's a standard
divide and conquer algorithm with the following recurrence:
T(n) = 2T(n/2) + n
This solves to O(n log n).
-----------------------------------------------------------------
Lower bound for computing the convex hull
Suppose the input to a sorting problem are the numbers X1, ..., Xn
Consider the convex hull problem:
(X1,X1^2), ..., (Xn,Xn^2)
The convex hull of these points must will be the points in sorted order.
Thus:
Any comparison based CH algorithm must make \Omega( n log n )
comparisons.
-------------------------------------------------------------------
QuickSort and Backwards Analysis
Before we present and analyze a simple randomized CH algorithm, let's
go back and look at another way to analyze QuickSort called Backwards
Analysis.
We assume that the keys are distinct
Recall randomized QuickSort:
QS(M)
1) pick random b in M
2) split M into S and L with S < b < L
3) return [QS(S), b, QS(L)]
The cost of the call QS(M) (Not counting recursive calls) is the size
of the array M minus 1. (This is how many comparisons are done.)
------------------------------------
To do the backwards analysis we propose a simple game with a cost
function that has the same behavior.
Consider a dart board with n empty squares.
+-----------------------+
| | | | | | | | |
+-----------------------+
All n squares start out empty.
The dart game proceeds as follows:
While there exists a empty square do:
Pick an empty square at random. Put a dart into it. The cost
of the step is the number empty squares to the left and right
of the new dart.
The expected cost of this game is exactly the same as the expected
cost of randomized quicksort. Why? Let's prove this by induction.
In the base case, n=0 or n=1. In this case QS(M) costs zero and the
dart game costs zero. In the general case, the cost of the first dart
is n-1, and the cost of the top level call to QS(M) is just n-1 (not
counting the costs of the recursive calls). We know that the dart
game eventually fills every cell with a dart. And that the darts to
the right of the first dart have no influence on what happens to the
left of it. So we can rearrange the sequence of darts to put those
that are in the left half first, then those in the right half. This
does not change the cost of the dart game. Now by induction the cost
of the dart game in the left half is the same as quicksorting the left
half. The right half is the same.
The dart game can be played backwards. Here's how to do it.
Layout a dart board with n squares each with a dart.
While there exists a dart on the board do:
Randomly pick a dart. The cost is the number of empty squares
to the left and right of the new dart. Remove the dart.
The only random parameter in the forward dart game is the permutation
that is chosen to add the darts. Similarly the only random parameter
in the reverse dart game is the permutation that is chosen to remove
the darts. Thus if we made a movie of the dart game and watched it
backwards, it would be totally indistinguishable (drawn from the same
distribution) from a movie of the backwards dart game.
Thus, if we can analyze the expected cost of the backwards dart game,
it will tell us the expected cost of the forward dart game (and thus
quicksort).
-------------------------------------------------------------------------
Let's analyze the backwards dart game.
Suppose there are i darts left. Define Ti as follows:
Ti = Expected cost to remove a random dart from a
board of i darts.
One way of computing this cost is to take the total cost of each of
the possibilities (the dart to remove) and divide by the number of
darts (i). The total cost of all possibilities can be computed by
allocating the costs to the empty cells. So when considering the
option of removing a dart d, put a token into each of the empty cells
in the blocks of empty cells to the left and right of d. The number
of tokens distributed is the cost of the option of removing dart d.
The sum of these costs over all darts is the number of tokens on all
the empty cells. It's easy to see that each empty cell can get at
most two tokens, one from each of the two darts at the boundaries of
that block of empty cells.
Total cost of all options at step i <= 2*(# empty cells) = 2(n-i)
Ti = Average cost of all options at step i <= 2(n-i)/i = 2n/i - 2
Let E_n be the expected cost of the backwards dart game on a
board of size n.
E_n = Tn + Tn-1 + .. T1
= [2n SUM 1/i] - 2n
i = 1...n
= 2nHn - 2n = O(n log n)
This is the same as the cost of the dart game running in the forward
direction, and the expected cost of quicksort.
--------------------------------------------------
2D Random Incremental Convex Hull
Input: P = { P1, ..., Pn}, where Pi is in R^2.
These are called the "sites". No three sites are
collinear.
Output: The convex hull of P.
Recall that the representation of the convex hull is a doubly linked
data structure of the sites and the edges between them. This makes is
possible in O(1) time to remove a site or add a site to the convex
hull. This representation is used internally during the running of
the algorithm.
Procedure: RandomIncrementalCH(P)
0) Pick three sites from P, say P1, P2, and P3. Construct our
standard convex hull representation of the triangle consisting of
these three points. Also choose a point C interior to T.
1) Construct the ray from C to each other site in P, and continuing
to infinity. This ray must cross one of the edges of the
triangle.
2) Partition the sites in P by which edge of T their ray crosses.
For each edge of T we keep a bit more information in the data
structure. We keep the set of sites whose rays cross this edge of
T. This will be maintained as the algorithm proceeds below.
3) Compute a random permutation of the remaining sites, P4...Pn.
Now process each of the sites Pi in this order. (The invariant
that we maintain is that we have at any point in time the convex
hull of the sites that have been processed.) Here is how site Pi
is processed:
Looking at the ray from C-->Pi and the edge e of the current
convex hull that it crosses, determine if Pi is inside or
outside of the current convex hull. If it's inside continue
with the next site.
Now, if Pi is outside, then update the data structure with
BuildTent(Pi,e).
4) The convex hull we have at the end is the convex hull of all the
sites.
Procedure: BuildTent(p,e)
Let CH denote the current convex hull (convex hull of all the
processed sites). The goal of this step is to processes one site.
1) Find all edges visible to p on CH by searching out from e in both
directions. These are called the "visible edges".
2) Replace visible edges with two new edges (a "tent") in the CH.
The new edges will be from p to the left and right end points of
the visible region.
3) For each site in the site sets maintained for each visible edge:
assign it to whichever of the two new edges its ray crosses.
---------------------------------------------------------------------
Note about optimizing this: In an actual implementation you would not
keep around unprocessed sites who are inside the convex hull. So the
sets of sites maintained at the edges would always be outside the
current convex hull. This only makes the algorithm more efficient.
However, we'll analyze the algorithm as stated above.
---------------------------------------------------------------------
Running time analysis.
It's obvious that all work other than that done by BuildTent()
takes O(n) time. What about BuildTent()?
Let's partition the cost of BuildTent() running over the entire
computation, into two parts. The "Edge Cost" is the cost of parts 1
and 2. It's proportional to the number of visible edges. The "Ray
Cost" is the cost of part 3, and it's proportional to the number of
sites in the site sets of visible edges processed there.
Let's analyze these costs separately. It's easy to see that the total
Edge Costs are O(n). Just keep a token on each edge of the current
CH. When we run BuildTent() we remove some visible edges and add two
new edges. Use the tokens on the visible edges that are removed to
pay for their removal. Then put a new token on each of the new edges.
Thus only two new tokens are needed each time, for a total cost of 2n
tokens.
What about the Ray Costs? Here's where we use backward analysis.
Let's write down the sequence of sites (after the random permutation
of P4...Pn).
P1 P2 P3 | P4 P5 P6 ... Pi Pi+1 Pi_2 ... Pn
\_____________/ \______________/
processed unprocessed
In a backward running of the algorithm we pick a random processed
point and unprocess it. There are i-3 points to choose from (P1, P2
and P3 are not part of the randomization). The number of rays stored
in all the edges of the current CH is one for each unprocessed point,
or (n-i). What is the expected Ray Cost of the step? There are i-3
unprocessed sites to choose from to remove. The ones that are
interior to the current CH have zero ray cost. The ones on the CH
incur a ray cost equal to the number of rays crossing the edge to the
left of this site plus the number of rays crossing the edge to the
right of this site. (This situation is exactly analogous to the
situation we analyzed in removing darts from the quicksort dart board.
The role of an empty cell on the board is played by a ray to an
unprocessed site.) So among all the options of processed sites to
unprocess, each ray is counted twice. (Actually it may be less
because of the possibility that some of P1, P2 or P3 are still on the
CH, and we're not allowed to choose these.) So the expected ray cost
is:
R_i = Expected ray cost of unstep i <= 2(n-i) / (i-3)
<= 2n/(i-3)
E_n = Expected total Ray Cost <= SUM [ 2n/(i-3) ]
i = 4...n
<= 2nH(n-3) = O(n log n)
(Here H(k) is the kth harmonic number).
------------------------------------------------------