Computational Geometry Introduction 15-750 CMU Spring 2009 Gary Miller The Convex Hull Problem The sorting problem of CG. A point set A subset R^d is convex if it is closed under convex combinations. That is, if we take any convex combination of any two points in A, the result is a point in A. DEF: ConvexClosure(A) = CC(A) = smallest convex set containing A There are two different definition of the convex hull. The first one is the one we will use. The second is used by some books. DEF1: ConvexHull(A) = boundary of CC(A) DEF2: ConvexHull(A) = CC(A) A computer representation of a convex hull must include the combinatorial structure. In two dimensions, this just means a simple polygon in, say counter clockwise order. In many of our algorithms we'll need to be able to traverse the hull in either direction starting form a vertex or an edge. We'll also need to be able to add and delete points and edges. We can characterize when a pair is on the convex hull as follows: (a,b) is on CH(A) iff a,b in A and forall a' in A a' is either left or on the line (a,b) ------------------------------------------------------------------- 2D Convex Hulls by divide and conquer Let the points be P1...Pn, where Pi = (xi, yi) Preprocess the points by sorting all of them by their x coordinates. (This step happens just once. In all recursive calls, we can assume the points are sorted.) Now recursively compute the convex hull of {P1...Pn/2}, call it LH, and and {Pn/2+1...Pn} call it RH. (Picture here) Now the problem is to stitch the left and right halves together to form a hull for all the points. Here's how to do this: Let a be the rightmost point in LH and b be the leftmost point in RH. Start with the line segment [a,b]. Let a'(b') be the next point clockwise around the LH(RH) from a(b). Let a"(b") be the next point counter-clockwise around the LH(RH) from a(b). Consider the three points [a',a,a] if a' is right of then set a = a' and continue. If it's on or left, then stop. Now do the analogous walk around RH starting from b. If at some point neither walk can proceed, the segment [a,b] is the new segment to be added to the CH of all the points on the bottom. We give pseudo code: Repeat the following: While a' is right of (a,b) do a=a' While b" is right of (a,b) do b=b" An analogous algorithm suffices to compute the new edge to be added on top of the hull. Once these new edges are found, the new convex hull is stitched together form the relevant parts. Preprocessing takes O(n log n) to sort. After that it's a standard divide and conquer algorithm with the following recurrence: T(n) = 2T(n/2) + n This solves to O(n log n). ----------------------------------------------------------------- Lower bound for computing the convex hull Suppose the input to a sorting problem are the numbers X1, ..., Xn Consider the convex hull problem: (X1,X1^2), ..., (Xn,Xn^2) The convex hull of these points must will be the points in sorted order. Thus: Any comparison based CH algorithm must make \Omega( n log n ) comparisons. ------------------------------------------------------------------- QuickSort and Backwards Analysis Before we present and analyze a simple randomized CH algorithm, let's go back and look at another way to analyze QuickSort called Backwards Analysis. We assume that the keys are distinct Recall randomized QuickSort: QS(M) 1) pick random b in M 2) split M into S and L with S < b < L 3) return [QS(S), b, QS(L)] The cost of the call QS(M) (Not counting recursive calls) is the size of the array M minus 1. (This is how many comparisons are done.) ------------------------------------ To do the backwards analysis we propose a simple game with a cost function that has the same behavior. Consider a dart board with n empty squares. +-----------------------+ | | | | | | | | | +-----------------------+ All n squares start out empty. The dart game proceeds as follows: While there exists a empty square do: Pick an empty square at random. Put a dart into it. The cost of the step is the number empty squares to the left and right of the new dart. The expected cost of this game is exactly the same as the expected cost of randomized quicksort. Why? Let's prove this by induction. In the base case, n=0 or n=1. In this case QS(M) costs zero and the dart game costs zero. In the general case, the cost of the first dart is n-1, and the cost of the top level call to QS(M) is just n-1 (not counting the costs of the recursive calls). We know that the dart game eventually fills every cell with a dart. And that the darts to the right of the first dart have no influence on what happens to the left of it. So we can rearrange the sequence of darts to put those that are in the left half first, then those in the right half. This does not change the cost of the dart game. Now by induction the cost of the dart game in the left half is the same as quicksorting the left half. The right half is the same. The dart game can be played backwards. Here's how to do it. Layout a dart board with n squares each with a dart. While there exists a dart on the board do: Randomly pick a dart. The cost is the number of empty squares to the left and right of the new dart. Remove the dart. The only random parameter in the forward dart game is the permutation that is chosen to add the darts. Similarly the only random parameter in the reverse dart game is the permutation that is chosen to remove the darts. Thus if we made a movie of the dart game and watched it backwards, it would be totally indistinguishable (drawn from the same distribution) from a movie of the backwards dart game. Thus, if we can analyze the expected cost of the backwards dart game, it will tell us the expected cost of the forward dart game (and thus quicksort). ------------------------------------------------------------------------- Let's analyze the backwards dart game. Suppose there are i darts left. Define Ti as follows: Ti = Expected cost to remove a random dart from a board of i darts. One way of computing this cost is to take the total cost of each of the possibilities (the dart to remove) and divide by the number of darts (i). The total cost of all possibilities can be computed by allocating the costs to the empty cells. So when considering the option of removing a dart d, put a token into each of the empty cells in the blocks of empty cells to the left and right of d. The number of tokens distributed is the cost of the option of removing dart d. The sum of these costs over all darts is the number of tokens on all the empty cells. It's easy to see that each empty cell can get at most two tokens, one from each of the two darts at the boundaries of that block of empty cells. Total cost of all options at step i <= 2*(# empty cells) = 2(n-i) Ti = Average cost of all options at step i <= 2(n-i)/i = 2n/i - 2 Let E_n be the expected cost of the backwards dart game on a board of size n. E_n = Tn + Tn-1 + .. T1 = [2n SUM 1/i] - 2n i = 1...n = 2nHn - 2n = O(n log n) This is the same as the cost of the dart game running in the forward direction, and the expected cost of quicksort. -------------------------------------------------- 2D Random Incremental Convex Hull Input: P = { P1, ..., Pn}, where Pi is in R^2. These are called the "sites". No three sites are collinear. Output: The convex hull of P. Recall that the representation of the convex hull is a doubly linked data structure of the sites and the edges between them. This makes is possible in O(1) time to remove a site or add a site to the convex hull. This representation is used internally during the running of the algorithm. Procedure: RandomIncrementalCH(P) 0) Pick three sites from P, say P1, P2, and P3. Construct our standard convex hull representation of the triangle consisting of these three points. Also choose a point C interior to T. 1) Construct the ray from C to each other site in P, and continuing to infinity. This ray must cross one of the edges of the triangle. 2) Partition the sites in P by which edge of T their ray crosses. For each edge of T we keep a bit more information in the data structure. We keep the set of sites whose rays cross this edge of T. This will be maintained as the algorithm proceeds below. 3) Compute a random permutation of the remaining sites, P4...Pn. Now process each of the sites Pi in this order. (The invariant that we maintain is that we have at any point in time the convex hull of the sites that have been processed.) Here is how site Pi is processed: Looking at the ray from C-->Pi and the edge e of the current convex hull that it crosses, determine if Pi is inside or outside of the current convex hull. If it's inside continue with the next site. Now, if Pi is outside, then update the data structure with BuildTent(Pi,e). 4) The convex hull we have at the end is the convex hull of all the sites. Procedure: BuildTent(p,e) Let CH denote the current convex hull (convex hull of all the processed sites). The goal of this step is to processes one site. 1) Find all edges visible to p on CH by searching out from e in both directions. These are called the "visible edges". 2) Replace visible edges with two new edges (a "tent") in the CH. The new edges will be from p to the left and right end points of the visible region. 3) For each site in the site sets maintained for each visible edge: assign it to whichever of the two new edges its ray crosses. --------------------------------------------------------------------- Note about optimizing this: In an actual implementation you would not keep around unprocessed sites who are inside the convex hull. So the sets of sites maintained at the edges would always be outside the current convex hull. This only makes the algorithm more efficient. However, we'll analyze the algorithm as stated above. --------------------------------------------------------------------- Running time analysis. It's obvious that all work other than that done by BuildTent() takes O(n) time. What about BuildTent()? Let's partition the cost of BuildTent() running over the entire computation, into two parts. The "Edge Cost" is the cost of parts 1 and 2. It's proportional to the number of visible edges. The "Ray Cost" is the cost of part 3, and it's proportional to the number of sites in the site sets of visible edges processed there. Let's analyze these costs separately. It's easy to see that the total Edge Costs are O(n). Just keep a token on each edge of the current CH. When we run BuildTent() we remove some visible edges and add two new edges. Use the tokens on the visible edges that are removed to pay for their removal. Then put a new token on each of the new edges. Thus only two new tokens are needed each time, for a total cost of 2n tokens. What about the Ray Costs? Here's where we use backward analysis. Let's write down the sequence of sites (after the random permutation of P4...Pn). P1 P2 P3 | P4 P5 P6 ... Pi Pi+1 Pi_2 ... Pn \_____________/ \______________/ processed unprocessed In a backward running of the algorithm we pick a random processed point and unprocess it. There are i-3 points to choose from (P1, P2 and P3 are not part of the randomization). The number of rays stored in all the edges of the current CH is one for each unprocessed point, or (n-i). What is the expected Ray Cost of the step? There are i-3 unprocessed sites to choose from to remove. The ones that are interior to the current CH have zero ray cost. The ones on the CH incur a ray cost equal to the number of rays crossing the edge to the left of this site plus the number of rays crossing the edge to the right of this site. (This situation is exactly analogous to the situation we analyzed in removing darts from the quicksort dart board. The role of an empty cell on the board is played by a ray to an unprocessed site.) So among all the options of processed sites to unprocess, each ray is counted twice. (Actually it may be less because of the possibility that some of P1, P2 or P3 are still on the CH, and we're not allowed to choose these.) So the expected ray cost is: R_i = Expected ray cost of unstep i <= 2(n-i) / (i-3) <= 2n/(i-3) E_n = Expected total Ray Cost <= SUM [ 2n/(i-3) ] i = 4...n <= 2nH(n-3) = O(n log n) (Here H(k) is the kth harmonic number). ------------------------------------------------------