Sweeping: ========= The idea is to think of the x axis as being time. As time progresses, we want to keep a data structure that allows us to quickly answer certain queries ("is there a collision") and update the data structure. The key to making this actually implementable is that while time can be thought of as being continuous, interesting events only occur at certain times; the rest of the time we can simply skip over since nothing really changes. Events: ======= In particular, in our application of detecting segment intersections, the events we have are: - the left endpoint ("start") of a segment - the right endpoint ("end") of a segment To get the next event quickly, when we learn of an event, store it on a priority queue. In this application we never learn of any new events, so we could just sort the events. Dealing with events: ==================== Next thing to figure out is what to do when an event occurs. When we reach the start of a segment, we want to check whether that segment intersects with any segment we're currently keeping track of. We only care about those because if we intersect with a segment we haven't reached yet, then when we get to that segment, we'll detect the intersection. When we reach the end of a segment, we remove it from the set of segments we care about. Now, we have a complete algorithm. But how do we check for an intersection? This is going to be very slow, if there are a lot of segments in the set. So we need something better. Quickly checking for intersections: =================================== Notice that if two segments (say a and b) intersect, then just before the intersection, a is the next segment above or below b. [Barring multiple segments intersecting at the same point.] This means that to check for intersection, we only need to check the next segment above and below. This isn't quite right: consider / --------X--------------- ----- / / / So really, we need to check the next segment above and below when we add a segment; and also whenever the relationships change (which only happens when we remove a segment). In other words: - start of segment: check this segment against the next segment above & below, add to the set - end of segment: remove from the set ; check the next-above against the next-below Always, we stop when we discover an intersection. Finally, we need to find the next-above and next-below quickly: - use a balanced search tree: AVL, red-black, splay, treap, ... The "tricky" thing: trees need some way to order the segments. How? Answer: do line-side tests. That is, when we compare the new segment to an old one, see which side of the old segment the start point of the new segment lies on. This provides a total ordering, up until an intersection. But we know we'll detect intersections, so the order would only get messed-up after we stopped the algorithm anyway. Analysis: ========= Now we have a working algorithm: - we know what the events are - we know how to handle them (quickly) What remains is the analysis. This comes down to: how long does each event take to process, and how many events are there? - start events: We insert into a balanced tree, which could have all the segments in it, so this take O(lg n) time. We also need to get the segments before and after. These take O(lg n) time as well. Then, the segment-intersection tests take O(1) time. - end events: We remove from the tree, get successor and predecessor, and do 1 segment-intersection test. O(lg n). - finding the next event: Each segment has a start and an end, so there are 2n events total. This means the event queue has size 2n, so removing an event takes time O(lg n). Each event takes O(lg n) time to process, and there are O(n) events, so the total time is O(n lg n) Finding all the intersections ============================= It's a simple extension to find _all_ the intersections. However, there's a catch: there can be Theta(n^2) intersections since every segment can intersect with every other segment. So we won't be able to guarantee O(n lg n) if there are a lot of intersections; but if there aren't many, we can still do well. This means the running time will be output-sensitive. When we noticed an intersection in the algorithm above, we just quit. Now we'll want to just output it, and keep going. But remember that the ordering of segments in the tree changes when we go through an intersection; thus we need an event during which we'll fix that. We handle the new event as follows: - intersection: Say a and b are intersecting, and a is currently the next-above b. Then a simple thing to do is remove segment a; and put in a new segment that starts at the intersection point but has the same end point. When we compare to b, hack the line-side tests to make sure the new segment a' is put below b. Now, check b against its new next-above, and a' against its new next-below. Ignore the intersection of a' and b. And we fix the previous events to do: - start, end: If we detect an intersection, solve for the intersection point and put an "intersection" event on the event queue. Running time: ------------- Assume there are N total events. Processing an intersection event takes O(lg n) time (remove, insert, next-above, next-below, and some segment-intersection tests) plus the time to insert new intersection events: O(lg N). Processing start/end is the same as before -- O(lg n) -- plus the time to insert a new event: O(lg N). Finding the next event takes time O(lg N). Total: O(N lg N) [assuming N >= n] What's N? There are 2n start and end events, but now there may be up to n^2 intersection events! The O(lg N) term is thus equal to O(lg n). But this could be as bad as O(n^2 lg n). But at the end if we have only k intersections, then N = 2n + k. So having run the algorithm -- or if someone told us how many intersections to expect -- we know that the running time is no worse than O((n+k)lg n). Slightly better: ---------------- If we notice we're getting a lot of intersections, we might want to drop everything and just do the naive algorithm: for each segment, test every other segment. That algorithm costs O(n^2). How do we do this? Count how many collisions we've had. If k gets n^2/lg n, then we know that so far we've expended O( (n^2/lg n) lg n ) = O(n^2) time. But it could grow up to O(n^2 log n) still if we keep going. So stop and do the naive algorithm. Total time: min(n^2, (n+k)lg n)