Outline for Recitations, Oct 20, 1997 o Recurrences o Heaps Recurrences Let's do some more examples. A lot of times, we'll see recurrences like this: T(n) = a*T(n/b) + cn T(1) = 1 (Here a,b, and c, are constants > 0.) Example: mergesort. a = 2, b = 2. (in general this corresponds to taking O(n) work to split into a pieces of size n/b) When you expand this out, you get: T(n) = a*(a*T(n/b^2)+cn/b)+cn = a*(a*(a*T(n/b^3)+cn/b^2)+cn/b)+cn = cn + c(a/b)n + c(a/b)^2 n + a^3T(n/b^3) = cn[1 + (a/b) + (a/b)^2 + (a/b)^3 + ...] What if a convergent series. Get cn*/(1 - a/b) = Theta(n) What if a=b? --> all terms equal. So get cn*(# of terms) = cn * log_b(n) = Theta(n log n). What if a>b? --> again this is a geometric series, but now the last term is largest T(n) = cn[1+ (a/b) + (a/b)^2 + ... + (a/b)^(log_b n)] = Theta(n*(a/b)^(log_b n)) To get any farther, we'll have to use some rules about logarithms: (You may choose not to do this.) = Theta(a^(log_b n)) because (1/b)^(log_b n) = 1/n = Theta(a^[(log_a n)*(log_b a)]) = Theta(n^(log_b a)) weird, but this is the right answer! HEAPS ----- -> go over array representation of trees. The root is numbered 0, its value is in A[0]. If a node is numbered i, its value is in A[i], its left child is numbered 2*i+1 its right child is numbered 2*i+2 its parent is numbered (i-1)/2 Example: 4 / \ / \ / \ 2 1 / \ / \ / \ / \ 3 6 5 8 A: {4, 2, 1, 3, 6, 5, 8} Given this, an equivalent defn of a HEAP to what was presented in lecture is: Array such that: for all i=0,1,...,len-1 have: if 2i+1 < len then A[i] >= A[2i+1] (if you have a left child, it's smaller) if 2i+2 < len then also have A[i] >= A[2i+2]. Worth knowing: heapsort really works as an algorithm for sorting numbers in an array. The "tree" is just in our head. In lecture we talked about important subroutine "downheap" or "siftUp": Given a subtree that is "almost a heap" in that the *only* problem is that the root is too small (some child is larger than it), it swaps the root with the larger child, and then keeps going down the tree until we either get to a leaf or to a point where the current node is >= its children. Heapsort algorithm: (1) build heap by doing this bottom up (start at rightmost non-leaf and go backwards in array). (2) swap root with rightmost leaf -- now viewing that leaf as no longer part of the heap -- and apply downheap/siftUp at the root to fix up the heap (and repeat until the heap is gone). Example of creating a heap: Here's a tree that is not a heap. 4 / \ / \ / \ 2 1 / \ / \ / \ / \ 3 6 5 8 We'll call downheap/siftUp three times to build a heap. First we call on the 1. 4 / \ / \ / \ 2 8 / \ / \ / \ / \ 3 6 5 1 Then we call on the 2. 4 / \ / \ / \ 6 8 / \ / \ / \ / \ 3 2 5 1 Now we call on the 4. 8 / \ / \ / \ 6 5 / \ / \ / \ / \ 3 2 4 1 Here is code in case you want it: // call with i as index of thing to downheap. Given this as a // parameter since will be helpful in creating heap // len is len of array A void downheap(int A[], int len, int i) { int temp, child; while ((child = 2*i+1) < len) { // there is a child if (child +1 < len && A[child] < A[child+1]) ++child; // larger if (A[i] >= A[child]) return; // we're done. temp=A[i]; A[i] = A[child]; A[child] = temp; // swap them i = child; // go to child } } void heapsort(int A[], int len) { int i,temp; /* make heap. Start at rightmost non-leaf, go backwards */ for(i = len/2 - 1; i >=0; --i) downheap(A,len,i); /* make into sorted array */ for(i=len-1; i > 0; --i) { temp = A[i]; A[i] = A[0]; A[0] = temp; // SWAP downheap(A, i, 0); // heap of size i } } Digression: Suppose you have n records, and you want to know the top k (k < n). (e.g, searching through 10,000 colleges and you want to output the top 50) How much time as a function of n and k? Could sort in time O(n log n) Or, build heap in time O(n) [In lecture we said it was O(n) but did we go through the proof?] and k times do "remove root, replacing it with the rightmost leaf and then downheap/siftUp" to remove top k. -> O(n + k*log(n)) ==> faster than O(n log n). Why is the time to build a heap O(n)? Here's the proof: -> show for complete binary tree. (Implies true even for tree where last level is only partially filled.) Run downheap on each non-leaf node (about n/2 or more precisely, (n-1)/2, of these). Cost is at worst proportional to distance of node to leaf (might have to swap all the way down). Say H(i) = distance of node i to leaf. Then cost is O( Sum_i H(i) ). So what is this? Idea: distance of node i to leaf = 1 + distance to leaf in tree that's one level shallower (and has (n-1)/2 nodes). T(n) = (n-1)/2 + T((n-1)/2) < n/2 + T(n/2) ==> O(n).