Recitation 10/15/97 Three natural topics for today: * midterm - Suggest students work out similar problems when studying for final. * Assign4: Basic idea of data compression. E.g., why does compressor have to be a 1-1 function? (I.e., why can't compress(f1) == compress(f2) if f1 is different from f2?) Go through how some string like "ababababababab" would be compressed and what the tree would look like. Go through uncompression. E.g., "ababababababab" would compress to: 0 a 0 b 1 b 3 a 2 a 5 b 3 and the tree looks like 0 / \ a / \ b 1 2 | | b| |a 3 5 | | a| |b 4 6 * Another example of a divide and conquer algorithm (see below). ============================================================================ Say you don't need to sort, just want to find largest. How would you do it? What if want to find 3rd largest? What if you want to find 17th largest? What if wanted to find median? Could sort and get the middle element. Or, here's a faster way. QUICKSELECT ----------- Goal is to find the kth smallest (k = 1,2,...,n) QUICKSELECT(int arr[], int len, int k) * pick some element (say first one) * split into LESS, EQUAL, GREATER piles, of lengths L1, L2, len - (L1+L2) Say array now looks like this: <--- L1 ---><-- L2 --><-- (len-(L1+L2)) --> -> if k <= L1, where is the thing we're looking for? ---> recurse on (arr, L1, k) otherwise, if k <= L1 + L2, then what? ---> output partition element. otherwise, then what? ---> recurse on GREATER pile (arr+(L1+L2), l3, k - L1 - L2). Idea: split into pieces, takes n work to do it, but then only recurse on one piece. If lucky, split in half. IF each time we split exactly in half, get recurrence: T(n) = n + T(n/2). -> what's solution? n + n/2 + n/4 + n/8 + ... = O(n). Claim: if we pick a RANDOM pivot then we're O(n) time on average. Informal Argument: Let's say we're unlucky and the thing we're looking for is always in the bigger piece. What is the average size of the bigger piece, if we split at random? Imagine I have a candy bar, I split it at random, and keep the bigger piece: what is the expected size of the bigger piece? ---> 3n/4. Reason: Could be n/2, n/2+1, ..., n: all equally likely. So, even if always recurse on bigger piece, still on average get to throw out 1/4. So, intuitively, cost is at most: T(n) = n + T(3n/4) = n + n*(3/4) + n*(3/4)^2 + ..., which is a geometric series -> n*(1/(1 - 3/4)) = 4n. --------------------------------------------