15-451 Algorithms 9/08/04 RECITATION NOTES ======================================================================= Questions on mini or recurrences? Can do one or more examples, depending on how comfortable people are. E.g., T(n) = 3T(n/3) + n. Go through recursion tree -- see how same amount of work is done at each level. Can also do T(n) = 2T(n/3)+n. ======================================================================= Problem 1: Analysis of insertion sort in terms of number of inversions (related to hwk). Insertion-sort works by sorting the first i-1 elements and then inserting the ith element where it belongs among the first i-1. Here is some C code: (think of an array like [4, 2, 5, 3]) void insertionsort(int a[], int n) { int i,j; for (i=1; i=0; j--) { if (a[j+1] >= a[j]) break; swap(a,j,j+1); // swap these elements } } } Let's look at running time in terms of the number of comparisons between elements. (This is the number of times the code "a[j+1] >= a[j]" is executed.) (a) If the initial array consists of n elements in sorted order, how many comparisons are made? Answer: If the array is already sorted, then we just make one comparison for each i (breaking out of the loop). So, the total is n-1 comparisons. (b) What if the array is in descending order? Answer: In this case, the ith element will be compared to everything smaller than it (the inner for-loop never breaks), for a total of i comparisons. So the total is n(n-1)/2. (c) More generally, call a pair of indices (i,j) an inversion if i a_j. Show that the number of comparisons is always between I and I + n-1. Answer: There are a couple of ways to see this. One is: each swap reduces the number of inversions by *exactly* 1 [think about it]. For each i, the number of comparisons is equal to, or one greater than, the number of swaps (depending on whether we break out of the loop or not). So, the number of comparisons is between I and I + n-1. Problem 2: BACKWARDS-ANALYSIS OF QUICKSORT. Here's a kind of bizarre way of analyzing quicksort: look at the algorithm backwards. (This is in the lecture notes). Actually, to do this analysis, it is better to think of a version of Quicksort that instead of being recursive, at each step it picks a random element of all the non-pivots so far, and then breaks up whatever bucket that element happens to be in. This is easiest to think of if you imagine doing the sorting "in place" in the array. If you think about it, this way of thinking is just rearranging the order in which the work happens, but it doesn't change the number of comparisons done (I mean, you probably wouldn't want to implement the algorithm this way because you'd have to keep track of which elements are previous pivots etc). Maybe do an example so that people are comfortable with this version of the algorithm and that it really is doing the same work but in a different order. The reason this version is nice is that if you imagine watching the pivots get chosen and where they would be on a sorted array, they are coming in a completely random permutation. Looking at the algorithm run backwards, at a generic point in time, we have $k$ pivots (producing $k+1$ buckets) and we ``undo'' one of our pivot choices at random, merging the two adjoining buckets. (Remember, the pivots are coming in a completely random order, so, viewed backwards, the most recent pivot is equally likely to be any of the k.) Now, the cost for an undo operation is the sum of the sizes of the two buckets joined (since this was the number of comparisons needed to split them). Notice that for each undo operation, if you sum the costs over all of the $k$ possible pivot choices, you count each bucket twice (or just once if it is the leftmost or rightmost bucket) and get a total of $< 2n$. Since we are picking one of these $k$ possibilities at random, the {\em expected} cost for this undo operation is at most $2n/k$. So, adding up over all undo operations, we get $\sum_k 2n/k = 2n H_n$. This is pretty slick, and not the sort of thing you'd think of trying off the top of your head, but it turns out to be useful in analyzing related algorithms in computational geometry.