15-451 Algorithms 8/29/01 RECITATION NOTES ======================================================================= The plan for today is to engage the class in solving a couple of homework-style problems. I've given two problems here. You can use these, or others. - Balance factor (this is pretty easy, but a good warmup) Given a binary tree, the "balance factor" at a node is the height of the left subtree minus the height of the right subtree. E.g., 0 o / \ 2 o o -2 / \ 1 o o -1 / \ 0 o o 0 Say we want to find the balance factors at all the nodes (this will tell us how "well-balanced" the tree is at various places). How can we do this quickly? Idea: in addition to computing the balance factor, let's also compute the height. We can compute the pair (height, balance-factor) for a node x by recursively computing it for the left child and the right child and then using the results to construct the pair for x. Another natural algorithm is to first compute the heights of all the nodes and then make a second scan to compute the balance factors given the heights. Running time is O(n), where n is the number of nodes. Trick here was to compute *more* information than we were asked to, in order to make our computation easier. This will come up later when we talk about Dynamic Programming. - More on balance factor: There's a balanced search tree data structure called "AVL trees" that keeps all balance factors between -1 and +1. We're not going to talk about this particular method in class, but the high level idea is that if you required the tree to be *perfectly* balanced, then each insert might force you to make all sorts of changes in the tree to maintain your invariant; but, by allowing this slight amount of slop, you can do the updates with only a constant factor extra work. [We'll see this as a general kind of technique: getting a fast algorithm by relaxing the requirements a little]. But, the question I want to focus on now is: what does having all balance factors between -1 a and +1 say about the *height* of the tree? Can we guarantee a height of O(log n) for an n-node tree? Easier: instead of looking at h(n), let's define n(h) to be the *minimum* number of nodes in a tree of height h. E.g., if we can show that n(h) > 2^{h/2}, then that means a tree of n nodes can have height at most 2*lg(n). Ideas? If tree has height h, then one of subtrees of root has height h-1 (by defn of height) and the other has height at least h-2 (by the balance property). So, n(h) > n(h-1) + n(h-2). This is Fibonacci..., but as a crude bound, we have n(h) > 2*n(h-2). Every time h goes down by 2, we multiply by 2. So this gives us n(h) >= 2^{h/2} if we define a singlroot with no children as having height 0 (to get the base case). - Insertion sort. (This is from last year's hwk1) Insertion-sort works by sorting the first $i-1$ elements and then inserting the $i$th element where it belongs among the first $i-1$. Here is some C code: (think of an array like [4, 2, 5, 3]) void insertionsort(int a[], int n) { int i,j,v; for (i=1; i=0; j--) { if (v >= a[j]) break; a[j+1] = a[j]; } a[j+1] = v; } } Let's look at running time in terms of the number of comparisons between elements. (This is the number of times the code "v >= a[j]" is executed. (a) If the initial array consists of $n$ distinct elements in decreasing order, how many comparisons are done? Solution: We can see from the code that each time the counter {\tt i} is incremented, {\tt a[i]} will need to move to the beginning of the list (why?). Thus the comparion {\tt (v >= a[j])} will need to be executed $i$ times for $i = 0, 1, 2, \ldots n-1$, for a total of $$\sum_{i=1}^{n-1} i = \frac{n(n-1)}{2}$$ comparisons. (b) If the initial array consists of $n$ distinct elements in increasing (sorted) order, how many comparisons are done? Solution: If the array is already sorted, then each time {\tt i} is incremented, we will perform a single comparison and break out of the inner loop. Since {\tt i} is incremented exactly $n - 1$ times, we perform $n - 1$ comparisons. (c) Call a pair of indices (i,j) an INVERSION in an array A if i < j but a_i > a_j. Can we bound the number of comparisons made by insertion sort in terms of the number of inversions in the array? The claim is that the number of comparisons C satisfies I <= C <= n+I-1, where I is the number of inversions. Proof 1: It will help for this answer to think of the code in a slightly different arrangement. Rather than using an extra variable to hold {\tt a[i]} and walking back through the array moving the {\tt a[j]} forward, we could just swap it with the element before it (i.e., executing {\tt swap(a, j+1, j);} in place of {\tt a[j+1] = a[j];} in the code) until {\tt (v >= a[j])} is {\tt TRUE}. Then each swap reduces the number of inversions by one; each comparison is {\tt TRUE} at most once per value of {\tt i} and results in a swap iff it is {\tt FALSE}. Thus the total number of comparisons is at least $I$ and at most $I + n - 1$, establishing the required bound. Proof 2: Here is an alternative way to do it: associate each inversion with the rightmost of the two elements in the original ordering. In each iteration through the outer for-loop, we remove exactly those inversions associated with $a[i]$. The number of comparisons made in this iteration is either equal to, or one greater than, the number of elements to the left of $a[i]$ that were larger than $a[i]$ --- which is exactly the number of inversions associated with $a[i]$.