Let's do some more examples. Frequently, a recurrence looks something like the following:
T(n) = a*T(n/b) + cn T(1) = 1(Here a, b, and c, are positive constants.) In merge sort, for example, a and b would be two, and c would be one.
When you expand this out, you get:
T(n) = a*(a*T(n/b^2) + cn/b) + cn = a*(a*(a*T(n/b^3) + cn/b^2) + cn/b) + cn = cn + c(a/b)n + c(a/b)^2 n + a^3T(n/b^3) = cn[1 + (a/b) + (a/b)^2 + (a/b)^3 + ...]Now there are three cases:
T(n) = cn[1+ (a/b) + (a/b)^2 + ... + (a/b)^(log_b n)]
= Theta(n*(a/b)^(log_b n))
To proceed with simplifying this, we need to use some rules about
logarithms. Note that a^(log_b n) = 2^(log a * log n / log b) =
n^(log a / log b)
T(n) = Theta(a^(log_b n)) because b^(log_b n) = 1/n
= Theta(a^((log_a n)/(log_a b)) because log_b n = (log_a n)/(log_a b)
= Theta(a^((log_a n)(log_b a))) because 1/(log_a b) = log_b a
= Theta(a^(log_a n^(log_b a))) because xlog y = log (y^x)
= Theta(n^(log_b a)) weird, but this is the right answer!
Let's go over array representation of once more. The root is numbered 0, its value is in A[0]. If a node is numbered i, its value is in A[i], Its left child is numbered 2*i+1. Its right child is numbered 2*i+2. Its parent is numbered (i-1)/2.
Here's an example (but it's not a heap):
4
/ \
2 1
/ \ / \
3 6 5 8
This would be represented in an array as
A: {4, 2, 1, 3, 6, 5, 8}
In this array representation, an equivalent definition of a heap to what was presented in lecture is:
A heap is an array such that: for all i=0,1,...,len-1 have:
if 2i+1 < len then A[i] >= A[2i+1] (if you have a left child, it's smaller)
if 2i+2 < len then also have A[i] >= A[2i+2].
In lecture we talked about an important subroutine called downheap. Let's do an example of this working.
Here's a tree that is not a heap.
4
/ \
2 1
/ \ / \
3 6 5 8
We'll call down heap three times to build a heap.
After first call it looks like the following:
4
/ \
2 8
/ \ / \
3 6 5 1
Then, after the second call:
4
/ \
6 8
/ \ / \
3 2 5 1
And after third call:
8
/ \
6 5
/ \ / \
3 2 4 1
Now we have a heap!
We will want to run downheap starting at some node in middle of tree, so that can use to build heap. Here is code:
// call with i as index of thing to downheap. Given this as a
// parameter since will be helpful in creating heap
// len is len of array A
void
downheap(int A[], int len, int i) {
int temp, child;
while((child = 2*i+1) < len) { // there is a child
if(child + 1 < len && A[child] < A[child+1]) ++child; // larger
if(A[i] >= A[child]) return; // we're done.
temp = A[i], A[i] = A[child], A[child] = temp; // swap them
i = child; // go to child
}
}
Suppose you have n records, and you want to know the top k (k < n). (e.g, searching through 10,000 colleges and you want to output the top 100)
How much time as a function of n and k? Could sort in time O(n log n). Or we could build a heap in time O(n) (We said but didn't prove this in lecture) and then do k downheaps to remove top k. This is O(n + k log n), which is better than O(n log n).
Why is the time to build a heap O(n)? Here's the proof:
Let's assume that the tree has n leaves, where n is a power of 2. This means that the tree has a total of 2n-1 nodes.
In our example, n = 4:
8
/ \
6 5
/ \ / \
3 2 4 1
(If the number of leaves is not a power of 2, we can add leaves until
the number is a power of 2, and that will at most double the number of
nodes in the tree.)
Let's define the height of a node to be its distance from the leaves. (Leaves have height 0, root has height log n.) When we call downheap on a node at height i, it does O(i) work.
Thus, buildheap does
O(1n/2 + 2n/4 + 3n/8 + 4n/16 + 5n/32 + 6n/64 + 7n/128 + 8n/256 + 9n/512 + ...) = O(n(1/2 + 2/4 + 3/8 + 4/16 + 5/32 + 6/64 + 7/128 + 8/256 + 9/512 + ... ))We can show that this is convergent in several ways. The easiest, if you know it from calculus, is the ratio test. Another alternative is to group terms:
O(n(1/2 + 2/4 + 3/8 + 4/16 + 5/32 + 6/64 + 7/128 + 8/256 + 9/512 + ... )) = O(n(1/2 + 2/4 + 4/8 + 4/16 + 8/32 + 8/64 + 8/128 + 8/256 + 16/512 +...16/2^16 + ...)) = O(n(1/2 + 1/2 + 12/16 + 120/256 + + 4080/2^16 + ...)) = O(n(1/2 + 1/2 + 1 + 1/2 + + 1/16 + ...)) = O(n)Or you could observe the following:
n < (4/3)^n, so n 3^n < 4^n = 2^n 2^n, so n / 2^n < (2/3)^n, so O(n(sum n/2^n)) = O(n(sum (2/3)^n)) = O(3n) = O(n)
See the heapsort demo in:
/afs/andrew/scs/cs/15-211/handouts/heap_demo.dec