- prove worst case lower bound of sorting is O(n logn) - prove upper and lower bound for max and min of n elements when determining upper and lower bounds we don't have exact an algorithm in mind, but rather are trying to determine what we should aim for. Today we look at lower bound for sorting and upper and lower bound for a search problem. We know that sorting is going to take at least Omega(n) time, because the algorithm is going to have to look at all the elements in order to be able to sort them. To get beyond that trivial bound, we have to restrict the ground rules that the hypothetical sorting algorithms are going to use. One way to make progress is to limit the class of algorithms to _comparison based_ algorithms which are only allowed to evaluate the elements to be sorted by doing comparisons among them. This is as opposed to more general _address computation_ methods, that are allowed to make full use of the data being sorted in the computation. (We'll see examples of both of these methods later.) The unix qsort is comparison based. It knows nothing about the elements being sorted, except that you've given it a function to compare two elements. Likewise, selection and insertion sort are comparison based. Now we can ask the question: is there a lower bound (larger than n-1) of the number of comparisons that are required to sort? The answer is YES. Before proving lower bound need a small lemma: If binary tree has l leaves, then depth is >= log_2(l). proof: by induction base case easy, l=1. depth=1. assume true for l leaves prove for l+1. two cases: log_2(l+1) == log_2(l) and log_2(l+1) > log_2(l). now prove lower bound. Theorem: The worst-case number of comparisons done by any comparison-based sorting algorithm on an array of n elements is at least log_2(n!). Note: log_2(n!) is (by Stirling's approximation) (n + .5) log_2 (n) + O(1) To prove this we need to construct a decision tree. Lets assume an input of n items (x_1, x_2, ..., x_n) and each of the items is unique. The binary (binary, since items are distinct) decision tree describes the comparisons made by the sorting algorithm. So, a path from the root to a leaf has all the comparisons carried out by our sorting algorithm and thus defines the actions carried out to sort a particular input. Tree is defined inductively. if node is i:j, then - x_i is compared with x_j. - root of left subtree comparison when x_i < x_j - root of right is when x_j < x_i. Example of binary decision tree for 3 elements: 1:2 /\ / \ / \ / \ / \ 2:3 1:3 /\ /\ / \ / \ / \ b,a,c 2:3 a,b,c 1:3 /\ /\ / \ / \ / \ a,c,b c,a,b b,c,a c,b,a where a is x_1, b is x_2, and c is x_3. Proof: The decision tree is a binary tree. Every different permutation of the inputs must lead to a different leaf of the decision tree, because if two different permutations lead to the same leaf, one of them would have to be incorrectly sorted by the algorithm. So the decision tree has at least n! leaves. Why? (permutation) Thus, depth of decision tree is >= log_2(n!) and therefore in the worst case it must do at least log_2(n!) comparisons. [Note: why is log(n!) = Theta(n log n)? if you don't want to use Stirling's approx, here is an argument from first principles: The easy direction is log(n!) < log(n^n) = n*log(n). But, why is log(n!) = Omega(n log n)? Here's one answer: use the fact that log(a*b) = log(a) + log(b) So, log(n!) = log(n) + log(n-1) + log(n-2) + .... + log(2) + log(1) > log(n) + log(n-1) + ... + log(n/2) [i.e. just look at top n/2 terms] > n/2 * log(n/2) = n/2 * [log(n) - 1] = Omega(n log(n)) ] ---------------------------------------------------------------- Lets look at how many comparisons it takes to find both min and max of n elements. easy algorithm (first max, then min) takes 2n-3 comparisons. Can we do better? how about comparing in pairs, then comparing winners (maxs) and losers (mins)? upper bound: assume n is even, n=2k. Represent state of the algorithm by quad-tuple (A,B,C,D). A = never compared B = winners (maxes) C = losers D = winners and losers start of algorithm all elements are in A. 1 - do k comparisons, then 0 in A, k elements in B & C, 0 in D. 2 - max is in B (k-1 comparisons) 3 - min is in C (k-1 comparisons) -> max & min in k+2(k-1) = 3k-2 = intup(3n/2)-2. similarly for n is odd. lower bound: each comparison, x= 2n-2 & X <= intdown(n/2). given the constraints we want to make X as big as possible. So set X=intdown(n/2). (lets assume n is even to make notation easier). X=n/2 -> Y >= 2n-2-2X -> Y >= 2n-2-n-> Y >= n-2 -> X+Y >= n/2 + n - 2. QED. ----------------------------------------------------------------