# Analysis of Algorithms

 Introduction In this lecture we will study various ways to analyze the performance of algorithms. Performance concerns the amount of resources that an algorithm uses to solve a problem of a certain size: typically, we will speak about solving a problem using an array of size N or a linked list with N nodes in it. In addition, we will mostly be concerned with "worst-case" performance; other possibilities are "best-case" (simple, but not much useful information here), and "average-case" (too complicated mathematically to pursue in an introductory course). Finally, we will mostly be concerned with the speed (time, as a resource) of algorithms, although we will sometimes discuss the amount of storage that they require too (space, as a resource: we can also talk of worst-case, best-case, and average case). We want to be able to analyze algorithms, not just the methods that implement them. That means we should be able to say something interesting about the performance of an algorithm independent from having a version of it (written in some programming language) that a machine can execute. Once we examine machine-executable versions, there is a lot of technology details to deal with: what language we write the code in, which compiler we use for that language, what speed processor we run it on, how fast memory is (and even how much caching is involved). While this information is important to predict running times, it is not fundamental to analyzing the algorithms themselves. So, we will analyze algoritms independent of technology, making this subject more scientific. Instead, we will analyze an algorithm by predicting how many steps it takes, and then go through a series of simplifications leading to characterizing an algorithm by its complexity class. Although it initially may seem that we have thrown out useful information, we will learn how to predict running times of methods on actual machines, using an algorithm's complexity class, and timing it on the machine that it will be run on.

 Analyzing Algorithms: From Machine Language to Big O Notation In this section we will start with a very concrete and technological approach to analyzing algorithms -by looking at how Java compiles such code to machine language- and then generalize to a science that is independent of such technology. First, suppose that we invent a mathematical function Iaw(N) that computes the number of machine code instructions executed by algorithm a when run on the worst-case problem of size N. Such a function takes an integer as a parameter (N, the problem size) and returns an integer as a result (the number of machine language instructions executed).
 Block A initializes max. Block B initializes i in the for loop; not the branch leading to testing for termination, which is near the bottom. Block C compare a[i] to max, either falling through to Block D which updates max or skipping this block. Block E increments i. Block F tests whether the loop should terminate or execute the body (again). We can compute the exact number of instructions that are to be executed for any inputs. For simplicity, let us assume that all the array values are bigger than the smallest integer (used to initialize max. If the array contains 0 values, 9 instructions are executed: blocks A, B, F. If the array contains 1 value, 23 instructions are executed: blocks A, B, F, C, D, E, F. If the array contains 2 values, either 33 instructions are executed (the first value is bigger than the second: blocks A, B, F, C, D, E, F, C, E, F) or 37 instructions are executed (the second value is bigger than the first - the worst case: A, B, F, C, D, E, F, C, D, E, F). Assuming the worst case from now on, if the array contains 3 values, 51 instructions are executed. ... Thus, for this example -the code to compute the maximum value in an array- we can write the formula. Iaw(N) = 14N + 9. At most there are 14 instructions executed during each loop iteration (blocks C, D, E, F); the housekeeping to initialize max and initialize i and check the first loop iteration requires 9 instructions (blocks A, B, F). In fact, computing Iab (the number of steps in the best case, where the if test is true only on the first iteration, is Iab(N) = 14N + 9 - 4(N-1) = 10N + 13 because the 4 instructions updating the max (block D) are never executed AFTER the first update; this formula works only when N>0 Thus, the actual number of instructions has a lower bound of 10N + 13 and an upper bound 14N + 9 (when N>0). Here N is a.length (the number of values stored in the array), and the worst-case run will be on an array of strictly increasing values: the if test executed during each iteration of the loop is repeatedly true, so the machine instructions to copy that value into max (block D) are always executed. Determining the average case is a problem in discrete math: given a random distribution (there are many; say the numbers are distributed uniformly) of values, how many times (on average) do we expect to execute block B, meaning the next value to be bigger than all the prior ones; this is not a simple problem but we can write programs to help understand it. Let's return our focus to Iaw(N) = 14N + 9. Although this formula is simple, we want to make it even simpler if we can. Note that as N gets large (and most algorithmic analysis is asymptotic: it is concerned with what happens as the problem size N gets very large), the lower order term (9) can be dropped from this function to simplify it (less precision), without losing much accuracy. For example, if N is 100, Iaw(N) = 1,409: if we drop the 9 term, the simplified answer is just 1,400 which is 99.3% of the correct answer; if we increase N to 1,000, Iaw(N) = 14,009: if we drop the 9 term, the simplfied answer is just 14,000 which is 99.94% of the correct answer; if we increase N to 10,000, Iaw(N) = 140,009: if we drop the 9 term, the simplfied answer is just 140,000 which is 99.994% of the correct answer. Thus as N gets large (and 10,000 is not even a very large problem for computers) the lower term is not significant so we will drop it to simplify the formula to Iaw(N) = 14N. Mathematically, if Td(N) is the dominant term (here 14N, we can drop any term T(N) if T(N)/Td(N) -> 0 as N -> infinity: note that 9/14N -> 0 as N -> inifinity, so that term can be dropped.
 Sort For another example, think about sorting an array. We can use the following simple to code but inefficient algorithm. for (int base=0; basea[check]) { int temp = a[base]; a[base] = a[check]; a[check] = temp; } The code for this example leads to the following basic blocks
 Assume that for the worst-case input, every time two values in the array are compared, they are found to be in the wrong order and must be swapped. The following right side of an EBNF rule models the correct order of execution of basic blocks: AH{BF{CDEF}GH}, with the restriction that he inner repetition happens one fewer times than the outer repetition. If the array contains 0 values, 7 instructions are executed: blocks A, H. If the array contains 1 value, 21 instructions are executed: blocks A, H, B, F, G, H. If the array contains 2 values, 63 instructions are executed: blocks A, H, B, F, C, D, E, F, G, H, B, F, I, H. If the array contains 3 values, 133 instructions are executed The resulting instruction-counting function is Iaw(N) = 28N(N-1)/2 + 14N +7 = 14N2 + 7 For example, if N is 100, Iaw(N) = 140,007: if we drop the 7 term, the simplified answer is just 140,000 which is 99.995% of the correct answer. Thus as N gets large (and 100 is a tiny problem for computers) the lower term is not significant, so we will drop it to simplify the formula to Iaw(N) = 14N2. Recall our terms T(N) can be dropped if the limit T(N)/Td(N) -> 0 as N -> infinity: note that 7/14N2 -> 0 as N -> inifinity, so that term can be dropped. Now, let's take a look at the constant in front of the dominant term; it's value doesn't really matter for three important reasons. And, by getting rid of it we can simplify the formula again. First, when computing the run time we are just going to multiply this constant by another constant relating to the machine, so we can discard this constant if we just use a different constant for the machine. Let Taw(N) denote the time taken by algorithm by algorithm a when run on the worst-case problem of size N. If our machine executes 2 billion instructions/second, then the formula in the second case is Taw(N) = 14N2/2x109, or Taw(N) = .000000014N2. Second, a major question that we want answered is how much extra work does an algorithm do if the size of a problem is doubled. Note that for the second simplified version of Iaw (the one with only the dominant term multiplied by a constant), Iaw(2N)/Iaw(N) = 14(2N)2 / 14N2 = 4 (meaning doubling the problem size quadruples the number of instructions that this algorithm executes), so the constant is actually irrelevant to the computation of this ratio. Third, a major question that we want answered is how the speed of two algorithms compare: specifically, we want to know whether Iaw(N)/Ibw(N) -> 0 as N -> infinity, which would mean that algorithm a gets faster and faster compared to algorithm b. Again, the constant is irrelevant to this calculation.
 Big O Notation By ignoring all the lower-order terms and constants, we would say that algorithm a is O(N2), which means that the growth rate of the work performed by algorithm a (the number of instructions it executes) is on the order of N2. This is called big O notation, and we use it to specify the complexity class of an algorithm. Big O notation doesn't tell us everything that we need to know about the running time of an algorithm. For example, if two algorithms are O(N2), we don't know which will eventually become faster). And, if one algorithm is O(N) and another is O(N2), we don't know which will be faster for samll N. But, it does economically tell us quite a bit about the performance of an algorithms (see the three important questions above). We can compute the complexity class of an algorithm by the process shown above, or by doing something much simpler: determining how often its most frequently executed statement is executed as a function of N. Returning to our first example, int max = Integer.MIN_VALUE; for (int i=0; i max) max = a[i]; the if statement is executed N times, where N is the length of the array: a.length. Returning to our second example, for (int base=0; basea[check]) { int temp = a[base]; a[base] = a[check]; a[check] = temp; } the if statement is executed about N2 times, where N is the length of the array: a.length. It is actually executed exactly N(N-1)/2 times: for the first outer loop iteration it is executed N-1 times; for the second N-2 times, ... for the last 0 times. Know that 1+2+3+...+ N = N(N+1)/2, so 1+2+3+...+N-1 = N(N-1)/2 or N2/2 - N/2. Dropping the lower terms an constant yields N2. Finally, note that comparing algorithms by their complexity classes is useful only for large N. We cannot state authoritatively whether an O(N) algorithm or an O(N2) algorithm is faster for small N; but we can state that once we pass some threshold for N (call it N0), the O(N) algorithm will always be faster than the O(N2) algorithm. This ignorance is illustrated by the picture below.
 In this example, the O(N) algorithm takes more time for small N. Of course, by adjusting constants and lower-order terms, it could also be the case that the O(N) algorithm is always faster; we cannot tell this information solely from the complexity class. Technically, an algorithm a is O(f(n)) if and only if there exist a number M and N0 such that Iaw(n) <= Mf(n) for all n>N0 This means for example that any O(N) algorithm is also O(N2) too (or O(f(n)) for any f(n) that grows faster than linearly). Technically Ω is the symbol to use when you know a tight bound on both sides: if there exists M1, M2, and N0 such that M1f(n) <= Iaw(n) <= Mf(n) for all n>N0, we say that algorithm a is Ω(f(n)). We will use just Big O notation, often pretending it is Ω. See Big O Notation in the online Wikipedia for more details.

Complexity Classes Using big O notation, we can broadly categorize algorithms by their complexity classes. This categorization supplies one kind of excellent information: given the time it takes a method (implementing an algorithm) to solve a problem of size N, we can easily compute how long it would take to solve a problem of size 2N.

For example, if a method implementing a certain sorting algorithm is in the complexity class O(N2), and it takes about 1 second to sort 10,000 values, it will take about 4 seconds to sort 20,000 values. That is, for complexity class O(N2), doubling the size of the problem quadruples the amount of time taken executing a method. The algebra to prove this fact is simple. Assuming Taw(N) = c*N2 (where c is some technology constant related to the compiler used, the speed of the computer and its memory, etc.) the ratio of the time taken to solve a problem of size 2N to the time take to solve a problem of size N is.

Taw(2N)/Taw(N) = c*(2N)2 / c*(N)2 = 4cN2 / cN2 = 4
As we saw before, the constants are irrelevant: they all disappear no matter what the complexity class.

Likewise, using this method to sort 1,000,000 values (100 times more data) would take about 2.8 hours (that is 10,000 times longer, which is 1002).

Here is a short characterization of some common complexity classes (there are many others: any expression that is a formula using N). We will discuss some of these algorithms in more detail later in this handout, and use these complexity class to characterize many methods throughout the semester.

Complexity ClassClass NameExampleT(2N)
O(1) Constant Insertion at the rear of an array
Insertion at front of linked list
Parameter passing
T(N)
O(log2N) Logarithmic Binary Search T(N)+constant
O(N) Linear Linear Search (arrays or linked lists) 2T(N)
O(N log2N) Log-Linear or
Linearithmic
Fast Sorting 2T(N)+constant
O(N3) Cubic NxN Matrix Multiplication 8T(N)
O(NC) Polynomial or
Geometric
2cT(N)
... ... ... ...
O(CN) Exponential Proving Boolean Equivalences of N variables CNT(N)

We can compute log2N = (ln N)/(ln 2) = 1.4427 ln N. Since log base 2 and log base e are linearly related, it really makes no difference which we use when using Big O notation, because only the constants (which we ignore) are different. You should also memorize that log21000 is about 10 (actually log21024 is exactly 10), and that log2Na = alog2N. From this fact we can easily compute log21,000,000 as log21,0002 which is 2log21000 which is about 20. Do this for log21,000,000,000 (one billion).

Again, we should understand that these simple formulas work only when N gets large. This is the core of asymptotic algorithmic analysis Note that complexity classes before (and including) log-linear are considered "fast": their running time does not increase much faster than the size of the problem increases. The later complexity classes O(N2), O(N3), etc. are slow but "tractable". The final complexity class O(2N) grows so fast that it is called "intractable": only small problems in this complexity class can ever be solved.

For example, assume that Ia1w(N) = 10 (constant), and Ia2w(N) = 10log2N (logrithmic), and Ia3w(N) = 10N (linear), etc. Assume further that we are running code on a machine executing 1 billion (109) operations per second. Then the following table gives us an intuitive idea of how running times for algorithms in different complexity classes changes with problem size.
Complexity ClassN = 10N = 100N = 1,000 ...N = 1,000,000
O(1) 1x10-7
seconds
1x10-7
seconds
1x10-7
seconds
... 1x10-7
seconds
O(log2N) 3.3x10-7
seconds
6.6x10-7
seconds
10x10-7
seconds
... 20x10-7
seconds
O(N) 1x10-7
seconds
1x10-6
seconds
1x10-5
seconds
... 1x10-3
seconds
O(Nlog2N) 3.3x10-7
seconds
6.6x10-6
seconds
10x10-5
seconds
... 20x10-3
seconds
O(N2) 1x10-6
seconds
1x10-4
seconds
1x10-2
seconds
... 2.7
hours
O(N3) 1x10-5
seconds
1x10-2
seconds
10
seconds
... 3x103
years
O(2N) 1x10-5
seconds
4x1021
centuries

 Time Estimation Based on Complexity Class Up until this point we have continually simplified information about algorithms to make our analysis of them easier. Have we strayed so far from reality that our information is useless. No! In this section we will learn how we can easily and accurately (say, within 10%) predict how long it will take a method to solve a large problem size, if we know the complexity class for the method, and have measured how long the method takes to execute for some large problem size. Notice both the measured and predicted problem sizes must be reasonably large, otherwise the simplifications used to compute the complexity class will not be accurate: the lower order terms will have a real effect on the answer. For a first example, we will measure, and then predict, the running time of a simple, quadratic sorting method. We will use a driver program (discussed below, in the Sorting section) to repeatedly sort an array containing 1,000 random values, and then predict how long it will take this method to sort an array containing of 10,000 random values (and actually compare this prediction to the measured running time for this problem size). Because this sorting method is in the O(N2) complexity class, we simply assume that we can write T(N) = cN2 where we do not know the value of c yet. We run the sorting method five times on an array containing 1,000 random values and measure the average running time: it is .022 seconds. Now we solve for c. Using N = 1000 we have T(1000) = c 10002 .022 = c 106 c = .022/106 c = 2.2 x 10-8 Thus for large N, T(N) = 2.2x10-8 N2 seconds. Using this formula, we can predict that using this method to sort an array of 10,000 random values would take about 2.2 seconds. The actually amount of time is about 2.7 seconds. The prediction is 100[1-(2.6-2.2)/2.6] or 85% accurate (so, we barely missed our goal of 90% accuracy). It would be more accurate if we measured this sort on a 10,000 value array and predicted the time to sort a 100,000 value array. For a second example, we wil measure, and then predict, the running time of a more complicated log-linear sorting method (this algorithm is in the lowest complexity class for all those that accomplish sorting). We will use a driver program to repeatedly sort an array containing 100,000 random values, and then predict how long it will take this method to sort an array containing of 1,000,000 random values (and actually compare this prediction to the measured running time for this problem size, which is small enough to measure). Because this sorting method is in the O(N log2N) complexity class, we simply assume that we can write T(N) = c(N Log2N) where we do not know the value of c yet. We run the sorting method five times on an array containing 100,000 random values and measure the average running time: it is .15 seconds (notice that this method sorts 10 times as many values over 10 times faster than the simple quadratic sorting method on the same amount of data). Now we solve for c. Using N = 10,000 we have T(100,000) = c (1,000,000 log 2 1,000,000) .15 = c 1,660,964 c = .15/1,660,964 c = 9.0 x 10-9 Thus for large N, T(N) = 7.8x10-8(N log 2N) seconds. Using this formula, we can predict that using this method to sort an array of 1,000,000 random values would take 1.6 seconds. The actually amount of time is about 1.8 seconds. The prediction is 100[1-(1.8-1.6)/1.6] or 87% accurate (so, we again missed our goal of 90% accuracy, but only barely). Here is a final word on the accuracy of our predictions. If we sort the exact same array a few times (the sort testing driver easily does this) we will see variations of 10%-20%; likewise we get a slightly greater spread if we sort different arrays (but all of the same size). Our model predicts that these would all take the same amount of time. So all kinds of things (operating system, what programs it is running, what network connections are open, etc.) influence the actually amount of time taken to sort an array. In this light, the accuracy of our "naive" predictions is actually quite good.

 Determining Complexity Classes Empirically We have seen that it is fairly simple, given an algorithm, to determine its complexity class: determin how often its most frequently executed statement is executed as a function of N. But what if even that is too hard: it is too big or convoluted. Well, if we have a method implementing the algorithm, we can actually time it on a few different-sized problems and infer the complexity class from the data. First, be aware that the standard timer in Java is accurate to only .001 second (1 millisecond). Call this one tick So, to get any kind of accuracy, you should run the method on large enough data to take tens to hundreds of ticks (milliseconds). So, run the method on some data of size N, enough for the required number of ticks, then of size 2N, then of size 4N, then of size 8N. For algorithms in simple complexity classes, you should be able to recognize a pattern (which will be approximate by not exact). If the sequence of values is 1.0 seconds, 2.03 seconds, 3.98 seconds, and 8.2 seconds: the method seems O(N). Here each doubling approximately doubled the time the method ran. If the sequence of values is 1.0 seconds, 3.8 second, 17.3 seconds, and 70.3 seconds: the method seems O(N2). Here each doubling approximately quadrupled the time the method ran. Of course, things get a bit subtle for a complexity class like O(Nlog2N), but you'll see it always a bit worse than linear, but nowhere near quadratic. Of course O(Nlog22N) would behave simlarly, so you must apply this process with a bit of skepticism that you are computing perfect answers.

 Searching: O(N) and O(log2N) Algorithms Linear seaching, whether in an array or in a linked list, is O(N); in the worst case (where the value being searched for is not in the data structure), each value in the data structure must be examined (the inner if statement must be executed N times). public static int linearSearch (int[] a, int value) { for (int i; i high) //low/high bounds inverted, so return -1; // the value is not in the array int mid = (low+high)/2; //Find middle of the array if (a[mid] == value) //Found value looking for, so return mid; // return its index; otherwise else if (value < a[mid]) //determine which half of the high = mid-1; // array potential stores the else // value and continue seraching low = mid+1; // only that part of the array } The following illustration shows how this method executes in a situation where it finds the value it is searching for. Notice how it converges on those indexes in the array that might store the searched for value.
 The following illustration shows how this method executes in a situation where it does not find the value it is searching for.
 Again, each iteration of the loop reduces the part of the array being looked at by a factor of two. How many times can we reduce a size N array before we are left with a single value? log2 N (the same number of times we can double the size of an array from 1 value to N). Finally, note that we cannot perform binary searching efficiently on linked lists, because we cannot quickly find the middle of a linked list. In fact, another self-referential data structure, trees, can be used to perform efficient searches.

 Sorting: O(N2) and O(N log2N) Algorithms Sorting is one of the most common operations performed on an array of data. We saw in the previous section how sorting an array allows it to be searched much more efficiently. Sorting algorithms are often divided into two complexity classes: simple to understand algorithms whose complexity class is O(N2) and more complicated algorithms whose complexity class is O(N log2 N). The latter are much faster than the former for large arrays (see the Time Estimation section, which discussed two such sorting algorithms, for an example). The fast one was the Arrays.sort method which sorts any array of objects efficiently: it implements an O(Nlog2N) algorithm with a small constant. Here is a brief description of three O(N2) sorting algorithms. In bubble sort, a next position to fill is compared with all later positions, swapping out-of-order values. In selection sort, the smallest value in the remaining positions is computed and swapped with the value in the next position. In insertion sort, the next value is moved backward (swapped with the value in the previous position in the region of sorted values) until it reaches its correct position. These algorithms are arranged in both simplest-to-most-complicated order, as well as slowest-to-fastest order for large N. Here is a brief description of three O(N log2 N) sorting algorithms. In merge sort, pairs of small, adjacent ordered arrays (the smallest are 1 member arrays) are merged repeated into larger ordered arrays until the result is just one ordered array containing all the values. In heap sort, values are added to and then removed from a special kind of tree data structure called a heap (which we will study later: its add and remove operations are both O(log2 N), so adding and then removing N values is NxO(log2 N) + NxO(log2 N) = O(N log2 N) total. In quick sort, a pivot value is chosen and then the array is partitioned into three regions: on the left are those values less than the pivot, in the middle are those equal to the pivot, and on the right are those values greater than the pivot; then this process is repeated in the left and right regions (if they contain more than one value). Heap sort is slower than merge sort, but it takes no extra space (merge sort requires another array that is as big as the array being sorted). Technically, Quick sort is O(N2). But on most arrays itis the fastest (and requires no extra space), but on pathologicallyh bad arrays, which are rare, it can take much longer to execute than the other methods. All these sorting algorithms are defined as static methods in the Sort class. All method have exactly the same prototype (so they can be easily interchanged) public static void bubble (Object[] a, int size, Comparator c) which includes An array of Object references to be sorted. An int specifying how many references are stored in the array; this can be a.length if the array is filled. An object from a class implementing Comparator, which decides which objects belong before which others in the sorted array. A driver for testing the performance of these sorting methods is in the Sorting Demo application. This application includes both the source code for testing these sorting methods, as well as Arrays.sort which actually runs slower than my fastest method: quicksort. I didn't expect that finely tuned system code to be that slow. You can examine this source code for this method and compare it to my fast sorting methods for clarity and performance (I'm going to). Finally, it has been proven that when using comparisons to sort values, all algorithms require at least O(N log2 N) comparisons. Thus, there are no general sorting algorithms in any complexity class smaller than log-linear (although better algorithms -ones with smaller constants- may exist).