Lecture 9/16/97 o Analyzing and understanding 1 handout. programs - old quiz 1 o loop invariants Reminder: Quiz Thurs. open book. 1 8.5 x 11 sheet of notes. Readings so far: Ch 2, 3, 4.1, 4.2, 4.6, 5, 7. ============================================================================= Lecture Topic: Techniques for analyzing and understanding programs. (Material from chapter 5 of the book.) Quote from Jon Bentley, "Progamming Pearls", page 35, 36 (slightly re-worded to use C syntax): Given a sorted array A[0] <= A[1] <=...<= A[n-1], we want to determine if a given element T is in the array. Binary search solves the problem by keeping track of a range within the array in which T must be if it is anywhere in the array. Initially the range is the entire array. The range is shrunk by comparing its middle element to T, and then discarding half the range. The process continues until T is found, or until the range in which it must lie is known to be empty. In an n-element table, the search uses roughly log2(n) comparisons. I've assigned this problem [binary search] in courses at Bell Labs and IBM. Professional programmers had a couple of hours to convert the above description [of binary search] into a program in the language of their choice.....at the end of the period, most programmers reported that they had written correct code for the task. We would then take 30 minutes to examine their code....In several cases, and with over 100 programmers, the results varied little. 90 % of the programmers found bugs in their programs. I was amazed: given ample time, only about 10 percent of professional programmers were able to get this small program right. But they aren't the only ones....Knuth points out that while the first binary search was published in 1946, the first published binary search without bugs did not appear until 1962. What's going on here: Everyone gets the high-level picture right, but the problem is getting the details correct. Today: * how to analyze code so it works right the first time * techniques for testing code to make sure. Start with easier problem: linear search. (1) First part is to state the ASSUMPTIONS we're making on our input, and the GUARANTEES we're giving on our output. (Good thing to put in a comment). (2) In a complicated loop, do the same thing but replace "starting assumptions" with LOOP INVARIANT. an INVARIANT is an assertion that doesn't change in executing a piece of code. A LOOP INVARIANT is something that holds at the beginning of a loop, and continues to hold after each iteration. (3) Also, we want a PROGRESS GUARANTEE that each iteration through the loop we get closer to finishing. Makes sure we don't have an infinite loop. (Also useful when analyzing running time). [in this case, it's pretty obvious] // Input: A is array of length n, key is item we're looking for // Output: location in A of key, or -1 if key does not appear in array. int linSearch(int A[], int n, int key) { for(int i=0; i key for all j > top. // progress: (new #elements in range) <= 1/2*(old #elements in range) // if (bottom > top) return -1; mid = (bottom + top)/2; if (key == A[mid]) return mid; if (key < A[mid]) top = mid-1; else bottom = mid+1; } } Let's do this systematically. Use an example for reference: searching for 6 in [1 3 5 7 9 11] What is the invariant we want to hold? * If key is in array, then it is between bottom and top, inclusive. Even stronger: A[j] < key for j < bottom and A[j] > key for j > top. Does it still hold? * as far as invariant goes, could set top=mid and bottom=mid in last two lines, but we really want top=mid-1 and bottom=mid+1 there. Why? How/where would it fail? --> try on array [0 1]. This seems to catch all the binary-search bugs. * we need our progress condition. Do larger example. progress condition is (new # elements) <= 1/2 * (old # elements) if did top=mid or bottom=mid, would just have the condition (new # elements) <= 1/2*(old # elements) + 1. Not good enough. Progress condition also tells us our running time is about log_2(n). To recap: The basic technique for analyzing a loop is the following: (1) State a loop invariant & show it holds at the beginning of the loop. (2) Prove, assuming the invariant at the previous iteration, that the invariant holds in the next iteration. (3) Prove that after the loop exits the desired property holds. (4) Prove termination with a progress guarantee. Another way to do binary search (actually easier to get it right this way) is to do it recursively. ANALYZING RECURSIVE PROGRAMS ---------------------------- * Analog to loop invariant: base case and general case. Look at correctness of general case under assumption that recursive calls turn out OK. * progress condition: make sure we're getting closer to base case on each recursive call. // recursive binary search: initial call would have bottom=0 and // top = n-1. int rec_search(int A[], int bottom, int top, int key) { int mid; if (bottom > top) return -1; mid = (top+bottom)/2; if (key == A[mid]) return mid; else if (key < A[mid]) return rec_search(A, bottom, mid-1, key); else return rec_search(A, mid+1, top, key); } int bsearch(int A[], int n, int key) { return rec_search(A,0,n-1,key);} Here inductive assumption is same as the earlier loop invariant: if key is in the array at all, it should be between bottom and top, inclusive. * Base case: easy. * general case: not too hard either. * progress: can see that always cutting down range by at least half. (We'll talk about testing next time)