Lecture 9/16/97
o Analyzing and understanding                   1 handout.
  programs       				- old quiz 1
o loop invariants 				

Reminder: Quiz Thurs.  open book. 1 8.5 x 11 sheet of notes.

Readings so far: Ch 2, 3, 4.1, 4.2, 4.6, 5, 7.


=============================================================================

  Lecture Topic: Techniques for analyzing and understanding programs.
  (Material from chapter 5 of the book.)

Quote from Jon Bentley, "Progamming Pearls", page 35, 36 (slightly
re-worded to use C syntax):

  Given a sorted array A[0] <= A[1] <=...<= A[n-1], we want to determine
  if a given element T is in the array.  Binary search solves the
  problem by keeping track of a range within the array in which T must
  be if it is anywhere in the array.  Initially the range is the entire
  array.  The range is shrunk by comparing its middle element to T, and
  then discarding half the range.  The process continues until T is
  found, or until the range in which it must lie is known to be empty.
  In an n-element table, the search uses roughly log2(n) comparisons.
  
  I've assigned this problem [binary search] in courses at Bell Labs and
  IBM.  Professional programmers had a couple of hours to convert the
  above description [of binary search] into a program in the language of
  their choice.....at the end of the period, most programmers reported
  that they had written correct code for the task.  We would then take
  30 minutes to examine their code....In several cases, and with over
  100 programmers, the results varied little.  90 % of the programmers
  found bugs in their programs.

  I was amazed: given ample time, only about 10 percent of professional
  programmers were able to get this small program right.  But they
  aren't the only ones....Knuth points out that while the first binary
  search was published in 1946, the first published binary search
  without bugs did not appear until 1962.


What's going on here:  Everyone gets the high-level picture right, but
the problem is getting the details correct.

Today: * how to analyze code so it works right the first time
       * techniques for testing code to make sure.

Start with easier problem: linear search.

(1) First part is to state the ASSUMPTIONS we're making on our input, and
    the GUARANTEES we're giving on our output.  (Good thing to put in a
    comment).

(2) In a complicated loop, do the same thing but replace "starting
    assumptions" with LOOP INVARIANT.


    an INVARIANT is an assertion that doesn't change in executing a
    piece of code.  A LOOP INVARIANT is something that holds at the
    beginning of a loop, and continues to hold after each iteration. 

(3) Also, we want a PROGRESS GUARANTEE that each iteration through the
    loop we get closer to finishing. Makes sure we don't have an infinite
    loop. (Also useful when analyzing running time).
    [in this case, it's pretty obvious]

// Input: A is array of length n, key is item we're looking for
// Output: location in A of key, or -1 if key does not appear in array.
int linSearch(int A[], int n, int key)
{
  for(int i=0; i<n; ++i) {   // invariant: key is not in A[j] for j<i
    if (A[i] == key) return i;
  }
  return -1;                 // key is not in A (this follows from invariant)
}

Now, let's try binary search.  Here is one way to write it.

// Input: A is array of n ints, A[0] <= A[1] <=...<= A[n-1]. Looking for key.
// Output: location in A of key, or -1 if key does not appear in array.

int bsearch(int A[], int n, int key) 
{
  int bottom = 0, top = n-1, mid;
  while (1) {
	// invariant: A[j] < key for all j < bottom.
	//            A[j] > key for all j > top.
	// progress: (new #elements in range) <= 1/2*(old #elements in range)
	//
    if (bottom > top) return -1;
    mid = (bottom + top)/2;
    if (key == A[mid]) return mid;
    if (key < A[mid]) top = mid-1;
    else              bottom = mid+1;
  }
}

Let's do this systematically.

Use an example for reference: searching for 6 in [1 3 5 7 9 11]

What is the invariant we want to hold?  

* If key is in array, then it is between bottom and top, inclusive.
  Even stronger:  A[j] < key for j < bottom and A[j] > key for j > top.


  Does it still hold?

* as far as invariant goes, could set top=mid and bottom=mid in last
  two lines, but we really want top=mid-1 and bottom=mid+1 there.  Why?
  How/where would it fail?

  --> try on array [0 1].  This seems to catch all the binary-search bugs.

* we need our progress condition.  Do larger example.
  progress condition is (new # elements) <= 1/2 * (old # elements)
  
  if did top=mid or bottom=mid, would just have the condition
  (new # elements) <= 1/2*(old # elements) + 1.  Not good enough.

  Progress condition also tells us our running time is about log_2(n).

To recap:

The basic technique for analyzing a loop is the following:

  (1) State a loop invariant & show it holds at the beginning of the loop.

  (2) Prove, assuming the invariant at the previous iteration, that the
      invariant holds in the next iteration.

  (3) Prove that after the loop exits the desired property holds.

  (4) Prove termination with a progress guarantee.


Another way to do binary search (actually easier to get it right this
way) is to do it recursively.

ANALYZING RECURSIVE PROGRAMS
----------------------------
* Analog to loop invariant: base case and general case.
  Look at correctness of general case under assumption that recursive
  calls turn out OK.

* progress condition: make sure we're getting closer
  to base case on each recursive call.

// recursive binary search: initial call would have bottom=0 and 
// top = n-1.
int rec_search(int A[], int bottom, int top, int key) 
{
    int mid;
    if (bottom > top) return -1;
    mid = (top+bottom)/2;
    if (key == A[mid]) return mid;
    else if (key < A[mid]) return rec_search(A, bottom, mid-1, key);
    else                   return rec_search(A, mid+1, top, key);
}

int bsearch(int A[], int n, int key) { return rec_search(A,0,n-1,key);}

Here inductive assumption is same as the earlier loop invariant: if
key is in the array at all, it should be between bottom and top, inclusive. 

* Base case: easy.
* general case: not too hard either.

* progress: can see that always cutting down range by at least half.

(We'll talk about testing next time)