\documentclass[11pt]{article}
% \documentclass[11pt,twoside]{article}

\usepackage{lecnotes}
\usepackage{graphicx}
\usepackage{comment}
% \input{fp-macros}

\newcommand{\lecdate}{May 22, 2014} % e.g. January 25, 2011
\newcommand{\lecnum}{4}           % e.g. 1
\newcommand{\lectitle}{Linear Search}         % e.g. Judgments and Propositions
\newcommand{\lecturer}{Frank Pfenning}         % e.g. Frank Pfenning

\begin{document}

\maketitle

\section{Introduction}

One of the fundamental and recurring problems in computer science is
to find elements in collections, such as elements in sets.  An
important algorithm for this problem is \emph{binary search}.  We use
binary search for an integer in a sorted array to exemplify it.  As a
preliminary study in this lecture we analyze \emph{linear search},
which is simpler, but not nearly as efficient.  Still it is often
used when the requirements for binary search are not satisfied,
for example, when we do not have the elements we have to search
arranged in a sorted array.

In term of our learning goals, we address the following:
\begin{description}
\item[Computational Thinking:] Developing contracts (preconditions,
  postconditions, assertions, and loop invariants) that establish the
  safety and correctness of imperative programs.

  Evaluating the use of \emph{order}
  (sorted data) as a problem-solving tool.

  Identifying the difference between \emph{specification} and
  \emph{implementation}.

\item[Algorithms and Data Structures:]
  Describing linear search.

\item[Programming:] We will practice \emph{deliberate programming}
  together in lectures.

  Identifying, describing, and effectively using short-circuiting
  Boolean operators.

\end{description}

\section{Linear Search in an Unsorted Array}

If we are given an array of integers $A$ without any further
information and have to decide if an element $x$ is in $A$, we just
have to search through it, element by element.  We return \verb'true'
as soon as we find an element that equals $x$, \verb'false' if no
such element can be found.
\begin{verbatim}
bool is_in(int x, int[] A, int lower, int upper)
//@requires 0 <= lower && lower <= upper && upper <= \length(A);
{
  for (int i = lower; i < upper; i++)
    //@loop_invariant lower <= i && i <= upper;
    {
      if (A[i] == x) return true;
    }
  return false;
}
\end{verbatim}
We used the statement \verb'i++' which is equivalent
to \verb'i = i+1' to step through the array, element by element.

The precondition is very common when working with arrays. We pass an
array, and we also pass bounds -- typically we will let ${\it lower}$
be 0 and ${\it upper}$ be the length of the array. The added
flexibility of allowing ${\it lower}$ and ${\it upper}$ to take other
values will be useful if we want to limit search to the first $n$
elements of an array and do not care about the others. It will also be
useful later to express invariants such as \emph{$x$ is not among the
  first $k$ elements of $A$}, which we will write in code as
\verb'!is_in(x, A, 0, k)' and which we will write in mathematical
notation as $x \notin A[0,k)$.

The loop invariant is also typical for loops over an array.  We
examine every element ($i$ ranges from ${\it lower}$ to ${\it
  upper}-1$).  But we will have $i = {\it upper}$ after the last
iteration, so the loop invariant which is checked \emph{just before
  the exit condition} must allow for this case.

Could we strengthen the loop invariant, or write a postcondition?
% For the loop invariant, we might want to express that $x$
% is not in $A[0] \ldots A[i-1]$.  We would have to express
We could try something like
\begin{verbatim}
//@loop_invariant !is_in(x, A, lower, i);
\end{verbatim}
where \verb'!b' is the negation of $b$.  However, it is difficult to
make sense of this use of recursion in a contract or loop invariant
so we will avoid it.

This is small illustration of the general observation that some
functions are basic specifications and are themselves not subject to
further specification.  Because such basic specifications are
generally very inefficient, they are mostly used in other
specifications (that is, pre- or post-conditions, loop invariants,
general assertions) rather than in code intended to be executed.

\section{Sorted Arrays}

A number of algorithms on arrays would like to assume that they are
sorted.  Such algorithms would return a correct result only if they are
actually running on a sorted array. Thus, the first thing we need to
figure out is how to specify sortedness in function specifications.
The specification
function \verb'is_sorted(A,lower,upper)' traverses the array $A$ from
left to right, starting at ${\it lower}$ and stopping just before reaching
${\it upper}$, checking that each element is smaller or equal to its
right neighbor.  We need to be careful about the loop invariant to
guarantee that there will be no attempt to access a memory element out
of bounds.

\begin{verbatim}
bool is_sorted(int[] A, int lower, int upper)
//@requires 0 <= lower && lower <= upper && upper <= \length(A);
{
  for (int i = lower; i < upper-1; i++)
    //@loop_invariant lower <= i;
    if (!(A[i] <= A[i+1])) return false;
  return true;
}
\end{verbatim}
The loop invariant here does not have an upper bound on $i$.
Fortunately, when we are inside the loop, we know the loop condition
is true so we know $i < {\it upper}-1$.  That together with ${\it
  lower} \leq i$ guarantees that \emph{both} accesses are in bounds.

We could also try $i \leq {\it upper}-1$ as a loop invariant, but this turns out
to be false.  It is instructive to think about why.  If you cannot
think of a good reason, try to prove it carefully.  Your proof should
fail somewhere.

%\clearpage

Actually, the attempted proof already fails at the initial
step.  If ${\it lower} = {\it upper} = 0$
(which is permitted by the precondition)
then it is \emph{not} true that
$0 = {\it lower} = i \leq {\it upper}-1 = 0-1 = -1$.
We could say $i \leq {\it upper}$, but that wouldn't seem to serve
any particular purpose here since the array accesses are
already safe.

Let's reason through that.  Why is the acccess $A[i]$ safe?  By the
loop invariant ${\it lower} \leq i$ and the precondition $0 \leq {\it
  lower}$ we have $0 \leq i$, which is the first part of safety.
%
Secondly, we
have $i < {\it upper}-1$ (by the loop condition, since we are in the body of the
loop) and ${\it upper} \leq \m{length}(A)$ (by the precondition), so $i$ will be
in bounds.  In fact, even $i+1$ will be in bounds, since $0 \leq {\it lower}
\leq i <
i+1$ (since $i$ is bounded from above) and
$i+1 < ({\it upper}-1)+1 = {\it upper} \leq
\m{length}(A)$.

Whenever you see an array access, you must have a very good reason why
the access must be in bounds.  You should develop a coding instinct
where you \emph{deliberately pause} every time you access an array
in your code and verify that it should be safe according to your
knowledge at that point in the program.  This knowledge can be
embedded in preconditions, loop invariants, or assertions that
you have verified.

% Why is it necessary?  If we ask if a zero-length array is sorted, then
% before the check the exit condition of the loop the first time we have
% $i = 0$ and $n = 0$, so it is not the case that $i \leq n-1$.
% Therefore the second part of the loop invariant does not hold.  We
% account for that explicitly by allowing $n$ to be $0$.

% For an example of reasoning with loop invariants, we verify in
% some detail why this is a valid loop invariant.
% \begin{description}
% \item[Initially:] Upon loop entry, $i = 0$.  We distinguish two
%   cases.  If $n = 0$, then the left disjunction \verb'n == 0' holds.
%   If $n \not= 0$ then $n > 0$ because the precondition of the function
%   requires $n \geq 0$.  But if $n > 0$ and $i = 0$ then $i \leq n-1$.
%   We also have $0 \leq i$ so \verb'0 <= i && i <= n-1' holds.

% \item[Preservation:] Assume the loop invariant holds before the
%   test, so either $n = 0$ or $0 \leq i \leq n-1$.  Because we do not
%   exit the loop, we also have $i < n-1$.  The step statement in the
%   loop increments $i$ so we have $i' = i+1$.

%   Since $i' = i+1$ and $0 \leq i$ we have $0 \leq i'$.  Also, since $i
%   < n-1$ and $i' = i+1$ we have $i'-1 < n-1$ and so $i' < n$.
%   Therefore $i' \leq n-1$.

%   So $0 \leq i' \leq n-1$ and the loop invariant is still satisfied
%   because the right disjunct is true for the new value $i'$ of $i$.
% \end{description}

% One pedantic point (and we \emph{do} want to be pedantic in this class
% when assessing function correctness, just like the machine is): from
% $0 \leq i$ and $i' = i+1$ we inferred $0 \leq i'$.  This is only
% justified in modular arithmetic if we know that $i+1$ does not
% overflow.  Fortunately, we also know $i < n-1$, so $i+1\leq n$ and $i$ is
% bounded from above by a positive integer (and from below by 0).  Therefore incrementing $i$
% cannot overflow.


% We generally do not verify loop invariants in this amount of detail,
% but it is important for you do know how to reason attentively through
% loop invariants to uncover errors, be they in the program or in the
% loop invariant itself.

\section{Linear Search in a Sorted Array}

Next, we want to search for an element $x$ in an array $A$ which we
know is sorted in ascending order.  We want to return $-1$ if
$x$ is not in the array and the index of the element if it is.

The pre- and postcondition as well as a first version of the function
itself are relatively easy to write.
\begin{verbatim}
int search(int x, int[] A, int n)
//@requires 0 <= n && n <= \length(A);
//@requires is_sorted(A,0,n);
/*@ensures (\result == -1 && !is_in(x, A, 0, n))
        || ((0 <= \result && \result < n) && A[\result] == x);
  @*/
{
  for (int i = 0; i < n; i++)
    //@loop_invariant 0 <= i && i <= n;
   if (A[i] == x) return i;
  return -1;
}
\end{verbatim}

This does not exploit that the array is sorted.  We would like
to exit the loop and return $-1$ as soon as we find that
$A[i] > x$.  If we haven't found $x$ already, we will not find
it subsequently since all elements to the right of $i$ will be
greater or equal to $A[i]$ and therefore strictly greater than
$x$.  But we have to be careful: the following program has a bug.

\clearpage
\begin{verbatim}
int search(int x, int[] A, int n)
//@requires 0 <= n && n <= \length(A);
//@requires is_sorted(A,0,n);
/*@ensures (-1 == \result && !is_in(x, A, 0, n))
        || ((0 <= \result && \result < n) && A[\result] == x);
  @*/
{
  for (int i = 0; A[i] <= x && i < n; i++)
    //@loop_invariant 0 <= i && i <= n;
    if (A[i] == x) return i;
  return -1;
}
\end{verbatim}

Can you spot the problem?  If you cannot spot it immediately,
reason through the loop invariant.  Read on if you are confident
in your answer.

\clearpage
The problem is that the loop invariant only guarantees
that $0 \leq i \leq n$ before the exit condition is tested.
So it is possible that $i = n$ and the test \verb'A[i] <= x'
will try access an array element out of bounds: the $n$
elements of $A$ are numbered from $0$ to $n-1$.

We can solve this problem by taking advantage of the so-called
\emph{short-circuiting evaluation} of the boolean operators of
conjunction (``and'') \verb'&&' and disjunction (``or'') \verb'||'.
If we have condition \verb'e1 && e2' ($e_1$ \emph{and} $e_2$) then we
do not attempt to evaluate $e_2$ if $e_1$ is \verb'false'.  This is
because a conjunction will always be false when the first conjunct is
false, so the work would be redundant.

Similarly, in a disjunction \verb'e1 || e2' ($e_1$ \emph{or} $e_2$)
we do not evaluate $e_2$ if $e_1$ is \verb'true'.  This is because
a disjunction will always be true when the first disjunct it
true, so the work would be redundant.

In our linear search program, we just swap the two conjuncts in
the exit test.
\begin{verbatim}
int search(int x, int[] A, int n)
//@requires 0 <= n && n <= \length(A);
//@requires is_sorted(A,0,n);
/*@ensures (-1 == \result && !is_in(x, A, 0, n))
        || ((0 <= \result && \result < n) && A[\result] == x);
  @*/
{
  for (int i = 0; i < n && A[i] <= x; i++)
    //@loop_invariant 0 <= i && i <= n;
   if (A[i] == x) return i;
  return -1;
}
\end{verbatim}
Now \verb'A[i] <= x' will only be evaluated if $i < n$ and
the access will be in bounds since we also know $0 \leq i$
from the loop invariant.

Alternatively, and perhaps easier to read, we can move
the test into the loop body.
\clearpage
\begin{verbatim}
int search(int x, int[] A, int n)
//@requires 0 <= n && n <= \length(A);
//@requires is_sorted(A,0,n);
/*@ensures (-1 == \result && !is_in(x, A, 0, n))
        || ((0 <= \result && \result < n) && A[\result] == x);
  @*/
{
  for (int i = 0; i < n; i++)
    //@loop_invariant 0 <= i && i <= n;
    {
      if (A[i] == x) return i;
      else if (A[i] > x) return -1;
    }
  return -1;
}

\end{verbatim}

This program is not yet satisfactory, because the loop
invariant does not have enough information to prove the
postcondition.  We \emph{do} know that if we return directly from
inside the loop, that $A[i] = x$ and so \verb'A[\result] == x'
holds.  But we cannot deduce that \verb'!is_in(x, A, 0, n)'
if we return $-1$.

Before you read on, consider which loop invariant you might
add to guarantee that.  Try to reason why the fact that
the exit condition must be false and the loop invariant
true is enough information to know that \verb'!is_in(x, A, 0, n)'
holds.

\clearpage
Did you try to exploit that the array is sorted?  If not,
then your invariant is most likely too weak, because the
function is incorrect if the array is not sorted!

What we want to say is that \emph{all elements in $A$
to the left of index $i$ are smaller than $x$}.
Just saying \verb'A[i-1] < x' isn't quite right, because
when the loop is entered the first time we have $i = 0$
and we would try to access $A[-1]$.  We again exploit
shirt-circuiting evaluation, this time for disjunction.
\begin{verbatim}
int search(int x, int[] A, int n)
//@requires 0 <= n && n <= \length(A);
//@requires is_sorted(A,0,n);
/*@ensures (-1 == \result && !is_in(x, A, 0, n))
        || ((0 <= \result && \result < n) && A[\result] == x);
  @*/
{
  for (int i = 0; i < n; i++)
    //@loop_invariant 0 <= i && i <= n;
    //@loop_invariant i == 0 || A[i-1] < x;
    {
      if (A[i] == x) return i;
      else if (A[i] > x) return -1;
      //@assert A[i] < x;
    }
  return -1;
}
\end{verbatim}
It is easy to see that this invariant is preserved.  Upon loop entry,
$i = 0$.  Before we test the exit condition, we just incremented $i$.
We did not return while inside the loop, so $A[i-1] \neq x$ and
also $A[i-1] \leq x$.  From these two together we have $A[i-1] < x$.
We have added a corresponding assertion to the program to highlight
the importance of that fact.

Why does the loop invariant imply the postcondition of the function?
If we exit the loop normally, then the loop condition must be false.
So $i \geq n$.  We know $A[n-1] = A[i-1] < x$.  Since the array is
sorted, all elements from $0$ to $n-1$ are less or equal to $A[n-1]$
and so also strictly less than $x$ and $x$ can not be in the array.

If we exit from the loop because $A[i] > x$, we also know that $A[i-1]
< x$ so $x$ cannot be in the array since it is sorted.

\section{Analyzing the Number of Operations}

In the worst case, linear search goes around the loop $n$ times, where
$n$ is the given bound.  On each iteration except the last, we perform
three comparisons: $i < n$, $A[i] = x$ and $A[i] > x$.  Therefore,
the number of comparisons is almost exactly $3 * n$ in the worst case.
We can express this by saying that the running time is \emph{linear}
in the size of the input ($n$).  This allows us to predict the running
time pretty accurately.  We run it for some reasonably large $n$ and
measure its time.  Doubling the size of the input $n' = 2*n$ mean that
now we perform $3*n' = 3*2*n = 2*(3*n)$ operations, twice as many as
for $n$ inputs.

We will introduce more abstract measurements for the running times
in the lecture after next.

% \clearpage
% \section*{Exercises}

% \clearpage
% \bibliographystyle{alpha}
% \bibliography{modal}

% \cleardoublepage
\end{document}
