CS 15-451 ALGORITHMS September 14,16 2004
Lecture #5&6 Manuel & Avrim Blum
In this lecture, we will see some simple models of computation, each with a
precise definition of what counts as a step of computation.
EXAMPLE 1: Multiply/divide 2 numbers using the standard grade school
multiplication algorithm. Count each digit printed on paper as 1 step. All
other computation is free.
QUESTION: How many steps does it take to multiply 2 n-bit numbers?
ANSWER: n^2 + 2n steps:
1011
x 1101
----
1011 n print steps
0000 n
1011 n
1011 n
-------
10001111 2n print steps
Total = n^2 + 2n print steps
QUESTION: How many (print) steps does the grade school division algorithm
take to divide a 2n bit number by an n bit number to get an n bit quotient
and an n bit remainder?
EXAMPLE 2: Exchange sort. Given a shelf containing n unordered books to be
arranged alphabetically. The only allowed step is to exchange two books.
Such an exchange counts 1 step. All other (planning) work is free (doesn't
count any steps).
QUESTION: How many exchanges are necessary (lower bound) & sufficient
(upper bound) to sort the n books?
UPPER BOUND: n-1 is clearly sufficient. (Why?)
But is n-1 necessary? If n is even, and no book is in its correct location,
then n/2 exchanges are necessary to "touch" all books. But can one argue
that for some arrangements of the books, n-1 steps is necessary to sort the
books?
LOWER BOUND: n-1 is necessary: Create a graph in which a directed edge i
-> j means that that the book in location i must end up at location j.
Note that this is a special kind of directed graph:
It is a permutation; a set of cycles!
(Why? Every book points to SOME location, perhaps its own location. In
addition, no two books point to the SAME location.)
1. What is the effect of exchanging any two elements (books) that are in
the same cycle?
2. What is the effect of exchanging any two elements that are in different
cycles?
In each case, explain what happens to the number of cycles in the permutation.
3. How many cycles are in the final sorted array?
Now put 1,2,3 together to give an arrangement that requires n-1 exchanges.
EXAMPLE 3: Comparison tree model for sorting n distinct elements:
Each node of the tree compares two (distinct) elements:
ai < aj ?
YES/ \NO
/ \
If elements are not distinct,
comparisons may be taken to be ai < = > aj ?
< rather than < , or ternary: / | \
- / | \
Each such comparison counts one step.
Example: Here is a partial tree for sorting 3 distinct elements a1,a2, a3:
a1.
This is NOT obvious. In fact, an algorithm to find the Median of n
elements does NOT have to find : if
it did, then finding the median would require at least
log ((n choose n/2)*(n/2)!) = OMEGA(n log n) comparisons (why?), and we
know from Lecture 4 that the median can be found with just O(n) comparisons.
Nevertheless, we have the theorem:
THEOREM: Any algorithm to find the second largest of n elements must
actually find the largest, ie must find the ordered pair of those n elements.
This means that the leaves of any tree to find the second largest can
actually be labelled with the ordered pair .
PROOF: Suppose to the contrary that the largest element is not found but
the second largest is found. SOME element (the Largest) necessarily won all
comparisons. That element is not unique (else the Largest would be
determined), so there must be at least two elements that never lost any
comparisons. Both are candidates for Largest (why?), and therefore both are
candidates for Second Largest (Why?). It follows that the second largest
has not yet been identified, contrary to our assumption. -><-
A BETTER INFORMATION THEORETIC LOWER BOUND: There are n(n-1) possible
solution leaves, (NOT n(n-1)/2. Why?). The comparison tree is binary. Hence
the minimum depth of any tree to solve this problem -- and therefore the
miniminum number of comparisons to solve it -- must be at least log n(n-1)
, which is approximately = 2(log n).
A BETTER LOWER BOUND: n-1. Why?
A BETTER UPPER BOUND: n-1 + (log n). Why? Trees are a nice theoretical
concept for formalizing the meaning of a comparison step and the general
form of a comparison tree algorithm, and for developing lower bounds. For
upper bounds, however, it is generally better to think of algorithms in the
normal way that they are constructed, and then to count the number of
comparisons that algorithm takes, bearing in mind that for any specific n,
one could in principle lay out the comparison tree. For the problem at
hand, to find the Maximum element using the fewest number of comparisons,
consider a tennis style tournament to find the best player. This is
represented by a DIFFERENT kind of tree, the tennis tournament tree:
6 4 2 1 8 7 3 5 First round
\ / \ / \ / \ /
6 2 8 5 Second round
\ / \ /
\ / \ /
\ / \/
6 8 Third round
\ /
\ /
\ /
\ /
\ /
\ /
8 Winner!
It is tempting to say that the last player (6) to lose to the winner (8) is
the second best player, but this is NOT necessarily true! Why?
It is true, however, that the 2nd best player necessarily lost to the best
player -- as first pointed out by Lewis Carroll! (Of course, this assumes
that the ith best player always beats the i+1st best player, and that
"beating" is transitive: if i beats j and j beats k then i beats k.)
In the above example, players 7,5,6 lost to 8. The 2nd best player can be
found by a tournament among these log n players:
7 5 6
\ / |
7 6
\ /
\ /
7
TIGHTER BOUNDS? At this point, our lower bound on finding the second
largest is n-1, while our upper bound is n-1 + (log n). What is the true
state of affairs? Must the lower bound be raised? Or can the upper bound
be lowered?
Lower Bounds to Find the Median:
* log n This information theoretic lower bound requires the observation
that any one of the n elements can be the median.
* n-1 If less than n-1 comparisons are made, then there are at least 2
candidates for median. Why?
* 3n/2 - O(1) To find the median, every element must be compared either
directly or indirectly to the median. Why?
To get the 3n/2 lower bound for the median, it helps to have a little
terminology about the various kinds of comparisons possible between two
elements, say 3 and 7:
3 3
Direct comparison -> / \
2 7 Indirect comparisons 5
/ \ <- Not a between 3 and 7: /
3 7 comparison of any 7
kind between 3 and 7.
In other words, these 2 comparisons
give no information whatsoever about
the relation between 3 and 7.
An adversary can make sure that at least (n-1)/2 comparisons are between
elements that are on opposite sides of the median. How?
In addition, to find the median, every element must be compared either
directly or indirectly to the median. (Why? This takes some thought and,
maybe, a cup of coffee.) This latter ensures that there must be at least
n-1 additional comparisons. The total number of comparisons therefore comes
to 3n/2 - O(1).
Historical note: the best known lower bound is 2n by Bent and John 1985.
The best known upper bound is 3n by Schonhage, Paterson and Pippenger 1976.
PROBLEM: The of n elements.
Give tight upper and lower bounds on the number of comparisons required to
find the largest and smallest of n distinct integers.
Problem: How many comparisons are necessary and sufficient to determine if
n real numbers
a1 .. an are all distinct?
LOWER BOUND: (necessary) number of steps = n-1.
UPPER BOUND: (sufficient) number of steps = number of comparisons
sufficient to sort + number of comparisons to check successive elts in the
sorted array. O(n log n)
A BETTER (ADVERSARY) LOWER BOUND: sorting is NECESSARY! If two adjacent
elements of the sorted array are not compared, then there is no way to tell
(in the comparison tree model) whether or not they are equal. Therefore
log2(n!) is necessary (and sufficient).
The IBM selectric typewriter ribbon problem.
CS 15-451 ALGORITHMS 16 September 2003
Lecture #6 Manuel & Avrim Blum
In this lecture, we look in great depth at a particular problem: Binary
Search in a 2-Dimensional Sorted Array.
While the problem is quite nice in itself, the main point of this lecture
is to see how to tighten the upper and lower bounds (thumbscrews) on a
problem. When you can do it, tightening the bounds is a great way to get a
new and possibly optimal algorithm for a problem.
Problem NAME: Lookup In a Sorted 2-dimensional Array (LISA).
INPUT: 1. M = an nxn matrix of distinct integers.
(M* denotes the entry in row i column j). Rows and columns of M are
both in sorted order. Specifically: rows are sorted: M** < M** for
j < M** for i such that x = M** if such ** exists in M.
"No such element" otherwise.
Definition of a COMPUTATION STEP: each lookup in the array counts one step.
Think of the array as having its elements covered by the gray paint used in
lotteries. Each entry that has its gray paint scraped off counts one step.
Problem: Give an algorithm to solve LISA.
Count the number of steps that your algorithm takes in the worst case (on
the worst possible example). This is an UPPER BOUND on the number of steps
to solve this problem. (Equivalently, this is the number of steps
SUFFICIENT to solve this problem -- in the worst case.)
An OPTIMAL algorithm uses the few possible number of lookups in the worst case.
Give a LOWER BOUND on the number of steps that any algorithm for this
problem must take (in the worst case). (Equivalently, this is the number of
steps NECESSARY to solve this problem.)
Try to make your upper and lower bounds as tight as you can make them. The
tightest bounds get the most points.
SOLUTIONS:
An UPPER BOUND: Compare the given x to all elements of the given M. This
suffices to solve the problem in n^2 steps.
A BETTER UPPER BOUND: Do a binary search for x in each column of M. This
takes log n steps per column, or n log n steps altogether.
A LOWER BOUND: Ok, so its not such a great (lower) bound, but it is a
bound: if you don't lookup any elements in M, then there is no way to tell
if x is in M. So a lower bound is 1.
A BETTER LOWER BOUND: log n. Suppose a birdie (correctly) tells you that
all elements of M OUTSIDE of column 1 are bigger than x. This additional
information cannot force you or any algorithm to take even more steps than
would otherwise be taken. (Why?) But it reduces the problem to a binary
search in column 1, which necessarily requires at least log n lookups.
An INFORMATION THEORETIC LOWER BOUND: There are n^2 + 1 possible answers.
The depth of any ternary (three-way branching) tree to solve the problem is
necessarily at least log3(n^2 + 1) -- in the worst case.
:
: Tight upper and lower bounds!
:
:
QUESTION:
Let M denote an mxn matrix of integers. Suppose all the COLUMNS of M are
sorted first, each in increasing order, to get M1:
a1 b1 c1 d1
M1 = a2 b2 c2 d2
a3 b3 c3 d3
a4 b4 c4 d4
Suppose then that all the ROWS of M1 are sorted in increasing order to get
M2. Since the rows of M2 were sorted last, they are clearly in sorted
order. The question is whether the columns of M2 remain sorted or must be
sorted again.
ANSWER:
I think the answer is YES: columns of M2 do not have to be sorted again.
They will all remain in sorted order. Does this prove it?
WHY? Either confirm the following proof, fix it, or give a counterexample.
To simplify, assume all elts of M are distinct.
Since rows of M1 are sorted to create M2, only columns (not rows) of M2 can
be out of order in M2.
Let M** denote the element in location ** (row i column j) of M.
Note that M2<1,1> is the minimum element of M, since sorting the columns
of M puts the minimum elt of M in row 1 of M1, then sorting the rows of M1
puts it in column 1 of M2, so it ends up in location <1,1> of M2.
Note that the first column of M2 must be sorted, else there is an element
<1,i+1> in M2 that is smaller (out of order) than <1,i> in M2. But element
<1,i+1> in M2 came from in M1, which was greater than in M1,
which in turn is greater or equal to <1,i> in M2. -><-.
In general, suppose to the contrary that column j of M2 is out of
order. Specifically, M2** > M2** .
But M2** is the jth smallest elt of M1 row i, while M2** is the
jth smallest elt of M1 row i+1. The latter is necessarily larger than j-1
other elements in its row, i+1, therefore larger than j elements in row i,
therefore at least as large as M2**. -><-
*