15-750 Graduate Algorithms
* Admin
* Quickly though some basic alg topics:
- Karatsuba, Strassen, recurrences
- RVs, linearity of expectation,
- randomized quicksort 1 handout: 15451 lects 1-5
==============================================================================
Admin:
- Web page: see link on my homepage, or use:
/afs/cs/academic/class/15750-s01/www/index.html.
Schedule of topics & dates isn't up there yet but will be soon.
- Using Kozen book.
- Expect to have hwks every 2 weeks, a take-home midterm, and an
in-class final. 50%/20%/30% (see web page).
- Hwks will be be combination of problems to do by yourself andproblems
you can work in groups on.
- Hwks will be graded by teams of students. Each student is expected
to do this once.
We are assuming everyone has had some kind of undergrad theory/algs
exposure. Divide & conquer, dynamic programming, asymptotic notation,
NP-completeness. Will do a quick refresher today. Handing out 1st 5
lectures of 15-451 which has some of basics -- you're responsible for
this material even though we are only going to skim it very quickly.
======================================
Let's start with a problem we are all familiar with: multiplying two
numbers. Say we want to multiply two n-bit numbers. 41 x 42
(101001 x 101010). Let's call them X and Y.
Really slow method: add X to itself Y times. How many additions does
this take as a function of the *size* of the input? O(2^n). Not
good. Point to make here: one might naively say this is linear time
because it is linear in Y, but what we care about is running time as a
fn of the *size* of the input, and it only takes log(Y) bits to write
Y down.
What about the standard method -- how long does that take?
1010010
101001
101001
-----------
11010111010 = 1722
Ans: O(n^2).
Can we do better? Let's try to do some kind of divide & conquer. Break X
and Y up into their 1st and 2nd halves:
---------------
X = A*2^{n/2} + B | A | B |
+-------------+
Y = C*2^{n/2} + D | C | D |
---------------
We can write X*Y as
AC*2^n + BC*2^{n/2} + AD*2^{n/2} + BD
If we use this directly as a recursive program to multiply two
numbers, we get the recurrence: T(n) = 4*T(n/2) + const*n
This solves to O(n^2). So, no gain. But, we can rewrite
the computation like this:
AC(2^n + 2^{n/2}) - 2^{n/2}(A - B)*(C - D) + (2^{n/2} + 1)BD
| \ / / | \ / |
| \ / / | \ / |
AC*2^n cancel BC*2^n/2 AD*2^n/2 cancel BD
So we have 3 multiplies of size n/2, 2 adds of size n/2, 3 adds of
size n, and a couple of shifts. This gives us the recurrence:
T(n) = 3*T(n/2) + const*n.
which solves to O(n^{log_2(3)}) = O(n^1.585)
(first discovered by Anatoli Karatsuba, in Russia, 1962.)
Is this asymptotically best possible? Turns out can use FFT to do
faster. Karp: O(n log^2 n), then improved to O(n log n loglog n) by
Schonhage and Strassen. This is best known.
How many have seen Strassen's alg for matrix multiplication? Strassen
is natural generalization to the problem of multiplying two matrices.
[Will now assume all entries are "small" so adding or multiplying any
given pair of entries takes constant time.]
To multiply two nxn matrices in the usual way takes time O(n^3). If
we think of each nxn matrix as consisting of 4 n/2 by n/2 submatrices,
then the standard method can be thought of as performing 8 n/2 by n/2
multiplications and 4 additions. Strassen cleverly rearranges to use
only 7 multiplications (and 14 additions). This gives a recurrence of
T(n) = 7T(n/2) + cn^2 which solves to O(n^{2.81}). Important because
do this a lot in scientific computation. Strassen's has more overhead
than standard method, but on Cray, switchover is at 64x64 I think.
Asymptotically, best matrix multiply alg known is by Coppersmith and
Winograd and has time O(n^{2.376}), but not practical. Nobody knows
if can do better --- FFT approach doesn't seem to carry over.
===========
Basic probability. Say I have a deck of 52 cards in order, and
shuffle randomly. How many cards do I expect to end in the same place
as they started? Ans 1.
A Random Variable is a function from outcomes (of some probabilistic
experiment) to reals. E.g., "number of cards that end in correct
position" is an RV. We'll only care about discrete RVs. A couple of
ways of looking at *expected value* of an RV:
E[X] = sum_a a*Pr(X=a). [The a's are different values X can take on]
or
E[X] = sum_e Pr(e)*X(e). [The e's are the outcomes (elementary events)]
A REALLY USEFUL FACT is linearity of expectation. If X and Y are
two RVs, then even if they are highly correlated, E[X+Y] = E[X]+E[Y].
Proof: direct from definition #2. Ifwe define Z(e) = X(e)+Y(e), then
it's immediate from the definition that E[Z] = E[X] + E[Y].
We can use this in the cards example by defining RVs like this:
X_i = 1 if the ith card ends in the same place that *it*
started, and X_i = 0 otherwise.
X = # cards that end in same place as they start.
E[X_i] = Pr(ith card ends in same place as it starts) = 1/52.
E[X] = E[X_1 + ... + X_52] = E[X_1] +...+E[X_52] = 1.
Now, let's use this to analyze randomized quicksort.
Quicksort:
1) pick an element as pivot^*
2) split array into LESS, EQUAL, and GREATER by comparing each
element to the pivot.
3) recursively sort LESS and GREATER
*version 1: use leftmost element as pivot.
What is worst-case time? Omega(n^2) if array is already sorted.
That's because everything goes into GREATER and we get a recurrence
like T(n) = n + T(n-1).
What if we make the algorithm itself be probabilistic?
RANDOMIZED-QUICKSORT: pick a *random* element as pivot.
We'll prove: for any initial ordering ofthe input, expected time to
sort is O(n log n).
This is sometimes called a Worst-case Expected-Time bound. This
implies that the deterministic algorithm gets O(n log n) *on average*
(meaning, on a randomly ordered input). But, having a good running
time "on average" is really not as satisfying since it depends on what
your inputs end up looking like. It's a little funny: making the algorithm
probabilistic gives us *more* control. A lot like why you use
randomization in a 2-player matrix game like rock-paper-scissors.
There's a neat way to analyze randomized quicksort that is a lot like
what we did with the card shuffling.
THEOREM: E[# comparisons] <= 2n*ln(n)
PROOF: First of all, let's assume all elements in array are different
since it's the worst case and will make our notation simpler. What
we'll analyze is the expected number of comparisons made by the
algorithm, since that determines the running time. Just like above,
the trick is writing that as a sum of simple Random Variables, and
using Linearity of Expectation.
Define X_ij = 1 if we *did* directly compare the ith smallest and jth smallest
elements in the sorting process
= 0 if we *didn't*.
n-1 n n-1 n
* So, how can we write X? X = SUM SUM X_ij. So, E[X] = SUM SUM E[X_ij]
i=1 j=i+1 i=1 j=i+1
- denote ith smallest by e_i, jth smallest by e_j, and conceptually,
imagine lining the elements up in sorted order. If pivot we choose
is between e_i and e_j then these two end up in different buckets and we
*dont* compare them. If pick e_i or e_j as pivot then we *do*
compare them. If pick pivot < e_i or > e_j then both end up in same
bucket and we repeat. So, can think of as a dart game: we throw a
dart at random, and if outside the range we repeat, until we hit e_i
or e_j or something in-between at which point we stop. Pr(we end up
hitting e_i or e_j) = 2/(j-i+1). This is E[X_ij].
- So, have
n-1 n n-1 n-i+1 n-1 n
SUM SUM 2/(j-i+1) = 2 SUM SUM 1/d < 2*SUM SUM 1/d.
i=1 j=i+1 i=1 d=2 i=1 d=2
- Now, 1 + 1/2 + 1/3 + ... + 1/n is called the "nth harmonic number".
Denoted H_n. H_n is in the range [ln(n), ln(n)+1], which you can
see by thinking of it as discrete version of the integral of 1/x.
n-1
- So, our total is < 2*SUM (H_n - 1) = 2*(n-1)*(H_n - 1) < 2n*ln(n).
i=1