RECITATION NOTES 15-451 Algorithms 09/15/04
Plan:
* questions on basics? questions on sorting lower bound?
* more practice with analyzing probabilistic processes
- coupon collectors problem
- coin flipping
- H_n
=======================================================================
questions on basics:
- Any questions on basics like order notation? Here's one: why can't we have
f(n)=Theta(g(n)) *and* have f(n) = o(g(n))?
questions on sorting lower bound?
- Basic idea: If we have n distinct objects in an array, they could
initially be in any of n! different permutations. Any
comparison-based sorting algorithm is asking a series of YES/NO
questions and so needs at least lg(n!) questions to isolate which
permutation the input really was in. (Like a game of 20-questions).
Finally, the last step of the argument is that you can't stop if there
exist two different permutations that are both consistent with all the
answers you've received, since the output of the algorithm is just
going to be a permutation of the input and so it can't be correct on
both of them.
=======================================================================
The coupon-collector's problem is a nice one for probabilistic
analysis using the tools we've discussed so far.
The problem is: suppose we have n boxes (let's label them 1,...,n)
and we repeatedly throw balls into the boxes at random. What is the
expected number of throws until every box has at least one ball in it?
Answer: nH_n
(where H_n = 1+1/2+1/3+...+1/n)
The proof is as follows. Imagine a counter (starting at 0) that tells
us how many boxes have at least one ball in it. Let X_1 denote the
number of throws until the counter reaches 1 (so X_1 = 1). Let X_2
denote the number of throws from that point until the counter reaches
2. In general, let X_k denote the number of throws made from the time
the counter hit k-1 up until the counter reaches k.
So, the total number of throws is X_1 + ... + X_n, and by linearity of
expectation, what we are looking for is E[X_1] + ... + E[X_n].
How to evaluate E[X_k]? Suppose the counter is currently at k-1.
Each time we throw a ball, the probability it is something new is
(n-(k-1))/n. So, another way to think about this question is as follows:
Coin flipping: we have a coin that has probability p of coming up
heads (in our case, p = (n-(k-1))/n). What is the expected number
of flips until we get a heads?
It turns out that the "intuitively obvious answer", 1/p, is correct.
But why? Here is one way to see it: if the first flip is heads,
then we are done; if not, then we are back where we started, except
we've already paid for one flip. So the expected number of flips E
satisfies: E = p*1 + (1-p)*(1 + E). You can then solve for E = 1/p.
The above argument is a little tricky, so if you want, you can draw
a picture of a tree like this, where what we are looking for is the
sum, over all leaves, of the depth of the leaf times the probability
of reaching that leaf.
*
p/ \(1-p)
* *
p/ \(1-p)
* *
...
And if we let T denote that tree, the argument is saying just that one
simple way of analyzing T is to view T as:
*
p/ \(1-p)
* T
Putting this all together, let CC(n) be the expected number of
throws until we have filled all the boxes. We then have:
CC(n) = E[X_1] + ... + E[X_n]
= n/n + n/(n-1) + n/(n-2) + ... + n/1
= n(1/n + 1/(n-1) + ... + 1/1)
= n H_n.
QED.
---- only if there is time left (which there probably won't be) ----
BTW, in class we said that H_n is about ln n. If you want, in class
you could go through a formal argument. Here are some of Danny
Sleator's notes from a while back:
By drawing a diagram with the curves 1/x, 1/(x+1), and some boxes,
it's easy to see that:
integral 1/(x+1) dx <= (H_n)-1 <= integral 1/x dx
1 to n 1 to n
The first integral evaluates to: ln(n+1)-ln(2)
the second integral evaluates to: ln(n)
Thus:
ln(n+1)-ln(2)+1 <= H_n <= ln(n)+1
A tighter upper bound can be obtained by using the trapezoid rule.
In this case the bound is:
H_n <= 1 + integral 1/(x+.5) dx
1 to n
Integrating this gives:
H_n <= 1 + ln(n+1/2) - ln(3/2)