RECITATION NOTES 15-451 Algorithms 09/12/01 Plan: * coupon collectors problem * coin flipping * H_n ======================================================================= The coupon-collector's problem is a nice one for probabilistic analysis using the tools we've discussed so far. The problem is: a random number generator uniformly generates a random number in [1,n]. What is the expected number of calls until all of the numbers 1...n have been generated? Answer: nH_n (where H_n = 1+1/2+1/3+...+1/n) The proof is as follows. Let X_1 denote the number of calls until we see our first number (so X_1 = 1). Let X_2 denote the number of calls from that point until we see our second new number. In general, let X_k denote the number of calls made from the time we've seen k-1 different numbers up until we see k different numbers. So, the total number of calls is X_1 + ... + X_n, and by linearity of expectation, what we are looking for is E[X_1] + ... + E[X_n]. How to evaluate E[X_k]? Each time we pick a random number, the probability it is something new is (n-(k-1))/n. So, another way to think about this question is as follows: Coin flipping: we have a coin that has probability p of coming up heads (in our case, p = (n-(k-1))/n). What is the expected number of flips until we get a heads? It turns out that the "intuitively obvious answer", 1/p, is correct. But why? Here is one way to see it: if the first flip is heads, then we are done; if not, then we are back where we started. So the expected number of flips E satisfies: E = p*1 + (1-p)*(1 + E). You can then solve for E = 1/p. Putting this all together, let CC(n) be the expected number of calls until we see all of 1...n. We then have: CC(n) = E[X_1] + ... + E[X_n] = n/n + n/(n-1) + n/(n-2) + ... + n/1 = n(1/n + 1/(n-1) + ... + 1/1) = n H_n. QED. BTW, in class we said that H_n is about ln n. If you want, in class you could go through a formal argument. Here are some of Danny Sleator's notes from last year: By drawing a diagram with the curves 1/x, 1/(x+1), and some boxes, it's easy to see that: integral 1/(x+1) dx <= (H_n)-1 <= integral 1/x dx 1 to n 1 to n The first integral evaluates to: ln(n+1)-ln(2) the second integral evaluates to: ln(n) Thus: ln(n+1)-ln(2)+1 <= H_n <= ln(n)+1 A tighter upper bound can be obtained by using the trapezoid rule. In this case the bound is: H_n <= 1 + integral 1/(x+.5) dx 1 to n Integrating this gives: H_n <= 1 + ln(n+1/2) - ln(3/2)