Notes for 9/17/98 15-859(D) Randomized Algorithms Yesterday: * importance of setting up the right quantities (random variables) * (started on) k-wise independence. -- Computing average while preserving secrecry Today: * k-wise independence. * linearity of expectation -- Gambler's odds problem -- analyzing randomized quicksort Handout: Hwk1 Readings: Today and next time doing Ch1 and parts of Ch2. Then next thurs we'll be looking at probabilistic inequalities in Ch 3.2, 4.1, and then using them in analysis of some algorithms. Office hours: Mon 10-12 but I'm away next monday. ============================================================================= One thing: if in class you find me making fuzzy plausible-sounding but not completely clear probabilistic statements, feel free to call me on them. Especially important in this area to be precise and make sure every step is legitimate. E.g., in coloring alg, made statement that expected running time of alg is <= expected time for random walk to reach one of the endpoints, and it's worth at home sitting down and really concincing yourself mathematically about that, and if you're not convinced come talk with me. Wanted to start today by giving more precisely the argument for the protocol we talked about last time K-wise independence ------------------- From last time: * The problem: A group of N friends wants to determine their average score on a test, but nobody wants to reveal their own score. Even if several people collaborate, they shouldn't be able to find any more information than the sum of the other ones (which they can figure out anyway from the answer). The protocol: We'll compute the sum mod M. M is some sufficiently large number known to be greater than the sum. E.g., M=101*N. Say player i has secret S_i. Player i chooses N-1 numbers X_i1, X_i2, X_{i,i-1}, X_{i,i+1}, ..., X_{iN} independently at random in the range 0,...,M-1 and writes them each on a piece of paper. Then he chooses X_{i,i} in 0,...,M-1 so that the sum of all the X_ij is equal to S_i mod M. Player i then distributes X_{i,j} to player j. Then, player i takes the number he kept, plus all of the others he received, and adds them up (mod M), and announces the result. Finally, all players add up the announced results (mod M), which is the desired sum. Analysis: Can see that the sum is right by writing out the matrix of values. But, how do we show that this is secure? First some definitions: Two events A and B are INDEPENDENT if Pr[A and B] = Pr[A] * Pr[B]. Easier way to think of it: Pr[A | B] = Pr[A]. Similarly, random variables X, Y are independent if for each x,y we have events X=x and Y=y independent. A collection of random variables X_1, ..., X_N are k-WISE INDEPENDENT if for any subset X_i1, X_i2, ..., X_ik, Pr[X_i1 = x_i1 | X_i2 = x_i2,...,X_ik = x_ik] = Pr[X_i1 = x_i1]. I.e., knowing the values of k-1 of them gives no information above the values of a kth. In our case, these slips of paper of player i are N random variables that are each uniformly distributed in the set {0, 1, ..., M-1} and furthermore are (N-1)-wise independent. Why: given the values of N_2 of them, (and given the sum S_i), let's say the two remaining are X and Y. One of them (say X) was actually picked at random so it's value is still uniform and unknown. The other (Y) might depend on all the previous, but each value of X corresponds to a different value of Y. So, all the possible values of Y are still equally likely. Notice this means there's nothing special about X_ii. If we changed the protocol so that a different X_ij was the last one, for a given sum S_i we'd have the *exact same* probability distribution on N-tuples of values. Claim: protocol is secure in following sense: for any set of t honest players, only information that the rest of players have, even if they cheat, is the sum of the t players' values. Formally what does this mean? What we mean is that if you modify the values for these players keeping the sum the same, then the probability distribution on the sequence of outputs produced by those players doesn't change. Proof: consider the t all sent their X_ii value to the same honest player. (Doesn't change protocol). For any fixed sequence of coin tosses, all sets of t values with the same sum lead to exactly the same information revealed to the other players. So, all sets with the same sum have the same distribution. GAMBLER'S ODDS PROBLEM ---------------------- Say you go into casino. Say has fair games: at any time t can bet some amount and for even money you have 50/50 chance of winning. Say you have some strategy - determines how much to bet and when and when to leave, but must leave by end of the day. What's best strategy for highest expected amount by the end of the day? Answer: it doesn't matter. All strategies result in same expected amount. * Defn of expectation: E[X] = SUM_a[a * Pr(X=a)]. * Let's define the random variable X_i to be the gain at time i. If X is the random variable denoting the amount we win by the end of the day, then X = X_1 + X_2 + ... + X_N, where N=#seconds in a day. Note that these X_i's could be horribly dependent (e.g., depending on whether you won or lost in your last bet, you might bet more or less). Nonetheless, if we just look at one of the X_i, it's easy to see that E[X_i] = 0. For instance: let p_ij be the apriori probability that you will bet j dollars at time at time i (which might be horrible to calculate, but that's ok, we don't need to). So, E[X_i] = SUM_j[j * p_ij / 2 + (-j) * p_ij / 2] = 0. Now, use linearity of expectation: E[X] = E[X_1 + ... + X_N] = E[X_1] + ... + E[X_N] = 0. (Lesson: simplify by setting up simple random variables and using linearity of expectation). Similar problem: Say there is a stock that's at some price x. Each day, it goes up or down by \$1, indep of past, unless it's at 0 in which case it stays there. You have some amount of money m. Each day you can by or sell as much as you want, until when you die your money is converted back into cash. Is there a strategy such that the expected amout of money you will have at death is more than what you started with? No. LINEARITY OF EXPECTATION ------------------------ Want to show that for 2 random variables X & Y, E[X+Y] = E[X] + E[Y] even if X and Y are dependent. Let's assume X and Y take on discrete values so that can use summations instead of integrals. E[X+Y] = SUM_a [ a * Pr(X+Y = a) ] = SUM_(b,c) [ (b+c) * Pr(X = b and Y = c) ] (just splitting the event that the sum is "a" into all the (disjoint) ways that this could occur) = SUM_b SUM_c [ b * Pr(X=b and Y=c)] + SUM_c SUM_b [ c * Pr(X=b and Y=c)] = SUM_b [b * Pr(X=b)] + SUM_c [c * Pr(Y=c)] RANDOMIZED QUICKSORT -------------------- --> describe algorithm. (assume no equal values for simplicity) --> what is the expected number of comparisons? Define random variable let e_1 be the smallest element, e_n be the largest. Define random variable X_ij = 1 if e_i is compared with e_j in running the algorithm. So, expected number of comparisons is just SUM_{i1, then with prob 2/3, Y is X/3 and with prob 1/3, Y is 3X). So the expected gain of "flip" is 2/3 * (-2) + 1/3 * 6 = 2/3 > 0. So you should flip. But, we are saying that no matter what you see, you should flip. So, might as well flip before observing anything. But, this is just like not flipping since you were shown a random one of the two sides.... So, what gives.....? -----------------------------------------------------------------------