Notes for 9/17/98	15-859(D) Randomized Algorithms

Yesterday: 
* importance of setting up the right quantities (random variables)
* (started on) k-wise independence.   
  -- Computing average while preserving secrecry
Today:
* k-wise independence.
* linearity of expectation
  -- Gambler's odds problem
  -- analyzing randomized quicksort

Handout: Hwk1
Readings: Today and next time doing Ch1 and parts of Ch2. Then next
thurs we'll be looking at probabilistic inequalities in Ch 3.2, 4.1,
and then using them in analysis of some algorithms.

Office hours: Mon 10-12 but I'm away next monday.
=============================================================================
One thing: if in class you find me making fuzzy plausible-sounding
but not completely clear probabilistic statements, feel free to call
me on them.  Especially important in this area to be precise and make
sure every step is legitimate.  E.g., in coloring alg, made statement
that expected running time of alg is <= expected time for random walk
to reach one of the endpoints, and it's worth at home sitting down and
really concincing yourself mathematically about that, and if you're
not convinced come talk with me.   Wanted to start today by giving
more precisely the argument for the protocol we talked about last time


K-wise independence
-------------------

From last time: 
* The problem: A group of N friends wants to determine their average
score on a test, but nobody wants to reveal their own score. Even if
several people collaborate, they shouldn't be able to find any more
information than the sum of the other ones (which they can figure out
anyway from the answer).  

The protocol: We'll compute the sum mod M.  M is some
sufficiently large number known to be greater than the sum.  E.g.,
M=101*N.  Say player i has secret S_i.  Player i chooses N-1 numbers
X_i1, X_i2, X_{i,i-1}, X_{i,i+1}, ..., X_{iN} independently at random
in the range 0,...,M-1 and writes them each on a piece of paper. Then
he chooses X_{i,i} in 0,...,M-1 so that the sum of all the X_ij is
equal to S_i mod M.  Player i then distributes X_{i,j} to player j. 
Then, player i takes the number he kept, plus all of the others he
received, and adds them up (mod M), and announces the result.
Finally, all players add up the announced results (mod M), which is
the desired sum. 

Analysis: Can see that the sum is right by writing out the matrix of
values.  But, how do we show that this is secure?

First some definitions: Two events A and B are INDEPENDENT if Pr[A and
B] = Pr[A] * Pr[B]. Easier way to think of it: Pr[A | B] = Pr[A].
Similarly, random variables X, Y are independent if for each x,y
we have events X=x and Y=y independent.  A collection of random
variables X_1, ..., X_N are k-WISE INDEPENDENT if for any subset X_i1,
X_i2, ..., X_ik, Pr[X_i1 = x_i1 | X_i2 = x_i2,...,X_ik = x_ik] =
Pr[X_i1 = x_i1].  I.e., knowing the values of k-1 of them gives no
information above the values of a kth.

In our case, these slips of paper of player i are N random variables
that are each uniformly distributed in the set {0, 1, ..., M-1} and
furthermore are (N-1)-wise independent.  Why:  given the values of N_2
of them, (and given the sum S_i), let's say the two remaining are X
and Y.  One of them (say X) was actually picked at random so it's value is
still uniform and unknown.  The other (Y) might depend on all the
previous, but each value of X corresponds to a different value of
Y. So, all the possible values of Y are still equally likely.

Notice this means there's nothing special about X_ii.  If we changed
the protocol so that a different X_ij was the last one, for a given
sum S_i we'd have the *exact same* probability distribution on
N-tuples of values.

Claim: protocol is secure in following sense: for any set of t honest
players, only information that the rest of players have, even if they
cheat, is the sum of the t players' values. Formally what does this
mean? What we mean is that if you modify the values for these players
keeping the sum the same, then the probability distribution on the
sequence of outputs produced by those players doesn't change.  Proof:
consider the t all sent their X_ii value to the same honest player.
(Doesn't change protocol).  For any fixed sequence of coin tosses, all
sets of t values with the same sum lead to exactly the same
information revealed to the other players.  So, all sets with the same
sum have the same distribution.


GAMBLER'S ODDS PROBLEM
----------------------

Say you go into casino.  Say has fair games: at any time t can bet
some amount and for even money you have 50/50 chance of winning.
Say you have some strategy - determines how much to bet and when and
when to leave, but must leave by end of the day.  What's best strategy
for highest expected amount by the end of the day?

Answer: it doesn't matter.  All strategies result in same expected amount.

* Defn of expectation:  E[X] = SUM_a[a * Pr(X=a)].

* Let's define the random variable X_i to be the gain at time i.  If X
is the random variable denoting the amount we 
win by the end of the day, then X = X_1 + X_2 + ... + X_N, where
N=#seconds in a day.  Note that these X_i's could be horribly
dependent (e.g., depending on whether you won or lost in your last
bet, you might bet more or less).  Nonetheless, if we just look at one
of the X_i, it's easy to see that E[X_i] = 0.  For instance: let p_ij
be the apriori probability that you will bet j dollars at time at time
i (which might be horrible to calculate, but that's ok, we don't need
to).  So, E[X_i] = SUM_j[j * p_ij / 2 + (-j) * p_ij / 2] = 0.

Now, use linearity of expectation: E[X] = E[X_1 + ... + X_N] = E[X_1]
+ ... + E[X_N] = 0.

(Lesson: simplify by setting up simple random variables and using
linearity of expectation).

Similar problem: Say there is a stock that's at some price x.  Each
day, it goes up or down by $1, indep of past, unless it's at 0 in
which case it stays there. You have some amount of money m.  Each day
you can by or sell as much as you want, until when you die your money
is converted back into cash.  Is there a strategy such that the
expected amout of money you will have at death is more than what you
started with? No.


LINEARITY OF EXPECTATION
------------------------

Want to show that for 2 random variables X & Y,
	E[X+Y] = E[X] + E[Y]
even if X and Y are dependent.

Let's assume X and Y take on discrete values so that can use
summations instead of integrals.

	E[X+Y] = SUM_a [ a * Pr(X+Y = a) ]

	       = SUM_(b,c) [ (b+c) * Pr(X = b and Y = c) ]

		(just splitting the event that the sum is "a" into all the
		(disjoint) ways that this could occur)

               = SUM_b SUM_c [ b * Pr(X=b and Y=c)] + 
	         SUM_c SUM_b [ c * Pr(X=b and Y=c)]	

	       = SUM_b [b * Pr(X=b)]   + SUM_c [c * Pr(Y=c)]


RANDOMIZED QUICKSORT
--------------------
--> describe algorithm.   (assume no equal values for simplicity)

--> what is the expected number of comparisons?


Define random variable let e_1 be the smallest element, e_n be the
largest.  Define random variable X_ij = 1 if e_i is compared with e_j
in running the algorithm.
So, expected number of comparisons is just SUM_{i<j} E[X_ij], by using
linearity of expectation.  E[X_ij] = p_ij, the probability that e_i is
compared to e_j.

p_ij is easy to calculate.  Consider the elements e_i, e_{i+1}, ...,
e_j.  We compare e_i and e_j if and only if we we pick one of them asa
pivot before choosing any of the other elements in this list as a
pivot.  So, p_ij = 2/(j-i+1).

	So, expected cost is at most
	SUM_{i=1 to n} SUM_{denominator=1 to n-i+1} 2/denominator

	< 2 * n * SUM_{k=1 to n} 1/k

	= 2 * n * H_n, where H_n is approx ln(n).


A strange puzzle
----------------
Consider the following game.  We have a probability distribution on
the following countably infinite deck of cards. 

	Card #		front		back		probability
	------		-----		----		-----------
	1		1		3		1/2
	2		3		9		1/4
	3		9		27		1/8
	...		...		...		...

A card is selected according to this distribution, and you are shown,
at random, one of its two sides.  (Let's say that the number you see
is X, and the one on the other side is Y).  You may now decide to "keep" or
"flip".   If you decide "keep", you gain X-Y.  If you decide "flip",
then you gain Y-X.

What to do?  Well, if you see a 1, you definitely should flip.  But
what if you see a 3?  In this case, with probability 2/3, Y is 1 and
with probability 1/3, Y is 9.  (More generally, if you see X>1, then
with prob 2/3, Y is X/3 and with prob 1/3, Y is 3X).  So the expected
gain of "flip" is 2/3 * (-2) + 1/3 * 6 = 2/3 > 0.  So you should flip.
But, we are saying that no matter what you see, you should flip.  So,
might as well flip before observing anything.  But, this is just like
not flipping since you were shown a random one of the two sides....
So, what gives.....?

-----------------------------------------------------------------------