Notes for 02/12		15-852 Randomized Algorithms

Topics for next couple lectures:
* How to construct expanders
* Expanders and eigenvalues
* eigenvalues and rapid mixing
* uses of rapid mixing.

For today: mention an explicit construction, and in preparation for
this week's Theory Seminar, go into one application: using random walk
to derandomize algorithms, or at least reduce amount of randomness
needed in a randomized algorithm. 

Say have a BPP alg, that uses r random bits to get (two-sided) error 1/100.
Say want to reduce error to (1/100)^k.  How many random bits do we
need?   

 -> We could re-run k times, independently. Use r*k random bits.

 -> Thm: can use random walks on expanders to get by with r + O(k) bits.


Defn: An (N, d, c) expander is a bipartite graph on X union Y, |X| =
|Y| = N,  where each node in X has degree d, such that for any subset S of X, 
	|N(S)| >= (1 + c(1 - |S|/n)) * |S|

E.g., if |S| = n/2, then want |N(S)| >= |S|(1 + c/2)

(Note: Sometimes, only required to hold for |S| <= n/2 (e.g., [Alon][LPS]))

(Note: book uses |X|=|Y|=n/2, so there's a factor of 2 difference in defn.)


Gabber-Galil construction
-------------------------
|X| = |Y| = n = m^2.  Each vertex in X labeled by pair (a,b) for a,b
in Z_m.  Same for vertices in Y.  Degree 5.
Neighborhood of (x,y) is:
				(x+y+1,y)
				(x+y,y)
				(x,y)
				(x,x+y)
				(x,x+y+1)

Theorem: This is (2-sqrt(3))/4 = 0.067-expander

Eigenvalues
-----------
For purposes of rapid mixing, and today's related application, what we
want formally is a graph with a gap between first and 2nd largest eigenvalues.

Actually, several related matrices here.  One is the standard graph matrix:

	A(G)	A_ij = 1 if (i,j) in E.

	(largest eigenvalue is d, eigenvector is (1,....,1)

Also, can look at matrix for random walk.  E.g., say we flip a coin
and with prob 1/2 we stay put and with prob 1/2 we move to a random
neighbor (the first part is to make it aperiodic). Then have matrix:

	M(G) = I/2 + A/2d.
	
	(largest eigenvalue is 1)

[Alon]: graph is expander iff it has an eigenvalue gap.

Amplifying random bits [Impagliazzo-Zuckerman]
----------------------------------------------

Going to use following facts:

* Can construct expanders deterministically.

* Can do so in such a way that can perform random walk without
  actually explicitly creating the graph, and taking only O(log n) time
  per step.  E.g., so can do this in polynomial time on an
  exponential-size graph. 

  For instance, Gabber-Galil construction does this for us.  Just need
  to do an addition and a mod calculation per step.

* M(G) has lambda_1 = 1, and lambda_2 = 1-epsilon for some constant epslion>0.


The goal: we're given BPP alg A that uses r random bits to get error
at most 1/100.  Want to reduce error to 1/2^k with r+O(k) random bits.

Idea: think of writing down one node for every string of r bits. N=2^r nodes.
  Color blue if "good" in the sense that it causes A to output correct
  answer, and red if "bad" in that causes A to output wrong answer.  So,
  99% are good.

  Standard alg: sample these nodes independently at random.

  [IZ]: set up an expander graph among them.  First sample will be at
  random, but then will do random walk on the graph.  Each step takes
  constant # of random bits since graph has constant degree.  
  Will sample every constant # of steps in this walk. 

  Note: to get a fully random new point, can't avoid O(log N) new
  random bits - e.g., would need to walk O(log N) steps.  But, we
  don't need a fully random new point, just to be more likely at a
  blue node. 

  Intuitively, here's where expansion comes in. Red set has size
  N/100, so it expands.  So, if start at random red point, pretty soon
  you're in some much larger set.  

Let beta be such that for M(G), (lambda_2)^beta < 1/10.  Beta is
constant.  We're going to sample every beta steps in our walk.

Alg: begin at random node.  Do walk, sampling (i.e., running A) every
  beta steps.  Do for 7k*beta samples.  So, total number of random
  bits used is: 

	r + O(7k*beta) = r + O(k).

  Then, take majority vote over these 7k runs of A.

  Note: still running A for O(k) times, just not using new random strings.

Analysis:

Let X_i = 0 if the ith sampled point is blue and 1 if it is red.

What we're going to show is that for any string x in {0,1}^7k with t ones,
the probability that all X_i=x_i is at most (1/5)^t.

 So, this means the probability there are >= 7k/2 ones is at most
	2^{7k} * (1/5)^{7k/2} = (2^7/sqrt{5}^7)^k < 1/2^k,
 and we'll be done.

So, we just need to figure out the probability that X=x, where X={X_i}.

To do this,
* Let p_0 = (1/n,...,1/n) = starting prob vector.

* Let B = M^beta  (matrix for taking beta steps)

* Let W (witness matrix) be identity NxN matrix where have zeroed out
        entries for bad points. (W_ii = 1 iff i is blue. W_ij=0 for i!=j)

  E.g., prob(start at good point) = |p_0 W|.   (|.| = sum of entries).

* Let \bar{W} = I - W.

  E.g., prob(start at bad point) = |p_0 \bar{W}|

* Let S_i = W if x_i=0, and \bar{W} if x_i=1.

Claim: Pr[X=x] = |p_0 S_1 B S_2 B S_3 ... B S_7k|

-> for every sequence of steps satisfying X=x, we are counting the
   probability of that sequence, and we're zeroing out everything else.

Now, turns out, will be easier to analyze the L_2 length of RHS above
   (it's length as a vector).  What we'll show is that for any vector p,
   if you hit p with BW, then its length does not go up
(1)		||pBW|| <= ||p||
   and if you hit p with B\bar{W}, then its length goes down by at least
   a factor of 5.
(2)		||pB\bar{W}|| <= 1/5 ||p||

Assuming this, we have that the L_2 length of the RHS above is at most
	||p_0||* (1/5)^t = (1/sqrt{n})* (1/5)^t.

 (Actually, if S_1=\bar{W}, it's even better: we've cut down ||p|| by a
 factor of 10)

 For any vector v, |v| <= sqrt{n}*||v||, so this says Pr[X=x] <= (1/5)^t
 as desired.

So, all we need to do is prove (1) and (2).

(1) follows from fact that all eigenvalues have absolute value <= 1.
(2) uses fact that (lambda_2)^beta = 1/10.

e_1 = eigenvector of max eigenvalue = (1/n,...,1/n)
e_2, e_3, ... are the rest.  All orthogonal.

idea: p has components along e_1 and components along the rest.
component along e_1 gets chopped by \bar{W}. Rest get chopped by
multiplying by the lambdas.

p = x + y, where x = c_1*e_1, y = c_2*e_2 + ... + c_n*e_n

||x B \bar{W}|| = ||x \bar{W}|| <= 1/10 * ||x||

||y B \bar{W}|| <= ||yB|| 
	= ||c_2*lambda_2^beta*e_2 + ... + c_n*lambda_n^beta*e_n||
	<= 1/10||y||.

So,
||pB\bar{W}|| = ||xB\bar{W} + yB\bar{W}|| <= 1/10||x|| + 1/10||y|| <=1/5||p||.