Notes for 10/20/98

Last time: Relation between eigenvalue gap and rapid mixing.  L_2
distance between p^(t) and stationary distribution pi is at most
|lambda_2|^t. 

Also mentioned relation between expansion and eigenvalue gap.  

Specifically, for an n-node, d-regular graph G, let lambda_2 be the
second-largest eigenvalue of the adjacency matrix A(G).
Then (we'll look at these next time):

Theorem: For all |S| <= n/2,
                |N(S) - S| >= |S|*(d-lambda_2)/2d

Theorem: If for all |S| <= n/2, |N(S) - S| >= c|S| for some c>0, then
		lambda_2 <= d - c^2/(4 + 2c^2)

Today: building expanders and some of their uses.

RANDOM LOW-DEGREE GRAPHS ARE EXPANDERS
--------------------------------------
[Here is a fixed proof for the degree 3 case]

Claim: Say we create a (n,n) bipartite graph via: for each node on the
left, we pick a random subset of d=3 nodes on the right to be
neighbors.  Then, there exists a constant c such that w.h.p, for all
sets S on the left with |S| <= n/c, we have |N(S)| >= 1.9*|S|.
[Equivalently, can view this as creating an n-node directed graph by
giving each node 3 random out-edges.]

Proof: For a given value of s, what is the probability that
there *exists* a set S of s vertices on the left with <= r neighbors
on the right?  

Let's upper bound this by fixing a set S of s nodes on the left and
fixing a set R of r nodes on the right and calculating the prob that
all nbrs of S lie inside R, and then multiply this by (n choose s)*(n choose r)

	Pr(exists such an S) 
		<= (n choose s)(n choose r)(r choose d)^s / (n choose d)^s

			[now, using (a/b)^b <= (a choose b) <= (ae/b)^b]

		<= (ne/s)^s (ne/r)^r (re/d)^{ds} / (n/d)^{ds}

		<= e^{s + r + ds} * r^{ds - r} / ( n^{ds - s - r} s^s )

			[now, plugging in d=3, r=1.9s]

		 = e^{6s} r^{1.1s} / ( n^{0.1s} s^s )

                 = (1.9 e^6 r^{0.1} / n^{0.1})^s

We now need to sum this over all values of s.  For s=1 this looks like
const/n^{0.1}, for s=2 this looks like const/n^{0.2}.  The quantity
keeps dropping until s starts getting large.  We're OK so long as the
numerator is sufficiently smaller than the denominator (say less than
half of the denominator).  This will be fine for sufficiently large c.


Applications #1
---------------

Lots of applications for things like routing without congestion,
fault-tolerant networks, etc.  Here is something Prabhakar and I
thought of after business mtg at STOC conference one year:

STOC nowadays has parallel sessions.  t talks in t/2 time slots.
Problem: forces attendees to take sides.  Also, if you want to see a
random subset of k talks, expect to get a conflict at about k =
sqrt(t).    But, people don't want to switch back to single-sessions
because then there are fewer talks and it becomes too exclusive.  

How to solve?  How to have conference last for only t/2 time slots,
but still allow attendees to see all their desired talks (assuming
each attendee wants only to see constant fraction c*t of the talks).

Answer: Have each person give their talk twice.  Actually, let's have
each talk given 3 times.  Can view as bipartite graph: talks on the
left, time slots on the right, t nodes on left, t/2 on right.  Given a
set of desired talks S, we can figure out how to see those talks by
finding a maximum matching over to the right.  If we do the talk
assignments randomly, then the same calculation as above shows that
whp, all sets S of < ct nodes on the left have |N(S)| >= |S|.  [Can
even do this if you keep in-degree = 6 on the right so have 6
sessions.] By  Hall's theorem, this means that for every |S| < c*t on
the left, there exists a perfect matching for it.   

[In paper being handed out, we looked at case of each talk given
twice. Not an expander, but "most sets" expand --> for any collection of
polynomial # of attendees, with high probability they are all happy]


Application #2:  amplifying success probability of randomized algorithm
without using a lot more random bits.  For this application, we'll
need the fact that there are deterministic constructions of
constant-degree expanders with the property that even if 2^n nodes,
can perform walk in poly time per step.  (Given the name of a node,
can compute quickly who the neighbors are).


Gabber-Galil construction
-------------------------
|X| = |Y| = n = m^2.  Each vertex in X labeled by pair (a,b) for a,b
in Z_m.  Same for vertices in Y.  Degree 5.
Neighborhood of (x,y) is:
				(x+y+1,y)
				(x+y,y)
				(x,y)
				(x,x+y)
				(x,x+y+1)


Specifically, here is result of [Impagliazzo & Zuckerman]: 
------------------------------------------------------------
* Have BPP alg using r random bits, with failure prob 1/100.  Claim: 
can decrease failure to (1/100)^k by using only r + O(k) random bits.
(in contrast to r*k bits if ran k times independently)

* Idea: set up implicit expander graph with one node for each string
of length r.  Start at random initial position and then do a random
walk.  Only need constant random bits per step.  Sample every $\beta$
steps (ie., run the BPP alg using the name of the current node as its
random input) where $\beta$ is defined to make 2nd largest eigenvalue
of $R = M^\beta$ at most 1/10.  Sample 42k times and take majority
vote.  If we color nodes ``good'' or ``bad'' depending on whether they
cause the BPP algorithm to answer correctly or not, then what we want
is for it to be very unlikely that more than half of samples are
``bad'' nodes.

* We'd like to say that no matter where you start, after running one
step of R, there's at most 1/5 chance of being at a bad node.  Can't
quite get this.  But, get something similar by looking at $L_2$
length.  In particular, for any vector p, 

	sqrt(sum of squares of bad entries of pR) <= 1/5 *(L_2 length of p).

Proof: say eigenvectors are: $e_1, e_2, ...$ where $e_1 = (1/n,...,1/n)$.
All orthogonal.

  Let $p = x + y$, where $x = c_1e_1$, $y = c_2e_2 + \ldots + c_ne_n$

  For convenience, define Z as matrix that zeroes out all the good entries.  
  I.e., Z is identity but where have zeroed out entries for good nodes.

  So, our goal is to show that ||pRZ|| <= 1/5 * ||p||.

  Look at x: ||xRZ|| = ||xZ|| <= 1/10 * ||x||.

  Look at y: ||yRZ|| <= ||yR|| 
              = ||c_2\lambda_2^\beta e_2 + \ldots + c_n\lambda_n^\beta e_n||
             <= 1/10 * ||y||.

  So,
	||pRZ|| <= ||xRZ|| + ||yRZ||   (triangle inequality)
                <= 1/10||x|| + 1/10||y||
                <= 1/5 * ||p||

(Note: this also shows that ||pRRZ|| <= 1/5 * ||p||, etc.)

Intuitively, if p was "spread out" already, then so is pR, and
multiplying by Z is zeroing out a lot of weight of entries.  On the
other hand, if p is highly concentrated, then multiplying by R by
itself is decreasing the L_2 length by spreading out the distribution.

* Now, to finish the proof: We want to say it's unlikely more than
half the samples are bad. 

Let's consider a fixed set of t samples, and ask: what's the
probability that these are all bad?  Claim: if q is our starting
distribution, then what we want is the L_1 length of
		q R R R Z R Z R R Z R Z
where the t "Z"s are indexing the t samples we took (there's at least
one "R" between any two "Z"s).

We'll use the fact that L_1 length <= sqrt(n) * L_2 length.
And, L_2 length of (q R R R Z R Z R R Z R Z) <= (1/5)^t * L_2 length of q.
And, L_2 length of q is 1/sqrt(n).
So, the probability these are all bad is at most (1/5)^t.

For half of all 42k samples to be bad we need some set of t > 21k to
be bad.  At most 2^(42k) such sets.  Prob of failure at most
	2^(42k) * (1/5)^(21k) = (4/5)^(21k) <= (1/100)^k.   QED