Notes for 10/22		15-859(D) Randomized Algorithms

* expanders and eigenvalues
* intro to approximate counting via rapid mixing

==============================================================================
Today: prove theorems showing that exapander graphs have an eigenvalue
gap, and vice versa that graphs with an eigenvalue gap are expanders.
Handing out paper and note of Alon on this subject.

Easier direction is eigenvalue gap ==> expansion.  In fact, some of the
constructions explicitly create graphs with an eigenvalue gap.  So,
could have defined expanders as "graphs with an eigenvalue gap" and
then said "here's a neat expansion property that they have".  [For
approx counting, though, we'll be setting up a neighborhood structure
and directly arguing it has expansion, and then want to conclude it's
rapidly mixing which needs expansion ==> eigenvalue gap]

THEOREM: if G is d-regular and lambda = 2nd-largest eigenvalue of A(G),
then for all W, the number of edges between W and V-W satisfies
		E(W,V-W) >= |W|*|V-W|*(d-lambda)/n
which also implies that if |W| <= n/2, then
		|N(W) - W| >= |W|*(d-lambda)/2d

[the last implication follows from |V-W| >= n/2, and degree = d]
[By the way, this kind of edge expansion is intutitively what causes
walk to be rapidly mixing.]

Proof: Define vector f such that f_v are equal (and positive) over v
in W, f_v are equal (and negative) over v in V-W, and the sum of
entries in f is 0.  E.g., f_v = |V-W| for v in W, f_v = -|W| for v in
V-W.

If we write f in terms of eigenvectors, it has zero component along
largest since sum of entries is 0.

[Remember, all eigenvectors that aren't largest have sum of entries
equal to 0.  Easy to see since largest is all 1's and they're
orthogonal.  Another way to see, pointed out by Shyjan, that hold for
any Markov chain, is if v is eigenvector of evalue lambda and z is all
1's column vector then vMz = lambda*vz = lambda*(sum of entries in
v). But, left-hand-side is just vz since rows of M add up to 1.  So,
either lambda=1 or sum of entries = 0]

Also, for later: sum_v (f_v)^2 = |W|*|V-W|^2 + |V-W|*|W|^2 = n*|W|*|V-W|

IDEA: in one step of random walk, f has to shrink since there's an
eigenvalue gap, but the only way for f to shrink is to mix between W
and V-W, which means there have to be a lot of edges.  Formally:

    f = c_2e_2 + ... + c_ne_n
   Af = c_2*lambda_2*e_2 + ... + c_n*lambda_n*e_n.

Now, take dot product with f. 

 Af.f = (c_2)^2*lambda_2 + ... + (c_n)^2*lambda_n <= lambda_2*(f.f)
  
RHS is lambda_2*sum_v (f_v)^2

LHS is sum_v [ f_v * [sum_{u in N(v)} f_u] ].   If f_v was constant,
	this would just be d*sum_v[(f_v)^2].  But f is not exactly
	constant, so instead we have (using E = E(W, V-W)):

    d*sum_v[(f_v)^2] - sum_{edges uv across cut} [|W|n + |V-W|n]
    = d*sum_v[(f_v)^2] - E*n^2.
	
and voila.
---------------------------------------------------------------------------
Intuition for both arguments: think of eigenvector as defining a cut,
with positives on one side and negatives on other. Eigenvector of 2nd
largest eigenvalue is roughly the cut with the least edge expansion
across it.
---------------------------------------------------------------------------

For the other direction, I claim that you can reduce the proof to the
following probabilistic puzzle. 

Consider a probability distribution P on a regular degree-d graph G.
Say we pick two vertices independently according to P.
Let A = Prob(the two vertices are neighbors)
Let B = Prob(the two vertices are the same)

Claim 1: for any P, A/B <= d.

E.g., if P is uniform, then A=d/n and B=1/n so A/B=d.
	Argue that any other P just decreases the ratio?
	Follows from (P_u)^2 + (P_v)^2 >= 2*P_u*P_v.
	Sum over all edges.  LHS gives d*B, RHS gives A.
  Another way to see (thanks, Hal) is that B = PP^t and A = A(G)PP^T
  and so A/B <= d  follows by the fact that all eigenvalues are <= d.

Claim 2: suppose we require P_v = 0 on at least half of the vertices
of G.  Then, if G is an expander, A/B <= d - c for some constant c>0.

 This is the tricky part.  First, some intuition: say P was flat on the
 non-zero part W.  In this case, B = 1/|W|.  A = d/|W| - (# edges leaving W)
 So, A/B = d - (# edges leaving W)/|W| = d-c since graph is an
 expander.  In general you have to do a summation, which is on the
 handout by Alon.  In his handout, x is the probability vector, and if
 you look at last displayed equation, the left hand side is dB-A, and
 the right-hand side is (c^2/2d)*B, so this gives us A/B <= d - c^2/2d.

Now, let's see why Claim 2 implies what we want: namely, lambda_2 <= d - c.

Let f = eigenvector for 2nd largest eigenvalue lambda.  WLOG, f is
positive on at most half of entries (else look at -f). Define g so
that g_v = f_v if f_v > 0 and g_v = 0 otherwise.  Let's scale f so
that the sum of entries in g is 1.  g is going to be the probability dist P.

We know A(G)f = lambda f.  Let's take dot product of both sides with g.

Get: lambda = [A(G)f.g]/[f.g]

Denominator = B

Numerator = Sum_v( g_v * (sum_{u in N(v)} f_u) ) <= A
	(replacing negative entries with zero only increases this)

So, all one has to do is prove Claim 2.  -> see handout by Alon.

=============================================================================

Approximate counting
--------------------
There are a lot of interesting problems that fall into the following
framework.  We have some universe U of points, some subset S of which
are special.  We want to know |S| (or in a continuous domain, the
volume of S), or equivalently, the probability that a randomly drawn
point from U belongs to S.

E.g., S = some polygonal object in the plane, U = a bounding rectangle.
      What is the volume of S?

E.g., U = strings of length n, S = those that satisfy a given DNF formula F.
      How many solutions are there to F?

One easy thing to do: pick a bunch of samples in U, and observe how
many fall into S (assume we can verify whether a point is in S in poly time),
and then take the ratio.

But, what if |S|/|U| is small?  In this case, this does not
approximate |S| very well.  Can we do better?

In fact, many of these counting problems are #P-hard.  E.g., reporting
the number of solutions to a given DNF formula is #P-hard.  Computing
the permanent of a matrix (which for A(G) corresponds to asking how many
perfect matchings there are in G) is #P-hard.

Goal: fpras: for any epsilon>0, produce an answer A such that 
	A is in [(1-epsilon)|S|, (1+epsilon)|S|] in time poly in 
	n and 1/epsilon.  (n = description length of points in U).


Idea: suppose we can create a sequence of polynomially-many sets:
	S=S_0 contained in S_1 contained in ... S_m ... contained in U
  where |S_i|/|S_{i-1}| is at most poly(n).  Then, we can reduce the
  counting problem to the random sampling problem:

  --> pick a random point from S_1, and see if it's in S.  Since
	|S|/|S_1| is at least 1/poly, we can get a good relative-error
	estimate of this ratio.  Say |S| = p_1*|S_1|.  We find q_1 with
	(1-epsilon/m)p_1 < q_1 < (1+epsilon/m)p_1.   Then do for |S_1|/|S_2|, 
	etc.

	Finally, output q_1*q_2*...*q_m*|U| which is at least
	(1-epsilon)*|S| and at most (1+2epsilon)*|S|.

So, if we can set up this sequence of sets, then the problem boils
down to: how do we pick a random element of S_i?  We'll do this by
setting up some appropriate neighborhood structure and doing a random
walk, and showing that this walk is rapidly mixing.