04/04/11 15-859(M) Randomized Algorithms * amplification via expanders * expanders and eigenvalues ======================================================================== Today we are going to talk more about expander graphs, and in particular constant-degree expanders. Recall these have the property that every set W of size <= n/2 has |N(W)-W| >= epsilon*|W|, for some constant epsilon>0. First, is a 2-grid an expander? No. Perimeter of W might be only sqrt(area). How about 3-d grid? No. Surface area might be only (volume)^{2/3}. For an expander you need surface area proportional to volume. So expanders are intrinsically high-dimensional. Anupam pointed out nice Gabber-Galil intuition [give intuition] We're going to show today that expansion implies having an eigenvalue gap (and therefore rapid mixing as we showed last time) and vice versa. Also [before or after?] I wanted to finish and argument from last time on how expanders can be used to reduce the number of random bits needed for amplifying the success probability of a randomized algorithm Expanders for amplification [Impagliazzo & Zuckerman] ----------------------------------------------------- * Have BPP alg using r random bits, with failure prob 1/100. Claim: can decrease failure to (1/100)^k by using only r + O(k) random bits. (in contrast to r*k bits if ran k times independently) * Idea: set up implicit expander graph with one node for each string of length r, and imagine we color nodes good'' or bad'' depending on whether they cause the BPP algorithm to answer correctly or not (so 99% of the nodes are good). Start at random initial position and then do a random walk. Only need constant random bits per step. Sample every \beta steps (ie., run the BPP alg using the current node as its random input) where \beta is defined to make 2nd largest eigenvalue of R = M^\beta at most 1/10. Sample 42k times and take majority vote. What we want is for it to be very unlikely that more than half of samples are bad nodes. * We'd like to say that no matter where you start, after running one step of R, there's at most 1/5 chance of being at a bad node. Can't quite get this. But, get something similar by looking at L_2 length. In particular, for any vector p, sqrt(sum of squares of bad entries of pR) <= 1/5 *(L_2 length of p). Proof: say eigenvectors are e_1, e_2, ... where e_1 = (1/n,...,1/n). All orthogonal. Let p = x + y, where x = e_1, y = c_2e_2 + ... + c_ne_n For convenience, define Z as matrix that zeroes out all the good entries. I.e., Z is identity but where have zeroed out entries for good nodes. So, our goal is to show that ||pRZ|| <= 1/5 * ||p||. Look at x: ||xRZ|| = ||xZ|| <= 1/10 * ||x||. [because 10 = sqrt(100)] Look at y: ||yRZ|| <= ||yR|| = ||c_2 lambda_2^beta e_2 + ...+ c_n lambda_n^beta e_n|| <= 1/10 * ||y||. [since each component shrunk by 1/10] So, ||pRZ|| <= ||xRZ|| + ||yRZ|| (triangle inequality) <= 1/10||x|| + 1/10||y|| <= 1/5 * ||p|| (Note: this also shows that ||pRRZ|| <= 1/5 * ||p||, etc.) Intuitively, if p was "spread out" already, then so is pR, and multiplying by Z is zeroing out a lot of weight of entries. On the other hand, if p is highly concentrated, then multiplying by R by itself is decreasing the L_2 length by spreading out the distribution. * Now, to finish the proof: We want to say it's unlikely more than half the samples are bad. Let's consider a fixed set of t samples, and ask: what's the probability that these are all bad? Claim: if q is our starting distribution, then what we want is the L_1 length of q R R R Z R Z R R Z R Z where the t "Z"s are indexing the t samples we took (there's at least one "R" between any two "Z"s). We'll use the fact that L_1 length <= sqrt(n) * L_2 length. And, L_2 length of (q R R R Z R Z R R Z R Z) <= (1/5)^t * L_2 length of q. And, L_2 length of q is 1/sqrt(n) [since we started at a *random* initial position -- this is where we use that fact] So, the probability these are all bad is at most (1/5)^t. For half of all 42k samples to be bad we need some set of t > 21k to be bad. At most 2^(42k) such sets. Prob of failure at most 2^(42k) * (1/5)^(21k) = (4/5)^(21k) <= (1/100)^k. QED ============================================================================== Now: prove theorems showing that expander graphs have an eigenvalue gap, and vice versa that graphs with an eigenvalue gap are expanders. First: eigenvalue gap ==> expansion. THEOREM 1: if G is d-regular and lambda = 2nd-largest eigenvalue of M(G), then for all W \subset V, the number of edges between W and V-W satisfies E(W,V-W) >= d*|W|*|V-W|*(1-lambda)/n which also implies that if |W| <= n/2, then |N(W) - W| >= |W|*(1-lambda)/2 [the last implication follows from |V-W| >= n/2, and degree = d] [Note: here M(G) = A(G)/d = Markov chain of random walk] [By the way, this kind of edge expansion is intutitively what causes walk to be rapidly mixing.] Before giving the proof, first some preliminaries. Since the degree=d, the stationary distribution is uniform, so the eigenvector of largest eigenvalue is all 1's. This means that all the other eigenvectors have sum of entries equal to 0 (since they are orthogonal). [[Actually, another way to see this, that holds for any Markov chain, is if v is eigenvector of eigenvalue lambda and z is all 1's column vector then vMz = lambda*vz = lambda*(sum of entries in v). But, left-hand-side is just vz since rows of M add up to 1. So, either lambda=1 or sum of entries = 0]] Proof of THM 1: Define vector f such that f_v are equal (and positive) over v in W, f_v are equal (and negative) over v in V-W, and the sum of entries in f is 0. In particular, let f_v = |V-W| for v in W, f_v = -|W| for v in V-W. If we write f in terms of eigenvectors, it has zero component along largest since its sum of entries is 0. Also, for later: sum_v (f_v)^2 = |W|*|V-W|^2 + |V-W|*|W|^2 = n*|W|*|V-W| IDEA: in one step of random walk, f has to shrink since there's an eigenvalue gap, but the only way for f to shrink is to mix between W and V-W, which means there have to be a lot of edges. Formally: f = c_2e_2 + ... + c_ne_n Mf = c_2*lambda_2*e_2 + ... + c_n*lambda_n*e_n. Now, take dot product with f. Mf.f = (c_2)^2*lambda_2 + ... + (c_n)^2*lambda_n <= lambda_2*(f.f) RHS is lambda_2*sum_v (f_v)^2 LHS is sum_v [ f_v * [avg_{u in N(v)} f_u] ]. If f_v was constant, this would just be sum_v[(f_v)^2]. But f is not exactly constant, so instead we have sum_v[(f_v)^2] - (1/d)sum_{edges uv across cut} [f_v^2 + f_u^2 - 2f_u*f_v] = sum_v[(f_v)^2] - (1/d)sum_{edges uv across cut} [n^2] = sum_v[(f_v)^2] - (1/d)E(W,V-W)*n^2. and voila: E(W,V-w)>=d(1-lambda_2)*[sum_v (f_v)^2]/n^2 = d|W|*|V-W|(1-lambda)/n --------------------------------------------------------------------------- Intuition for both arguments: think of eigenvector as defining a cut, with positives on one side and negatives on other. Eigenvector of 2nd largest eigenvalue is roughly the cut with the least edge expansion across it. --------------------------------------------------------------------------- Now we will go for the other direction. For this direction it will be easier to work with A(G) rather than M(G). So largest eigenvalue is d and 2nd-largest lambda_2 is \leq d. Theorem: If for all |W| <= n/2, |N(W) - W| >= c|W| for some c>0, then lambda_2 <= d - c^2/(4 + 2c^2) For this direction, I claim that you can reduce the proof to the following probabilistic puzzle. Consider a probability distribution P on a regular degree-d graph G. Say we pick two vertices independently according to P. Let A = Prob(the two vertices are neighbors) Let B = Prob(the two vertices are the same) Claim 1: for any P, A/B <= d. E.g., if P is uniform, then A=d/n and B=1/n so A/B=d. Argue that any other P just decreases the ratio? Follows from (P_u)^2 + (P_v)^2 >= 2*P_u*P_v. Sum over all edges. LHS gives d*B, RHS gives A. Another way to see is that B = PP^t and A = A(G)PP^T and so A/B <= d follows by the fact that all eigenvalues are <= d. Claim 2: suppose we require P_v = 0 on at least half of the vertices of G. Then, if G is an expander, A/B <= d - c' for some constant c'>0. This is the tricky part. First, some intuition: say P was flat on the non-zero part W. In this case, B = 1/|W|. A = (avg internal degree)/|W| = (d - (# edges leaving W)/|W|) / |W|. So, A/B = d - (# edges leaving W)/|W| >= d-c since graph is an expander. In general you have to do a summation, which is on the handout by Alon. In his handout, x is the probability vector, and if you look at last displayed equation, the left hand side is dB-A [you get d*sum_i x_i^2 - sum_i sum_{j~i} x_ix_i], and the right-hand side is (c^2/2d)*B, so this gives us A/B <= d - c^2/2d. Now, let's see why Claim 2 implies what we want: namely, lambda_2 <= d - c'. Let f = eigenvector for lambda_2. WLOG, f is positive on at most half of entries (else look at -f). Define g so that g_v = f_v if f_v > 0 and g_v = 0 otherwise. Let's scale f so that the sum of entries in g is 1. g is going to be the probability dist P. We know A(G)f = lambda_2 f. Let's take dot product of both sides with g. Get: lambda_2 = [A(G)f.g]/[f.g] Denominator = B Numerator = Sum_v( g_v * (sum_{u in N(v)} f_u) ) <= A (replacing negative entries with zero only increases this) So, all one has to do is prove Claim 2. -> see handout by Alon. =============================================================================