Notes for 02/19 15-852 Randomized Algorithms * more on expanders and eigenvalues * approximate counting via rapid mixing: overview ============================================================================== Last time, gave some intuition about why expanders have an eigenvalue gap. Actually, reverse proof is much easier. An in fact, some of the constructions explicitly create graphs with an eigenvalue gap. So, could have defined expanders as "graphs with an eigenvalue gap" and then said "here's a neat expansion property that they have". Theorem: if G is d-regular, then for all |W| <= n/2, |N(W) - W| >= |W|*(d-lambda)/2d lambda = second-largest eigenvalue of A(G). Proof: Will prove there are a lot of edges between W and V-W. In particular: E = E(W,V-W) >= |W|*|V-W|*(d-lambda)/n where E(A,B) = # edges between A and B. This implies what we want because: Since |W|<=n/2 we have e(W,V-W) >= |W|*(d-lambda)/2. Now just use the fact that each node in V-W has at most d neighbors in W. Define vector f such that f_v are equal (and positive) over v in W, f_v are equal (and negative) over v in V-W, and the sum of entries in f is 0. E.g., f_v = |V-W| for v in W, f_v = -|W| for v in V-W. If we write f in terms of eigenvectors, it has zero component along largest since sum of entries is 0. IDEA: in one step of random walk, f has to shrink since there's an eigenvalue gap, but the only way for f to shrink is to miz between W and V-W, which means there have to be a lot of edges. Formally: f = c_2e_2 + ... + c_ne_n Af = c_2*lambda_2*e_2 + ... + c_n*lambda_n*e_n. Now, take dot product with f. Af.f = (c_2)^2*lambda_2 + ... + (c_n)^2*lambda_n <= lambda_2*(f.f) RHS is lambda_2*sum_v (f_v)^2 LHS is sum_v [ f_v * [sum_{u in N(v)} f_u] ]. If f_v was constant, this would just be d*sum_v[(f_v)^2]. But f is not exactly constant, so instead we have: d*sum_v[(f_v)^2] - E*[|V-W|^2 + |W|^2 + 2|V-W|*|W|] = d*sum_v[(f_v)^2] - E*n^2. and voila. --------------------------------------------------------------------------- For the other direction, I still don't know an easy proof, but here's another piece of intuition. Turns out that can reduce the proof to the following probabilistic puzzle. Consider a probability distribution P on a regular degree-d graph G. Say we pick two vertices independently according to P. Let A = Prob(the two vertices are neighbors) Let B = Prob(the two vertices are the same) Claim 1: for any P, A/B <= d. E.g., if P is uniform, then A=d/n and B=1/n so A/B=d. Argue that any other P just decreases the ratio? Follows from (P_u)^2 + (P_v)^2 >= 2*P_u*P_v. Sum over all edges. LHS gives d*B, RHS gives A. Claim 2: suppose we require P_v = 0 on at least half of the vertices of G. Then, if G is an expander, A/B <= d - c for some constant c>0. Proving this is the hard part. Some intuition: say P was flat on the non-zero part W. In this case, B = 1/|W|. A = d/|W| - (# edges leaving W) So, A/B = d - (# edges leaving W)/|W| = d-c since graph is an expander. I don't know an intuitive proof in general. Let's just see why Claim 2 implies what we want: namely, lambda_2 <= d - c. Let f = eigenvector for 2nd largest eigenvalue lambda. WLOG, f is positive on at most half of entries (else look at -f). Define g so that g_v = f_v if f_v > 0 and g_v = 0 otherwise. Let's scale g so that the sum of entries is 1. g is going to be the probability dist P. We know A(G)f = lambda f. Let's take dot product of both sides with g. Get: lambda = [A(G)f.g]/[f.g] Denominator = B Numerator = Sum_v( g_v * (sum_{u in N(v)} f_u) ) <= A (replacing negative entries with zero only increases this) So, all one has to do is prove Claim 2. I'm still thinking about an intuitive proof of this. Extra credit if you can find one.... ============================================================================= Approximate counting -------------------- There are a lot of interesting problems that fall into the following framework. We have some universe U of points, some subset S of which are special. We want to know |S| (or in a continuous domain, the volume of S), or equivalently, the probability that a randomly drawn point from U belongs to S. E.g., S = some polygonal object in the plane, U = a bounding rectangle. What is the volume of S? E.g., U = strings of length n, S = those that satisfy a given DNF formula F. How many solutions are there to F? One easy thing to do: pick a bunch of samples in U, and observe how many fall into S (assume we can verify whether a point is in S in poly time), and then take the ratio. But, what if |S|/|U| is small? In this case, this does not approximate |S| very well. Can we do better? In fact, many of these counting problems are #P-hard. E.g., reporting the number of solutions to a given DNF formula is #P-hard. Computing the permanent of a matrix (which for A(G) corresponds to asking how many perfect matchings there are in G) is #P-hard. Goal: fpras: for any epsilon>0, produce an answer A such that A is in [(1-epsilon)|S|, (1+epsilon)|S|] in time poly in n and 1/epsilon. (n = description length of points in U). Idea: suppose we can create a sequence of polynomially-many sets: S=S_0 contained in S_1 contained in ... S_m ... contained in U where |S_i|/|S_{i-1}| is at most poly(n). Then, we can reduce the counting problem to the random sampling problem: --> pick a random point from S_1, and see if it's in S. Since |S|/|S_1| is at least 1/poly, we can get a good relative-error estimate of this ratio. Say |S| = p_1*|S_1|. We find q_1 with (1-epsilon/m)p_1 < q_1 < (1+epsilon/m)p_1. Then do for |S_1|/|S_2|, etc. Finally, output q_1*q_2*...*q_m*|U| which is at least (1-epsilon)*|S| and at most (1+2epsilon)*|S|. So, if we can set up this sequence of sets, then the problem boils down to: how do we pick a random element of S_i? We'll do this by setting up some appropriate neighborhood structure and doing a random walk, and showing that this walk is rapidly mixing.