04/06/11 15-859(M) Randomized Algorithms Reading: Chapter 11 * Approximate counting via random sampling * Random sampling via random walks on appropriately-define Markov Chains * Approximating the permanent ======================================================================== Approximate counting -------------------- There are a lot of interesting problems that fall into the following framework. We have some universe U of points, some subset S of which are special. We want to know |S| (or in a continuous domain, the volume of S), or equivalently, the probability that a randomly drawn point from U belongs to S. E.g., S = some polygonal object in the plane, U = a bounding rectangle. What is the volume of S? E.g., U = strings of length n, S = those that satisfy a given DNF formula F. How many solutions are there to F? One easy thing to do: pick a bunch of samples in U, and observe how many fall into S (assume we can verify whether a point is in S in poly time), and then take the ratio. But, what if |S|/|U| is small? In this case, this does not approximate |S| very well. Can we do better? In fact, many of these counting problems are #P-hard. E.g., reporting the number of solutions to a given DNF formula is #P-hard. Computing the permanent of a matrix (which for A(G) corresponds to asking how many perfect matchings there are in G) is #P-hard. Goal: fpras: for any epsilon>0, produce an answer A such that A is in [(1-epsilon)|S|, (1+epsilon)|S|] in time poly in n and 1/epsilon. (n = description length of points in U). Idea: suppose we can create a sequence of polynomially-many sets: S=S_0 contained in S_1 contained in ... S_m ... contained in U where |S_i|/|S_{i-1}| is at most poly(n). Then, we can reduce the counting problem to the random sampling problem: --> pick a random point from S_1, and see if it's in S. Since |S|/|S_1| is at least 1/poly, we can get a good relative-error estimate of this ratio. Say |S| = p_1*|S_1|. We find q_1 with (1-epsilon/m)p_1 < q_1 < (1+epsilon/m)p_1. Then do for |S_1|/|S_2|, etc. Finally, output q_1*q_2*...*q_m*|U| which is at least (1-epsilon)*|S| and at most (1+2epsilon)*|S|. So, if we can set up this sequence of sets, then the problem boils down to: how do we pick a random element of S_i? We'll do this by setting up some appropriate neighborhood structure and doing a random walk, and showing that this walk is rapidly mixing. Approximating the Permanent =========================== Per(A): like determinant except don't alternate signs. If G is an n by n bipartite graph on (X,Y), and we define a_ij = 1 if there is an edge from element i in X to element j in Y, then Per(A) is the number of perfect matchings in A. Equivalently, if G is a directed graph and we define A=A(G) in the usual way, then Per(A) is the number of cycle covers. *---* 1 0 1 \ / *---* 0 1 0 / \ *---* 1 0 1 Det(A) is easy to compute, but Per(A) is #P-hard. We'll be looking at approximating Per(A) where A is the matrix for a n by n bipartite graph of minimum degree n/2. (Computing exactly is still #P-hard for these.) Looking at graph G, bipartite on two sets X and Y of n vertices each. m = number of edges. * One strategy: pick a random permutation on (1,...,n) and see if it corresponds to a matching in the graph and repeat. Would be fine if graph was REALLY dense, but the problem is we might never find a matching this way. Instead, we'll do it like this: * Define M_k as set of k-edge matchings. Our goal is to estimate |M_n|. Do a random walk on M_k U M_{k-1}. (Will define some graph on these sets of matchings and show that it quickly approaches uniform.) Then see how often it ends up in the two sets to estimate |M_k|/|M_{k-1}|. Do for all k and multiply out to get estimate of |M_n|. |M_1| is easy: it's just m. * To get a good estimate, we need: (A) M_k is not much larger than M_{k-1}: easy to see it's at most a factor of n^2 larger. (B) M_k is not too much (at most a factor n^2) SMALLER than M_{k-1}: Uses fact that graph has large degree. Basically, for any y in M_{k-1}, there must be a nearby x in M_k. Specifially, pick two unmatched vertices u and v. If they are neighbors, then done. Else there must be a neighbor of u and a neighbor of v that are matched together in y, so remove that edge and add the two new ones. * In fact, will just look at M_n and M_{n-1}. Turns out we can reduce the problem of random sampling M_k U M_{k-1} in graph G to the problem of sampling M_n' U M_{n'-1} in some appropriately-defined n' by n' bipartite graph G'. (See exercise 11.11.) [[Specifically, add n-k new vertices to X and connect these each to all the old vertices in Y. Similarly add n-k new vertices to Y and connect these to all the old vertices in X. Then, can use results from walk on perfect and near-perfect matchings of G' to calculate the desired quantity for G.]] Now: define a random walk on M_n U M_{n-1}. So, this is a big graph where each perfect or near-perfect matching is a node. * Here's the walk: * with 1/2 prob, we stay put. * with 1/2 prob, we pick a random edge e in the graph G: REDUCE: if current state is in M_n and e is IN the matching, then remove e (going into M_{n-1}) ROTATE: if in M_{n-1} and e has one matched endpoint and one unmatched, then insert e and remove the other edge. AUGMENT: if in M_{n-1} and neither endpoint of e is matched, then add e (going into M_n). otherwise, stay put. Go through example on K_{2,2} (See p.321 of the text). * * *---* / / * * * * *---* * * \/ *---* /\ * * * * * * \ *---* \ * * Claim: for any states i,j Pr(i-->j) = Pr(j-->i). Why? (in particular, if the probability is non-zero, then it's 1/2m) Claim: stationary distribution is uniform. --> since symmetric, both row and column sums are 1 (called "doubly stochastic"), so uniform dist is stationary. Also, easy to see it's aperiodic. So, all that's left is to figure out how fast we converge to uniform. To do this, we want to bound the second-eigenvalue away from 1. Note: our goal is weaker than that of explicit construction of expanders. We have a walk on N (which is exponential in n) states, and unlike the case of expanders, 1) It's ok that our graph has degree O(n), and not constant. 2) Just need 1-lambda_2 >= 1/poly(n) (Or, in terms of N, we want an expander, but allowing polylog(N) degree and 1/polylog(N) eigenvalue separation. This is easier than getting both of these constant.) (Once we get this gap, can run walk poly(n) steps to get very close to stationary distrib. Not hard to see that "very close" is good enough.) Will show eigenvalue gap by showing that the graph has edge expansion, weighted according to pi. Called "high conductance" of Markov chain. * Capacity(S) C_S = sum_{i in S} pi_i. (prob of being in S according to pi. --> measure of size of S) * flow(S) = sum_{i in S, j not in S} w_ij, where w_ij = (pi_i)*(p_ij) = prob of being on edge i->j in pi. (measure of # edges leaving S). * Conductance Phi = min_{S: C_S <= 1/2} Phi_S Phi_S = flow(S)/capacity(S). E.g., in our case, pi_i = 1/N, p_ij = 1/2m, so w_ij = 1/(2mN). Phi_S = (1/2m)(# edges leaving S)/|S| N = number of states in the Markov chain. n = number of vertices in original bipartite graph, and m = # edges. Goal is conductance >= 1/poly(n). ========================================================================== Show high conductance (edge expansion) via "canonical path method". * For every pair of states (u,v), we'll define a canonical path from u to v. * Show that the number of paths through any given transition is at most bN. * This means the number of edges leaving S (|S|<= N/2) is at least |S|(N-|S|)/(bN) >= |S|/2b, and (# edges leaving S)/|S| >= 1/2b. So, if b is const or poly(n), we're done. =========================================================================== Illustrate by showing that boolean cube has edge-expansion (random walk has high conductance). * State space is {0,1}^n. * N = 2^n, degree = n = log(N). * For two vertices (v,w), define path P(v,w): flip bits that differ from left to right. Claim: no transition is contained in more than N/2 canonical paths. Proof: (ASK). Think of some transition T: 01011010 -> 01011110 What paths can possibly include this transition? Must be from some point of the form: *****010 to some point of the form: 010111** Number of such pairs is at most (2^n)/2 = N/2. So, all sets S of size at most N/2 have (# edges leaving S)/|S| >= 1. So, conductance Phi_S = sum(pi_i * p_ij) / sum(pi_i) >= 1/(2n) (given that you're at a random point in S, you have prob at least 1/2n of getting out.) ========================================================================== Now, back to the Markov chain for the Permanent. Need to specify canonical path between two states s and t. In fact, we'll only need to worry about the case that s and t are both perfect matchings. [[Do this only if there's time]] We can then handle the near-perfect matchings as follows: We argued earlier that (given our condition of the min degree at least n/2) we can associate every near-perfect matching to some perfect matching such that no perfect matching has more than n+n^2 near-perfect matchings associated to it. So, if s and/or t is a near-perfect matching, we can just first move to the perfect matching associated to s, then take the canonical path to the perfect matching associated to t, and then go to t. The canonical path is just the following: look at the symmetric difference of s and t. This is just a collection of disjoint cycles. To go from s to t, we will "unwind" each of these cycles in some canonical order -- say, doing the cycle containing the vertex of least index first, and so on. "Unwinding" a cycle corresponds to doing one Reduce operation, then a sequence of Rotates, and then finally one Augment. We can just pick some arbitrary canonical starting point and direction for this unwinding (say, Reduce the edge containing the vertex of least index, and then Rotate in the direction of that vertex). Theorem: At most N canonical paths between perfect matchings use any given transition T. (b=1) (Note: this implies that at most O(n^4*N) canonical paths include T when you also count the near-perfect matchings.) (remember Phi_S = (1/2m)(# edges leaving S)/|S|, and now we have that (# edges leaving S)/|S| is >= 1/(n^4).) Proof: [[Basically idea is: given T (say T is the transition u-->v), if we also knew s U t, then we could reconstruct which part is s and which part is t, by using our canonical ordering. And, we can encode s U t with a perfect or near-perfect matching p, which means there must be at most N such pairs. E.g., if T is reduce, then part of u looks like s and part of u looks like t, so just want p to look like t on the first portion and look like s on the second portion (and also include edges where s and t are the same just to make p be a perfect matching)]] We'll show that given a transition T (say T is the transition u-->v), for any pair (s,t) whose path goes through T, we can associate a perfect or near-perfect matching p, such that given p and T we can recover s and t. I.e., our mapping is 1-1. This means that there can be at most N such pairs (s,t). The easier cases are when T is a Reduce or Augment transition. Formally, if T is Reduce, then p = s XOR t XOR u, and if T is Augment then p = s XOR t XOR v. Informally, p is just the portion of (s U t) missing from (u U v) (where we've included the edges where s=t just to make p into a perfect matching). Given p and T, it's easy to see that we can recover (s U t). We can then split this into s and t by using the canonical ordering (and u and v). The rotate case is slightly more complicated since the portion of (s U t) missing from (u U v) (formally: s XOR t XOR (u U v)) is not a matching: it has one vertex of degree 2. But, all we have to do is to remove the edge (of s) that was the initial edge removed in the Reduce operation for this cycle. Then, when we reconstruct, we will have all of (s U t), except for this edge. But, that's OK, because when we see the path corresponding to the cycle minus this edge, we will know that we can just reconstruct the cycle by connecting together the two endpoints of the path. So, that does it.