Notes for 02/03 15-852 Randomized Algorithms * randomized rou{t,nd}ing Chap 4.3, 5.1, 5.2, 5.6 * "the probabilistic method" Start with something I should have done last time. Then, continue on. ============================================================================= Randomized routing/rounding --------------------------- Given an undirected graph and a set of pairs {(s_i, t_i)} we want to route these pairs to minimize congestion. NP-hard. Approximate? Idea: (Raghavan & Thompson) 1. Solve fractionally. Think of as multi-commodity flow. (View as a decision problem: is there a solution with congestion at most C ==> set capacities on edges to C, and then do binary search on C). Can solve with linear programming. __ /|\ |\ \|_\|/ 2. For each pair (s_i, t_i) we have a flow: Now do "path stripping": Can write this flow as a union of paths, with path j carring some fraction f_j of flow. sum_j f_j = 1. 3. Choose routing for (s_i, t_i) path by randomly selecting according to the distribution f_j. Analysis: fix some edge. Let p_i be the flow of commodity i on this edge. This also means that p_i is the probability that we picked this edge for routing (s_i, t_i) in step 3. So, for a given edge, can think as a box with of a bunch of balls, one per (s_i, t_i), each thrown into this box with prob p_i, INDEPENDENTLY. Expected number of balls is C. Chernoff gives us: Pr[total > (1+epsilon)C] < e^{-epsilon^2*C/3} So, if C >> log(n), then w.h.p., maximum is only 1+epsilon times larger than the expectation. What if C=1, or C is constant? In this case, a bound we can apply is Pr[sum > k*u] < (e^(k-1)/k^k)^u, where u is expectation. (This bound holds for all k>1, and where "sum" is a sum of independent {0,1} random variables.) So, set k to be O(log(n)/loglog(n)), and then get 1/n^c. Somewhat like our discussion of the maximum for the occupancy problem. ------------------------------------------------------------------------ Note: last time talked about how with pairwise independence, can only assume max is at most sqrt(n), and wondered how the hashing methods we discussed would behave. In fact, this is the topic of Gabor Tardos's theory seminar this Friday. ----------------------------------------------------------------------- "The Probabilistic Method": We want to show that something exists. Do this by setting up some probability distribution and showing that what we want happens with probability > 0. Spencer's Graduation/Tenure Game -------------------------------- Begin with some set of people at various positions. 2 players: Paul (partitioner) splits people into two groups Carole (chooser) selects one group to be removed, other advances. Paul wins if somebody graduates, Carole wins if nobody graduates. Theorem 1: If sum_k a_k / 2^k < 1, then Carole has a winning strategy. Theorem 2: If sum_k a_k / 2^k >= 1, then Paul has a winning strategy. Proof of theorem 1. Suppose Carole plays randomly. What is the expected number of people that reach the goal? For instance, for fixed person who is initially at level k, what is the probability that this person reaches the goal? Therefore, no matter what Paul's strategy is, a random Carole has a non-zero chance of winning. Therefore, Paul cannot have a winning strategy and so Carole does. What is Carole's winning strategy? Just use Theorem 1: pick subset where sum is less than 1/2, so when they advance, the sum is less than 1, and we still have a winning strategy. Idea sometimes called the "method of conditional expectations". General idea: say we've shown that making random choices results in a good expected value. Now we consider our first choice, and calculate conditional expectations given this first choice, and we pick one where conditional expectation is still good (which we know must occur, for appropriate definitions of "good"). What about Paul? His strategy is to split so that both groups have sum >= 1/2, by same reasoning. Can always do this. (Go through elements from largest to smallest.) MAX SAT --------- CNF formula: an AND of clauses. "exactly k" CNF: each clause of size exactly k. (k-CNF has each of size at most k.) Say we're given one and we want to satisfy as many clauses as we can. Claim: For any "exactly k"-CNF, there exists a solution that satisfies at least 1 - 1/2^k of the clauses. Proof: consider a random assignment. Expected # clauses satisfied E=(1-1/2^k)m. (m=number of clauses). How about a deterministic algorithm? Idea: given any partial assignment P, we can calculate the expected number of clauses satisfied given that those variables not specified in P are set randomly. (Just look at the clauses one by one.) So, use conditional expectation method. --> Calculate E_0 = expected number satisfied given x_1=0 and rest are random. E_1 = expected number satsified given x_1=1 and rest are random. E = (E_0 + E_1)/2. So, fix x_1 depending on whichever is larger. Continue with x_2, etc. What about MAX-SAT in general? If there are m_k clauses of size k, we satisfy sum_k m_k(1-1/2^k) of them. In the worst case, this is 1/2. Can't hope for better, e.g., if the formula is just "x_1 and not(x_1)". How about an approximation algorithm that satisfies nearly as many clauses as the best possible solution? (note: MAX 2-SAT is NP-hard). Here's a way to satisfy at least 3/4 of the maximum possible. Note: above will work so long as no singletons. Now we'll look at a randomized rounding procedure that does well so long as all clauses are small. Then we'll combine them. --> solve fractional version of problem. Instead of requiring variable x_i to be in {0,1}, we allow x_i to be in [0,1]. define not(x_i) to be 1-x_i. Allow clauses to be "partially satsified": for clause (x_1 or x_2 or not(x_3)), let "satisfiedness" be: min(1, x_1 + x_2 + (1-x_3)). I.e., if sum is less than 1, then it represents how satisfied the clause is, and if greater than 1, then we say clause is satisfied. Then find solution that maximizes total satisfaction. Set up as LP: x_i in [0,1]. Variable z_j for clause j: z_j <= 1. z_j <= sum-of-literals-in-clause-j. Then, maximize sum z_j. --> Now, let's do randomized rounding: set variable i to 1 with prob x_i. Claim: if clause j has k literals, then prob(clause j is satisfied) >= z_j * (1 - (1-1/k)^k) [Note: for k=1, this is z_j, for k=2, this is 3z/4, for k=3, this is 0.704z] Proof: idea: say z_j=1 and all variables in it are at 1/k. Then, prob not satisfied is exactly (1-1/k)^k. Say z_j may not be 1, but all variables are equal at z_j/k. Then prob satisfied = 1 - (1-z_j/k)^k. This is >= what we want since equal at z_j=0 or 1, and concave. Then, have to show that for a given sum, the prob is maximized when they're all equal which is intuitively easy to see (but I don't have a nongrungy argument right now...) --> So, this strategy does well when clauses are small, previous did well when clauses are big. E.g., prob(clause j is satisfied) as a function of k is: strategy 1 | strategy 2 ------------|------------- k=1 1/2 | z_j k=2 3/4 | 3/4 * z_j k=3 7/8 | 0.704 z_j --> so, let's just flip a coin and with prob 1/2 use strategy 1, with prob 1/2 use strategy 2. Goodness is average of two values from above table. Just want this to be >= 3/4 * z_j, which, in fact, it is. (just have to do the calculation in general). Notes: Current best approxs: 0.931 for MAX 2-SAT [Feige-Goemans], 0.801 for MAX 3-SAT [Trevisan, Sorkin, Sudan, Williamson], 0.758 for MAX-SAT [Goemans-Williamson] Current best hardness results: 36/37 = 0.973 for MAX-3SAT, 73/74=0.986 for MAX 2-SAT.