15-451 Algorithms 11/10/2010 recitation notes * NP-completeness reductions contd. * Approximation algorithm for MAX-EXACTLY-3-SAT ==================================================================== More NP-completeness reductions =============================== The practice quiz talked about SET-SPLITTING. Given a collection of sets over n points {1,2,...,n} we want to color the points red or blue so that each set gets at least one point of each color. This turns out to be NP-complete. Let's prove it by reduction from 3-SAT. Given a 3-SAT formula F, create a SET-SPLITTING instance as follows: - one point for each variable x_i and one point for each "not(x_i)". - for each i, make the set {x_i, not(x_i)}. This will force them to be different colors. (One color will represent "true" and the other will represent "false". - one additional point "f" whose color we will interpret as "false". - then, for each clause c in the given 3-SAT formula, we create a set of size 4 containing the literals in c plus the point f. Claim 1: if the 3-SAT formula was satisfiable then the set-splitting instance is solvable too. Proof: use the two colors "true" and "false". Color each literal according to the setting of some satisfying assignment, and color "f" false. Each set has at least one true and at least one false (namely f) so it's a yes-instance too. Claim 2: vice-versa. Just interpret the color of f as false and the other color as true and read off the colors as the assignment. Each clause must have at least one literal set to true, and we never set literals inconsistently because of the sets {x_i, not(x_i)} Now turning to approximation algorithms... MAX-EXACTLY-3-SAT ================= Suppose we have a 3-SAT formula where every clause has size exactly 3 with no duplicate variables per clause allowed (e.g., can't have (x_1 v x_1 v x_1) or (x_1 v not(x_1))). Then there is a simple randomized algorithm that can satisfy at least a 7/8 fraction of clauses in expectation. Can anyone see what it is? [just try a random assignment to the variables] How about a deterministic algorithm? Here's a nice way we can do that. First, let's generalize the above statement to talk about general CNF formulas. Claim: Suppose we have a CNF formula of m clauses, with m_1 clauses of size 1, m_2 of size 2, etc. (m = m_1 + m_2 + ...). Again, no duplicate variables allowed in the same clause. Then a random assignment satisfies sum_k m_k(1 - 1/2^k) clauses. Proof: Can anyone see a proof? [linearity of expectation.] Now, here is a deterministic algorithm : Look at x_1: for each of the two possible settings (0 or 1) we can calculate the expected number of clauses satisfied if we were to go with that setting, and then set the rest of the variables randomly. (It is just the number of clauses already satisfied plus sum_k m_k(1-1/2^k), where m_k is the number of clauses of size k in the ``formula to go''.) Fix x_1 to the setting that gives us a larger expectation. Now go on to x_2 and do the same thing, setting it to the value with the highest expectation-to-go, and then x_3 and so on. The point is that since we always pick the setting whose expectation-to-go is larger, this expectation-to-go never decreases (since our current expectation is the average of the ones we get by setting the next variable to 0 or 1). This is called the ``conditional expectation'' method. The algorithm itself is completely deterministic --- in fact we could rewrite it to get rid of any hint of randomization by just viewing sum_k m_k(1-1/2^k) as a way of weighting the clauses to favor the small ones, but our motivation was based on the randomized method. Interesting fact: getting a 7/8 + epsilon approximation for any constant epsilon (like .001) for MAX-exactly-3-SAT is NP-hard!