Notes for 01/10/11 15-859(M) Randomized Algorithms * Admin Handout: course info * Why randomized algorithms? * Plan for the course * Two interesting algorithms ============================================================================= ADMIN ----- Grades based on homeworks + class participation + take-home final + also a short project/presentation. There is no TA, so students will also be asked to help with the homework grading. Web page and course blog as given in today's handout. Page will have rough notes for each class, plus handouts. Book is Motwani-Raghavan. Can get on amazon. WHY A COURSE ON RANDOMIZED ALGORITHMS? -------------------------------------- * Randomness is a useful algorithmic resource. Explore important techniques for taking advantage of it. * Analysis can be delicate. Explore analysis techniques (e.g., setting up the right random variable). Also vehicle for exploring important probabilistic concepts (k-wise independence, tail inequalities), and combinatorial objects/concepts (expander graphs, conductance), and so on. * Randomized algs often simpler than deterministic counterparts. Good opportunity for talking about some neat algorithms. Course will have some parts that focus on techniques, some on different areas where techniques are useful. Try also to give connections, e.g., Randomied exponential-weighting alg in learning theory is also used in approximation algorithms and also gives a way of proving the minimax theorem in game theory. Also in random walks, we'll see they can be used for approximate counting problems and also for derandomization of algorithms. HOW DOES RANDOMNESS HELP IN ALGORITHMS? -------------------------------------- 1. Use simple randomized algorithm to achieve same worst-case performance as complicated deterministic algorithm. E.g., sometimes have simple deterministic strategy with good performance "most of the time" (on average over a random choice of the input) that can transform into an algorithm with good worst-case performance by adding randomness. (Worst case over the input, on average over the internal randomness of the algorithm) E.g., median-finding, quicksort, many standard algorithmic problems. 2. Sometimes can do things we don't know how to do deterministically. E.g., polynomial identity testing. 3. Sometimes can do things provably impossible deterministically (protocol or oracle or online or game theoretic problems) 4. Sometimes can do things that just seem impossible... Today: Some neat randomized algs. SPLITTING A 3-COLORABLE GRAPH INTO TWO PIECES SO NEITHER HAS A TRIANGLE ----------------------------------------------------------------------- Definition of problem: You have a 3-colorable graph. Ideally, you want to split into 3 independent sets. NP-hard. Easier problem: split into two pieces so that neither has a triangle. (Could imagine this as first step in some heuristic or approximation algorithm. In fact, if you could split into a constant number of pieces so that no piece had a diamond or pentagon, then this would improve current best approx ratio for 3-coloring) An interesting algorithm: First, split arbitrarily (Red/Blue). Find any bad triangle. Then pick one of its 3 nodes AT RANDOM, and move it to the other side. Continue until there are no more bad triangles. * Analysis Trick: Fix some arbitrary correct coloring. Look at quantity T = # nodes you've colored in same way as this coloring. T is some integer between 0 and n. Let's look at how T changes with time. * Consider number line with values of T. What is the probability of going left/right/staying-put in each step? (left with prob 1/3, right with prob 1/3 and stay put with prob 1/3). In this walk, the expected time to hit boundary 0 or n is O(n^2) [*]. This means that expect to end in at most that much time. Proof of [*]: Rough calculation: First, let's consider 50/50 random walk --- proving for this case is enough. Now, in a random walk with m steps, prob that make A steps to right is 2^{-m} * (m choose A). Maximized at A=m/2, and has value about 1/sqrt(m). (Stirling's approx gives sqrt(2/(pi*m)). So, prob that after m steps you are between 0 and n is at most n times this, which is < 1/2 for m = 4n^2. So, chance we haven't finished by 4n^2 steps is < 1/2. Chance we haven't finished by 8n^2 steps is < 1/4, etc. So, expectation is O(n^2). More exact calculation: Let E_x be the expected time to reach 0 or n given that you start at x. So, E_0 = E_n = 0. What is E_x in terms of its neighbors? E_x = 1 + 1/3(E_x + E_{x+1} + E_{x-1}). Note: this is using linearity of expectation. (E.g., event that in next k steps, you make at least x more steps to left than to right is correlated with event that you make at least x+1 more steps to left than to right. Nonetheless, E(X+Y) = E(X)+E(Y) even if X and Y are correlated.) Rewrite as: E_{x+1} - E_x = E_x - E_{x-1} - 3. So ``second derivative'' of E w.r.t. x is -3. Suggests to solve E_x = - 3/2 x^2 + cx + c' at boundary conditions. Get: 3/2 x(n-x). This is maximized at x=n/2, giving (3/8)n^2. Lessons: setting up the "right" quantity/random-variable (in this case, "T") is crucial. Also, if you don't know how to walk in the right direction, sometimes it's enough to walk randomly. Q: Where does the analysis go wrong with 5-cycles? ================================================ Improving over 2^n-time for 3-SAT. We all know the 3-SAT problem is NP-hard, so we don't expect to be able to get a polynomial-time algorithm to find a satisfying assignment. But maybe we can at least beat the naive 2^n time bound of just trying all possible solutions. (ok, we could try all possible solutions in a random order, getting expected time 0.5*2^n = 2^{n-1}, but that's not much of a savings). Here is an idea due to Uwe Schoening, that you will see is closely related to the algorithm we just analyzed. Will first reduce time to 3^{n/2} = 1.73^n, then to 1.5^n, and then to 1.33^n. Schoening's Alg, version 1: 1. pick a random initial assignment x. 2. while there is at least one unsatisfied clause and have done this at most n times: - pick an *arbitrary* unstatisfied clause c. - flip the bit of a *random* x_i in the clause. 3. If the formula is still not satisfied, go to 1. (see analysis for when to give up). Analysis #1: consider some arbitrary correct satisfying assignment A. With probability at least 1/2, the initial x agrees with A on at least n/2 positions. Now, every iteration of step 2 has at least a 1/3 chance of increasing the agreement with A by 1 (does everyone see why?). So, there is probability at LEAST p = (1/2)*(1/3)^{n/2}, we reach a satisfying assignment in n/2 steps. So, whp we will find a satisfying assignment after at most log(n)/p runs (does everyone see why?). So this is total time about O(n log n * 3^{n/2}). Analysis #2: We can get a better bound by just doing a less crude analysis. Rather than saying "with probability at least 1/2, the initial x agrees with A on at least n/2 positions", we can say that the probability x agrees with A on k positions is {n \choose k}*2^{-n}. So, the probability that in step 2 we make a beeline towards A (increasing agreement by 1 on each step until reaching a satisfying assignment) is at least \sum_k {n \choose k}*2^{-n}*(1/3)^k. As a cute trick we can then see that this equals 2^{-n}*(1 + 1/3)^n. So, the probability p of success is at least (2/3)^n. To improve this, we slightly change the algorithm: Schoening's Alg, version 2: Same as version #1, but do step 2 for 3n iterations (rather than just n iterations) before giving up. Analysis: Hmm - let's leave this to the homework! [Notes: - Coloring algorithm is from: Colin McDiarmid, ``A random recouloring method for graphs and hypergraphs.'' Combinatorics, Probability and Computing, 2:363-365, 1993. - 3-SAT algorithm is from: U. Schoening. A probabliistic algorithm for k-SAT and constraint satisfaction problems. Proc. 40th Symp. Foundations of Computer Science, pp. 410-414, 1999. ]