Notes for 01/10/11 15-859(M) Randomized Algorithms
* Admin Handout: course info
* Why randomized algorithms?
* Plan for the course
* Two interesting algorithms
=============================================================================
ADMIN
-----
Grades based on homeworks + class participation + take-home final +
also a short project/presentation.
There is no TA, so students will also be asked to help with the
homework grading.
Web page and course blog as given in today's handout.
Page will have rough notes for each class, plus handouts.
Book is Motwani-Raghavan. Can get on amazon.
WHY A COURSE ON RANDOMIZED ALGORITHMS?
--------------------------------------
* Randomness is a useful algorithmic resource. Explore important
techniques for taking advantage of it.
* Analysis can be delicate. Explore analysis techniques (e.g.,
setting up the right random variable). Also vehicle for exploring
important probabilistic concepts (k-wise independence, tail
inequalities), and combinatorial objects/concepts (expander graphs,
conductance), and so on.
* Randomized algs often simpler than deterministic counterparts. Good
opportunity for talking about some neat algorithms.
Course will have some parts that focus on techniques, some on
different areas where techniques are useful. Try also to give
connections, e.g., Randomied exponential-weighting alg in learning
theory is also used in approximation algorithms and also gives a way
of proving the minimax theorem in game theory. Also in random walks,
we'll see they can be used for approximate counting problems and also
for derandomization of algorithms.
HOW DOES RANDOMNESS HELP IN ALGORITHMS?
--------------------------------------
1. Use simple randomized algorithm to achieve same worst-case performance
as complicated deterministic algorithm. E.g., sometimes have simple
deterministic strategy with good performance "most of the time"
(on average over a random choice of the input)
that can transform into an algorithm with good worst-case
performance by adding randomness. (Worst case over the input,
on average over the internal randomness of the algorithm)
E.g., median-finding, quicksort, many standard algorithmic problems.
2. Sometimes can do things we don't know how to do deterministically.
E.g., polynomial identity testing.
3. Sometimes can do things provably impossible deterministically
(protocol or oracle or online or game theoretic problems)
4. Sometimes can do things that just seem impossible...
Today: Some neat randomized algs.
SPLITTING A 3-COLORABLE GRAPH INTO TWO PIECES SO NEITHER HAS A TRIANGLE
-----------------------------------------------------------------------
Definition of problem: You have a 3-colorable graph. Ideally, you
want to split into 3 independent sets. NP-hard. Easier problem:
split into two pieces so that neither has a triangle. (Could imagine
this as first step in some heuristic or approximation algorithm. In
fact, if you could split into a constant number of pieces so that no
piece had a diamond or pentagon, then this would improve current best
approx ratio for 3-coloring)
An interesting algorithm: First, split arbitrarily (Red/Blue). Find
any bad triangle. Then pick one of its 3 nodes AT RANDOM, and move it
to the other side. Continue until there are no more bad triangles.
* Analysis Trick: Fix some arbitrary correct coloring. Look at quantity
T = # nodes you've colored in same way as this coloring. T is some
integer between 0 and n. Let's look at how T changes with time.
* Consider number line with values of T. What is the probability of
going left/right/staying-put in each step? (left with prob 1/3, right
with prob 1/3 and stay put with prob 1/3). In this walk, the expected
time to hit boundary 0 or n is O(n^2) [*]. This means that expect to
end in at most that much time.
Proof of [*]:
Rough calculation: First, let's consider 50/50 random walk --- proving
for this case is enough. Now, in a random walk with m steps, prob
that make A steps to right is 2^{-m} * (m choose A). Maximized at
A=m/2, and has value about 1/sqrt(m). (Stirling's approx gives
sqrt(2/(pi*m)). So, prob that after m steps you are between 0 and n
is at most n times this, which is < 1/2 for m = 4n^2. So, chance we
haven't finished by 4n^2 steps is < 1/2. Chance we haven't finished
by 8n^2 steps is < 1/4, etc. So, expectation is O(n^2).
More exact calculation: Let E_x be the expected time to reach 0 or n
given that you start at x. So, E_0 = E_n = 0. What is E_x in terms of
its neighbors? E_x = 1 + 1/3(E_x + E_{x+1} + E_{x-1}). Note: this is
using linearity of expectation. (E.g., event that in next k steps, you
make at least x more steps to left than to right is correlated with
event that you make at least x+1 more steps to left than to right.
Nonetheless, E(X+Y) = E(X)+E(Y) even if X and Y are correlated.) Rewrite
as: E_{x+1} - E_x = E_x - E_{x-1} - 3. So ``second derivative'' of E
w.r.t. x is -3. Suggests to solve E_x = - 3/2 x^2 + cx + c' at boundary
conditions. Get: 3/2 x(n-x). This is maximized at x=n/2, giving (3/8)n^2.
Lessons: setting up the "right" quantity/random-variable (in this case,
"T") is crucial. Also, if you don't know how to walk in the right
direction, sometimes it's enough to walk randomly. Q: Where does the
analysis go wrong with 5-cycles?
================================================
Improving over 2^n-time for 3-SAT.
We all know the 3-SAT problem is NP-hard, so we don't expect to be
able to get a polynomial-time algorithm to find a satisfying
assignment. But maybe we can at least beat the naive 2^n time bound
of just trying all possible solutions. (ok, we could try all possible
solutions in a random order, getting expected time 0.5*2^n = 2^{n-1},
but that's not much of a savings).
Here is an idea due to Uwe Schoening, that you will see is closely
related to the algorithm we just analyzed. Will first reduce time to
3^{n/2} = 1.73^n, then to 1.5^n, and then to 1.33^n.
Schoening's Alg, version 1:
1. pick a random initial assignment x.
2. while there is at least one unsatisfied clause and have done this
at most n times:
- pick an *arbitrary* unstatisfied clause c.
- flip the bit of a *random* x_i in the clause.
3. If the formula is still not satisfied, go to 1.
(see analysis for when to give up).
Analysis #1: consider some arbitrary correct satisfying assignment A.
With probability at least 1/2, the initial x agrees with A on at least
n/2 positions. Now, every iteration of step 2 has at least a 1/3
chance of increasing the agreement with A by 1 (does everyone see
why?). So, there is probability at LEAST p = (1/2)*(1/3)^{n/2}, we
reach a satisfying assignment in n/2 steps. So, whp we will find a
satisfying assignment after at most log(n)/p runs (does everyone see
why?). So this is total time about O(n log n * 3^{n/2}).
Analysis #2: We can get a better bound by just doing a less crude
analysis. Rather than saying "with probability at least 1/2, the
initial x agrees with A on at least n/2 positions", we can say that
the probability x agrees with A on k positions is {n \choose k}*2^{-n}.
So, the probability that in step 2 we make a beeline towards A
(increasing agreement by 1 on each step until reaching a satisfying
assignment) is at least \sum_k {n \choose k}*2^{-n}*(1/3)^k.
As a cute trick we can then see that this equals 2^{-n}*(1 + 1/3)^n.
So, the probability p of success is at least (2/3)^n.
To improve this, we slightly change the algorithm:
Schoening's Alg, version 2:
Same as version #1, but do step 2 for 3n iterations (rather than just
n iterations) before giving up.
Analysis: Hmm - let's leave this to the homework!
[Notes:
- Coloring algorithm is from: Colin McDiarmid, ``A random recouloring
method for graphs and hypergraphs.'' Combinatorics, Probability and
Computing, 2:363-365, 1993.
- 3-SAT algorithm is from: U. Schoening. A probabliistic algorithm for
k-SAT and constraint satisfaction problems. Proc. 40th
Symp. Foundations of Computer Science, pp. 410-414, 1999.
]