Markov Chains Lecture 2.
D. Sleator
Reminder of last time.
Markov chains
variation distance
convergence rate
The general goal of this work is to efficiently estimate the number of
solutions to combinatorial problems.
Theory of #P completeness.
Some #P-complete problems:
Computing the permanent
Counting the number of colorings of a graph
Let G be a graph of n vertices and m edges.
Let Delta be the maximum degree of G, let k
be the number of colors we want to color G with.
Randomized Approximation Scheme for k-colorings is:
A randomized algorithm that takes as input
a graph G and error bound e. And produces as output
a number Y such that:
Pr((1-e)C(G) <= Y <= (1+e)C(G)) >= 3/4
Here C(G) is the number of k-colorings of the graph G.
The scheme is Fully Polynomial if it runs in polynomial time
in the input length and e^(-1). Call it "FPRAS".
Our goal is to develope an FPRAS for k-coloring a graph.
We'll be able to do it as long as k >= 2Delta+1.
Our approach is this:
(1) Give an efficient algorithm to generate a uniformly
random coloring.
(2) Use this random coloring algorithm to compute the
estimate Y of the number of colorings.
In this lecture I'll talk about (1) in detail, but just gloss over
part (2). Let's do that first.
Let G be the graph in question. Let m be the number of edges in G.
Consider a series of graphs G=G(m), G(m-1), G(m-2)... G(0) where each
one is obtained from the previous one by removing some edge. Let C(G)
be the number of k-colorings fo a graph G. Our goal is to estimate
C(G).
We can write C(G) as follows:
C(G(m)) C(G(m-1) C(G(1))
C(G) = --------- x -------- x .... x ------- x C(G(0))
C(G(m-1)) C(G(m-2) C(G(0))
Our technique will be to estimate each of these terms.
Clearly C(G(0)) = k^n. To estimate:
C(G(i)
---------
C(G(i-1))
Note that the set of colorings of G(i) is a subset of the colorings of
G(i-1). So the ratio we want to estimate is simply the probability
that a random coloring of G(i-1) is also a coloring of G(i).
It remains to carefully work out the convergence rates so that we know
how many times we have to sample in order to get an estimate Y of C(G)
that's sufficiently close. This is all done in Jerrum's paper.
The key thing that we need is to show that the algorithm presented
last time for coloring converges quickly.
Lemma 1: The convergence time of the coloring markov chain is
k
Tau(e) <= -------- n ln(n/e)
k-2Delta
We're going to prove this by using a technique called coupling.
Let me explain that first.
Let Xt and Yt be the states at time t of two identical markov chains.
Each chain, when viewed separately from the other, is perfectly random.
However the chains are corrolated in such a way that they tend to
converge to the same state. Once this event happens (they've coupled)
they remain coupled.
Let me give an example. Consider the random walk on a circle of 2n
points. At each step we either go clockwise, counterclockwise, or
stay in the same state, with probability 1/3.
At each step, chain X decides randomly to go make one of
three moves (c, cc, s). Chain Y decides what to do, based on the
current relationship between X and Y, and what X decides to do.
Specifically:
If Xt=Yt then Y does what X does
If Xt and Yt have the same parity then we map X's
move to Y's move as follows
(c,cc,s) ==> (cc,c,s)
If Xt and Yt have different parity then we do this map:
(c,cc,s) ==> (c,s,cc)
It's clear that viewed separately from X, Y looks like a perfectly
good implementation of a random walk on a circle. It's also clear
that the two will converge fairly quickly. Why? Well, because
* If they have opposite parities, then in expected 3/2 steps
they'll have the same parity.
* If they have the same parity, then the time it takes until
coupling to occur is the time it takes for a random walk
on a line of n points to reach the end.
It turns out that there's a very direct connection between the
variation distance and coupling. Specifically:
Lemma: Let (X,Y) be two coupled markov chains. Let Y's initial state
be chosen from the stationary distribution. Then:
variation distance of X at time t <=
Pr(X and Y have not coupled by time t)
Proof:
Recall that the variation distance is defined as:
D(Xt) = MAX P(Xt in S) - Pi(S)
S subset of Omega
Alternatively, it's the sum over all states S where
the probability of Xt being in that state is
larger than the stationary distribution Pi, of the
difference between them.
Let A be this set maximizing this.
D(Xt) = Pr(Xt in A) - Pr(Yt in A)
Now, there are four disjoint possibilities. Let
p1 = Pr(Xt in A and Yt in A)
p2 = Pr(Xt in A and Yt not in A)
p3 = Pr(Xt not in A and Yt in A)
p4 = Pr(Xt not in A and Yt not in A)
Rewriting the above:
D(Xt) = p1+p2 - (p1+p3)
= p2 - p3
<= p2
<= Pr(Xt != Yt)
This is just the probability that X and Y have not
coupled. QED.
Now returning to the random walk on the circle, we need to bound the
probability that X and Y have not coupled. (Ignoring parity) This
probability is just the probability that a random walk on a line of n
points reaches the endpoint. This can be bounded using expectations.
I won't do it here.
Proof of Lemma 1. We construct a coupling (X,Y), where X is the
original coloring markov chain. Here's how it works:
(1) Select a vertex v in V at random.
(2) compute a permutation g = g(G,Xt,Yt,v) of C (the set of colors)
by a procedure to be explained below.
(3) Choose a color c in C at random.
(4) Recolor v to be color c in Xt to give X'
Recolor v to be color g(c) in Yt to give Y'
(5) If the coloring X' is proper then X(t+1) = X'
otherwise X(t+1)= Xt. (same for Y)
Clearly, both X and Y are faithful copies of the coloring markov
chain. We need to explain how we choose g().
Let A = the set of vertices on which Xt and Yt agree
.. D = ...................................... disagree
Consider E', the set of edges of G where one end is in A
and the other end is in D. Let d'(v) be the degree of
v in this set E'. Let m' = |E'|.
Here's how we compute g().
(a) if v in D then g is the identity
(b) if v is in A:
Let N be the neighbors of v.
Define Cx to be the set of colors used in N in
Xt but not in N in Yt.
Define Cy to be the set of colors used in N in
Yt but not in N in Xt.
Cx and Cy are disjoint.
|Cx| <= d'(v) and |Cy| < d'(v) Why?
Because: look at the subset of N that is in A.
The colors used there are not in Cx or Cy.
The rest of N has size d'(v).
WLOG assume |Cx| <= |Cy|. Choose any subset
C'y of Cy that is the same size as Cx.
The permutation used simply maps colors from
Cx <----> C'y, and is the identity elsewhere.
Let |D(t)| be the size of the disagreeing set at time t.
When this reaches 0, the chains have coupled. We need to bound
the probability that this has happened after t steps.
Delta D(t) is -1, 0 or 1. Let's look at the probability that it goes
up by one. This is bad and we want the probability to be small.
The only way this can happen is if the recolored vertices agree
and are recolored to disagree. This might occur in case (b) above.
It turns out that the only way this can happen is if the choice of
color c is in Cy.
[Why? Well, you can see this just by running through all the
possibilities. Actually, you kind of need to split out the WLOG
part to make it rigorous.]
So:
1 d'(v) m'
Pr(D increases) <= --- Sum ----- = ---
n v in A k kn
What about D decreasing? Well this will happen if v has different
colors in Xt and Yt, and it ends up with the same color.
This will certainly happen when the chosen color differs from all the
colors among N in both Xt and Yt. How many different colors is that?
It's clearly at least k-2d(v)+d'(v) >= k - 2Delta +d'(v).
Thus:
1 k - 2Delta +d'(v) (k-2Delta)
Prob(D decreases) >= --- Sum ------------------ = ---------- |D|+m'/kn
n v in D k kn
Let's look at the expected size of D(t+1).
E[|D(t+1)|] = Prob(D increases)(|D(t)|+1) +
Prob(D decreases)(|D(t)|-1)
Prob(D stays the same)(|D(t)|)
E[|D(t+1)|] = Prob(D increases)(|D(t)|+1) +
Prob(D decreases)(|D(t)|-1)
(1-Prob(D increases) - Prob(D decreases))(|D(t)|)
E[|D(t+1)|] = |Dt| + Prob(D increases) - Prob(D decreases)
Define
k-2Delta m'
a = -------- b = ---
kn kn
Using this notation:
Prob(D increases) <= b
Prob(D decreases) >= a|D(t)| + b
E[|D(t+1)|] <= |D(t)| + b - (a|D(t)|+b)
E[|D(t+1)|] <= |D(t)| + b - (a|D(t)|+b)
E[|D(t+1)|] <= (1-a)|D(t)|
...See last paragraph of proof...