Notes for 1/22/97	15-852 Randomized Algorithms

* the Minimum cut problem  (Chapter 1.1, 10.2)
  A simple, fast monte-carlo algorithm

===========================================================================

THE MINIMUM CUT PROBLEM

Given a graph G, a CUT is a set of edges whose removal splits the
graph into at least two connected components.  The MINIMUM CUT is the
cut of minimum size.  For instance, say this graph represents a
network, and we want to know how "robust" it is in the sense of the
the minimum number of links whose failure causes the network to become
disconnected.   The minimum cut problem is to find a cut of minimum size.

Easy fact: the minimum cut has size at most the minimum degree of any node.

You're probably more familiar with the "s-t minimum cut" problem,
where you want a cut that separates to specific vertices s and t,
since this is the dual to the max flow problem.  In fact, for a while,
the best algorithm for finding the global minimum cut was to solve the
s-t cut problem n times.

Here's a really simple randomized algorithm:  [due to D. Karger]

THE SIMPLE ALGORITHM  (view #1)

  1. Pick an edge (x,y) at random in G.

  2. CONTRACT the edge, keeping multi-edges, but removing self-loops.

	(i.e., if there were edges (x,v) and (y,v), we keep both of them.)

  3. If there are more than 2 nodes, go back to 1.  Else, output the
	edges remaining as your cut.

  (Note: step 1 is NOT the same as picking two connected vertices x,y
	at random.  Why?)


Do example on graph:     *-------*
                        /|       |\
                       / |       | \
                      *  |       |  *
                       \ |       | /
                        \|       |/
                         *-------*

Easy algorithm, want to show it has a reasonable (1/n^2) chance of working.
(So, run n^2 * log n times to get success w.h.p.)

EQUIVALENT FORMULATION: 

   1. Randomly rank the edges.

   2. Use Kruskal to find minimum spanning tree (i.e., start with
	lighted edge, then put in next lightest that connects two different
	components, etc.)


   3. remove the last edge.  This gives you the two components.


ANALYSIS
--------
1) Say we fix some cut C.  So long as none of edges in C have been
selected, C remains a cut.  Reason: only contract edges, so things on
left side of C stay on left side, and things on right side stay on
right side.

2) in the course of the algorithm, does the min cut size go down or
up?  Claim: any cut in the new graph is also a cut in the original
graph.  Why?  So, min cut size can't go down.

3) Say original graph has min cut of size k.  Claim: when have n-i
nodes remaining (for some i>= 0), there are at least (n-i)k/2 edges in
the graph.  Why?

   The point of (3) is that we want to have lots of edges to reduce
   the chance that we kill our favorite cut.

So,..... Fix some cut C of size k.  What is the prob that C survives
the first round?   

	1 - k/(# edges) >=  1 - k/(nk/2)  =  1 - 2/n.

Suppose it's survived so far, and there are now n-i vertices.  Prov of
survival is at least:
				1 - 2/(n-i).

So, prob that C survives overall is at least:

	Prod_{i=0}^{n-3} (1 - 2/(n-i))

	= (n-2)/n * (n-3)/(n-1) * (n-4)/(n-2) * ... * 1/3

        = 2/(n(n-1))

Neat implication: How many minimum cuts are there? A: at most n(n-1)/2.

Running time: can implement in O(m log m) time, or O(n^2) time,
results in O(mn^2 log^2 n) overall if we run it O(n^2 log(n)) times.
Simple, but not all THAT fast.  How can we speed all this up?


SPEEDING UP THE ALGORITHM   [Due to D. Karger and C. Stein]
-------------------------

Claim: earlier steps are less risky than later steps.  Why?

At what point is our chance of having lost C about 50/50?  

Ans: when we have about n/sqrt(2) vertices left.  By that time, our
success probability is about: 

		(n/sqrt(2))^2 / n^2 = 1/2.

So, this suggests a speedup:

	Instead of repeating from the start, let's repeat from only
	partway down.  E.g., from when we had just n/sqrt(2) vertices left.

Idea: build a binary tree.

	Depth is 2lg(n).  # leaves is n^2.

  Can think of as: each edge is independently destroyed with prob 1/2.
  What's the probability some path to a leaf survives?

	(can think of the strategy of doing n^2 repititions as a root
	with n^2 branches leading, in depth 2lg(n) to n^2 leaves.)


First, what is running time:

	T(n) = O(n^2) + 2T(n/sqrt(2)) = O(n^2 log n)


Now: want to show that a path survives with probability at least 1/log(n).

Proof:

 For tree of depth d, let P_d be the probability a path survives.
Looking recursively, we get:

	P_d = (1/2) P_{d-1} + (1/4)(1 - (1-P_{d-1})^2)

            = (1/2) P_{d-1} + (1/4)(2P_{d-1} - (P_{d-1})^2)

            = P_{d-1} - (1/4) (P_{d-1})^2.

We can now verify that P_d > 1/d, since:
   
              1/(d-1) - 1/(4(d-1)^2)  >  1/(d-1) - 1/(d(d-1)) = 1/d


So, we can just run our algorithm log^2(n) times to get good
probability of success. 


One good way of looking at what we've shown is this:  we have shown
that we can replace n^2 indepdendent runs of the algorithm by n^2
DEPENDENT runs of the algorithm in a way that drastically reduces
running time ((n^2 * cost-of-one-run) versus (log(n) * cost-of-one-run), 
but only slightly reduces the chance of success (constant versus 1/log(n)).