15-451 Algorithms			10/19/06
* Graphs Algorithms III: fun with BFS and DFS
 - 2-coloring
 - properties of BFS and DFS trees
 - strongly-connected components
=======================================================================
Problem 1: Graph Coloring
-------------------------
Given a graph, want to give each node a color such that no
two neighbors have the same color.  Goal is to do this with the fewest
total number of different colors (easy to do with n colors).

E.g., pentagon.

E.g., consider the problem of scheduling final exams into the fewest
number of different time-slots.  Each node is a class, and
there is an edge between two if lots of students are taking both
classes.  In compilers, this kind of problem comes up when assigning
variables to registers. 

Unfortunately, graph-coloring problem is NP-hard.  In fact, even telling if
possible to color with <= 3 colors is NP-hard in general.  But,
2-coloring problem (color with 2 colors if possible else say
"impossible") is easy.   How to do it?

Turns out it's pretty simple.  Pick some vertex v and give it one
color (say, red).  Then all its neighbors had better be the other
color (say, blue).  All their neighbors had better be red, their
neighbors blue and so forth, sweeping across the graph until
everything in v's connected component has a color.  Now, we check: if
there is any bad edge (both endpoints the same color) then G is not
2-colorable: why? because all our decisions were forced.  Otherwise we
have a legal 2-coloring of v's component and we run this procedure
again on any remaining components (which don't interfere with each other
since there are no edges between components).

Another term for a graph being 2-colorable is that it is bipartite.
So, this is an algorithm for testing if a graph is bipartite, and if
so, finding a bipartition.

One way to think of this is we are doing breadth-first search.  (can
actually use DFS to do this too, but BFS is easier to visualize).

Reminder about BFS and discussion of BFS trees and DFS trees
------------------------------------------------------------

Breadth-first search: start at some vertex v.  Then go to all
neighbors (call them "level 1" since they're distance 1 from v) then
to their unvisited neighbors (level 2 at distance 2 from v) and so
on, sweeping this wavefront across the graph. Operationally, we can do
this by storing current wavefront in a queue.  Something like

BFS(s):
   mark s as visited and insert s into queue q.  level(s)=0.
   while (queue not empty):
      v = q.remove()
      for all unvisited neighbors w of v:
        put edge (v,w) into BFS tree T
	mark w as visited, level(w)=level(v)+1
	q.insert(w)

Total time = time for BFS = O(m + n)

Note 1: There are multiple possible BFS trees from a given start s,
depending on ordering of neighbors. [example]  Same with DFS.

Note 2: What do edges in G that are *not* in the BFS tree look like? 
If G is undirected, then any such edge (u,v) must be between two
vertices at the same or adjacent levels.  In particular, no
back-edges, only cross-edges.  [not true if G is *directed*]

In particular, with respect to coloring, if coloring was illegal there
must be an edge in G between two nodes at the same level i.

How about DFS trees?  Let's write out DFS again since we're going to
use it later anyway.  Also introduce notion of "DFS numbering" which
is just the order in which vertices are first visited by the DFS.

DFS_main(G):
 count = 1.
 For v=1 to n: if v is not yet visited, do DFS(v).

DFS(v):
  mark v as visited. // entering node v
  dfsnum(v) = count++.
  for each unmarked out-neighbor w of v: 
      put (v,w) into DFS tree T
      do DFS(w).
  mark v as exited.

In an *undirected* graph, what do the non-tree edges of G look like?
Can you have cross-edges?  [between u and v such that neither is an
ancestor of the other]  No.  Only back-edges.

In a *directed* graph, you *can* have cross-edges, but they have to
point to a node of lower dfsnum that was already exited.

Say we are at some node v, what can we say about lower dfsnum nodes
that have *not* yet been exited?  They are exactly the ancestors.

========================================================================
PROBLEM 2: strong connectivity.

A directed graph G is strongly-connected if you can get from any node
to any other node.  Equivalently, for every unordered pair {u,v}, you
can get from u to v and v to u (they are mutually-reachable).

Linear-time alg to tell if G is strongly-connected?  
 - pick some node v.
 - see if v can reach everybody (DFS or BFS).
 - compute reverse graph G^R (reverse all edges)
 - do it again (i.e., see if every vertex can *reach* v in G).

If either one fails (v can't reach everybody or not everybody can
reach v) then clearly G is not strongly-connected.  If both succeed,
then G *is* strongly-connected.  Why?  


========================================================================
PROBLEM 3: strongly-connected components (SCCs).

Given any graph G, can decompose into strongly-connected components.

Def: u and v are in the same SCC if u and v are mutually-reachable.

Claim: notion of SCCs is well-defined: if u and v are in same SCC and
v and w are in same SCC then u and w are in same SCC.  Why?

	       [insert example]

So, the SCCs form a partition of the vertices of G.  If you view each
SCC as a "super-node" then what's left is a DAG.  Why?

In fact, if you run depth-first-search on G, it will behave at the
high-level like it is running on this DAG.  In particular, once you
enter some SCC, say at vertex v, you won't exit your recursive call
until you have visited all of v's component.  This suggests we might
be able to use DFS to actually *find* all the SCCs, which in fact we
can, though it ends up being pretty tricky.

Let's define the "head" of an SCC to be the first vertex v in it
reached by the DFS.  Let's now give some observations.  

Observation #1: Each SCC is a connected subtree of the DFS tree, with
its head as the root of the subtree.  [Why can't the DFS go from the
current component, off into some other component and then back before
popping?]  So, if you chopped the tree at every edge leading into a
head, you would decompose it into the strongly connected components.
So, all we need to do is identify the heads, then we can chop away...

Observation #2: v is head of an SCC iff v cannot reach a lower
dfsnum w that can also reach v.

Now, let's try to use these to construct an algorithm.  First of all,
our goal will be to identify the heads as they are exited in the DFS.
When we do identify a head, let's at that point output the SCC and
mark all vertices in it as "outputted" (can do in time proportional to
the size of the component).  So we will be outputting SCCs in reverse
topological order.   One more observation:

Observation #3: Assuming we are acting as claimed, all vertices
visited but not yet outputted are either in the current SCC or in some
ancestor SCC.  In particular, any such vertex can reach our current
node!

So, this would be great if we could somehow get DFS to output the
lowest reachable non-outputted vertex when it returns.  Could then use to
identify v as head when it's time to exit and chop away.  Unfortunately,
this is not so easy....

                                          +----->o-------+
                                          |              v
 Example:                       o--->o--->o--->o--->o--->o
                                     ^         ^    |    |
                                     +----+----|----+    |
                                               +---------+

But, what if we change "reachable" to "reachable with only at most
non-tree edge".

Define low(v) = lowest dfsnum non-outputted vertex reachable from v by
a path that uses at most one non-tree edge.  [by "non-outputted we
mean not yet outputted by the time we exit v] 

This we *can* get DFS to return, since we can compute it at the leaves
of the DFS tree, and if we know the answers for all children of v, we
can compute it for v.

Claim: low numbers are sufficient.  v is head iff low(v)=v.

   Direction 1: if low(v) < v then v is not head.
   Proof: Let w = low(v).  w is not yet outputted so by observation 3,
	  it can reach v.  So by observation 2, v is not a head.

   Direction 2: if v is not head then low(v) != v.
   Proof: if v is not head then there must be a path from subtree of v
          to an ancestor of v.  This entire path must be inside v's
	  SCC, so the first back- or cross-edge on it must be to an
	  unoutputted vertex of lower dfsnum.