15-451 Algorithms 10/20/05 * Graphs Algorithms III: fun with BFS and DFS - 2-coloring - properties of BFS and DFS trees - strongly-connected components ======================================================================= PROBLEM 1: GRAPH COLORING As you know, each semester the registrar schedules final exams. Can think of this as a graph problem. Each node is a class, and there is an edge between two if lots of students are taking both classes. Goal is to use fewest number of different time slots such that no two adjacent nodes are given the same time slot. E.g., pentagon. Graph Coloring Problem ---------------------- Given a graph, want to give each node a color such that no two neighbors have the same color. Goal is to do this with the fewest total number of different colors (easy to do with n colors). In the above scheduling problem, each color is a different time slot. In compilers, this comes up when assigning variables to registers. Unfortunately, graph-coloring problem is NP-hard. In fact, even telling if possible to color with <= 3 colors is NP-hard in general. But, 2-coloring problem (color with 2 colors if possible else say "impossible") is easy. How to do it? Turns out it's pretty simple. Pick some vertex v and give it one color (say, red). Then all its neighbors had better be the other color (say, blue). All their neighbors had better be red, their neighbors blue and so forth, sweeping across the graph until everything in v's connected component has a color. Now, we check: if there is any bad edge (both endpoints the same color) then G is not 2-colorable: why? because all our decisions were forced. Otherwise we have a legal 2-coloring of v's component and we run this procedure again on any remaining components (which don't interfere with each other since there are no edges between components). Another term for a graph being 2-colorable is that it is bipartite. So, this is an algorithm for testing if a graph is bipartite, and if so, finding a bipartition. One way to think of this is we are doing breadth-first search. (can actually use DFS to do this too, but BFS is easier to visualize). Reminder about BFS and discussion of BFS trees and DFS trees ------------------------------------------------------------ Breadth-first search: start at some vertex v. Then go to all neighbors (call them "level 1" since they're distance 1 from v) then to their unvisited neighbors (level 2 at distance 2 from v) and so on, sweeping this wavefront across the graph. Operationally, we can do this by storing current wavefront (fringe) in a queue. Something like BFS(s): mark s as visited and insert s into queue. level(s)=0. while (queue not empty): v <- queue.remove() for all unvisited neighbors w of v: put edge (v,w) into BFS tree T mark w as visited, level(w)=level(v)+1 queue.insert(w) Total time = time for BFS = O(m + n) Note 1: There are multiple possible BFS trees from a given start s, depending on ordering of neighbors. [example] Same with DFS. Note 2: What do edges in G that are *not* in the BFS tree look like? If G is undirected, then any such edge (u,v) must be between two vertices such that |level(u)-level(w)| <= 1. In particular, no back-edges, only cross-edges. [not true if G is *directed*] In particular, with respect to coloring, if coloring was illegal there must be an edge in G between two nodes at the same level i. How about DFS trees? Let's write out DFS again since we're going to use it later anyway. Also introduce notion of "DFS numbering" which is just the order in which vertices are first visited by the DFS. DFS_main(G): count = 1. For v=1 to n: if v is not yet visited, do DFS(v). DFS(v): mark v as visited. // entering node v dfsnum(v) = count++. for each unmarked out-neighbor w of v: put (v,w) into DFS tree T do DFS(w). mark v as exited. In an *undirected* graph, what do the non-tree edges of G look like? Can you have cross-edges? [between u and v such that neither is an ancestor of the other] No. Only back-edges. In a *directed* graph, you *can* have cross-edges, but they have to point to a node of lower dfsnum that was already exited. More coloring notes ------------------- Here's a neat problem if you like to doodle. Draw a box (or circle, it doesn't matter). Then slice it with straight lines. Now, color the regions so no two adjacent (sharing an edge) have the same color. Claim: this can always be done with at most 2 colors. Can anyone see a proof? (So, good for doodling in pencil). ======================================================================== PROBLEM 2: strong connectivity. A directed graph G is strongly-connected if you can get from any node to any other node. Equivalently, for every unordered pair {u,v}, you can get from u to v and v to u (they are mutually-reachable). Linear-time alg to tell if G is strongly-connected? - pick some node v. - see if v can reach everybody (DFS or BFS). - compute reverse graph G^R (reverse all edges) - do it again (i.e., see if every vertex can *reach* v in G). If either one fails (v can't reach everybody or not everybody can reach v) then clearly G is not strongly-connected. If both succeed, then G *is* strongly-connected. Why? ======================================================================== PROBLEM 3: strongly-connected components (SCCs). Given any graph G, can decompose into strongly-connected components. Def: u and v are in the same SCC if u and v are mutually-reachable. Claim: notion of SCCs is well-defined: if u and v are in same SCC and v and w are in same SCC then u and w are in same SCC. Why? [insert example] So, the SCCs form a partition of the vertices of G. If you view each SCC as a "super-node" then what's left is a DAG. Why? In fact, if you run depth-first-search on G, it will behave at the high-level like it is running on this DAG. In particular, once you enter some SCC, say at vertex v, you won't exit your recursive call until you have visited all of v's component. This suggests we might be able to use DFS to actually *find* all the SCCs, which in fact we can, though it ends up being pretty darn tricky. Let's start with some observations. Observation #1: if we look at the DFS-tree, each SCC has a unique "head" which is the first vertex of the component visited in the search. Furthermore, the component is connected in the tree. [Why can't the DFS go from the current component, off into some other component and then back before popping?] So, if you chopped the tree at every edge leading into a head, you would decompose it into the strongly connected components. So, all we need to do is identify the heads, then we can chop away... Also note that the head is an ancestor of all nodes in the SCC. Observation 2: suppose we're at some vertex v. All vertices with a lower dfsnum than v that have not yet been exited are ancestors of v, and vice-versa. So, v is the head of an SCC iff there is no path from v to a lower dfsnum w not yet exited. [Proof: if v is head then there better not be such a path since w is ancestor. If v is not the head then v's head needs to be an ancestor of v and by definition of SCC, that node had better be reachable from v.] So, this would be great if we could somehow get DFS to output the lowest reachable non-exited vertex when it returns. Unfortunately, this is not so easy.... Example 1: [leap-frog graph] o--->o--->o--->o--->o--->o ^ ^ | | +----|----+ | +---------+ <---+ (edge to 1 or 3) | Example 2: 1--->2---3-->4--->5 | ^ | | +-->6-->7 Instead, let's define one more category for vertices: "outputted". When we find the head of some component, we then label all not-yet outputted vertices in its subtree as being in its SCC and mark them as outputted. Can actually do this without invoking new DFS and just using a stack, but let's not worry about that. Define low(v) = lowest dfsnum non-outputted vertex reachable from v by a path that uses at most one back-edge or cross-edge. [by "non-outputted we mean not yet outputted by the time we exit v] This we *can* get DFS to return, since we can compute it at the leaves of the DFS tree, and if we know the answers for all children of v, we can compute it for v. Claim: low numbers are sufficient. v is head iff low(v)=v. Direction 1: if v is head then low(v)=v. Proof: if there's a back edge from v's subtree to ancestor of v then clearly v can't be a head. But even if there is a cross-edge to some w, let u be the head of w's component. If u isn't an ancestor of v then (inductively) we would have outputted it already. So if w is not outputted, then u *is* an ancestor of v, and so v is not the head of its component. Direction 2: if v is not head then low(v) != v. Proof: if v is not head then there must be a path from subtree of v to an ancestor of v. This entire path must be inside v's SCC, so the first back- or cross-edge on it must be to an unoutputted vertex. So, that's it...