Background Notes on Depth First Search CMU 15-451 Spring 2006 D. Sleator (Adapted from my lecture in 15-211 in Fall 2002) There's a very efficient, elegant, search procedure that is the foundation for algorithms for solving many problems on graphs. * Determine the connected components of a graph * Find a cycle in a graph (directed or undirected), or verify that it does not have one. * Determine if a graph is bipartite. * Topologically sort a directed graph (see def below). and many other things that are beyond the scope of this course: * Find the biconnected components of an undirected graph * Determine if a graph is planar, and find an embedding if it is * Find the strong components of a directed graph. All can be solved in time O(n+m) in a graph of n edges and m vertices, i.e. linear time, using DFS. (We say this is "linear time" because the size of the adjacency list representation of the graph is Theta(n+m).) Basic DFS algorithm: Each vertex has a binary field called "visited(v)". Here's the algorithm in pseudo-code: for all v in V do visited(v) = 0 for all v in V do if (visited(v)==0) DFS(v) DFS(v) { visited(v) = 1 for all w in adj(v) do if (visited(w)==0) DFS(w) } Claim 1: DFS is called exactly once for each vertex in the graph. Proof: Clearly DFS(x) is called for a vertex x only if visited(x)==0. The moment it's called, visited(x) is set to 1. Therefore the DFS(x) cannot be called more than once for any vertex x. Furthermore, the loop "for all v...DFS(v)" ensures that it will be called for every vertex at least once. QED. Claim 2: The body of the "for all w" loop is executed exactly once for each edge (v,w) in the graph. Proof: DFS(v) is called exactly once for each vertex v (Claim 1). And the body of the loop is executed once for all the edges out of v. QED. Therefore the running time of the DFS algorithm is O(n+m). Topological Sorting ------------------- If G is a DAG, then there exists an order in which the vertices can be listed such that if u is before v in the list then there is no edge from u to v. (The edges point toward the front of the list.) This is called "topological order". Application: Suppose each vertex represents a task that must be completed, and an edge (u,v) indicates that task u depends on task v. That is v must be completed before u. The topological ordering of the vertices is a valid order in which you can complete the tasks. It's easy to see that such an ordering exists: Find a vertex with no out-edges (there must be such if the graph is acyclic). Print it, delete it from the graph, and repeat. This is slow. (It's also easy to see that if the graph has a cycle, there's no topological order.) Here's a fast way to do this using DFS (note the only difference between this and the algorithm above is the print statement): for all v in V do visited(v) = 0 for all v in V do if (visited(v)==0) DFS(v) DFS(v) { visited(v) = 1 for all w in adj(v) do if (visited(w)==0) DFS(w) print v } Claim: If the graph is acyclic, then when v is printed, each element of adj(v) has already been printed. Proof: At any point in the middle of the execution of the algorithm, there's set of vertices on the call stack. We can draw this as follows: v1 | v2 | v3 | ... | vk This means that the first call was DFS(v1), then DFS(v2), etc. [v1...vk] must be a simple path in the graph, because each successive call is to a neighbor of the previous one. Consider the point when we're working on vertex vk, and a neighbor w of vk. That is, edge (vk, w). Vertex w cannot be in the set {v1, v2...vk} because if it was, it would form a cycle. There are now two cases to consider. Case 1: visited(w)==1. In this case w has already been visited, and because it's not on the call stack, the call to DFS(w) must have been completed by now. Therefore w has been printed out already. Case 2: visited(w)==0. In this case we do a recursive call to DFS(w). The last thing that happens in this call is "print w". So by the time we print vk, we've already printed w. So in either case, w is printed before vk. Since DFS processes every edge in this way (Claim 2 above), we've shown that for every edge (v,w) that w is printed before v. This proves that the printing has been done in topological order. QED. Finding a cycle in a directed graph ----------------------------------- As the analysis of topological sort shows, all we really need to do this is to keep track of which vertices are on the stack. We can do this explicitely by adding a new field for each vertex called mark(v). for all v in V do visited(v) = 0 for all v in V do mark(v) = 0 for all v in V do if (visited(v)==0) DFS(v) DFS(v) { mark(v) = 1 visited(v) = 1 for all w in adj(v) do { if (visited(w)==0) { DFS(w) } else { if (mark(w)==1) print "found a cycle" } } mark(v) = 0 } Theorem: The above algorithm prints "found a cycle" if and only if the graph has a cycle. Proof: If "found a cycle" is printed, there is a cycle. To see this, let (v,w) be the edge being explored when the message is printed. As noted in the analysis of topological sort, all vertices currently on the stack form a path. This path therefore extends from w to v. The edge (v,w) therefore completes a cycle. To see the converse, modify the above algorithm trivially by putting "print v" right after "mark(v)=0". Now this algorithm is printing vertices in the same order that the topological sorting algorithm above does. If the algorithm does not print "found a cycle", then (by the same argument we used in topological sort), the list of vertex names must be a topological ordering of the vertices. The existance of a topological ordering of the vertices proves that there is no cycle. QED. Connected Components -------------------- Here we have an undirected graph and we want to label each vertex with a component number (comp(v)) such that two vertices are connected if and only if they have the same component number. Here's the DFS algorithm for this: for all v in V do visited(v) = 0 c = 0 (a global variable) for all v in V do { if (visited(v)==0) { DFS(v) c = c+1 } } DFS(v) { visited(v) = 1 comp(v) = c for all w in adj(v) do if (visited(w)==0) DFS(w) } Intuition: This works because in a top-level call to DFS(v) we visit every vertex that is in the same connected component with v. Determining Bipartiteness ------------------------- Given an undirected graph (if directed convert to undirected) determine if it's bipartite, and if so, label each vertex with the part number (part(v)) for all v in V do visited(v) = 0 for all v in V do if (visited(v)==0) DFS(v,0) DFS(v,p) { visited(v) = 1 part(v) = p for all w in adj(v) do { if (visited(w)==0) { DFS(w,1-p) } else { if (part(v) == part(w)) print "graph not bipartite" } } } Proof sketch: The proof of correctness is not hard, and is based on the following observation. For each edge (v,w), either we label part(w) differently from part(v), or we check that this has already been accomplished.