Background Notes on Depth First Search
CMU 15451 Spring 2006 D. Sleator
(Adapted from my lecture in 15211 in Fall 2002)
There's a very efficient, elegant, search procedure that is
the foundation for algorithms for solving many problems on
graphs.
* Determine the connected components of a graph
* Find a cycle in a graph (directed or undirected), or
verify that it does not have one.
* Determine if a graph is bipartite.
* Topologically sort a directed graph (see def below).
and many other things that are beyond the scope of this course:
* Find the biconnected components of an undirected graph
* Determine if a graph is planar, and find an embedding
if it is
* Find the strong components of a directed graph.
All can be solved in time O(n+m) in a graph of n edges and m
vertices, i.e. linear time, using DFS. (We say this is
"linear time" because the size of the adjacency list
representation of the graph is Theta(n+m).)
Basic DFS algorithm: Each vertex has a binary field called
"visited(v)". Here's the algorithm in pseudocode:
for all v in V do visited(v) = 0
for all v in V do if (visited(v)==0) DFS(v)
DFS(v) {
visited(v) = 1
for all w in adj(v) do if (visited(w)==0) DFS(w)
}
Claim 1: DFS is called exactly once for each vertex in the graph.
Proof: Clearly DFS(x) is called for a vertex x only if
visited(x)==0. The moment it's called, visited(x) is set to
1. Therefore the DFS(x) cannot be called more than once for
any vertex x. Furthermore, the loop "for all v...DFS(v)"
ensures that it will be called for every vertex at least
once. QED.
Claim 2: The body of the "for all w" loop is executed
exactly once for each edge (v,w) in the graph.
Proof: DFS(v) is called exactly once for each vertex
v (Claim 1). And the body of the loop is executed once for
all the edges out of v. QED.
Therefore the running time of the DFS algorithm is O(n+m).
Topological Sorting

If G is a DAG, then there exists an order in which the
vertices can be listed such that if u is before v in the
list then there is no edge from u to v. (The edges
point toward the front of the list.) This is called
"topological order".
Application: Suppose each vertex represents a task that must
be completed, and an edge (u,v) indicates that task u
depends on task v. That is v must be completed before
u. The topological ordering of the vertices is a valid
order in which you can complete the tasks.
It's easy to see that such an ordering exists: Find a vertex
with no outedges (there must be such if the graph is
acyclic). Print it, delete it from the graph, and repeat.
This is slow.
(It's also easy to see that if the graph has a cycle,
there's no topological order.)
Here's a fast way to do this using DFS (note the only
difference between this and the algorithm above is the print
statement):
for all v in V do visited(v) = 0
for all v in V do if (visited(v)==0) DFS(v)
DFS(v) {
visited(v) = 1
for all w in adj(v) do if (visited(w)==0) DFS(w)
print v
}
Claim: If the graph is acyclic, then when v is printed, each
element of adj(v) has already been printed.
Proof: At any point in the middle of the execution of the
algorithm, there's set of vertices on the call stack. We
can draw this as follows:
v1

v2

v3

...

vk
This means that the first call was DFS(v1), then DFS(v2),
etc. [v1...vk] must be a simple path in the graph, because
each successive call is to a neighbor of the previous one.
Consider the point when we're working on vertex vk, and a
neighbor w of vk. That is, edge (vk, w). Vertex w cannot
be in the set {v1, v2...vk} because if it was, it would form
a cycle.
There are now two cases to consider.
Case 1: visited(w)==1. In this case w has already been
visited, and because it's not on the call stack, the call to
DFS(w) must have been completed by now. Therefore w has
been printed out already.
Case 2: visited(w)==0. In this case we do a recursive call
to DFS(w). The last thing that happens in this call is
"print w". So by the time we print vk, we've already
printed w.
So in either case, w is printed before vk. Since DFS
processes every edge in this way (Claim 2 above), we've
shown that for every edge (v,w) that w is printed before v.
This proves that the printing has been done in topological
order.
QED.
Finding a cycle in a directed graph

As the analysis of topological sort shows, all we really
need to do this is to keep track of which vertices are on
the stack. We can do this explicitely by adding a new field
for each vertex called mark(v).
for all v in V do visited(v) = 0
for all v in V do mark(v) = 0
for all v in V do if (visited(v)==0) DFS(v)
DFS(v) {
mark(v) = 1
visited(v) = 1
for all w in adj(v) do {
if (visited(w)==0) {
DFS(w)
} else {
if (mark(w)==1) print "found a cycle"
}
}
mark(v) = 0
}
Theorem: The above algorithm prints "found a cycle" if and
only if the graph has a cycle.
Proof: If "found a cycle" is printed, there is a cycle. To
see this, let (v,w) be the edge being explored when the
message is printed. As noted in the analysis of topological
sort, all vertices currently on the stack form a path. This
path therefore extends from w to v. The edge (v,w)
therefore completes a cycle.
To see the converse, modify the above algorithm trivially by
putting "print v" right after "mark(v)=0". Now this
algorithm is printing vertices in the same order that the
topological sorting algorithm above does. If the algorithm
does not print "found a cycle", then (by the same argument we
used in topological sort), the list of vertex names must be
a topological ordering of the vertices. The existance of a
topological ordering of the vertices proves that there is no
cycle.
QED.
Connected Components

Here we have an undirected graph and we want to label each
vertex with a component number (comp(v)) such that two
vertices are connected if and only if they have the same
component number. Here's the DFS algorithm for this:
for all v in V do visited(v) = 0
c = 0 (a global variable)
for all v in V do {
if (visited(v)==0) {
DFS(v)
c = c+1
}
}
DFS(v) {
visited(v) = 1
comp(v) = c
for all w in adj(v) do if (visited(w)==0) DFS(w)
}
Intuition: This works because in a toplevel call to DFS(v)
we visit every vertex that is in the same connected
component with v.
Determining Bipartiteness

Given an undirected graph (if directed convert to
undirected) determine if it's bipartite, and if so, label
each vertex with the part number (part(v))
for all v in V do visited(v) = 0
for all v in V do if (visited(v)==0) DFS(v,0)
DFS(v,p) {
visited(v) = 1
part(v) = p
for all w in adj(v) do {
if (visited(w)==0) {
DFS(w,1p)
} else {
if (part(v) == part(w)) print "graph not bipartite"
}
}
}
Proof sketch: The proof of correctness is not hard, and is
based on the following observation. For each edge (v,w),
either we label part(w) differently from part(v), or we
check that this has already been accomplished.