Strongly-connected components (SCCs). First, an easier problem: A directed graph G is strongly-connected if you can get from any node to any other node. Equivalently, for every unordered pair {u,v}, you can get from u to v and v to u (they are mutually-reachable). Linear-time alg to tell if G is strongly-connected? - pick some node v. - see if v can reach everybody (DFS or BFS). - compute reverse graph G^R (reverse all edges) - do it again (i.e., see if every vertex can *reach* v in G). If either one fails (v can't reach everybody or not everybody can reach v) then clearly G is not strongly-connected. If both succeed, then G *is* strongly-connected. Why? What if G is *not* strongly-connected? Given any graph G, can decompose into strongly-connected components. Def: u and v are in the same SCC if u and v are mutually-reachable. Claim: notion of SCCs is well-defined: if u and v are in same SCC and v and w are in same SCC then u and w are in same SCC. Why? [insert example] So, the SCCs form a partition of the vertices of G. If you view each SCC as a "super-node" then what's left is a DAG. Why? In fact, if you run depth-first-search on G, it will behave at the high-level like it is running on this DAG. In particular, once you enter some SCC, say at vertex v, you won't exit your recursive call until you have visited all of v's component. This suggests we might be able to use DFS to actually *find* all the SCCs, which in fact we can, though it ends up being pretty tricky. Let's define the "head" or "base" of an SCC to be the first vertex v in it reached by the DFS. Let's now give some observations. Observation #1: Each SCC is a connected subtree of the DFS tree, with its head (base) as the root of the subtree. [Why can't the DFS go from the current component, off into some other component and then back before popping?] So, if you chopped the tree at every edge leading into a head, you would decompose it into the strongly connected components. So, all we need to do is identify the heads, then we can chop away... Observation #2: v is head of an SCC iff v cannot reach a lower dfsnum w that can also reach v. Now, let's try to use these to construct an algorithm. First of all, our goal will be to identify the heads as they are exited in the DFS. When we do identify a head, let's at that point output the SCC and mark all vertices in it as "outputted" (can do in time proportional to the size of the component). So we will be outputting SCCs in reverse topological order. One more observation: Observation #3: Assuming we are acting as claimed, all vertices visited but not yet outputted are either in the current SCC or in some ancestor SCC. In particular, any such vertex can reach our current node! So, this would be great if we could somehow get DFS to output the lowest reachable non-outputted vertex when it returns. Could then use to identify v as head when it's time to exit and chop away. Unfortunately, this is not so easy.... +----->o-------+ | v Example: o--->o--->o--->o--->o--->o ^ ^ | | +----+----|----+ | +---------+ But, what if we change "reachable" to "reachable with only at most one non-tree edge". Define low(v) = lowest dfsnum non-outputted vertex reachable from v by a path that uses at most one non-tree edge. [by "non-outputted we mean not yet outputted by the time we exit v] This we *can* get DFS to return, since we can compute it at the leaves of the DFS tree, and if we know the answers for all children of v, we can compute it for v. Claim: low numbers are sufficient. v is head iff low(v)=v. Direction 1: if low(v) < v then v is not head. Proof: Let w = low(v). w is not yet outputted so by observation 3, it can reach v. So by observation 2, v is not a head. Direction 2: if v is not head then low(v) != v. Proof: if v is not head then there must be a path from subtree of v to an ancestor of v. This entire path must be inside v's SCC, so the first back- or cross-edge on it must be to an unoutputted vertex of lower dfsnum.