15-451 Algorithms 10/05/06 Graph algorithms I * Basic notation/terminology * DFS for Topological sorting * Dynamic-Programming algs for shortest paths - Bellman-Ford, Floyd-Warshall (all-pairs SP) ===================================================================== Graphs ====== A lot of algorithmic problems can be modeled as problems on graphs. Today we will talk about a couple basic ones and we will continue talking about graph algorithms for a lot of the rest of the course. Reminder of basic terminology: A graph is a set of NODES, or VERTICES, with edges between some of the nodes. V = set of vertices, E = set of edges. If there's an edge between two vertices, we call them NEIGHBORS. Degree of a vertex is the number of neighbors it has. standard notation: n = |V|, m = |E|. If we don't allow self-loops, what is max number of total edges? {n \choose 2} In a directed graph, each edge has a direction. Can have up to n(n-1) edges now. For each node, can talk about out-neighbors (and out-degree) and in-neighbors (and in-degree). ============================================================================== TOPOLOGICAL SORTING. A "DAG" or directed-acyclic-graph is a directed graph without any cycles. E.g., A----->D--->E |\ ^ | \ | | \ | | | / V V/ C<--B<----F Given a DAG, the "topological sorting" problem is: find an ordering of the vertices such that all edges go forward in the ordering. Typically comes up when given a set of tasks to do with precedence constraints (need to do A and F before you can do B), and want to find a legal ordering of performing the jobs. Here is a nice linear-time algorithm based on depth-first search (can also use it to tell if a directed graph has any cycles). To be specific, by "DFS of G" we mean "pick a node and do DFS from there. If the whole graph hasn't been visited, then pick an unvisited node and repeat the process. Continue until everything is visited". I.e., DFS_main(G): For v=1 to n: if v is not yet visited, do DFS(v). DFS(v): mark v as visited. // entering node v for each unmarked out-neighbor w of v: do DFS(w). // exiting node v. How to solve Topological Sorting #2: 1. Do depth-first search of G, and output the nodes as you *exit* them. 2. reverse the order. Claim: If there is an edge from u to v, then v is exited first. (This implies that when we reverse the order, we have a topological sorting.) Proof of claim: [Think of u=B, v=D above.] Easy to see if our DFS started at u before starting at v. What if we started at v first? In this case, we would finish v before even starting u since there can't be a path from v to u (else wouldn't be acyclic). =========================================================================== SHORTEST PATHS Now going to turn to another basic graph problem, of finding shortest paths, and look at some algs based on Dynamic Programming. Given: a weighted, directed graph (each edge has a "weight" or "length"), a start node S and a destination node T. Goal: find shortest path from S to T. Will allow for negative-weight edges (we'll see later some problems where this comes up when using shortest-path algs as a subroutine) but will assume no negative-weight cycles (else the shortest path has length negative-infinity). The BELLMAN-FORD Algorithm ========================== (Note: Bellman is the person credited for inventing Dynamic Programming). Algorithm will in fact find the shortest path from all nodes v in the graph to T (not just from S). 30 15 Say we have a graph 5-------------------3-----------------4 like this: | | | 10 | -40| 40 | | | | | v | 2-------------------1-----------------0-- 30 60 T How can we use DP to solve? First of all, as usual for DP, let's just compute the *lengths* of the shortest paths first, and then we can reconstruct the paths. Alg idea: - find length of shortest path that uses 1 or fewer edges (or infty if none) [easy: if v=T, get 0; if v has edge to T, get length of edge; else infty] - Now, suppose for all v we have solved for length of shortest path to T that uses i-1 or fewer edges. How can we use to solve for i? Answer: shortest path from v to T that uses i or fewer edges will first go to some neighbor x, and then take the shortest path from x to T that uses i-1 or fewer edges, which we've already solved for! So just take the min over all neighbors x. How far do we need to go? Answer: at most i=n-1 edges. BELLMAN-FORD pseudocode: - d[v][i] will be the length of the shortest path from v that uses i or fewer edges (if it exists) or else infinity otherwise. ("d" for "distance") - we'll have base case be paths of length 0 instead of length 1. initialize d[v][0] = infinity for v != T. d[T][i]=0 for all i. for i=1 to n-1: for each v not equal to T: d[v][i] = MIN ( length(vx edge) + d[x][i-1] ) v->x (try it on the above graph!) Inner MIN operation takes time proportional to out-degree of v. So, inner for-loop takes time O(sum of out-degrees of all nodes) = O(m). --> total time is O(mn). How to recosntruct the paths? Work backwards: if you're at vertex v at distance d[v], move to node x such that d[v] = d[x] + len(vx). ======================================================================== Here is a generalization we can also solve using Dynamic Programming. We'll see how to do it in 2 different ways. ALL-PAIRS SHORTEST PATHS ------------------------ Say we want to compute length of shortest path between *every* pair of vertices. If we use Bellman-Ford for all n possible starts S, this will take time O(mn^2). Here is a nice algorithm that uses the matrix representation of a graph, and runs in time O(n^3 log n). ALL-PAIRS SHORTEST PATHS via MATRIX PRODUCTS -------------------------------------------- Given graph G, define matrix A(G) as follows: - A[i,i] = 0 for all i. - if there is an edge from i to j, then A[i,j] = length of that edge. - otherwise, A[i,j] = infinity. I.e., A[i,j] = length of shortest path from i to j using 1 or fewer edges. Now, following the basic Dynamic Programming idea, can we use this to produce a new matrix B where B[i,j] = length of the shortest path from i to j using 2 or fewer edges? Answer: yes. B[i,j] = min_k (A[i,k] + A[k,j]) [think about why this is true] I.e., what we want to do is compute a matrix product B = A*A except we change "*" to "+" and we change "+" to "min". In other words, instead of computing the sum of products, we compute the min of sums. What if we now want to get shortest paths using 4 or fewer edges? Just need to compute C = B*B. I.e., to get from i to j using 4 or fewer edges, we need to go from i to some k using 2 or fewer edges, and then from k to j using 2 or fewer edges. Just need to keep squaring O(log n) times. Running time: O(n^3 log(n)). Need to do log(n) matrix multiplies to get all paths of length n or less and standard matrix multiply takes n^3 operations. ============================================================================== All-pairs shortest paths via Floyd-Warshall. Here is an algorithm that shaves off the O(log n) and runs in time O(n^3). The idea is that instead of increasing the number of edges in the path, we'll increase the set of vertices we allow as intermediate nodes in the path. In other words, starting from the same base case (the shortest path that uses no intermediate nodes), we'll then go on to the shortest path that's allowed to use node 1 as an intermediate node. Then the shortest path that's allowed to use {1,2} as intermediate nodes, and so on. /* after each iteration of the outside loop, A[i][j] = len of i->j path that's allowed to use vertices in the set 1..k */ for k = 1 to n do: for each i,j do: A[i][j] = min( A[i][j], (A[i][k] + A[k][j]); i.e., you either go through k or you don't. Total time is O(n^3). What's amazing here is how compact and simple the code is! ===================================================================