15-451 Algorithms	10/19/00
* Graphs Algorithms II
 - Dijkstra's alg for shortest paths
 - Prim's alg for MST
 - Kruskal's alg for MST
=======================================================================
Dynamic Programming is a good approach to try when you're trying to
get an exponential-time algorithm down to polynomial-time.  (Other
good techniques are network flow and linear programming which we'll
talk about later).  But, DP will usually get you just down to O(n^2)
or O(n^3) time.  If you need something faster, that's when you start
thinking about data structures and algorithms that can take advantage
of them.

In recitation, we saw this with the longest-increasing-subsequence
problem. 

Today - start by talking again about shortest path problem.  Have a
"goal" node and want to find shortest path from everywhere to the
goal.  We saw that Bellman-Ford takes time O(E*V).  Today we'll look
at a faster alg (Dijkstra's alg) which requires there are no
negative-weight edges.  Try to see if you can figure out where the
argument breaks down if negative-weight edges are present.

===============================================================================

Dijkstra's algorithm.  Idea is to build up shortest-path tree in greedy way.

E.g.,
                 5     8
              A-----B-----C
              |     |     |
             1|    1|     |6     goal = A.
              |  2  |  4  |
              D-----E-----F

DIJKSTRA's ALG
--------------
initialize:
 - tree = {goal}.  No edges.  goal.dist = 0.

invariant: nodes in tree have correct distance to goal.

repeat:
 - for each nbr x of tree compute an (over)-estimate of distance to goal:
	   x.dist =    min   [  v.dist + length(e)  ]
                  edges e=(x,v)
                    v in tree

 - put x of minimum x.dist into tree, connecting it via the argmin
	(the edge that caused x.dist to be what it is)

[do above example]

Claim: even if some of dists are wrong, the *minimum* one is correct.
          
Proof: say x is node of smallest x.dist.  What does the *true*
shortest path from x look like?  The point is that the first edge this
path takes must go directly into the tree, since it has to enter the
tree sometime, and so any other path would have length at least y.dist
from some y!=x, and by definition, x.dist is smaller.

Notice that this argument would break down if there was a
negative-cost way of getting from x to y.  That's why Dijkstra's
algorithm doesn't allow negative-cost edges.

RUNNING TIME & IMPLEMENTATION:
First of all, we should only recompute x.dist when it could possibly
have changed.  So, a better implementation would look more like this:

Init: goal.dist=0, v.dist = infinity for v != goal.  Put all nodes in
data structure H.

Repeat:
 pull out v of min v.dist from H.  Mark as being in the tree.
 for each unmarked neighbor x of v:
	set x.dist = min (x.dist, v.dist + length(xv edge))
	if new value is smaller, set x.parent = v. 

Operations done on H:
	insert: |V| times
	remove-min: |V| times.
	decrease-key: |E| times.

Linked list: insert is O(1).  descrease-key is O(1).  remove-min is O(V).
Regular heap: all operations are O(log V).
Fibonacci heap: insert, decrease-key are O(1), remove-min is O(log V)
	(amortized, but that's fine  for us).

So, total time using a fibonacci heap is O(V*logV + E).  So, this is
linear time for dense graphs (average degree >= log(V)).

============================================================================

Now turn to related problem of MINIMUM SPANNING TREE.

A SPANNING TREE is a tree that touches all the nodes of a graph. (So,
it only makes sense in a connected graph.)   A MST is the shortest
spanning tree (size of tree is sum of its edge lengths)

For instance, can imagine you're setting up a communication network
between these nodes or something like that.

Let's go back to the above example graph:

                 5     8
              A-----B-----C
              |     |     |
             1|    1|     |6
              |  2  |  4  |
              D-----E-----F

What is the MST here?  (total length = 14)


Nice algorithm that's a lot like Dijkstra's shortest path alg called
Prim's algorithm.  

Prim's algorithm also grows the tree one edge at a time, beginning at
some arbitrary starting node.  It's simpler than Dijkstra's, though,
because the Algorithm is simply: at each step add on the shortest edge
into the tree that doesn't create a cycle.  I.e., to grow the tree,
put in the node of smallest distance to the *tree* (not to the *root*).

Prim: 
  initialize: tree = {root}.  No edges.

  repeat: add in the shortest edge a between tree node and a non-tree node.

--> run Prim on above graph

Proof is correctness is less obvious, though.  Will prove by induction.

First: neat fact about spanning trees:  If you take a spanning tree
and add an edge to it, you get a cycle.  If you then remove any edge
in the cycle, you get back a spanning tree.

Proof of correctness of Prim's alg:
* Inductively, assume tree built so far is consistent with an MST (there
  might be more than one).  This is certainly true at the start.

* Need to show this is maintained each time we add an edge.  Say T is
  tree so far, and M is MST consistent with it.  If new edge is in M,
  then fine.  If new edge is not in M, then edge forms a cycle with M.
  This edge e goes from T to outside, so if we follow the cycle we must
  eventually get to an edge e' that goes back in to T.  We know
  length(e') >= length(e) by definition of the algorithm.  So, if we add
  e to M and remove e', we get a new tree that is no longer than M was.
  This maintains inductive hypothesis.

Running time is like Dijkstra: O(E log V) if use regular heap, or O(E
+ V log V) if use fibonacci heap.
----------------------

Here's another nice algorithm, called Kruskal's algorithm. 

Algorithm: Sort edges by length and examine them from shortest to
longest. Put edge into current forest (forest is set of trees) if it
doesn't form a cycle. 

E.g., look at in this graph:

                 5     8
              A-----B-----C
              |     |     |
             1|    1|     |6
              |  2  |  4  |
              D-----E-----F

Proof of correctness: Can basically use same argument as before:  assume
inductively there is some MST M consistent with forest F found so
far.  Now we add in a new edge e between components C and C' of our
forest.  We know that M gets between C and C' somehow.  In particular,
either M has e in it, or else e forms a cycle with M, and this cycle has some
other edge e' in it that connects C to some other component (maybe C'
or maybe an intermediate one) in our forest.  By definition of our
algorithm, len(e) <= len(e').  So either M has e already, or
substituting e for e' produces a tree M' that is just as good and
maintains our inductive hypothesis.

What about running time?  Takes O(E log E) to sort.  Then, for each
edge we need to test if it connects two different components.  Turns
out there's a nice data structure called "union-find" data structure
for doing this.  

General picture is of this ADT is we maintain a collection 
{S_1, S_2, ..., S_k} of disjoint sets over some universe.
Operations are: 
	MakeSet(x) - creates set containing just x
	Union(x,y) - replace the set S_x containing x and the set S_y
			containing y with the single set S_x U S_y.
	Find(x) - return the unique ID for the set containing x (this
			is just some representative element of this set).

Given this, we begin with MakeSet(v) for all vertices v.  When we
consider some edge (v,w) we just see if Find(v)==Find(w).  To put in
the edge(v,w) we do a union.    Turns out there's a nice data
structure for doing this with "almost constant" amortized time per
operation (get into specifics next time).  

So, total time is O(E log E + E * alpha(E,V)) where alpha(E,V) is
"almost constant".  This is dominated by the sorting time and is O(E
log E).  But, if for some reason we have the edges already sorted by
length, then get time of O(E * alpha(E,V)), where basically alpha(E,V) <= 4.

(to make alpha > 4, would need E > 2^2^2^65536  (actually, a couple
more 2's in the tower))