15-451 Algorithms		10/12/06

* Maximum bottleneck path
* Minimum Spanning Tree recap: Prim and Kruskal	     Midterm on Tues
* Midterm review            			     1 sheet of notes, no book
=========================================================================

One nice thing about Dijkstra's algorithm is that you can modify it to
solve other related problems.  (This kind of thing often makes a good
test question)

DIJKSTRA's ALG recap (finds shortest path tree)
-----------------------------------------------
E.g.,            5     8
              A-----B-----C
              |     |     |
             1|    1|     |3     start = A.
              |  2  |  4  |
              D-----E-----F

Initialize:
 - tree = {start}.  No edges.  label start as having distance 0.

Invariant: nodes in tree have correct distance to start.

Repeat:
 - for each nbr x of tree compute an (over)-estimate:
           dist(x) =    min   [  dist(v) + length(e)  ]
                     edges e=(v,x)
                      v in tree

   In other words, by our invariant, this is the length of the
   shortest path to x whose only edge not in the tree is the very last
   edge.

 - put x of minimum distance into tree, connecting it via the argmin
        (the edge e used to get dist(x))

Actual implementation: just update dists of neighbors of the node put
onto tree, since they are the only ones that can change.

Maximum-bottleneck path
=======================
Now consider the following problem called the "maximum bottleneck
path" problem.  Imagine the edge weights represent capacities of the
edges (like "widths" rather than "lengths") and you want the path between
two nodes whose minimum capacity is largest.  How can we solve this?

Def: capacity of path = minimum capacity of edges on the path
cap(v) = capacity of max capacity path to v.

Now, we'd update:
           cap(x) =    max   [min(cap(v),w(e))]
                    edges e=(v,x)
                      v in tree

 - put x of maximum cap(x) into tree, connecting it via the argmax

We'll actually use this later in the course...
===========================================================================

MINIMUM SPANNING TREE
---------------------
A SPANNING TREE is a tree that touches all the nodes of a graph. (So,
it only makes sense in a connected graph.)   A Minimum Spanning Tree
is the shortest spanning tree (size of tree is sum of its edge lengths)

Prim's algorithm
================

Prim's algorithm is the following simple MST algorithm: 
- pick some arbitrary start node.
- add shortest edge to current tree.
- continue until tree spans all nodes.

For example:
                 5     8
              A-----B-----C
              |     |     |
             1|    1|     |6
              |  2  |  4  |
              D-----E-----F

[So, a lot like Dijsktra, except just put in the shortest edge rather
than adding the length to the distance of endpoint to the start
node.  But even though the alg is simpler than Dijkstra, the analysis
is a bit trickier...]

Now we need to prove that this works.  Will prove by induction.

  Assume inductively that there exists an MST M that contains our tree
  T built so far.  Want to show there exists an MST that contains T
  *and* the new edge e we are about to add.  If M contains e then we're
  done.  If not, then add e to M.  This creates a cycle.  Since e has
  one endpoint in T and one outside T, if we trace around this cycle
  we must eventually get to an edge e' that goes back in to T.  We
  know length(e') >= length(e) by definition of the algorithm.  So, if
  we add e to M and remove e', we get a new tree M' that is no larger
  than M was, and contains both T and e.

Running time: To make this efficient, we need a good way of storing
the edges into our current tree so we can quickly pull off the
shortest one.  If we use a heap, we can do this in log(n) time.  So,
the total time is O(m log n) where m is the number of edges and n is
the number of vertices.


Kruskal's algorithm
===================
Here's another nice algorithm, called Kruskal's algorithm. 

Algorithm: Sort edges by length and examine them from shortest to
longest. Put edge into current forest (forest is set of trees) if it
doesn't form a cycle. 

E.g., look at in this graph:

                 5     8
              A-----B-----C
              |     |     |
             1|    1|     |6
              |  2  |  4  |
              D-----E-----F

Proof of correctness: Can basically use same argument as before.
Assume by induction there exists a MST M containing all edges in our
forest F built so far.  Want to show there exists an MST M' containing
F and the next edge e we are about to include.  If M contains e then
we are done.  Else put e into M.  This makes a cycle.  Go around until
you take some edge e' jumping between trees in F.  By definition of
our algorithm, len(e) <= len(e').  So, putting in e and removing e' to
get M' can only make M better.

What about running time?  Takes O(m log m) to sort.  Then, for each
edge we need to test if it connects two different components.  This
seems like it should be a real pain: how can we tell if an edge has
both endpoints in the same component.  Turns out there's a nice data
structure called "union-find" data structure for doing this.  So fast
that it actually will be low-order compared to the sorting.

Simple version of Union-Find: total time O(m + n log n) for our series
of operations.  More sophisticated version of Union-Find: total time
O(m lg^*(n)) -- or even O(m alpha(n)), where alpha(n) is the
inverse-Ackermann function that grows even more slowly than lg^*.

-> lg^*(n) is the number of times you need to take the lg until you
	get down to 1.  So,
		lg^*(2) = 1
		lg^*(2^2 = 4) = 2		
		lg^*(2^2^2 = 16) = 3
		lg^*(2^2^2^2 = 65536) = 4
		lg^*(2^65536) = 5

[I won't define inverse-Ackerman, but to get alpha(n) to 5, you need n
to be at least a stack of 256 2's.]

Simple UNION-FIND  O(m + n log n)-time.
=================
General picture is we maintain a collection {S_1, S_2, ..., S_k} of
components, initially each with one element.  Operations:

        Union(x,y) - replace the component S_x containing x and the S_y
                     containing y with the single component S_x U S_y.
        Find(x) - return the unique ID for the component containing x (this
                        is just some representative element in it).

Given this, when we consider some edge (v,w) we just see if
Find(v)==Find(w).  To put in the edge(v,w) we do a union. 

Algorithm
---------
Each set will be just a linked list: each element has pointer to
next element in list.  But, augment so that each element also has
a pointer directly to head of list.   Head of list is the
representative element. 

Initialization: for each x, let x->head = x.   Total time O(n).

Find(x) - constant time: return(x->head).  We do this O(m) times, so
	  total time O(m).

Union(x,y) - to union the two lists (let's call them A and B with
	lengths L_A and L_B).  We append B onto end of A.  What is
	is the time needed?  Answer: we first have to walk down A to get to
	bottom element.  Then hook up head of B to this.  Then walk down B
	updating the pointers to the head.  So total time is O(L_A + L_B)

Q: Can we get just O(L_B)?
	
	Method 1: splice B into the middle of A, at x.
	Method 2: if store pointer to tail in the head, this also avoids
	walking down A.

Q: Can we reduce this to O(min(L_A,L_B))?

A: Yes.  Just store length of list in the head. Then compare and
	append shorter one onto end of longer one.  Then update the
	length count. 

Claim: this gives us the O(m + n log n) total running time.

PROOF: We already analyzed the initialization and find operations.
Each union costs O(length of list we walk down).  Key idea: 
let's pay for this by charging O(1) to each element that we walked
over.  Think of as a charge for updating its head pointer.

Now, let's look at this from the point of view of some lowly element x
sitting in a list somewhere.  Over time, how many times does x get
stepped on and have its head pointer updated?  Answer: at most O(log
n) times.  Reason: because every time x gets stepped on, the size of
the list he is in doubles (at least).

So, we were able to pay for unions by charging elements stepped on,
and no element gets charged more than O(log n), so total cost is O(n
log n) for unions, or O(m + n log n) overall.

======================================================================

MIDTERM REVIEW
==============

- Last year's midterm now up on webpage.

- Main topics:

 - recurrences
 - probabilistic reasoning
 - reasoning about upper and lower bounds
 - binary search trees
 - hashing
 - Dynamic programming
 - MST and shortest paths