17 Apr 1996

Outline

Priority first search
Correctness of algorithm

Priority first search

We use priority first search for finding shortest paths: today we'll see some examples and why it works.

Draw some weighted graph. E.g.,

       5
    A-----B
  2 |    /|
    C---' | 
  2 |  2  | 11
    D     |
  2 |    ,'
    E---'

First, here's another way to think about it.

Think of an army of ants (or whatever) all starting at the START vertex (say, A), moving at unit speed away from it in all directions. Whenever a group of ants hits a vertex, they split apart going in all directions. The first ant to hit a node has gone along the shortest path.

The idea of priority first search is to simulate this in a fast way. In PFS, when we say vertex is ``done'', this corresponds to ant having already visited it (so we know the shortest path). The ``fringe'' is the set of vertices not yet visited but where there's an ant on an edge leading in (so at time = 1, B and C are on fringe. At time = 2, C gets visited, B still on fringe, and D gets added to fringe). The ``unseen'' vertices are the rest of the vertices.

Thinking about the ant method, here are some obvious speedups.

If an ant gets to a vertex that's already been visited, it can quit since it can't possibly be the first to anywhere.
In fact, there's no reason to even start down a path to a vertex if you know it's been visited before.
In fact, if you're starting down a path to a vertex (e.g., C->B) and another ant is already on some other path to that same vertex, you could calculate who's going to get there first and remove the other one. (these are smart ants). This corresponds to version 1 below.

So you might think of ``done'' vertices as those visited by the ants. Each has the shortest path length to it assigned (as well as where that first ant came from). The ``fringe'' holds vertices that are being traveled to.

Notice that to run this algorithm, you need some sort of data structure that lets you know which node is closest, and where you can insert and delete things. What to use...?

Our plan: ``Fringe'' vertices will be on priority queue.

Priority Queue: operations insert(x, priority), removeMIN(), and maybe adjust-priority(x, newpriority). Like a queue but what you remove isn't what's been in there the longest time, but what has the lowest key.

A good way to implement this is to use a ``min heap''.

For us, the priority is the current estimate of distance to start node.

Here's how we use priorty-first-search for finding shortest paths (Goal: find shortest paths from start to everywhere (called single-source shortest paths))

Method # 1 (lecture version):

  insert start into Pqueue with priority 0
  while queue not empty:
    pull the top node V from the priority queue, mark it as done
    for each neighbor X: 
       if it's already done, ignore it
       if it's not in queue, insert it (with priority = priority(V)
             + w(V, X).  Also, set parent to V.
       if it is in queue (FRINGE), update priority:
             priority(X) = MIN(priority(X), priority(V) + w(V, X)).
             If new one was the smaller one, then set parent to V
             (i.e.,  which ant is going to get there first)

Way to think about this is: shortest way to X is either old way, or go through V.

Method # 2 (doesn't need to update priorities):

  insert start into Pqueue with priority = 0.
  while queue not empty:
    pull the top node V from the priority queue
	if it hasn't been visited yet:
	  mark V visited
	  for each unvisited neighbor X:
		insert with priority priority(X) + w(V, X)

In other words, instead of updating the priority, you just have some nodes in the queue in several places. This is like the ants without calculators.

Running time: using heap, inserts, removes, change priority are O(log n) if n elements in pqueue.

For each node, when we mark it as DONE or VISITED, we look at all neighbors, and for each do at worst log(|fringe|) < log n work. So the cost for each node is

O(log|V| * size of adjacency list)

Now if we sum this over all nodes, with get log n times the sum of the size of all the adjacency lists, but this sum is just m. Thus the total running time for the algorithm is

O(m log n)

Correctness of algorithm

Why does this work? Can see it by seeing that we are simulating the ants. Or, here's another way: At any point in time, have some visited or DONE vertices, some fringe vertices on the PQueue, and some unseen vertices.

Let's do new example. Say at some point the algorithm looks like this:

           +------+   9 |   4     3
           | DONE |---A-|-------X---Y
           |      |  12 | 3    /2
           |      |---B-|----Z
           |      |  15 |  /5     also, AC edge of length 2.
           +------+---C-|-
                 fringe |
(numbers are priorities)|

We maintain the invariant that:

Some distances on fringe are wrong, but the smallest distance is right.
All the others either correct or too big.

The first property means at least when we take smallest off and visit the node, we have correct value. Making sure the second property is true afterwards is easy since the priority is always equal to the distance by a legal path, even if not shortest. The big question: when we do an operation, how do we maintain that smallest priority node on fringe has correct distance?

Here's the trick. Once we've done a step, the true closest node on fringe must have shortest path to it come directly from the visited/DONE region. Why? (Otherwise, node path comes from would be closer).

Also, all in DONE region have the correct number. So, TRUE closest also has the correct number since its shortest path comes directly from a node that has the correct number. So, this means we maintain the first invariant.