10 Apr 1996

Outline

Let's talk about Assignment 6 today.

Input files
Tasks
Data Structures
The breadth-first search algorithm

Input files

The input for this assignment is a set of 3 files containing data that describes a street map of Pittsburgh. The following examples are taken directly from the assignment handout (small map). The comments are mine, and don't actually appear in the files. (You probably don't have to write all of this down.)

places.txt

Places are listed alphabetically.

8                          // number of places
Ali_Baba 404 S_Craig_St    // name of place 0, num. on street, name of street
Baker_Hall 4850 Frew_St
...

intersections.txt

For this assignment, only the first line of this file is needed. We'll look at the rest of the file in assignment seven.

19                         // number of intersections
631912 499709              // x and y coordinates of intersection 0
632883 485117              // x and y coordinates of intersection 1
...

streets.txt

Streets are listed alphabetically.

12                         // number of streets
5th_Ave 2                  // first street, alphabetically, 2 blocks
 7 5000 5099 982 8         // first block on 5th_Ave, from node 7 to 8
 8 5100 5199 419 1         // second block on 5th_ave, from node 8 to 10
Amberson_Ave 1
 9 834 999 356 8
...

The street map can be thought of as a graph, with intersections corresponding to nodes, and blocks corresponding to edges. (See figure on handout.)

Tasks

The program is to perform 4 tasks:

Given a street address (e.g., 450 S._Craig), output the numbers of the two adjacent intersections (0 and 1).

If the name does not exactly match a street name, look for a partial match. If there is more than one partial match, list them all. If there is one exact match or partial match, check to see if the address lies on any of the blocks on the streets.
Given a street address as in Task 1, report all known places (i.e., items in places.txt) that are on the same block as the given address.
Given an intersection number and a radius R, report the numbers of all intersections within R blocks of the given intersection.
Given a street address and a radius, list all the places in places.txt that are within the given number of blocks of the given address. Let us say that any place on the same block as the given address has distance 0.

In addition to writing the code that solves these four tasks, you will also have to write the code that reads in the input data and builds your internal data structures.

Data structures

Before writing any code to solve any of the tasks, it makes sense to spend some time thinking about what data structures are most appropriate for this assignment. The first of these is easy:

Graph

Adjacency list representation

advantages: space efficient, since graph is known to be sparse.
disadvantages: code is a little more complicated than for adjacency matrix

Now let's brainstorm a little bit about what data structures we could use to represent the places and streets, and what algorithms we might use to solve the tasks. Try to think of some alternative yourself before looking at what we suggest.

Street

a struct, containing name, number of blocks, and a linked list of blocks, sorted by starting address

advantages

space efficient
blocks are given in sorted order
easy access to name and number of blocks,
easy to walk through blocks in order

disadvantages

linear search for block containing address on street
linear search even if address is invalid (easy solutions)

a struct, containing name, number of blocks, and an array of blocks, sorted by starting address

advantages

space efficient
blocks are given in sorted order
easy access to name and number of blocks,
easy to walk through blocks in order
binary search for block containing an address

disadvantages

more difficult to code
most streets don't contain that many blocks anyway

Any other reasonable representations?

Streets

array of streets, sorted by street name

advantages

streets are given in sorted order
no insertions/deletions
finding exact and partial matches (Tasks 1, 2, and 4)
can be very easily coded using linear search
with just a little more effort, matches can be found using binary search

disadvantages

are there any?

hash table

advantages

your solution will be unique

disadvantages

searching for partial matches will probably be difficult, unless your hash function maps similar names to the same bucket, in which case you're losing the advantages of a hash table

binary search tree / splay tree

advantages

supports fast searching for an exact match
supports fast searching for partial matches using ``members in range'' function
Sleator's ``bulletproof'' code is available for you to use

disadvantages

getting the details of partial matching right is a little tricky
debugging may be trickier, since code is more complicated

Places

This won't be used until Task 4.

array, sorted by street name

advantages

input already given in sorted order
no insertions or deletions to be done
will support looking up of places by name -- not needed in this assignment, but a natural operation

disadvantages

sorted order doesn't help any with task 4!
we might have to search the whole array for each block

For each block, associate a linked list containing the places on that block.

advantages

makes listing the places on a block in Tasks 2 and 4 easy.

disadvantages

are there any?

Queue

A queue will be needed for Tasks 3 and 4.

array

advantages

you might already have working code
can be very fast and space efficient

disadvantages

have to worry about array overflowing

linked list

advantages

you might already have working code
don't have to worry about resizing an array

disadvantages

have to use pointers

Now let's take a step back and look at the different objects for which we may want to define a struct:

intersection/node (maybe)
block/edge        (probably)
street            (probably)
place             (probably)

We'll want several global structures, too:

graph                 (yes)
list of places        (not globally, but one list per block)
list of streets       (yes)
list of intersections (no, already part of adjacency list rep of graph)

The breadth-first search algorithm

Normally we think of this algorithm as performing a search from a single node to all of the other nodes in the graph. In Task 4, however, there are two important differences. First, we are going to search from a block, or edge, to the rest of the graph. One way to do this is to put both endpoints of the edge in the queue initially. Second, instead of listing the nodes (intersections) as we visit them, we want to list the places that we see, which are located on the edges. This is an important difference, because breadth-first search normally ignores an edge if it leads to an already visited node, which we can't necessarily do.

Example: (nodes listed in order they are removed from queue.)

    0       1
      o----o         4 o
      |               /|
      |              / |
      |           3 /  | /~\
      o------------o   | |_|  Anja & Jason's House
    2      /~\      \  |
           |_|       \ |
         Bruce's      \|
         House       5 o

Even though 5 is already in the queue when the edge from 4 to 5 is examined, we have to list Anja & Jason's House