Carnegie Mellon University
15721 Database System Design and Implementation
Spring 2003 - C. Faloutsos
Homework 1 - Due: 3/4

0) Reminders:

The homeworks should be done individually.
Due date: March 4, 3pm (in class)
URL for this very page: http://www.cs.cmu.edu/~christos/courses/721.S03/HW1/hw1.html

0.1) Time estimates

Q1 would require about 1-2 days, depending on your familiarity with SQL and MSAccess
Q2 requires a lot of programming - start early! Rough time estimate: approximately 11 days:

2 days: to become familiar with B-tree code
1 day: to implement the z-ordering estimation for a point
1 day: to modify the B-tree code to accomodate z-values and labels, instead of strings
3 days: to design and implement the z-order values estimatin for a query region
1 day: to implement the range search routine
3 days: integrating, testing and debugging

1) Q1: SQL [10 pts]

Consider the file peer-oregon+010526.txt.gz which has information about the internet routers. Specifically, it has pairs of nodes, blank separated, one per line. The pair, eg.,

10 12

means that node '10' has an outgoing link to node '12'. Load this file in a DBMS (eg., MSAccess on the clusters) and answer the following query:

QUERY: for each node, list the node-id and the count of out-going edges, sorted in decreasing count order

[5 pts] Give the SQL code,
[5 pts] Give the top 10 nodes with the highest out-degree, and their out-degrees

No extra credit: FYI, there is a large repository of Internet datasets at http://topology.eecs.umich.edu/archive/asgraph.tar.gz

2) Q2: Z-ordering and B-trees [90pts]

Obtain B-tree code from any source you would like (suggested: btree721_S03.tar.gz ) and build a z-ordering method on top of it. Your final system (say, with the executable called 'zorder') should be able to handle the commands 'c'lear, 'i'nsert, 'r'ange-search, e'x'it. Sample usage scenario ('%' is the UNIX prompt, 'z>' is the prompt of your zorder system, and bold font stands for user input)

% zorder
z> c                         # should clear the B-tree
z> i 10 20 pitt             # should inser the point (10,20), and label it "pitt"
z> r 0 100 0 200             # should return all the points in the range 0<=x<= 100 ; 0<= y <= 200
        10 20 pitt
z> x                         #exit

2.1) Assumptions you may make:

there are no duplicate points
coordinates are all integers, in the range 0-1023 X 0-1023
labels are all exactly 4 characters long
we only need this system for 2-d data

2.2) Important specifications:

Your range-search algorithm should be efficient (ie., much faster than sequential scan, for small queries)
Stop splitting the query region as soon as it has $k$~8 pieces.
The data should be written on disk, that is, should be persistent across invocations of your 'zorder' system
You may use some other language (Java, C++, Perl), if you want.

2.3) What to hand in

[70pts] Hard copy of ALL your source code - mark the routines related to the z-order: range-search, point-insert, etc.
[20pts] Hard copy of the output of your code on the test scripts that we will announce later.
[0pts] e-mail a tar-ball with a 'makefile' to the instructor - 'make demo' should compile your program and run it on the test scripts above.

Last modified by Christos Faloutsos, 2/2/2003

Carnegie Mellon University 15721 Database System Design and Implementation Spring 2003 - C. Faloutsos Homework 1 - Due: 3/4