Carnegie Mellon University
15721 Database System Design and Implementation
Spring 2003 - C. Faloutsos
Homework 1 - Due: 3/4
0) Reminders:
0.1) Time estimates
-
Q1 would require about 1-2 days, depending on your familiarity with SQL
and MSAccess
-
Q2 requires a lot of programming - start early! Rough time
estimate: approximately 11 days:
-
2 days: to become familiar with B-tree code
-
1 day: to implement the z-ordering estimation for a point
-
1 day: to modify the B-tree code to accomodate z-values and labels, instead
of strings
-
3 days: to design and implement the z-order values estimatin for a query
region
-
1 day: to implement the range search routine
-
3 days: integrating, testing and debugging
1) Q1: SQL [10 pts]
Consider the file peer-oregon+010526.txt.gz
which has information about the internet routers. Specifically, it has
pairs of nodes, blank separated, one per line. The pair, eg.,
10 12
means that node '10' has an outgoing link to node '12'. Load this file
in a DBMS (eg., MSAccess on the clusters) and answer the following query:
QUERY: for each node, list the node-id and the count of out-going
edges, sorted in decreasing count order
-
[5 pts] Give the SQL code,
-
[5 pts] Give the top 10 nodes with the highest out-degree, and their
out-degrees
No extra credit: FYI, there is a large repository of Internet datasets
at http://topology.eecs.umich.edu/archive/asgraph.tar.gz
2) Q2: Z-ordering and B-trees [90pts]
Obtain B-tree code from any source you would like (suggested: btree721_S03.tar.gz
) and build a z-ordering method on top of it. Your final system
(say, with the executable called 'zorder') should be able to handle
the commands 'c'lear, 'i'nsert, 'r'ange-search, e'x'it. Sample usage
scenario ('%' is the UNIX prompt, 'z>' is the prompt of your zorder system,
and bold font stands for user input)
% zorder
z> c
# should clear the B-tree
z> i 10 20 pitt
# should inser the point (10,20), and label it "pitt"
z> r 0 100 0 200
# should return all the points in the range 0<=x<= 100 ;
0<= y <= 200
10 20 pitt
z> x
# exit
2.1) Assumptions you may make:
-
there are no duplicate points
-
coordinates are all integers, in the range 0-1023 X 0-1023
-
labels are all exactly 4 characters long
-
we only need this system for 2-d data
2.2) Important specifications:
-
Your range-search algorithm should be efficient (ie., much faster
than sequential scan, for small queries)
-
Stop splitting the query region as soon as it has $k$~8 pieces.
-
The data should be written on disk, that is, should be persistent
across invocations of your 'zorder' system
-
You may use some other language (Java, C++, Perl), if you want.
2.3) What to hand in
-
[70pts] Hard copy of ALL your source code - mark the routines related
to the z-order: range-search, point-insert, etc.
-
[20pts] Hard copy of the output of your code on the test scripts
that we will announce later.
-
[0pts] e-mail a tar-ball with a 'makefile' to the instructor - 'make
demo' should compile your program and run it on the test scripts above.
Last modified by Christos Faloutsos, 2/2/2003