You are given a dataset of galaxies from the Sloan
Digital Sky Survey, where each galaxy is represented by an object-id and
its (x, y, z) coordinates. Insert them into an R-Tree. Then implement the
spatial join, and report the counts of qualifying pairs of galaxies, for
the specified ranges below.
2) Details
-
INPUT DATA: here.
Careful - it is 2Mb uncompressed (30,000 3-d points)
-
CODE: The code for the R-Tree is here
(actually,
it is called 'DRtree' for 'Deferred Split R-tree - but you don't
worry about this distinction). It is in C; 'gunzip; tar xvf'
and do 'make demo'. This creates the bin/DRmain program
and runs it on some small datasets.
-
IGNORE the 'warnings' of the compiler and the 'make: Fatal error'.
-
Do 'bin/DRmain' and insert some points of your own, to become
familiar with the package.
-
PREPARATION: Insert these galaxies into an R-Tree. Read the 'README ' and
'READ_this_after_README' files.
-
Note#1: the program expects rectangles - treat points as degenerate rectangles
(x_low = x_high etc).
-
Note#2: the first line of the data file is header info.
-
Note#3: in case of unexplained errors, do 'make spotless' ,
to delete all libraries etc.
-
Note#4: use 'DRmain -float' to insert floats instead of integers.
-
SPECS: We want to implement the spatial join. Currently the R-tree package
supports 's' for range search, 'i' for insertion etc. You have to implement
'j' for the spatial join - then, your program should:
-
read the desired distance range R from the terminal and
-
print the count of pairs of galaxies that are within distance R
or less.
-
include self-pairs
-
include mirror-pairs
-
use the 'L-infinity' distance, which, for the 3-d case is:
Distance( (x1,x2,x3), (y1,y2,y3) ) = max ( |x1-y1|,
|x2-y2|, |x3-y3|)
-
(of course, your code should handle *any* dimensionality, like the current
R-tree does).
-
Suggestion: you might want to test your program with small data files,
where you know the answer.
3) What to Turn In
-
Hard copy: a printout of ONLY the part of your source code dealing with
spatial joins.
-
Hard copy: results (= pair-counts) from your program from running spatial
joins at the following ranges:
-
0.00
-
1E-09
-
1E-05
-
2E-5
-
3E-05
For your information (no points for this part)
If you are interested in astronomy, check the web site for the
Sloan
Digital Sky Survey, which is expecting to collect 0.5 billion galaxies,
and which actually has a relational DBMS underneath (MS SQL server),
to help astronomers collect the data. Also, the
pair-count
of galaxies is an important measure, which gives astronomers an idea
about the intrinsic ('fractal') dimensionality of the galaxies.
Last updated: 11/18/2002, by Christos and Deepayan