CARNEGIE MELLON UNIVERSITY
15-826 - Multimedia databases and data mining
Spring 2005
Homework 1 - Due: Feb. 1, 3:01pm, in class.
Important:
- Please turn in a typed report - handwritten material will not be graded.
- Due date: Feb. 1, 3:01pm, in class
For your information:
- Expected time effort for this homework: 6h-10h, broken down as
follows:
- Q1: 30' to figure out the queries; 1-2 hours to load the data,
debug the queries etc
- Q2: 1h to review the lecture notes; 2-3 hours to design the
algorithm; 2-3 hours to implement and debug it.
- Points are in [bold pts] and add to 100.
Q1: SQL [30 pts]
Retrieve the table FLIGHTS (from, to) and load them in a
DBMS of your choice: MS-Access is recommended, but any other is
acceptable (eg., MySQL, postgres, etc).. The table is a
blank-separated file, at http://www.cs.cmu.edu/~christos/courses/826.S05/HW1/flights.txt
Then, answer the following queries:
- [5 pts] list all the
airports along with the number of outgoing flights, sorted in decreasing
count order (that is, most busy airport first)
- [5 pts] list all the pairs
of airports (a,b) , so that airport 'a' can reach airport 'b' within 2
hops or less. A direct flight counts as 1 hop.
- Eliminate duplicates;
- do not report self-pairs;
- sort your answer in
increasing alpha order (sort on 'from' first, then on 'to').
- Notice that not all flights have return flights.
- [5 pts] report the number
of pairs within 2 hops or less.
- [10 pts] repeat, for
3 hops or less: list all the (a,b) pairs of airports, so that 'a'
can reach 'b' within 3 hops or less.
- [5 pts] report the number
of pairs within 3 hops or less.
For each query give
- the resulting table [half
of the points] and
- the corresponding SQL statement [rest half of the points].
Q2: K-d trees [70 pts]
Consider the k-d-tree package, in C, at http://www.cs.cmu.edu/~christos/courses/826.S05/HW1/kdtree1_1.tar
(notice the update, as of 1/28)
(tar xvf; make). We want to augment the nearest neighbor
search, so that it returns the k=10 nearest neighbors, instead of the
1 nearest neighbor that it returns now.
- If the tree has fewer than k=10 nodes, then return all the nodes,
with a warning: "only <number>
nodes found".
- If you prefer some language other than 'C' for the
implementation, contact the instructor.
Please hand in
- [30pts] a hard copy of your code
- [35pts] a hard copy of the output of your code, on
the two included input scripts (2d-input.txt,3d-input.txt). In your output, keep
only the lines that are the result of the query, to save paper. For
your convenience, try 'make hw1'
- [5pts] please e-mail your 'tar' file to the TA ('make
kdtree.tar')
Christos Faloutsos, 1/18 2005