15-826 - Multimedia databases and data mining

Spring 2006

Homework 1 - Due: Feb. 7, 3:01pm, in class.


Q1: SQL [20pts]

Retrieve the following tables and load them in a DBMS of your choice: MS-Access is recommended, but any other is acceptable (eg., MySQL, postgres, etc). Please refer to if you want to run postgres in one of the andrew machines.

Note for postgres users:


The tables are comma-separated ASCII files. Answer the following queries:

  1. [2pts] List the titles of all movies having more than 3 actors.
  2. [2pts] List the titles of all type=M movies of "Tom Cruise" made after the year 1996 (>1996).
  3. [5pts] Report the actor (first_name, last_name) who has acted in the maximum number of movies. In case of tie, report all such actors.
  4. [4pts] List the titles of all movies that have at least one common actor with the movie "EuroTrip".
  5. [7pts] List the titles of all movies that have more than one common actor with the movie "EuroTrip".

For each query give

Caution: The ASCII data files have windows end-of-line termination. For unix/linux use "dos2unix" to convert them.

Q2 K-d trees : Spatial Join [30pts]

Consider the k-d-tree package, in C, at   (tar xvf; make). You are required to augment it to handle spatial joins.

Input Data (ASCII files with windows end-of-line termination):

Implement an option 's' for spatial join. Then your program should:

You are not allowed to make n (= number of points) range queries. You should be able to exploit the space partition represented by the kd tree to do it more efficiently.

Turn in:

  1. [15pts] Hard copy: a printout of your source code (please give only the parts that you have added/modified). Email the tar-ball to kaustav+15826 at cs. Please make sure that the program correctly compiles with make.
  2. [8pts] Hard copy results: The coordinates of the pair of points and the count for ε = 1, 10, 20 for the 2D dataset.
  3. [7pts] Hard copy results: The counts for ε = 1, 10, 20, 50, 100, 200 for both data sets.


Q3: R trees [50pts]

You are required to extend the capabilities of an R tree package, and implement plot and count queries.