CARNEGIE MELLON UNIVERSITY

15-826 - Multimedia databases and data mining

Spring 2006

Homework 1 - Due: Feb. 7, 3:01pm, in class.

Important:

Q1: SQL [20pts]

Retrieve the following tables and load them in a DBMS of your choice: MS-Access is recommended, but any other is acceptable (eg., MySQL, postgres, etc). Please refer to http://www.cs.cmu.edu/~olston/15-415/F05/HW/PostgreSQL_Readme.htm if you want to run postgres in one of the andrew machines.

Note for postgres users:

Tables:

The tables are comma-separated ASCII files. Answer the following queries:

  1. [2pts] List the titles of all movies having more than 3 actors.
  2. [2pts] List the titles of all type=M movies of "Tom Cruise" made after the year 1996 (>1996).
  3. [5pts] Report the actor (first_name, last_name) who has acted in the maximum number of movies. In case of tie, report all such actors.
  4. [4pts] List the titles of all movies that have at least one common actor with the movie "EuroTrip".
  5. [7pts] List the titles of all movies that have more than one common actor with the movie "EuroTrip".

For each query give

Caution: The ASCII data files have windows end-of-line termination. For unix/linux use "dos2unix" to convert them.

Q2 K-d trees : Spatial Join [30pts]

Consider the k-d-tree package, in C, at www.cs.cmu.edu/~kaustav/15826/hw1/kdtree1_1.tar   (tar xvf; make). You are required to augment it to handle spatial joins.

Input Data (ASCII files with windows end-of-line termination):

Implement an option 's' for spatial join. Then your program should:

You are not allowed to make n (= number of points) range queries. You should be able to exploit the space partition represented by the kd tree to do it more efficiently.

Turn in:

  1. [15pts] Hard copy: a printout of your source code (please give only the parts that you have added/modified). Email the tar-ball to kaustav+15826 at cs. Please make sure that the program correctly compiles with make.
  2. [8pts] Hard copy results: The coordinates of the pair of points and the count for ε = 1, 10, 20 for the 2D dataset.
  3. [7pts] Hard copy results: The counts for ε = 1, 10, 20, 50, 100, 200 for both data sets.

 

Q3: R trees [50pts]

You are required to extend the capabilities of an R tree package, and implement plot and count queries.