15-826 - Multimedia databases and data mining
Fall 2011 C. Faloutsos

Homework 0
Out: Sept. 13 2011
Due: Sept. 20 2011, before 12 noon (class time), via e-mail to the TA


Task Estimated time
Setup MySQL 10 minutes to 1 hour
SQL test queries 5 minutes
Build KD tree 5 minutes

Q1: Test SQL

This question assumes you have access to a MySQL server. If not, click here for instructions.
  1. On the MySQL server prompt, create a table with the following command:

    create table wikileaks (
    report_key varchar(36),
    report_date varchar(30),
    report_type varchar(30),
    category varchar(30),
    tracking_number varchar(50)

  2. Import data from the file Wikileaks2.csv by running the command:

    load data local infile '/absolute/path/to/Wikileaks2.csv' into table wikileaks
    fields terminated by ','
    enclosed by '"'
    lines terminated by '\n'

    Replace '/absolute/path/to/Wikileaks2.csv' with the absolute file path - for instance, if you save the file in your home directory, the full path will be something like /afs/
  3. Count the number of records in the table. Report the answer via e-mail.

    select count(*) from wikileaks

  4. How many distinct dates are there in the table? Report the answer via e-mail.

    select count(distinct(report_date)) from wikileaks

Q2: Build and run KD-tree package

The following instructions are for the andrew linux servers. You may use alternative settings (cygwin, mac-os), at your own risk.
  1. Download the kdtree package in your home dir on the andrew linux server (log in with your andrew credentials on
  2. untar the archive

      % tar -xvf kdtree_base.tar

  3. Go in the kdtree_base directory
  4. Clean the directory of any previous builds

      % make clean

  5. Build the package

      % make

  6. Run a test script and send the output by e-mail to the TA

      % kdtree_main -d 2 < script1

What to e-mail:

On a single e-message, send to the TA the answers to

Last modified by Ina Fiterau and Christos Faloutsos, Sept. 11, 2011