Homework 0
Out: Sept. 11 2012
Due: Sept. 18 2012, before 12:00 pm (class time), via blackboard deposit
Task |
Estimated time |
Setup SQLite |
5-20 minutes |
SQL test queries |
5-10 minutes |
KD tree |
5-30 minutes |
For this question we will be using the Marvel dataset, answering questions like 'who is the most popular Marvel-comic character'.
(For your information, the Marvel dataset lists the
appearances of Marvel characters in comic books, and was
obtained from: http://bioinfo.uib.es/~joemiro/marvel.html. If
you are interested, you can find the names of the comic books
and characters in the files marvelCharacters.txt and marvelComicBooks.txt.)
This question assumes you have access to
SQLite, which is already installed on the
Andrew Linux servers.
(Alternatively, you are welcome to install sqlite3 (or
mysql, or postgres) on your own machine - click here for
instructions for sqlite3.)
Once you have a running sqlite3 engine, run the scripts below. These commands have been tested with SQLite version 3.6.20 - earlier versions may give syntax errors.
1. On the SQLite3 prompt, create a table with the following commands
linux_prompt% sqlite3 marvel.db sqlite>
create table marvel ( |
2. Import data from the file marvel.txt by running the following commands:
sqlite> .separator ' ' sqlite> .import marvel.txt marvel |
3. Replace '/absolute/path/to/marvel.txt' with the absolute file path - for instance, if you save the file in your home directory, the full path will be something like /afs/andrew.cmu.edu/usr2/ANDREWID/marvel.txt. Count the number of records in the table. Report the answer in your submission.
sqlite> select count(*) from marvel; |
4. How many distinct comic books are there in the dataset? Report the answer in your submission.
sqlite> select
count(distinct(comic_id)) |
5. (Optional) If you are curious, find which characters appear in the most comic books:
sqlite> select
char_id,count(*)
ct |
Q2: Build and run the KD-tree
package
The
following instructions are for the andrew linux
servers. You may use alternative settings (cygwin, mac-os), at
your own risk.
1. Download the kdtree package in your home dir on the andrew linux server (log in with your andrew credentials on linux.andrew.cmu.edu)
2. Untar the archive
linux_prompt% tar -xvf kdtree_base.tar.gz |
3. Go in the kdtree_base directory.
4. Clean the directory of any previous builds.
linux_prompt% make clean |
5. Build the package
linux_prompt% make |
6. Run the test scripts provided (script1, script2, script3) and report the number of data points found in your submission for each command.
linux_prompt%
kdtree_main -d 3 <
script1 |
What to submit:
Submit your answers (5
numbers, all on the same line, blank separated), on the CMU
Blackboard system (http://www.cmu.edu/blackboard),
under 'HW0'. If (and only if) there is trouble depositing it,
please email your answers mailto:ajuarez@andrew.cmu.edu,
with the exact header "15826
HW0".
* Q1.3 (a single number)
* Q1.4 (a single number)
*
Q2.6 (three numbers, one per
script)
Last modified by Antonio Juarez and Christos Faloutsos, Sept. 9, 2011