Lab6: A Simple Flickr Crawler


Due Date: Saturday April 4th, 2009 by midnight


Objective:

Once you are done with this lab, you should be familiar with the following:


Assignment:

Part 1

In this part of the assignment, you will write a perl script which downloads a set of images from the www.flickr.com website based on some keyword(s). You are required to use wget to download the images after retrieving the html. We will use LWP (Library for WWW in Perl) to retrieve the html. A starter file is available at Lab6_starter.pl.

The perl script accepts a list of keywords as a run-time parameters and 2 optional flags. The perl script will pass the keywords to www.flickr.com as a part of your search query. The two optional flags are -n and -t. The -n num_of_images flag tells the perl script to find and download num_of_images images for each keyword while the -t target_dir tells the perl script to save the images to target_dir. Each downloaded image should be identified by a keyword (derived from the run-time parameter) and index number (0, 1, 2, etc.).For example if the command perl lab6.pl -n 10 -t test beach is given, the program will get 10 images with beach keyword from flickr and save them into a folder called test with images named: beach1.jpg, beach2.jpg etc

usage : perl lab6.pl [options] keyword1 keyword2 ...
options:
	-n num_of_images	the number of images to download for each keyword
				(the default is 10)
	-t target_dir		where to put the downloaded images.
				(the default is the current directory)
        -f file_name		the name of the htm file to organize images.
				(the default name is images.htm)
IMPORTANT: image URL's should be saved in an array to be used in Part 2.

Part 2

In this part of the assignment, you will be creating a simple web page to organize the images(four images per row). Keep the images sizes to height=200 width=200 as shown in sample.htm. Each image will be linked to point to actual images from the flickr. You do not need to know a lot about html. But your program must create htm file dynamically based on the key word(s). Take a look at the source code of sample.htm (given in the folder) to see how to create this file. The output shows 8 images from flickr based on search word "beach"and each image is directly linked to the image on flickr. You DO NOT need to download the images but rather getting the image URL's directly encoded in the html file.

Extra Credit

if you are interested in some extra credit (up to 10 points), you can do the following. First enhance the part2 HTML page by including Titles under the picture. Secondly, Extend your perl script to find the 3 most popular tag for the images downloaded. This would require you to extract tags from each image html and rank them and list the top 3 tags present in the images. if there are ties, list all of them.
The top 3 tags for the images downloaded
elephant
monkey
tiger


Getting Started


lab Requirements


Downloading Files

As usual, download files from /afs/andrew.cmu.edu/course/15/123/downloads/lab6

Handing in your Solution

Your solution should be in the form of a .pl file. Submit to /afs/andrew.cmu.edu/course/15/123/handin/lab6

Grading

Your program will be graded according to the rubric.txt given in the downloads folder

FAQ

We always try to maintain an updated FAQ.txt accessible from Bb->labs. Please read the FAQ.txt file if you have any questions. If you cannot find the answer in FAQ.txt please send email to any course staff, cc to guna@andrew

Better Mouse Trap

Do you have a creative idea about how to make this lab more interesting and practical. if so, please write the idea(s) in a file called idea.txt and place it in the handin folder. If we like your idea and then we will seriously consider using that in future labs and more importantly will give you some extra credit in this lab and acknowledgment in future labs.


The Original idea for the lab came from Hassan Rom, CS major, 2006