15-719: Advanced Cloud Computing (Fall 2014)

Project 1 overview and FAQ

The goal of project 1 is to learn to write simple iterative big data processing jobs (using MapReduce and Spark) and to learn to run them effectively within AWS's Elastic computing framework. By comparison between MapReduce and Spark, you will learn the strengths and weaknesses of the MapReduce framework.

The project handout is available here. This document will be updated for Phases 2 and 3. The project presentation is available here. And for frequently asked questions to be answered here.

The project will not be done in groups; each student does this project on their own.

Readings helpful for project 1

1. Dean, Jeffrey and Ghemawat, Sanjay. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation, Volume 6, 2004. See the lecture readings for [Dean04].
2. Zaharia, Matei, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot topics in cloud computing, 2010. See the lecture readings for [Zaharia10].
3. Amazon EMR developer guide.

Last updated: Wed Sep 10 19:55:56 -0400 2014 [validate xhtml]