Timing Hadoop programs

In the HWs, you need to time your program. This includes,
  1. Wall clock time of the whole Hadoop job from start to finish.
    Look in the web interface for wall clock time of your job.
  2. Average Map, Shuffle, Reduce task times as shown in the Job Tracker History of the Web Interface
    Open Job Tracker History from bottom of the job tracker page, click on the Hadoop job that you want timings for, (which must be successfully finished), and click Analyse This Job.

For some of the programs, you are also asked to vary the number of reduce tasks to see how timings differ. For the HWs, it is sufficient to just vary the #reduce tasks using the following setups: 1 reduce, 2 reduces, 4 reduces, 8 reduces, and just set your #maps to 1.
You may choose other combinations of #maps and #reduces, but you need to justify your choices. If you want to try out different #maps, note that your setup is only a minimum #map tasks, the actual number also depends on how many input blocks there are. For example, there are 18 blocks for the 1GB input file, so if you set the #map tasks below 18, the final number of map tasks will be 18.

Your timings for running the same job multiple times will be different. If you think it's necessary, you can run multiple times and take a grand average. But this is not required for the HWs.