Frequently Asked Questions on HW1

  1. Q: Can I use Eclipse + plugin instead of NetBeans?
    A: The NetBeans plugin is more powerful than the Eclipse plugin. Unless you have prior Hadoop programming experience, we highly recommend that you use NetBeans + plugin. You need to get permission from the instructors, to prove your prior Hadoop experience, if you plan to use Eclipse.
  2. Q: Cannot install NetBeans plugin on unix.andrew.cmu.edu?
    A: The default NetBeans on that host is version 6.5. Please install a version 6.7 or higher and make sure you launch that version.
  3. Q: Default program generated by NetBeans plugin does not generate results or anything when running?
    A: If the workflow view of your default program runs fine, but your jar is not running correctly on Hadoop. Check if your main class of your project is set correctly. See instructions.
  4. Q: Default program doesn't run? or there is no Jar file after building project?
    A: Make sure you include and ONLY include Hadoop-0.20.0 in the project libraries.
  5. Q: Drop down menu in NetBeans not working?
    A: If you are using multiple monitors, and viewing NetBeans on a secondary monitor, drop down menus (for example the ones in the workflow view) may fail to display. Use NetBeans on the primary monitor should solve the problem.
  6. Q: Your program runs fine locally but doesn't run on cluster?
    A: Here, you'll need to use the web interface to look at the error log of your job, your map tasks, or reduce tasks to find out what exactly is the problem.
  7. Q: Default program generated an output directory on HDFS, but does not generate any text output or generated exactly as the input text?
    A: You probably have created your Hadoop MapReduce Job class before you actually include the Hadoop-0.20 library. Try include the library, and then create the job class.
    Generally, if a program runs Ok locally, but fails on the cluster, you should debug your program. Start by looking at the output from the Web interface.
  8. Q: How to kill a long running hadoop job on the cluster?
    A: To kill a hadoop job, run:
    hadoop job -kill "jobID" e.g. hadoop job -kill job_201001200252_0269
  9. Q: The instructions for accessing the job monitoring page don't work. Help!
    A: Yes, this is confusing, and depends on whether you're using a machine on the CMU network or not. If you are ssh'ing into a CMU machine, and then the cluster, follow these steps:
    1. Start X11 on the machine (desktop, laptop) that you are physically at (your LOCAL machine) if one isn't already running.
    2. From xterm on your LOCAL machine, type ssh -X CMUHOST.cmu.edu to log into a linux/unix machine on the CMU network. The -X option is important -- it will setup your X11 environment on CMUHOST to forward all X11 connections to your LOCAL machine.
    3. Type firefox & to start firefox in the background on CMUHOST. This should open in an X window in your LOCAL machine.
    4. Type the tunneling command (ssh -L...) from the status page insructions. This will log you into the cluster login node, but just leave that terminal window open.
    5. In the firefox window on your LOCAL machine (actually running on the CMUHOST machine, but forwarding X), setup the HTTP proxy server to be localhost:8888
    6. In that same window, go to the job status page: http://ltijt.opencloud:40030/