Software (install the following software to your machine - Linux or Mac only)
(For Windows desktops, you need to install putty to SSH to a Linux machine, see instructions. Let's call this Linux machine {LINUXSERVER}.)
- Java (jdk version 1.6.0 or higher), NetBeans and hadoop plugin for NetBeans
For Java, make sure you install it in a directory that you have write permission for. If you are using unix.andrew.cmu.edu as your {LINUXSERVER}, you don't need to install Java, the default version is 1.6.0.
For NetBeans, using any version with JavaSE support should work. Follow this install guide.
- If you are running a Linux or Mac desktop, directly follow the install guide.
- If you are running Windows and are sshing to a linux machine e.g. unix.andrew.cmu.edu as your {LINUXSERVER},
- First, you need to download the installation file e.g. netbeans-6.7.1-ml-javase-linux.sh to {LINUXSERVER},
(alternatively, download to your {WINDESK} and transfer to {LINUXSERVER} using Secure SSH Shell, or WinSCP, make sure you transfer the file in binary transfer format, otherwise it will become corrupted after transfering from Windows to Linux).
- then, install X-Win on your desktop, which is downloadable from here;
- launch a X-win shell window from {WINDESK}, in which you will run the following command in command line:
- run netbeans install program on {LINUXSERVER} from your X-win shell window, e.g.: sh netbeans-6.7.1-ml-javase-linux.sh.
- your windows desktop should receive a request from that {LINUXSERVER} to display the installation GUI, accept that request.
- follow through the install process and make sure you install it on a directory on {LINUXSERVER} or AFS that you actually have write permission.
For the Hadoop plugin for NetBeans, closely follow docs here up to the Installation Guide.
- Test Compile on your local machine or {LINUXSERVER}
Compilation could be automatic (by enabling it: right click on your project -> Properties -> Compiling -> checkbox: Compile on Save), or compile manually by right clicking on project->Build
- Build source code into jar package
- enable project->Properties->Build->Packaging->Build JAR after compiling
- right click on your project->Build
- resulting yourPackage.jar will be under project/dist directory
- Transport yourPackage.jar to destination machine which runs Hadoop. Let's call the machine {HADOOP_MASTER}.
- create a project directory {Project} on {HADOOP_MASTER},
- copy the created jar to the directory (using scp, or winscp, or secure shell etc.)
- Upload data to HDFS (Hadoop File System)
run: "hadoop fs -put src dest", or if you configured the NetBeans plugin, you can browse and change HDFS files from the plugin
- Run sample program:
- run: hadoop jar yourPackage.jar arguments..
e.g. hadoop jar /usr/local/sw/hadoop/hadoop-0.20.1-examples.jar wordcount input output.
To see output on HDFS:
- run: hadoop fs -ls output, you should see a bunch of files named part-xxxxx
- check the first 10 lines of the output file output/part-00000, run: hadoop fs -cat output/part-00000 | head -n 10