Machine
Find your own Linux or Mac machine to debug and test locally.
If you have a Windows desktop, you'll have to use a Linux server as your local machine, by doing the following extra steps to access the linux machine from your Windows desktop:
- The idea is to use X-win on your desktop, to view and control NetBeans running on a Linux server. (Look here for a nice guide for using X-Win and putty.)
- the server can be any public Andrew machine e.g. unix.andrew.cmu.edu, call it {LINUXSERVER}
You will need an SSH client software, such as SSH Secure Shell installed on your {WINDESK} to connect to the Andrew machines
You can login to these machines via your SSH client or X-win, using your Andrew ID and password.
You will be using Andrew AFS for storage on these machines.
These machines are restarted everyday, so make sure you save your work.
- To run netbeans on {LINUXSERVER}, and controlling the GUI on your Windows desktop (call it {WINDESK}):
- run X-win on your {WINDESK}, and keep it running at background,
- open up a shell window using X-win connecting to {LINUXSERVER}.
- This above procedure is needed for installing NetBeans, as well as running it.
- in shell, go to the directory that you installed netbeans, run netbeans-6.8/bin/netbeans
If you are using unix.andrew.cmu.edu, do NOT directly run "netbeans" in shell, this will invoke the default version 6.5, and you won't be able to install the plugin to a version 6.5.
- on {WINDESK}, accept the access request from {LINUXSERVER}, and a new window will open
Hadoop Cluster
- Login node
- Use a SSH client (such as Secure SSH Shell or putty) to connect to ltilogin.cloud.pdl.cmu.local, when prompted, enter your Andrew ID and the initial password (provided in class). You should change your initial password after you login, using the command "passwd".
- Your home directory should be /mnt/ltihome/{ANDREW_ID} and you can upload jar packages of your Hadoop programs there.
- You will submit Hadoop jobs to the cluster via the login node
- For security reasons, the login node is accessible only from within CMU. If you want to access it from elsewhere, you need to use VPN or ssh to a CMU linux server and then ssh to the login node.
- Hadoop
The remote cluster runs Hadoop version 0.20.1 (at /usr/local/sw/hadoop/). You need to choose the same version in the NetBeans plugin when including libraries for your Hadoop projects.
HDFS: before even running Hadoop jobs, create your own directory on HDFS by running: hadoop fs -mkdir /user/{ANDREW_ID}.
If the Linux shell cannot locate your hadoop executable, run this exact command in the shell: setenv PATH {$PATH}:/usr/local/sw/hadoop/bin/ . It will add hadoop to the environment path, so it can be found by shell.
- Politeness
Be polite to your fellow students,
- Use at most 8 reducers in total (you can run multiple Hadoop jobs, but the total #reducers should be <= 8).
- Do NOT store larger than 1GB of data on the login node, any large data should go to the HDFS.
- Do NOT keep multiple copies of your output index. Delete old ones when you are building another.
- Do NOT run NetBeans or any heavy duty program on the Login node.
- Documentation
About Web Interface
FAQ about the cluster.