The "Cloudera Hadoop AMI Instances":http://www.cloudera.com/hadoop-ec2 for Amazon's EC2 compute cloud are the fastest, easiest way to get up and running with hadoop. Unfortunately, doing streaming scripts can be a pain, especially if you're doing iterative development. Installing NFS to share files along the cluster gives the following conveniences: * You don't have to bundle everything up with each run: any path in ~coder/ will refer back via NFS to the filesystem on master. * The user can now passwordless ssh among the nodes, since there's only one shared home directory and since we included the user's own public key in the authorized_keys2 file. This lets you easily rsync files among the nodes. First, you need to take note of the _internal_ name for your master, perhaps something like @domU-xx-xx-xx-xx-xx-xx.compute-1.internal@. As root, on the master (change @compute-1.internal@ to match your setup):
    apt-get install nfs-kernel-server 
    echo "/home *.compute-1.internal(rw)" >> /etc/exports ;
    /etc/init.d/nfs-kernel-server stop ;
(The @*.compute-1.internal@ part limits host access, but you should take a look at the security settings of both EC2 and the built-in portmapper as well.) Next, set up a regular user account on the *master only*. In this case our user will be named 'chimpy':
  visudo # uncomment the last line, to allow group sudo to sudo
  groupadd admin 
  adduser chimpy
  usermod -a -G sudo,admin chimpy
  su chimpy                  # now you are the new user
  ssh-keygen -t rsa          # accept all the defaults
  cat ~/.ssh/id_rsa.pub      # can paste this public key into your github, etc
  cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2
Then on each slave (replacing domU-xx-... by the internal name for the master node):
    apt-get install nfs-common ;
    echo "domU-xx-xx-xx-xx-xx-xx.compute-1.internal:/home  /mnt/home  nfs  rw  0 0"  >> /etc/fstab
    /etc/init.d/nfs-common restart
    mkdir /mnt/home
    mount /mnt/home
   ln -s /mnt/home/chimpy /home/chimpy
You should now be in business. Performance tradeoffs should be small as long as you're just sending code files and gems around. *Don't* write out log entries or data to NFS partitions, or you'll effectively perform a denial-of-service attack on the master node. ------------------------------ The "Setting up an NFS Server HOWTO":http://nfs.sourceforge.net/nfs-howto/index.html was an immense help, and I recommend reading it carefully.