The "Cloudera Hadoop AMI Instances": for Amazon's EC2 compute cloud are the fastest, easiest way to get up and running with hadoop. Unfortunately, doing streaming scripts can be a pain, especially if you're doing iterative development. Installing NFS to share files along the cluster gives the following conveniences: * You don't have to bundle everything up with each run: any path in ~coder/ will refer back via NFS to the filesystem on master. * The user can now passwordless ssh among the nodes, since there's only one shared home directory and since we included the user's own public key in the authorized_keys2 file. This lets you easily rsync files among the nodes. First, you need to take note of the _internal_ name for your master, perhaps something like @domU-xx-xx-xx-xx-xx-xx.compute-1.internal@. As root, on the master (change @compute-1.internal@ to match your setup): <pre> apt-get install nfs-kernel-server echo "/home *.compute-1.internal(rw)" >> /etc/exports ; /etc/init.d/nfs-kernel-server stop ; </pre> (The @*.compute-1.internal@ part limits host access, but you should take a look at the security settings of both EC2 and the built-in portmapper as well.) Next, set up a regular user account on the *master only*. In this case our user will be named 'chimpy': <pre> visudo # uncomment the last line, to allow group sudo to sudo groupadd admin adduser chimpy usermod -a -G sudo,admin chimpy su chimpy # now you are the new user ssh-keygen -t rsa # accept all the defaults cat ~/.ssh/ # can paste this public key into your github, etc cat ~/.ssh/ >> ~/.ssh/authorized_keys2 </pre> Then on each slave (replacing domU-xx-... by the internal name for the master node): <pre> apt-get install nfs-common ; echo "domU-xx-xx-xx-xx-xx-xx.compute-1.internal:/home /mnt/home nfs rw 0 0" >> /etc/fstab /etc/init.d/nfs-common restart mkdir /mnt/home mount /mnt/home ln -s /mnt/home/chimpy /home/chimpy </pre> You should now be in business. Performance tradeoffs should be small as long as you're just sending code files and gems around. *Don't* write out log entries or data to NFS partitions, or you'll effectively perform a denial-of-service attack on the master node. ------------------------------ The "Setting up an NFS Server HOWTO": was an immense help, and I recommend reading it carefully.