--- layout: default title: Install Notes collapse: false --- h1(gemheader). {{ site.gemname }} %(small):: install% ** "Get the code":#getcode ** "Setup":#setup ** "Installing and Running Wukong with Hadoop":#gethadoop ** "Installing and Running Wukong with Datamapper, ActiveRecord, the command-line and more":#others
h2(#getcode). Get the code We're still actively developing {{ site.gemname }}. The newest version is available via "Git":http://git-scm.com on "github:":http://github.com/mrflip/{{ site.gemname }} pre. $ git clone git://github.com/mrflip/{{ site.gemname }} A gem is available from "gemcutter:":http://gemcutter.org/gems/{{ site.gemname }} pre. $ sudo gem install {{ site.gemname }} --source=http://gemcutter.org (don't use the gems.github.com version -- it's way out of date.) You can instead download this project in either "zip":http://github.com/mrflip/{{ site.gemname }}/zipball/master or "tar":http://github.com/mrflip/{{ site.gemname }}/tarball/master formats. h3. Get the Dependencies * Hadoop * Pig (optional) * Parts of {{ site.gemname }} require these gems: ** addressable/uri ** htmlentities ** extlib ** YAML ** JSON
h2(#setup). Setup 1. Allow Wukong to discover where his elephant friend lives by setting a $HADOOP_HOME environment variable: @export HADOOP_HOME="/usr/local/share/hadoop"@ 2. Add wukong's @bin/@ directory to your $PATH if you'd like to use the "wutils":wutils.html (see also: "Ruby Hadoop Quickstart":http://blog.pdatasolutions.com/post/191978092/ruby-on-hadoop-quickstart)
h2(#gethadoop). Installing and Running Wukong with Hadoop Wukong was primarily developed for Hadoop, and we think it's the best way to use Hadoop (it's certainly the most fun!). h3. Run Wukong on the Amazon AWS EC2 Cloud h3. Hadoop Infrastructure Even if you have a bunch of machines with spare cycles, lots of RAM, and a shared filesystem... do yourself a favor and start out using the "Cloudera AMIs on Amazon's EC2 cloud.":http://www.cloudera.com/hadoop-ec2 There are an overwhelming number of fiddly little parameters and you'll be glad for the user experience before you get into server setup. If it's still mid-late 2009 when you read this, ignore prudence and jump straight to using Hadoop 0.20. It will be a) more fun, b) much more robust (trust me, at "v0.20" you want to live on the bleeding edge), and c) you won't have to suffer through migrating your HDFS two weeks after setup. To set up hadoop, your best bet are the Cloudera AMIs on Amazon's EC2 compute cloud: * http://www.cloudera.com/hadoop-ec2 * http://www.cloudera.com/hadoop-ec2-ebs-beta EC2 means anyone with a $10 bill can rent a 10-machine cluster with 1TB of distributed storage for 8 hours. h3. Run Wukong using Amazon AWS Elastic MapReduce AWS Elastic MapReduce saves the trouble of even setting up a cluster: click, bam, there it is. Phil Ripperger has prepared a "Ruby Hadoop Quickstart":http://blog.pdatasolutions.com/post/191978092/ruby-on-hadoop-quickstart explaining how to get started with Wukong, Hadoop and the Amazon Elastic MapReduce cloud -- it's better than anything we could put here. Thanks Phil! h3. Set up a Hadoop cluster If you have a local cluster, or just want to experiment with a single-machine install, check out the Cloudera packages for both Debian/Ubuntu-based and Redhat/RPM-based Linux systems. h3. More Hadoop Notes I've braindumped some random notes on configuring and using hadoop "over here":hadoop-tips.html
h2(#others). Wukong isn't just Hadoop: Datamapper, ActiveRecord, command-line usage and more Wukong is used by many in an non-Hadoop environment -- anywhere you can stream data records, you can unleash its monkey power. Please see the "usage notes":usage.html#playnice for more!