--- layout: default title: Usage notes --- h1(gemheader). {{ site.gemname }} %(small):: usage%
h2. How to run a Wukong script To run your script using local files and no connection to a hadoop cluster, pre. your/script.rb --run=local path/to/input_files path/to/output_dir To run the command across a Hadoop cluster, pre. your/script.rb --run=hadoop path/to/input_files path/to/output_dir You can set the default in the config/wukong-site.yaml file, and then just use @--run@ instead of @--run=something@ --it will just use the default run mode. If you're running @--run=hadoop@, all file paths are HDFS paths. If you're running @--run=local@, all file paths are local paths. (your/script path, of course, lives on the local filesystem). You can supply arbitrary command line arguments (they wind up as key-value pairs in the options path your mapper and reducer receive), and you can use the hadoop syntax to specify more than one input file: pre. ./path/to/your/script.rb --any_specific_options --options=can_have_vals \ --run "input_dir/part_*,input_file2.tsv,etc.tsv" path/to/output_dir Note that all @--options@ must precede (in any order) all non-options.
h2. How to test your scripts To run mapper on its own: pre. cat ./local/test/input.tsv | ./examples/word_count.rb --map | more or if your test data lies on the HDFS, pre. hdp-cat test/input.tsv | ./examples/word_count.rb --map | more Next graduate to running @--run=local@ mode so you can inspect the reducer.
h2. What tools does Wukong work with? Wukong is friends with "Hadoop":http://hadoop.apache.org/core the elephant, "Pig":http://hadoop.apache.org/pig/ the query language, and the @cat@ on your command line. We're looking forward to being friends with "martinis":http://datamapper.org and "express trains":http://wiki.rubyonrails.org/rails/pages/ActiveRecord down the road.
h2. Design ...
h2. Caveats ...
h2. TODOs ...
h2. Note on Patches/Pull Requests * Fork the project. * Make your feature addition or bug fix. * Add tests for it. This is important so I don't break it in a future version unintentionally. * Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull) * Send me a pull request. Bonus points for topic branches.
h2. Endnotes h3. Why is it called Wukong? Hadoop, as you may know, is "named after a stuffed elephant.":http://en.wikipedia.org/wiki/Hadoop Since Wukong was started by the "infochimps":http://infochimps.org team, we needed a simian analog. A Monkey King who journeyed to the land of the Elephant seems to fit the bill: bq. Sun Wukong (孙悟空), known in the West as the Monkey King, is the main character in the classical Chinese epic novel Journey to the West. In the novel, he accompanies the monk Xuanzang on the journey to retrieve Buddhist sutras from India. bq. Sun Wukong possesses incredible strength, being able to lift his 13,500 jīn (8,100 kg) Ruyi Jingu Bang with ease. He also has superb speed, traveling 108,000 li (54,000 kilometers) in one somersault. Sun knows 72 transformations, which allows him to transform into various animals and objects; he is, however, shown with slight problems transforming into other people, since he is unable to complete the transformation of his tail. He is a skilled fighter, capable of holding his own against the best generals of heaven. Each of his hairs possesses magical properties, and is capable of transforming into a clone of the Monkey King himself, or various weapons, animals, and other objects. He also knows various spells in order to command wind, part water, conjure protective circles against demons, freeze humans, demons, and gods alike. -- ["Sun Wukong's Wikipedia entry":http://en.wikipedia.org/wiki/Wukong] The "Jaime Hewlett / Damon Albarn short":http://news.bbc.co.uk/sport1/hi/olympics/monkey that the BBC made for their 2008 Olympics coverage gives the general idea. * What's up with Wukong::AndPig? ** @Wukong::AndPig@ is a small library to more easily generate code for the "Pig":http://hadoop.apache.org/pig data analysis language. See its "README":wukong/and_pig/README.textile for more.