README.rdoc in jruby-on-hadoop-0.0.3 vs README.rdoc in jruby-on-hadoop-0.0.4
- old
+ new
@@ -1,36 +1,48 @@
= JRuby on Hadoop
JRuby on Hadoop is a thin wrapper for Hadoop Mapper / Reducer by JRuby.
+We recommend to use this with hadoop-rubydsl on the github / gemcutter.
+== Description
+
== Install
Required gems are all on GemCutter.
1. Upgrade your rubygem to 1.3.5
2. Install gems
$ gem install jruby-on-hadoop
-== Description
+== Usage
1. Run Hadoop cluster on your machines and set HADOOP_HOME env variable.
2. put files into your hdfs. ex) test/inputs/file1
3. Now you can run 'joh' like below:
$ joh examples/wordcount.rb test/inputs test/outputs
You can get Hadoop job results in your hdfs test/outputs/part-*
-Script example. (see also examples/wordcount.rb)
+== Example
+see also examples/wordcount.rb
def setup(conf)
# setup jobconf
end
- def map(script, key, value, output, reporter)
+ def map(key, value, output, reporter)
# mapper process
+ # (wordcount example)
+ value.split.each do |word|
+ output.collect(word, 1)
+ end
end
- def reduce(script, key, values, output, reporter)
+ def reduce(key, values, output, reporter)
# reducer process
+ # (wordcount example)
+ sum = 0
+ values.each {|v| sum += v }
+ output.collect(key, sum)
end
== Build
You can build hadoop-ruby.jar by "ant".