Sha256: a33f3e523bdc905f30c03e719d95cd7e47c89f50e7b435e6db7cba92e6a05532

Contents?: true

Size: 1.2 KB

Versions: 3

Compression:

Stored size: 1.2 KB

Contents

= JRuby on Hadoop

JRuby on Hadoop is a thin wrapper for Hadoop Mapper / Reducer by JRuby.
We recommend to use this with hadoop-rubydsl on the github / gemcutter.

== Description

== Install

Required gems are all on GemCutter.

1. Upgrade your rubygem to 1.3.5
2. Install gems
 $ gem install jruby-on-hadoop

== Usage

1. Run Hadoop cluster on your machines and set HADOOP_HOME env variable.
2. put files into your hdfs. ex) test/inputs/file1
3. Now you can run 'joh' like below:
 $ joh examples/wordcount.rb test/inputs test/outputs
You can get Hadoop job results in your hdfs test/outputs/part-*

== Example 
see also examples/wordcount.rb

 def setup(conf)
   # setup jobconf
 end

 def map(key, value, output, reporter)
   # mapper process
   # (wordcount example)
   value.split.each do |word|
     output.collect(word, 1)
   end
 end
 
 def reduce(key, values, output, reporter)
   # reducer process
   # (wordcount example)
   sum = 0
   values.each {|v| sum += v }
   output.collect(key, sum)
 end

== Build

You can build hadoop-ruby.jar by "ant".
 ant

Required to set env HADOOP_HOME for your system.
Assumed Hadoop version is 0.19.2.

== Author
Koichi Fujikawa <fujibee@gmail.com>

== Copyright
License: Apache License

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
jruby-on-hadoop-0.0.6 README.rdoc
jruby-on-hadoop-0.0.5 README.rdoc
jruby-on-hadoop-0.0.4 README.rdoc