Sha256: ce25e798ddffcaf11f03dd0c4a2eca2386d992195508c97289feae2495fac94e

Contents?: true

Size: 1.37 KB

Versions: 2

Compression:

Stored size: 1.37 KB

Contents

= hadoop-rubydsl

== Description
HadoopのMapper/ReducerをRubyによるDSLで記述することができます。
hadoop-ruby.jarを利用します。

例)
apachelog.rb

# log:
#   127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
#   127.0.0.1 - frank2 [10/Oct/2000:13:55:36 -0700] "GET /apache_pb2.gif HTTP/1.0" 200 2326
#   127.0.0.1 - frank2 [10/Oct/2000:13:55:36 -0700] "GET /apache_pb3.gif HTTP/1.0" 404 2326

use 'LogAnalysis'
data.pattern /(.*) (.*) (.*) (\[.*\]) (".*") (\d*) (\d*)/
column[2].count_uniq
column[3].count_uniq
column[4].count_uniq
column[5].count_uniq
column[6].sum

=> 
col2    frank   1
col2    frank2  2
col3    [10/Oct/2000:13:55:36 -0700]    3
col4    "GET /apache_pb.gif HTTP/1.0"   1
col4    "GET /apache_pb2.gif HTTP/1.0"  1
col4    "GET /apache_pb3.gif HTTP/1.0"  1
col5    200     2
col5    404     1
col6    6978

== Usage
0. HADOOP_HOMEを正しく設定し、Hadoopを一式立ち上げておく。

1. jruby-complete-*.jar を lib/java 以下にコピー
ex)
$ wget http://jruby.kenai.com/downloads/1.4.0RC2/jruby-complete-1.4.0RC2.jar
$ cp jruby-complete-*.jar lib/java/

2. データを HDFS にアップロード
ex)
$ hadoop dfs -copyFromLocal apachelog inputs/

3. MapReduce実行
$ bin/hadoop-ruby.sh examples/apachelog.rb inputs outputs

== Author
Koichi Fujikawa <fujibee@gmail.com>

== Copyright
License: Apache License

Version data entries

2 entries across 2 versions & 1 rubygems

Version Path
hadoop-rubydsl-0.0.2 README
hadoop-rubydsl-0.0.1 README