Feature: Run wu-hadoop from the command line In order to execute hadoop streaming commands As a user of wu-hadoop I should be able run wu-hadoop with wukong processors Scenario: Simple wu-hadoop command Given a wukong script "examples/word_count.rb" When I run `bundle exec wu-hadoop examples/word_count.rb --dry_run --input=/foo --output=/bar ` Then the output should contain: """ /usr/lib/hadoop/bin/hadoop \ jar /usr/lib/hadoop/contrib/streaming/hadoop-*streaming*.jar \ -D mapred.job.name='word_count.rb---/foo---/bar' \ """ And the output should match: """ -mapper '.*ruby bundle exec wu-local .*word_count.rb --run=mapper ' \\ -reducer '.*ruby bundle exec wu-local .*word_count.rb --run=reducer ' \\ """ And the output should contain: """ -input '/foo' \ -output '/bar' \ """ And the output should match: """ -file '.*word_count.rb' \\ -cmdenv 'BUNDLE_GEMFILE=.*wukong-hadoop/Gemfile' """ Scenario: A wu-hadoop command without an input or output Given a wukong script "examples/word_count.rb" When I run `bundle exec wu-hadoop examples/word_count.rb --dry_run` Then the output should contain: """ Missing values for: input (Comma-separated list of input paths.), output (Output directory for the hdfs.) """ Scenario: Specifying an alternative gemfile Given a wukong script "examples/word_count.rb" When I run `bundle exec wu-hadoop examples/word_count.rb --dry_run --input=/foo --output=/bar --gemfile=alt/Gemfile` Then the output should contain: """ -cmdenv 'BUNDLE_GEMFILE=alt/Gemfile' """ Scenario: Skipping the reduce step Given a file named "wukong_script.rb" with: """ Wukong.processor(:mapper) do end """ When I run `bundle exec wu-hadoop wukong_script.rb --dry_run --input=/foo --output=/bar` Then the output should contain: """ -D mapred.reduce.tasks=0 \ """ Scenario: A processor without a mapper Given a file named "wukong_script.rb" with: """ Wukong.processor(:reducer) do end """ When I run `bundle exec wu-hadoop wukong_script.rb --dry_run --input=/foo --output=/bar` Then the output should match: """ No :mapper definition found in .*wukong_script.rb """ Scenario: Translating hadoop jobconf options Given a wukong script "examples/word_count.rb" When I run `bundle exec wu-hadoop examples/word_count.rb --dry_run --input=/foo --output=/bar --max_tracker_failures=12` Then the output should match: """ -D mapred.max.tracker.failures=12 \\ """ Scenario: Passing along extra configuration options Given a wukong script "examples/word_count.rb" When I run `bundle exec wu-hadoop examples/word_count.rb --dry_run --input=/foo --output=/bar --foo=bar` Then the output should match: """ -mapper '.* --foo=bar' \\ -reducer '.* --foo=bar' \\ """ Scenario: Specifying input and output formats Given a wukong script "examples/word_count.rb" When I run `bundle exec wu-hadoop examples/word_count.rb --dry_run --input=/foo --output=/bar --input_format=com.foo.BarInputFormat` Then the output should contain: """ -inputformat 'com.foo.BarInputFormat' \ """ Scenario: Specifying additional java options Given a wukong script "examples/word_count.rb" When I run `bundle exec wu-hadoop examples/word_count.rb --dry_run --input=/foo --output=/bar --java_opts=-Dfoo.bar=baz,-Dother.opts=cool` Then the output should contain: """ -D foo.bar=baz \ -D other.opts=cool \ """ Scenario: Failed hadoop job Given a wukong script "examples/word_count.rb" When I run `bundle exec wu-hadoop examples/word_count.rb --input=/foo --output=/bar` Then the output should contain: """ Streaming command failed! """