Sha256: 9316c5501d93678db7131d644381ef6f85aaaac7fc1525852aeecf372210953a
Contents?: true
Size: 931 Bytes
Versions: 3
Compression:
Stored size: 931 Bytes
Contents
#!/usr/bin/env ruby require 'rubygems' require 'wukong/script' # Run as (local mode) # # ./examples/stupidly_simple_filter.rb --run=local input.tsv output.tsv # # for hadoop mode, # # ./examples/stupidly_simple_filter.rb --run=hadoop input.tsv output.tsv # # For debugging, run # # cat input.tsv | ./examples/stupidly_simple_filter.rb --map input.tsv | more # class Mapper < LineStreamer include Filter MATCHER = %r{(ford|mercury|saab|mazda|isuzu)} # # A very simple mapper -- looks for a regex match in one field, # and emits the whole record if the field matches # # # Given a series of records like: # # tweet 123456789 20100102030405 @frank: I'm having a bacon sandwich # tweet 123456789 20100102030405 @jerry, I'm having your baby # # emits only the lines matching that regex # def emit? line MATCHER.match line end end # Execute the script Wukong.run(Mapper)
Version data entries
3 entries across 3 versions & 1 rubygems
Version | Path |
---|---|
wukong-3.0.0.pre | old/examples/stupidly_simple_filter.rb |
wukong-2.0.2 | examples/stupidly_simple_filter.rb |
wukong-2.0.1 | examples/stupidly_simple_filter.rb |