Sha256: 6b8a5bcd400a3fba291b17c031a71552d73cbcc42f7bb48be83b63561ebb39ba

Contents?: true

Size: 292 Bytes

Versions: 7

Compression:

Stored size: 292 Bytes

Contents

Wukong.processor(:tokenizer) do

  field :min_length, Integer, :default => 1

  def process(record)
    words   = record.downcase.strip.split(/\W/)
    lengthy = words.select{ |word| word.length >= min_length }
    lengthy.each do |word|
      yield [ word, 1 ].join("\t")
    end
  end

end

Version data entries

7 entries across 7 versions & 2 rubygems

Version Path
ul-wukong-4.1.1 examples/basic/word_count/tokenizer.rb
ul-wukong-4.1.0 examples/basic/word_count/tokenizer.rb
wukong-4.0.0 examples/basic/word_count/tokenizer.rb
wukong-3.0.1 examples/basic/word_count/tokenizer.rb
wukong-3.0.0 examples/basic/word_count/tokenizer.rb
wukong-3.0.0.pre3 examples/basic/word_count/tokenizer.rb
wukong-3.0.0.pre2 examples/word_count/tokenizer.rb