Sha256: 6b8a5bcd400a3fba291b17c031a71552d73cbcc42f7bb48be83b63561ebb39ba
Contents?: true
Size: 292 Bytes
Versions: 7
Compression:
Stored size: 292 Bytes
Contents
Wukong.processor(:tokenizer) do field :min_length, Integer, :default => 1 def process(record) words = record.downcase.strip.split(/\W/) lengthy = words.select{ |word| word.length >= min_length } lengthy.each do |word| yield [ word, 1 ].join("\t") end end end
Version data entries
7 entries across 7 versions & 2 rubygems