Sha256: c6148b0e0f146039befcd2827f06809acfe0fa3b3c0731ddbcc7eadbdf26b6a4
Contents?: true
Size: 668 Bytes
Versions: 5
Compression:
Stored size: 668 Bytes
Contents
Wukong.processor(:tokenizer) do field :min_length, Integer, :default => 1 field :max_length, Integer, :default => 256 field :split_on, Regexp, :default => /\s+/ field :remove, Regexp, :default => /[^a-zA-Z0-9\']+/ field :fold_case, :boolean, :default => false def process string tokenize(string).each do |token| yield token if acceptable?(token) end end private def tokenize string string.split(split_on).map do |token| stripped = token.gsub(remove, '') fold_case ? stripped.downcase : stripped end end def acceptable? token (min_length..max_length).include?(token.length) end end
Version data entries
5 entries across 5 versions & 1 rubygems