Sha256: d2ccd61337382e3a2ac0ee23d553537d6c2e6527c6c84f2c1137233d7aecd269
Contents?: true
Size: 449 Bytes
Versions: 2
Compression:
Stored size: 449 Bytes
Contents
require 'unicode_utils/each_word' require 'tf-idf-similarity/token' # A tokenizer using UnicodeUtils to tokenize a text. # # @see https://github.com/lang/unicode_utils module TfIdfSimilarity class Tokenizer # Tokenizes a text. # # @param [String] text # @return [Enumerator] an enumerator of Token objects def tokenize(text) UnicodeUtils.each_word(text).map do |word| Token.new(word) end end end end
Version data entries
2 entries across 2 versions & 1 rubygems
Version | Path |
---|---|
tf-idf-similarity-0.3.0 | lib/tf-idf-similarity/tokenizer.rb |
tf-idf-similarity-0.2.0 | lib/tf-idf-similarity/tokenizer.rb |