Sha256: b5bb33e8dd7b3c533506dfd1076dd263ce9875c5d835b03afe599ddcae4cacf5

Contents?: true

Size: 902 Bytes

Versions: 2

Compression:

Stored size: 902 Bytes

Contents

module TextRank
  ##
  # Tokenizers are responsible for transforming a single String of text into an
  # array of potential keywords ("tokens").  There are no requirements of tokens
  # other than to be non-empty.  When used in combination with token filters, it
  # may make sense for a tokenizer to temporarily create tokens which might seem
  # like ill-suited keywords.  The token filter may use these "bad" keywords to
  # help inform its decision on which tokens to keep and which to drop.  An example
  # of this is the part of speech token filter which uses punctuation tokens to
  # help guess the part of speech of each non-punctuation token.
  ##
  module Tokenizer

    autoload :Regex,                'text_rank/tokenizer/regex'
    autoload :Whitespace,           'text_rank/tokenizer/whitespace'
    autoload :WordsAndPunctuation,  'text_rank/tokenizer/words_and_punctuation'

  end
end

Version data entries

2 entries across 2 versions & 1 rubygems

Version Path
text_rank-1.1.1 lib/text_rank/tokenizer.rb
text_rank-1.1.0 lib/text_rank/tokenizer.rb