Sha256: ac8e1b484b01eba8c677472ef5bda6c8706130d2870d5f7a53fba4d7e364784f

Contents?: true

Size: 1.1 KB

Versions: 8

Compression:

Stored size: 1.1 KB

Contents

module TextRank
  ##
  # Character filters pre-process text prior to tokenization.  It is during
  # this phase that the text should be "cleaned up" so that the tokenizer will
  # produce valid tokens.  Character filters should not attempt to remove undesired
  # tokens, however.  That is the job of the token filter.  Examples include
  # converting non-ascii characters to related ascii characters, forcing text to
  # lower case, stripping out HTML, converting English contractions (e.g. "won't")
  # to the non-contracted form ("will not"), and more.
  # 
  # Character filters are applied as a chain, so care should be taken to use them
  # in the desired order.
  ##
  module CharFilter

    autoload :AsciiFolding,     'text_rank/char_filter/ascii_folding'
    autoload :Lowercase,        'text_rank/char_filter/lowercase'
    autoload :StripEmail,       'text_rank/char_filter/strip_email'
    autoload :StripHtml,        'text_rank/char_filter/strip_html'
    autoload :StripPossessive,  'text_rank/char_filter/strip_possessive'
    autoload :UndoContractions, 'text_rank/char_filter/undo_contractions'

  end
end

Version data entries

8 entries across 8 versions & 1 rubygems

Version Path
text_rank-1.2.3 lib/text_rank/char_filter.rb
text_rank-1.2.2 lib/text_rank/char_filter.rb
text_rank-1.2.0 lib/text_rank/char_filter.rb
text_rank-1.1.7 lib/text_rank/char_filter.rb
text_rank-1.1.6 lib/text_rank/char_filter.rb
text_rank-1.1.5 lib/text_rank/char_filter.rb
text_rank-1.1.1 lib/text_rank/char_filter.rb
text_rank-1.1.0 lib/text_rank/char_filter.rb