Sha256: 743bc28796f9722cb4c0ae113522a86271a0cf704652347261183cd476445a23

Contents?: true

Size: 347 Bytes

Versions: 6

Compression:

Stored size: 347 Bytes

Contents

module TextRank
  module Tokenizer
    ##
    # A tokenizer regex that preserves entire URL's as a token (rather than split them up)
    ##
    Url = %r{
      (
        (?:[\w-]+://?|www[.])
        [^\s()<>]+
        (?:
          \([\w\d]+\)
          |
          (?:[^[:punct:]\s]
          |
          /)
        )
      )
    }xi

  end
end

Version data entries

6 entries across 6 versions & 1 rubygems

Version Path
text_rank-1.2.3 lib/text_rank/tokenizer/url.rb
text_rank-1.2.2 lib/text_rank/tokenizer/url.rb
text_rank-1.2.0 lib/text_rank/tokenizer/url.rb
text_rank-1.1.7 lib/text_rank/tokenizer/url.rb
text_rank-1.1.6 lib/text_rank/tokenizer/url.rb
text_rank-1.1.5 lib/text_rank/tokenizer/url.rb