Sha256: 0c5b2f68a3b5451ebff0cca7d938f7d0d935ea4cbab8d1fb2d8cffe5ae91a319

Contents?: true

Size: 484 Bytes

Versions: 56

Compression:

Stored size: 484 Bytes

Contents

require 'strscan'
require 'linguist/linguist'

module Linguist
  # Generic programming language tokenizer.
  #
  # Tokens are designed for use in the language bayes classifier.
  # It strips any data strings or comments and preserves significant
  # language symbols.
  class Tokenizer
    # Public: Extract tokens from data
    #
    # data - String to tokenize
    #
    # Returns Array of token Strings.
    def self.tokenize(data)
      new.extract_tokens(data)
    end
  end
end

Version data entries

56 entries across 56 versions & 1 rubygems

Version Path
github-linguist-7.16.0 lib/linguist/tokenizer.rb
github-linguist-7.15.0 lib/linguist/tokenizer.rb
github-linguist-7.14.0 lib/linguist/tokenizer.rb
github-linguist-7.13.0 lib/linguist/tokenizer.rb
github-linguist-7.12.2 lib/linguist/tokenizer.rb
github-linguist-7.12.1 lib/linguist/tokenizer.rb
github-linguist-7.12.0 lib/linguist/tokenizer.rb
github-linguist-7.11.1 lib/linguist/tokenizer.rb
github-linguist-7.10.0 lib/linguist/tokenizer.rb
github-linguist-7.11.0 lib/linguist/tokenizer.rb
github-linguist-7.9.0 lib/linguist/tokenizer.rb
github-linguist-7.8.0 lib/linguist/tokenizer.rb
github-linguist-7.7.0 lib/linguist/tokenizer.rb
github-linguist-7.6.1 lib/linguist/tokenizer.rb
github-linguist-7.6.0 lib/linguist/tokenizer.rb
github-linguist-7.5.1 lib/linguist/tokenizer.rb
github-linguist-7.5.0 lib/linguist/tokenizer.rb
github-linguist-7.4.0 lib/linguist/tokenizer.rb
github-linguist-7.3.1 lib/linguist/tokenizer.rb
github-linguist-7.3.0 lib/linguist/tokenizer.rb