Sha256: 0c5b2f68a3b5451ebff0cca7d938f7d0d935ea4cbab8d1fb2d8cffe5ae91a319

Contents?: true

Size: 484 Bytes

Versions: 56

Compression:

Stored size: 484 Bytes

Contents

require 'strscan'
require 'linguist/linguist'

module Linguist
  # Generic programming language tokenizer.
  #
  # Tokens are designed for use in the language bayes classifier.
  # It strips any data strings or comments and preserves significant
  # language symbols.
  class Tokenizer
    # Public: Extract tokens from data
    #
    # data - String to tokenize
    #
    # Returns Array of token Strings.
    def self.tokenize(data)
      new.extract_tokens(data)
    end
  end
end

Version data entries

56 entries across 56 versions & 1 rubygems

Version Path
github-linguist-9.0.0 lib/linguist/tokenizer.rb
github-linguist-8.0.1 lib/linguist/tokenizer.rb
github-linguist-8.0.0 lib/linguist/tokenizer.rb
github-linguist-7.30.0 lib/linguist/tokenizer.rb
github-linguist-7.29.0 lib/linguist/tokenizer.rb
github-linguist-7.28.0 lib/linguist/tokenizer.rb
github-linguist-7.27.0 lib/linguist/tokenizer.rb
github-linguist-7.26.0 lib/linguist/tokenizer.rb
github-linguist-7.25.0 lib/linguist/tokenizer.rb
github-linguist-7.24.1 lib/linguist/tokenizer.rb
github-linguist-7.24.0 lib/linguist/tokenizer.rb
github-linguist-7.23.0 lib/linguist/tokenizer.rb
github-linguist-7.22.1 lib/linguist/tokenizer.rb
github-linguist-7.22.0 lib/linguist/tokenizer.rb
github-linguist-7.21.0 lib/linguist/tokenizer.rb
github-linguist-7.20.0 lib/linguist/tokenizer.rb
github-linguist-7.19.0 lib/linguist/tokenizer.rb
github-linguist-7.18.0 lib/linguist/tokenizer.rb
github-linguist-7.17.0 lib/linguist/tokenizer.rb
github-linguist-7.16.1 lib/linguist/tokenizer.rb