Sha256: daf32e917f15055b40e6e4b31ae5aa9709177e6d14fb0bdbc7b71fe762e9dfdc

Contents?: true

Size: 681 Bytes

Versions: 4

Compression:

Stored size: 681 Bytes

Contents

module PragmaticTokenizer
  module Languages
    module Common
      PUNCTUATION = ['。', '.', '.', '!', '!', '?', '?', '、', '¡', '¿', '„', '“', '[', ']', '"', '#', '$', '%', '&', '(', ')', '*', '+', ',', ':', ';', '<', '=', '>', '@', '^', '_', '`', "'", '{', '|', '}', '~', '-', '«', '»']
      PUNCTUATION_MAP = ['♳', '♴', '♵', '♶', '♷', '♸', '♹', '♺', '⚀', '⚁', '⚂', '⚃', '⚄', '⚅', '☇', '☈', '☉', '☊', '☋', '☌', '☍', '☠', '☢', '☣', '☤', '☥', '☦', '☧', '☀', '☁', '☂', '☃', '☄', "☮", '♔', '♕', '♖', '♗', '♘', '♙', '♚']
      SEMI_PUNCTUATION = ['。', '.', '.']
    end
  end
end

Version data entries

4 entries across 4 versions & 1 rubygems

Version Path
pragmatic_tokenizer-0.1.3 lib/pragmatic_tokenizer/languages/common.rb
pragmatic_tokenizer-0.1.2 lib/pragmatic_tokenizer/languages/common.rb
pragmatic_tokenizer-0.1.1 lib/pragmatic_tokenizer/languages/common.rb
pragmatic_tokenizer-0.1.0 lib/pragmatic_tokenizer/languages/common.rb