Sha256: 4891e56f04c96ec98ba4ea1c495d1462245d70dd1e0c0cfc45ca59c8de617976

Contents?: true

Size: 1013 Bytes

Versions: 4

Compression:

Stored size: 1013 Bytes

Contents

module KeywordMatcher
  class Prophet
    attr_reader :phrase

    PRECISION = 0.5
    SPLIT = 0.2
    SEPARATOR = %r{[\s\(\)\/*:"'\\\/\$\.,=]+}
    MEASURES = 'кг|г|л|мл|уп|ед|шт|мг|пак'.freeze

    def initialize(phrase)
      @phrase = phrase
    end

    def explode
      prepare
        .split(SEPARATOR)
        .map(&:strip)
        .map(&:downcase)
        .reject { |w| w.size < 2 }
        .reject { |w| w =~ /\d{5,}/ }
    end

    def prepare
      phrase.downcase
            .gsub(/(\p{Ll})(\d+\S)/, '\1 \2') # split text from digits
            .gsub(/%([\p{L}\d])/, '% \1') # add space after percents
            .gsub(/(\d)[\.,](\d)/, '\1-\2') # replace separator between digits from , or . to -
            .gsub(/(\d)[\.,\s]+(#{MEASURES})\.?/, '\1\2') # replace gaps between numbers and measures
            .gsub(/(\p{Ll})(\p{Lu})/, '\1 \2') # split camelcase string
            .gsub(/(\d)-0+(#{MEASURES})/, '\1\2') # remove trailing zeroes after measures
    end
  end
end

Version data entries

4 entries across 4 versions & 1 rubygems

Version Path
keyword_matcher-0.3.1 lib/keyword_matcher/prophet.rb
keyword_matcher-0.3.0 lib/keyword_matcher/prophet.rb
keyword_matcher-0.2.0 lib/keyword_matcher/prophet.rb
keyword_matcher-0.1.0 lib/keyword_matcher/prophet.rb