Sha256: f6f30d30ec3c77709082eeafe4fa5c2c7f3ef6623acc836b1eeb5f400072b315

Contents?: true

Size: 836 Bytes

Versions: 3

Compression:

Stored size: 836 Bytes

Contents

require 'active_support/core_ext/string/multibyte'
require 'delegate'
module Fuzzily
  class String < SimpleDelegator

    def trigrams
      normalized = self.normalize
      number_of_trigrams = normalized.length - 3
      trigrams = (0..number_of_trigrams).map { |index| normalized[index,3] }.uniq
    end

    def scored_trigrams
      trigrams.map { |t| [t, self.length] }
    end

    protected

    # Remove accents, downcase, replace spaces and word start with '*',
    # return list of normalized words
    def normalize
      # Iconv.iconv('ascii//translit//ignore', 'utf-8', self).first.
      ActiveSupport::Multibyte::Chars.new(self).
        mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/,'').downcase.to_s.
        gsub(/[^a-z]/,' ').
        gsub(/\s+/,'*').
        gsub(/^/,'**').
        gsub(/$/,'*')
    end
  end
end

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
fuzzily-0.2.3 lib/fuzzily/trigram.rb
fuzzily-0.2.2 lib/fuzzily/trigram.rb
fuzzily-0.2.1 lib/fuzzily/trigram.rb