Sha256: 7c2238e410ea8f9071e837a8ee1a64c7367f18abd923afc52bd5286585a76ad5

Contents?: true

Size: 669 Bytes

Versions: 3

Compression:

Stored size: 669 Bytes

Contents

require 'active_support/inflector'
require 'levenshtein-ffi'

module Crawler
  module Utils
    def self.transliterate(string)
      ActiveSupport::Inflector.transliterate(string.gsub(/[:\-.,!?]/, ' ').strip.gsub(/\s+/, ' '), nil).downcase
    end

    def self.levenshtein_score(string_1, string_2)
      string_1_transliterated = transliterate(string_1)
      string_2_transliterated = transliterate(string_2)
      levenshtein_distance = Levenshtein.distance(string_1_transliterated, string_2_transliterated)
      max_size = [string_1_transliterated.size, string_2_transliterated.size].max.to_f

      (max_size - levenshtein_distance) / max_size
    end
  end
end

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
crawler-core-1.1.0 lib/crawler/utils.rb
crawler-core-1.0.0 lib/crawler/utils.rb
crawler-core-0.2.0 lib/crawler/utils.rb