Sha256: d1fb1f90205e251c51262b604b2baadb87ed6fea08f928a9e14b66b0129dd2fd

Contents?: true

Size: 1.74 KB

Versions: 11

Compression:

Stored size: 1.74 KB

Contents

#
# Levenshtein distance algorithm implementation for Ruby, with UTF-8 support.
#
# The Levenshtein distance is a measure of how similar two strings s and t are,
# calculated as the number of deletions/insertions/substitutions needed to
# transform s into t. The greater the distance, the more the strings differ.
#
# The Levenshtein distance is also sometimes referred to as the
# easier-to-pronounce-and-spell 'edit distance'.
#
# Author: Paul Battley (pbattley@gmail.com)
#

module Text # :nodoc:
module Levenshtein

  # Calculate the Levenshtein distance between two strings +str1+ and +str2+.
  # +str1+ and +str2+ should be ASCII, UTF-8, or a one-byte-per character encoding such
  # as ISO-8859-*.
  #
  # The strings will be treated as UTF-8 if $KCODE is set appropriately (i.e. 'u').
  # Otherwise, the comparison will be performed byte-by-byte. There is no specific support 
  # for Shift-JIS or EUC strings.
  #
  # When using Unicode text, be aware that this algorithm does not perform normalisation. 
  # If there is a possibility of different normalised forms being used, normalisation
  # should be performed beforehand.
  #
  def distance(str1, str2)
    if $KCODE =~ /^U/i
      unpack_rule = 'U*'
    else
      unpack_rule = 'C*'
    end
    s = str1.unpack(unpack_rule)
    t = str2.unpack(unpack_rule)
    n = s.length
    m = t.length
    return m if (0 == n)
    return n if (0 == m)
  
    d = (0..m).to_a
    x = nil

    (0...n).each do |i|
      e = i+1
      (0...m).each do |j|
        cost = (s[i] == t[j]) ? 0 : 1
        x = [
          d[j+1] + 1, # insertion
          e + 1,      # deletion
          d[j] + cost # substitution
        ].min
        d[j] = e
        e = x
      end
      d[m] = x
    end

    return x
  end

  extend self
end
end

Version data entries

11 entries across 9 versions & 3 rubygems

Version Path
text-0.1.13 lib/text/levenshtein.rb
Text-1.0.0 lib/text/levenshtein.rb
Text-1.1.1 lib/text/levenshtein.rb
Text-1.1.0 lib/text/levenshtein.rb
Text-1.1.2 lib/text/levenshtein.rb
flatulent-0.0.3 lib/flatulent/text/levenshtein.rb
flatulent-0.0.3 rails/lib/flatulent/text/levenshtein.rb
flatulent-0.0.4 lib/flatulent/text/levenshtein.rb
flatulent-0.0.1 lib/flatulent/text/levenshtein.rb
flatulent-0.0.2 lib/flatulent/text/levenshtein.rb
flatulent-0.0.2 rails/lib/flatulent/text/levenshtein.rb