Sha256: 57a49a5ff5d720fa5737c62b055a4425ae83543e3dfcb68208f12db185b2132c

Contents?: true

Size: 1.04 KB

Versions: 9

Compression:

Stored size: 1.04 KB

Contents

class LooseTightDictionary
  # "Record linkage typically involves two main steps: blocking and scoring..."
  # http://en.wikipedia.org/wiki/Record_linkage
  #
  # Blockings effectively divide up the haystack into groups that match a pattern
  #
  # A blocking (as in a grouping) comes into effect when a str matches.
  # Then the needle must also match the blocking's regexp.
  class Blocking
    attr_reader :regexp
    
    def initialize(regexp_or_str)
      @regexp = regexp_or_str.to_regexp
    end

    def match?(str)
      !!(regexp.match(str))
    end

    # If a blocking "joins" two strings, that means they both fit into it.
    #
    # Returns false if they certainly don't fit this blocking.
    # Returns nil if the blocking doesn't apply, i.e. str2 doesn't fit the blocking.
    def join?(str1, str2)
      if str2_match_data = regexp.match(str2)
        if str1_match_data = regexp.match(str1)
          str2_match_data.captures == str1_match_data.captures
        else
          false
        end
      else
        nil
      end
    end
  end
end

Version data entries

9 entries across 9 versions & 1 rubygems

Version Path
loose_tight_dictionary-1.0.5 lib/loose_tight_dictionary/blocking.rb
loose_tight_dictionary-1.0.4 lib/loose_tight_dictionary/blocking.rb
loose_tight_dictionary-1.0.3 lib/loose_tight_dictionary/blocking.rb
loose_tight_dictionary-1.0.2 lib/loose_tight_dictionary/blocking.rb
loose_tight_dictionary-1.0.1 lib/loose_tight_dictionary/blocking.rb
loose_tight_dictionary-1.0.0 lib/loose_tight_dictionary/blocking.rb
loose_tight_dictionary-0.2.3 lib/loose_tight_dictionary/blocking.rb
loose_tight_dictionary-0.2.2 lib/loose_tight_dictionary/blocking.rb
loose_tight_dictionary-0.2.1 lib/loose_tight_dictionary/blocking.rb