= loose_tight_dictionary Match things based on string similarity (using the Pair Distance algorithm) and regular expressions. == Quickstart >> d = LooseTightDictionary.new %w{seamus andy ben} => [...] >> puts d.find 'Shamus Heaney' => 'seamus' Try running the included example file: $ ruby examples/first_name_matching.rb ###################################################################################################################################################### # Match "Mr. Seamus" => "seamus" ###################################################################################################################################################### Needle (needle_reader proc not defined, so downcasing everything) ------------------------------------------------------------------------------------------------------------------------------------------------------ "mr. seamus" Haystack (haystack_reader proc not defined, so downcasing everything) ------------------------------------------------------------------------------------------------------------------------------------------------------ "seamus" "andy" "ben" Tighteners ------------------------------------------------------------------------------------------------------------------------------------------------------ (none) Comparisons Score t_haystack [=> tightened/prefixed] t_needle [=> tightened/prefixed] ------------------------------------------------------------------------------------------------------------------------------------------------------ 0.8333333333333334 "seamus" "mr. seamus" 0.0 "andy" "mr. seamus" 0.0 "ben" "mr. seamus" Match ------------------------------------------------------------------------------------------------------------------------------------------------------ "seamus" # [... there's more output ...] == The Boeing example From the tests: ###################################################################################################################################################### # Match "BOEING 737100" => "BOEING BOEING 737-100/200" ###################################################################################################################################################### Needle (needle_reader proc not defined, so downcasing everything) ------------------------------------------------------------------------------------------------------------------------------------------------------ "boeing 737100" Haystack (haystack_reader proc not defined, so downcasing everything) ------------------------------------------------------------------------------------------------------------------------------------------------------ "boeing boeing 737-100/200" "boeing boeing 737-900" Tighteners ------------------------------------------------------------------------------------------------------------------------------------------------------ /(7\d)(7|0)-?(\d{1,3})/i Comparisons Score t_haystack [=> tightened/prefixed] t_needle [=> tightened/prefixed] ------------------------------------------------------------------------------------------------------------------------------------------------------ 1.0 "boeing boeing 737-100/200" => "737100" "boeing 737100" => "737100" 0.6666666666666666 "boeing boeing 737-100/200" => "737100" "boeing 737100" 0.6153846153846154 "boeing boeing 737-900" "boeing 737100" 0.6 "boeing boeing 737-900" => "737900" "boeing 737100" => "737100" 0.6 "boeing boeing 737-100/200" "boeing 737100" 0.4 "boeing boeing 737-900" => "737900" "boeing 737100" 0.32 "boeing boeing 737-100/200" "boeing 737100" => "737100" 0.2857142857142857 "boeing boeing 737-900" "boeing 737100" => "737100" Match ------------------------------------------------------------------------------------------------------------------------------------------------------ "BOEING BOEING 737-100/200" == Improving dictionaries Similarity matching will only get you so far. TODO: regex usage == Note on Patches/Pull Requests * Fork the project. * Make your feature addition or bug fix. * Add tests for it. This is important so I don't break it in a future version unintentionally. * Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull) * Send me a pull request. Bonus points for topic branches. == Copyright Copyright (c) 2011 Seamus Abshere. See LICENSE for details.