= ICU Tournament Canonicalises and matches person names with Western European characters and first and last names. == Installation For ruby 1.9.2, 1.9.3, 2.0.0. gem install icu_name It depends on _active_support_ and _i18n_. == Names This class exists for two main purposes: * to normalise to a common format the different ways names are typed in practice * to be able to match two names even if they are not exactly the same To create a name object, supply both the first and second names separately to the constructor. robert = ICU::Name.new(' robert j ', ' FISHER ') Capitalisation, white space and punctuation will all be automatically corrected: robert.name # => 'Robert J. Fischer' robert.rname # => 'Fischer, Robert J.' (reversed name) The input text, without any changes apart from white-space cleanup and the insertion of a comma (to separate the two names), is returned by the original method: robert.original # => 'FISCHER, robert j' To avoid ambiguity when either the first or second names consist of multiple words, it is better to supply the two separately. If the full name is supplied alone to the constructor, without any indication of where the first names end, then the last distinct name is assumed to be the last name. bobby = ICU::Name.new(' bobby fischer ') bobby.first # => 'Bobby' bobby.last # => 'Fischer' In this case, since the names were not supplied separately, the original text will not contain a comma: bobby.original # => 'bobby fischer' Names will match even if one is missing middle initials or if a nickname is used for one of the first names. bobby.match('Robert J.', 'Fischer') # => true The method alternatives can be used to list alternatives to a given first or last name: Name.new('Stephen', 'Orr').alternatives(:first) # => ["Steve"] Name.new('Michael Stephen', 'Orr').alternatives(:first) # => ["Steve", "Mike", "Mick", "Mikey"], Name.new('Mark', 'Orr').alternatives(:first) # => [] By default the class is only aware of a few common alternatives for first names (e.g. _Bobby_ and _Robert_, _Bill_ and _William_, etc). However, this can be customized (see below). Supplying the match method with strings is equivalent to instantiating an instance with the same strings and then matching it. So, for example the following are equivalent: robert.match('R.', 'Fischer') # => true robert.match(ICU::Name.new('R.', 'Fischer')) # => true Here the inital _R_ matches the first letter of _Robert_. However, nickname matches will not always work with initials. In the next example, the initial _R_ does not match the first letter _B_ of the nickname _Bobby_. bobby.match('R. J.', 'Fischer') # => false Some other ways last names are canonicalised are illustrated below: ICU::Name.new('John', 'O Reilly').last # => "O'Reilly, John" ICU::Name.new('dave', 'mcmanus').last # => "McManus, Dave" == Characters and Encoding The class can only cope with Latin characters, including those with diacritics (accents). Along with hyphens and single quotes (which represent apostophes) letters in ISO-8859-1 (e.g. "a", "è", "Ö") and letters outside ISO-8859-1 which are decomposable into a US-ASCII character plus one or more diacritics (e.g. "ł" or "Ś") are preserved, while everything else is removed. ICU::Name.new('éric', 'PRIÉ').name # => "Éric Prié" ICU::Name.new('BARTŁOMIEJ', 'śliwa').name # => "Bartłomiej Śliwa" ICU::Name.new('Սմբատ', 'Լպուտյան').name # => "" The various accessors (first, last, name, rname, to_s, original) always return strings encoded in UTF-8, no matter what the input encoding. eric = ICU::Name.new('éric'.encode("ISO-8859-1"), 'PRIÉ'.force_encoding("ASCII-8BIT")) eric.rname # => "Prié, Éric" eric.rname.encoding.name # => "UTF-8" eric.original # => "PRIÉ, éric" eric.original.encoding.name # => "UTF-8" Accented letters can be transliterated into their US-ASCII counterparts by setting the :chars option, which is available in all accessors. For example: eric.rname(:chars => "US-ASCII") # => "Prie, Eric" eric.original(:chars => "US-ASCII") # => "PRIE, eric" Also possible is the preservation of ISO-8859-1 characters, but the transliteration of all other accented characters: joe = Name.new('Józef', 'Żabiński') joe.rname # => "Żabiński, Józef" joe.rname(:chars => "ISO-8859-1") # => "Zabinski, Józef" joe.rname(:chars => "US-ASCII") # => "Zabinski, Jozef" Note that the character encoding of the strings returned is still UTF-8 in all cases. The same option also relaxes the need for accented characters to match exactly: eric.match('Eric', 'Prie') # => false eric.match('Eric', 'Prie', :chars => "US-ASCII") # => true joe.match('Józef', 'Zabinski') # => false joe.match('Józef', 'Zabinski', :chars => "ISO-8859-1") # => true == Customization of Alternative Names We saw above how _Bobby_ and _Robert_ were able to match because, by default, the matcher is aware of some common English nicknames. These name alternatives can be customised to handle additional nicknames and other types of alternative names such as common spelling error and player name changes. The alternative names consist of two arrays, one for first names and one for last names. Each array element is itself an array of strings representing a set of equivalent names. Here, for example, are some of the default first name alternatives: ["Anthony", "Tony"] ["James", "Jim", "Jimmy"] ["Michael", "Mike", "Mick", "Mikey"] ["Robert", "Bob", "Bobby"] ["Stephen", "Steve"] ["Steven", "Steve"] ["Thomas", "Tom", "Tommy"] ["William", "Will", "Willy", "Willie", "Bill"] The first of these means that _Anthony_ and _Tony_ are considered equivalent and can match. Name.new("Tony", "Miles").match("Anthony", "Miles") # => true Note that both _Steven_ and _Stephen_ match _Steve_ but, because they don't occur in the same group, they don't match each other. Name.new("Steven", "Hanly").match("Steve", "Hanly") # => true Name.new("Stephen", "Hanly").match("Steve", "Hanly") # => true Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => false To change alternative name behaviour, you can replace the default alternatives with a customized set perhaps stored in a database or a YAML file, as illustrated below: data = YAML.load(File open "my_last_name_alternatives.yaml") Name.load_alternatives(:first, data) data = YAML.load(File open "my_first_name_alternatives.yaml") Name.load_alternatives(:first, data) An example of one way in which you might want to customize the alternatives is to cater for common spelling mistakes such as _Steven_ and _Stephen_. These two names don't match by default, but you can make them so by replacing the two default rules: ["Stephen", "Steve"] ["Steven", "Steve"] with the following single rule: ["Stephen", "Steven", "Steve"] so that now: Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => true This kind of rule risks producing false positives - you must judge carefully whether that risk is outweighed by the benefits of being able to overcome spelling mistakes in the context of your application. Another use is to cater for English and Irish versions of the same name. For example, for last names: [Murphy, Murchadha] or for first names, including spelling variations: [Patrick, Pat, Paddy, Padraig, Padraic, Padhraig, Padhraic] == Conditional Alternatives Normally, entries in the two arrays are just lists of alternative names. There is one exception to this however, when one of the entries (it doesn't matter which one but, by convention, the last one) is a regular expression. Here is an example that might be added to the last name alternatives: ["Quinn", "Benjamin", /^(Debbie|Deborah)$/] What this means is that the last names _Quinn_ and _Benjamin_ match but only when the first name matches the given regular expression. In this case it caters for a female whose last name changed after marriage. Name.new("Debbie", "Quinn").match("Debbie", "Benjamin") # => true Name.new("Mark", "Quinn").match("Mark", "Benjamin") # => false Another example, this time for first names, is: ["Sean", "John", /^Bradley$/] This caters for an individual who is known by two normally unrelated first names. The two first names only match when the last name is _Bradley_. Name.new("John", "Bradley").match("Sean", "Bradley") # => true Name.new("John", "Alfred").match("Sean", "Alfred") # => false == Author Mark Orr, rating officer for the Irish Chess Union (ICU[http://icu.ie]).