= ICU Tournament
Canonicalises and matches person names with Western European characters.
Note: version 1.1.0 dropped support for characters beyond codepoint 255 and became independent of activesupport and i18n.
== Installation
For ruby 1.9.2, 1.9.3, 2.0.0.
gem install icu_name
== Names
This class exists for two main purposes:
* to normalise to a common format the different ways Irish person names are typed in practice
* to be able to match two names even if they are not exactly the same in their original form
To create a name object, supply both the first and second names separately to the constructor.
robert = ICU::Name.new(' robert j ', ' FISHER ')
Capitalisation, white space and punctuation will all be automatically corrected:
robert.name # => 'Robert J. Fischer'
robert.rname # => 'Fischer, Robert J.' (reversed name)
The input text, without any changes apart from white-space cleanup and the insertion of a comma
(to separate the two names), is returned by the original method:
robert.original # => 'FISCHER, robert j'
To avoid ambiguity when either the first or second names consist of multiple words, it is better to
supply the two separately. If the full name is supplied alone to the constructor, without any indication
of where the first names end, then the last distinct name is assumed to be the last name.
bobby = ICU::Name.new(' bobby fischer ')
bobby.first # => 'Bobby'
bobby.last # => 'Fischer'
In this case, since the names were not supplied separately, the original text will not contain a comma:
bobby.original # => 'bobby fischer'
Names will match even if one is missing middle initials or if a nickname is used for one of the first names.
bobby.match('Robert J.', 'Fischer') # => true
The method alternatives can be used to list alternatives to a given first or last name:
Name.new('Stephen', 'Orr').alternatives(:first) # => ["Steve"]
Name.new('Michael Stephen', 'Orr').alternatives(:first) # => ["Steve", "Mike", "Mick", "Mikey"],
Name.new('Mark', 'Orr').alternatives(:first) # => []
By default the class is only aware of a few common alternatives for first names (e.g. _Bobby_ and _Robert_,
_Bill_ and _William_, etc). However, this can be customized (see below).
Supplying the match method with strings is equivalent to instantiating an instance with the same
strings and then matching it. So, for example the following are equivalent:
robert.match('R.', 'Fischer') # => true
robert.match(ICU::Name.new('R.', 'Fischer')) # => true
Here the inital _R_ matches the first letter of _Robert_. However, nickname matches will not
always work with initials. In the next example, the initial _R_ does not match the first letter
_B_ of the nickname _Bobby_.
bobby.match('R. J.', 'Fischer') # => false
Some other ways last names are canonicalised are illustrated below:
ICU::Name.new('John', 'O Reilly').last # => "O'Reilly, John"
ICU::Name.new('dave', 'mcmanus').last # => "McManus, Dave"
== Characters and Encoding
The class can only cope with Latin characters, including those with diacritics (accents).
Hyphens, single quotes (which represent apostophes) and letters in the ISO-8859-1 range
(e.g. "a", "è", "Ö") are preserved, while everything else is removed (unsupported).
ICU::Name.new('éric', 'PRIÉ').name # => "Éric Prié"
ICU::Name.new('BARTŁOMIEJ', 'śliwa').name # => "Bartomiej Liwa"
ICU::Name.new('Սմբատ', 'Լպուտյան').name # => ""
The various accessors (first, last, name, rname, to_s, original) always return
strings encoded in UTF-8, no matter what the input encoding.
eric = ICU::Name.new('éric'.encode("ISO-8859-1"), 'PRIÉ'.force_encoding("ASCII-8BIT"))
eric.rname # => "Prié, Éric"
eric.rname.encoding.name # => "UTF-8"
eric.original # => "PRIÉ, éric"
eric.original.encoding.name # => "UTF-8"
Accented letters can be transliterated into their US-ASCII counterparts by setting the
:chars option, which is available in all accessors. For example:
eric.rname(:chars => "US-ASCII") # => "Prie, Eric"
eric.original(:chars => "US-ASCII") # => "PRIE, eric"
Note that the character encoding of the strings returned is still UTF-8 in all cases.
The same option also relaxes the need for accented characters to match exactly:
eric.match('Eric', 'Prie') # => false
eric.match('Eric', 'Prie', :chars => "US-ASCII") # => true
== Customization of Alternative Names
We saw above how _Bobby_ and _Robert_ were able to match because, by default, the
matcher is aware of some common English nicknames. These name alternatives can be
customised to handle additional nicknames and other types of alternative names
such as common spelling error and player name changes.
The alternative names consist of two arrays, one for first names and
one for last names. Each array element is itself an array of strings
representing a set of equivalent names. Here, for example, are some
of the default first name alternatives:
["Anthony", "Tony"]
["James", "Jim", "Jimmy"]
["Michael", "Mike", "Mick", "Mikey"]
["Robert", "Bob", "Bobby"]
["Stephen", "Steve"]
["Steven", "Steve"]
["Thomas", "Tom", "Tommy"]
["William", "Will", "Willy", "Willie", "Bill"]
The first of these means that _Anthony_ and _Tony_ are considered equivalent and can match.
Name.new("Tony", "Miles").match("Anthony", "Miles") # => true
Note that both _Steven_ and _Stephen_ match _Steve_ but, because they don't occur in the
same group, they don't match each other.
Name.new("Steven", "Hanly").match("Steve", "Hanly") # => true
Name.new("Stephen", "Hanly").match("Steve", "Hanly") # => true
Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => false
To change alternative name behaviour, you can replace the default alternatives
with a customized set perhaps stored in a database or a YAML file, as illustrated below:
data = YAML.load(File open "my_last_name_alternatives.yaml")
Name.load_alternatives(:last, data)
data = YAML.load(File open "my_first_name_alternatives.yaml")
Name.load_alternatives(:first, data)
An example of one way in which you might want to customize the alternatives is to
cater for common spelling mistakes such as _Steven_ and _Stephen_. These two names
don't match by default, but you can make them so by replacing the two default rules:
["Stephen", "Steve"]
["Steven", "Steve"]
with the following single rule:
["Stephen", "Steven", "Steve"]
so that now:
Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => true
This kind of rule risks producing false positives - you must judge
whether that risk is outweighed by the benefits of being able to overcome
spelling mistakes in the context of your application.
Another use is to cater for English and Irish versions of the same name.
For example, for last names:
[Murphy, Murchadha]
or for first names, including spelling variations:
[Patrick, Pat, Paddy, Padraig, Padraic, Padhraig, Padhraic]
== Conditional Alternatives
Normally, entries in the two arrays are just lists of alternative names. There is one
exception to this however, when one of the entries (it doesn't matter which one but,
by convention, the last one) is a regular expression. Here is an example that might
be added to the last name alternatives:
["Quinn", "Benjamin", /^(Debbie|Deborah)$/]
What this means is that the last names _Quinn_ and _Benjamin_ match but only when the
first name matches the given regular expression. In this case it caters for a female
whose last name changed after marriage.
Name.new("Debbie", "Quinn").match("Debbie", "Benjamin") # => true
Name.new("Mark", "Quinn").match("Mark", "Benjamin") # => false
Another example, this time for first names, is:
["Sean", "John", /^Bradley$/]
This caters for an individual who is known by two normally unrelated first names.
The two first names only match when the last name is _Bradley_.
Name.new("John", "Bradley").match("Sean", "Bradley") # => true
Name.new("John", "Alfred").match("Sean", "Alfred") # => false
== Author
Mark Orr, rating officer for the Irish Chess Union (ICU[http://icu.ie]).