= ICU Tournament
Canonicalises and matches person names with Western European characters and first and last names.
== Installation
For ruby 1.9.2 and above.
gem install icu_name
It depends on _active_support_ and _i18n_.
== Names
This class exists for two main purposes:
* to normalise to a common format the different ways names are typed in practice
* to be able to match two names even if they are not exactly the same
To create a name object, supply both the first and second names separately to the constructor.
robert = ICU::Name.new(' robert j ', ' FISHER ')
Capitalisation, white space and punctuation will all be automatically corrected:
robert.name # => 'Robert J. Fischer'
robert.rname # => 'Fischer, Robert J.' (reversed name)
The input text, without any changes apart from white-space cleanup and the insertion of a comma
(to separate the two names), is returned by the original method:
robert.original # => 'FISCHER, robert j'
To avoid ambiguity when either the first or second names consist of multiple words, it is better to
supply the two separately. If the full name is supplied alone to the constructor, without any indication
of where the first names end, then the last distinct name is assumed to be the last name.
bobby = ICU::Name.new(' bobby fischer ')
bobby.first # => 'Bobby'
bobby.last # => 'Fischer'
In this case, since the names were not supplied separately, the original text will not contain a comma:
bobby.original # => 'bobby fischer'
Names will match even if one is missing middle initials or if a nickname is used for one of the first names.
bobby.match('Robert J.', 'Fischer') # => true
The method alternatives can be used to list alternatives to a given first or last name:
Name.new('Stephen', 'Orr').alternatives(:first) # => ["Steve"]
Name.new('Michael Stephen', 'Orr').alternatives(:first) # => ["Steve", "Mike", "Mick", "Mikey"],
Name.new('Mark', 'Orr').alternatives(:first) # => []
By default the class is only aware of a few common alternatives for first names (e.g. _Bobby_ and _Robert_,
_Bill_ and _William_, etc). However, this can be customized (see below).
Supplying the match method with strings is equivalent to instantiating an instance with the same
strings and then matching it. So, for example the following are equivalent:
robert.match('R.', 'Fischer') # => true
robert.match(ICU::Name.new('R.', 'Fischer')) # => true
Here the inital _R_ matches the first letter of _Robert_. However, nickname matches will not
always work with initials. In the next example, the initial _R_ does not match the first letter
_B_ of the nickname _Bobby_.
bobby.match('R. J.', 'Fischer') # => false
Some other ways last names are canonicalised are illustrated below:
ICU::Name.new('John', 'O Reilly').last # => "O'Reilly, John"
ICU::Name.new('dave', 'mcmanus').last # => "McManus, Dave"
== Characters and Encoding
The class can only cope with Latin characters, including those with diacritics (accents).
Along with hyphens and single quotes (which represent apostophes) letters in ISO-8859-1
(e.g. "a", "è", "Ö") and letters outside ISO-8859-1 which are decomposable into a US-ASCII
character plus one or more diacritics (e.g. "ł" or "Ś") are preserved, while everything
else is removed.
ICU::Name.new('éric', 'PRIÉ').name # => "Éric Prié"
ICU::Name.new('BARTŁOMIEJ', 'śliwa').name # => "Bartłomiej Śliwa"
ICU::Name.new('Սմբատ', 'Լպուտյան').name # => ""
The various accessors (first, last, name, rname, to_s, original) always return
strings encoded in UTF-8, no matter what the input encoding.
eric = ICU::Name.new('éric'.encode("ISO-8859-1"), 'PRIÉ'.force_encoding("ASCII-8BIT"))
eric.rname # => "Prié, Éric"
eric.rname.encoding.name # => "UTF-8"
eric.original # => "PRIÉ, éric"
eric.original.encoding.name # => "UTF-8"
Accented letters can be transliterated into their US-ASCII counterparts by setting the
:chars option, which is available in all accessors. For example:
eric.rname(:chars => "US-ASCII") # => "Prie, Eric"
eric.original(:chars => "US-ASCII") # => "PRIE, eric"
Also possible is the preservation of ISO-8859-1 characters, but the transliteration of
all other accented characters:
joe = Name.new('Józef', 'Żabiński')
joe.rname # => "Żabiński, Józef"
joe.rname(:chars => "ISO-8859-1") # => "Zabinski, Józef"
joe.rname(:chars => "US-ASCII") # => "Zabinski, Jozef"
Note that the character encoding of the strings returned is still UTF-8 in all cases.
The same option also relaxes the need for accented characters to match exactly:
eric.match('Eric', 'Prie') # => false
eric.match('Eric', 'Prie', :chars => "US-ASCII") # => true
joe.match('Józef', 'Zabinski') # => false
joe.match('Józef', 'Zabinski', :chars => "ISO-8859-1") # => true
== Customization of Alternative Names
We saw above how _Bobby_ and _Robert_ were able to match because, by default, the
matcher is aware of some common English nicknames. These name alternatives can be
customised to handle additional nick names and other types of alternative names
such as common spelling mistakes and name changes.
The alternative names are specified in two YAML files, one for first names and
one for last names. Each YAML file represents an array and each element in the
array is an array representing a set of alternative names. Here, for example,
are some of the default first name alternatives:
[Anthony, Tony]
[James, Jim, Jimmy]
[Michael, Mike, Mick, Mikey]
[Robert, Bob, Bobby]
[Stephen, Steve]
[Steven, Steve]
[Thomas, Tom, Tommy]
[William, Will, Willy, Willie, Bill]
The first of these means that _Anthony_ and _Tony_ are considered equivalent and can match.
Name.new("Tony", "Miles").match("Anthony", "Miles") # => true
Note that both _Steven_ and _Stephen_ match _Steve_ but, because they don't occur in the
same group, they don't match each other.
Name.new("Steven", "Hanly").match("Steve", "Hanly") # => true
Name.new("Stephen", "Hanly").match("Steve", "Hanly") # => true
Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => false
To customize alternative name behaviour, prepare YAML files with your chosen alternatives
and then replace the default alternatives like this:
Name.load_alternatives(:first, "my_first_name_alternatives.yaml")
Name.load_alternatives(:last, "my_last_name_alternatives.yaml")
An example of one way in which you might want to customize the alternatives is to
cater for common spelling mistakes such as _Steven_ and _Stephen_. These two names
don't match by default, but you can make them so by replacing the two default rules:
[Stephen, Steve]
[Steven, Steve]
with the following single rule:
[Stephen, Steven, Steve]
so that now:
Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => true
Another use is to cater for English and Irish versions of the same name. For example,
for last names:
[Murphy, Murchadha]
or for first names, including spelling variations:
[Patrick, Pat, Paddy, Padraig, Padraic, Padhraig, Padhraic]
== Conditional Alternatives
Normally, entries in the two YAML files are just lists of alternative names. There is one
exception to this however, when one of the entries (it doesn't matter which one but,
by convention, the last one) is a regular expression. Here is an example that might
be added to the last name alternatives:
[Quinn, Benjamin, !ruby/regexp /^(Debbie|Deborah)$/]
What this means is that the last names _Quinn_ and _Benjamin_ match but only when the
first name matches the regular expression.
Name.new("Debbie", "Quinn").match("Debbie", "Benjamin") # => true
Name.new("Mark", "Quinn").match("Mark", "Benjamin") # => false
Another example, this time for first names, is:
[Sean, John, !ruby/regexp /^Bradley$/]
This caters for an individual who is known by two normally unrelated first names.
We only want these two names to match for that individual and no others.
Name.new("John", "Bradley").match("Sean", "Bradley") # => true
Name.new("John", "Alfred").match("Sean", "Alfred") # => false
== Author
Mark Orr, rating officer for the Irish Chess Union (ICU[http://icu.ie]).