README.rdoc in icu_name-0.1.4 vs README.rdoc in icu_name-1.0.0
- old
+ new
@@ -21,96 +21,189 @@
robert = ICU::Name.new(' robert j ', ' FISHER ')
Capitalisation, white space and punctuation will all be automatically corrected:
- robert.name # => 'Robert J. Fischer'
- robert.rname # => 'Fischer, Robert J.' (reversed name)
+ robert.name # => 'Robert J. Fischer'
+ robert.rname # => 'Fischer, Robert J.' (reversed name)
The input text, without any changes apart from white-space cleanup and the insertion of a comma
-(to separate the two names), is returned by the _original_ method:
+(to separate the two names), is returned by the <tt>original</tt> method:
- robert.original # => 'FISCHER, robert j'
+ robert.original # => 'FISCHER, robert j'
To avoid ambiguity when either the first or second names consist of multiple words, it is better to
-supply the two separately, if known. However, the full name can be supplied alone to the constructor
-and a guess will be made as to the first and last names (the last distinct word becomes the last name).
+supply the two separately. If the full name is supplied alone to the constructor, without any indication
+of where the first names end, then the last distinct name is assumed to be the last name.
bobby = ICU::Name.new(' bobby fischer ')
- bobby.first # => 'Bobby'
- bobby.last # => 'Fischer'
+ bobby.first # => 'Bobby'
+ bobby.last # => 'Fischer'
-But in this case, since the names were not supplied separately, the _original_ text will not contain a comma:
+In this case, since the names were not supplied separately, the <tt>original</tt> text will not contain a comma:
- bobby.original # => 'bobby fischer'
+ bobby.original # => 'bobby fischer'
Names will match even if one is missing middle initials or if a nickname is used for one of the first names.
- bobby.match('Robert J.', 'Fischer') # => true
+ bobby.match('Robert J.', 'Fischer') # => true
-Note that the class is aware of only common nicknames (e.g. _Bobby_ and _Robert_, _Bill_ and _William_, etc)
-and not all possibilities.
+The method <tt>alternatives</tt> can be used to list alternatives to a given first or last name:
-Supplying the _match_ method with strings is equivalent to instantiating a Name instance with the same
+ Name.new('Stephen', 'Orr').alternatives(:first) # => ["Steve"]
+ Name.new('Michael Stephen', 'Orr').alternatives(:first) # => ["Steve", "Mike", "Mick", "Mikey"],
+ Name.new('Mark', 'Orr').alternatives(:first) # => []
+
+By default the class is only aware of a few common alternatives for first names (e.g. _Bobby_ and _Robert_,
+_Bill_ and _William_, etc). However, this can be customized (see below).
+
+Supplying the <tt>match</tt> method with strings is equivalent to instantiating an instance with the same
strings and then matching it. So, for example the following are equivalent:
- robert.match('R.', 'Fischer') # => true
- robert.match(ICU::Name.new('R.', 'Fischer')) # => true
+ robert.match('R.', 'Fischer') # => true
+ robert.match(ICU::Name.new('R.', 'Fischer')) # => true
-The inital _R_, for example, matches the first letter of _Robert_. However, nickname matches will not
-always work with initials. In the next example, the initial _R_ does not match the first letter _B_ of the
-nickname _Bobby_.
+Here the inital _R_ matches the first letter of _Robert_. However, nickname matches will not
+always work with initials. In the next example, the initial _R_ does not match the first letter
+_B_ of the nickname _Bobby_.
- bobby.match('R. J.', 'Fischer') # => false
+ bobby.match('R. J.', 'Fischer') # => false
-Some of the ways last names are canonicalised are illustrated below:
+Some other ways last names are canonicalised are illustrated below:
- ICU::Name.new('John', 'O Reilly').last # => "O'Reilly"
- ICU::Name.new('dave', 'mcmanus').last # => "McManus"
+ ICU::Name.new('John', 'O Reilly').last # => "O'Reilly, John"
+ ICU::Name.new('dave', 'mcmanus').last # => "McManus, Dave"
== Characters and Encoding
The class can only cope with Latin characters, including those with diacritics (accents).
Along with hyphens and single quotes (which represent apostophes) letters in ISO-8859-1
(e.g. "a", "è", "Ö") and letters outside ISO-8859-1 which are decomposable into a US-ASCII
character plus one or more diacritics (e.g. "ł" or "Ś") are preserved, while everything
else is removed.
- ICU::Name.new('éric', 'PRIÉ').name # => "Éric Prié"
- ICU::Name.new('BARTŁOMIEJ', 'śliwa').name # => "Bartłomiej Śliwa"
- ICU::Name.new(' 渡井美代子').name # => ""
+ ICU::Name.new('éric', 'PRIÉ').name # => "Éric Prié"
+ ICU::Name.new('BARTŁOMIEJ', 'śliwa').name # => "Bartłomiej Śliwa"
+ ICU::Name.new('Սմբատ', 'Լպուտյան').name # => ""
-The various accessors (_first_, _last_, _name_, _rname_, _to_s_, _original_) always return
+The various accessors (<tt>first</tt>, <tt>last</tt>, <tt>name</tt>, <tt>rname</tt>, <tt>to_s</tt>, <tt>original</tt>) always return
strings encoded in UTF-8, no matter what the input encoding.
eric = ICU::Name.new('éric'.encode("ISO-8859-1"), 'PRIÉ'.force_encoding("ASCII-8BIT"))
- eric.rname # => "Prié, Éric"
- eric.rname.encoding.name # => "UTF-8"
- eric.original # => "PRIÉ, éric"
- eric.original.encoding.name # => "UTF-8"
+ eric.rname # => "Prié, Éric"
+ eric.rname.encoding.name # => "UTF-8"
+ eric.original # => "PRIÉ, éric"
+ eric.original.encoding.name # => "UTF-8"
Accented letters can be transliterated into their US-ASCII counterparts by setting the
-_chars_ option, which is available in all accessors. For example:
+<tt>:chars</tt> option, which is available in all accessors. For example:
- eric.rname(:chars => "US-ASCII") # => "Prie, Eric"
- eric.original(:chars => "US-ASCII") # => "PRIE, eric"
+ eric.rname(:chars => "US-ASCII") # => "Prie, Eric"
+ eric.original(:chars => "US-ASCII") # => "PRIE, eric"
Also possible is the preservation of ISO-8859-1 characters, but the transliteration of
all other accented characters:
joe = Name.new('Józef', 'Żabiński')
- joe.rname # => "Żabiński, Józef"
- joe.rname(:chars => "ISO-8859-1") # => "Zabinski, Józef"
- joe.rname(:chars => "US-ASCII") # => "Zabinski, Jozef"
+ joe.rname # => "Żabiński, Józef"
+ joe.rname(:chars => "ISO-8859-1") # => "Zabinski, Józef"
+ joe.rname(:chars => "US-ASCII") # => "Zabinski, Jozef"
Note that the character encoding of the strings returned is still UTF-8 in all cases.
The same option also relaxes the need for accented characters to match exactly:
- eric.match('Eric', 'Prie') # => false
- eric.match('Eric', 'Prie', :chars => "US-ASCII") # => true
- joe.match('Józef', 'Zabinski') # => false
- joe.match('Józef', 'Zabinski', :chars => "ISO-8859-1") # => true
+ eric.match('Eric', 'Prie') # => false
+ eric.match('Eric', 'Prie', :chars => "US-ASCII") # => true
+ joe.match('Józef', 'Zabinski') # => false
+ joe.match('Józef', 'Zabinski', :chars => "ISO-8859-1") # => true
+
+== Customization of Alternative Names
+
+We saw above how _Bobby_ and _Robert_ were able to match because, by default, the
+matcher is aware of some common English nicknames. These name alternatives can be
+customised to handle additional nick names and other types of alternative names
+such as common spelling mistakes and name changes.
+
+The alternative names are specified in two YAML files, one for first names and
+one for last names. Each YAML file represents an array and each element in the
+array is an array representing a set of alternative names. Here, for example,
+are some of the default first name alternatives:
+
+ [Anthony, Tony]
+ [James, Jim, Jimmy]
+ [Michael, Mike, Mick, Mikey]
+ [Robert, Bob, Bobby]
+ [Stephen, Steve]
+ [Steven, Steve]
+ [Thomas, Tom, Tommy]
+ [William, Will, Willy, Willie, Bill]
+
+The first of these means that _Anthony_ and _Tony_ are considered equivalent and can match.
+
+ Name.new("Tony", "Miles").match("Anthony", "Miles") # => true
+
+Note that both _Steven_ and _Stephen_ match _Steve_ but, because they don't occur in the
+same group, they don't match each other.
+
+ Name.new("Steven", "Hanly").match("Steve", "Hanly") # => true
+ Name.new("Stephen", "Hanly").match("Steve", "Hanly") # => true
+ Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => false
+
+To customize alternative name behaviour, prepare YAML files with your chosen alternatives
+and then replace the default alternatives like this:
+
+ Name.load_alternatives(:first, "my_first_name_alternatives.yaml")
+ Name.load_alternatives(:last, "my_last_name_alternatives.yaml")
+
+An example of one way in which you might want to customize the alternatives is to
+cater for common spelling mistakes such as _Steven_ and _Stephen_. These two names
+don't match by default, but you can make them so by replacing the two default rules:
+
+ [Stephen, Steve]
+ [Steven, Steve]
+
+with the following single rule:
+
+ [Stephen, Steven, Steve]
+
+so that now:
+
+ Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => true
+
+Another use is to cater for English and Irish versions of the same name. For example,
+for last names:
+
+ [Murphy, Murchadha]
+
+or for first names, including spelling variations:
+
+ [Patrick, Pat, Paddy, Padraig, Padraic, Padhraig, Padhraic]
+
+== Conditional Alternatives
+
+Normally, entries in the two YAML files are just lists of alternative names. There is one
+exception to this however, when one of the entries (it doesn't matter which one but,
+by convention, the last one) is a regular expression. Here is an example that might
+be added to the last name alternatives:
+
+ [Quinn, Benjamin, !ruby/regexp /^(Debbie|Deborah)$/]
+
+What this means is that the last names _Quinn_ and _Benjamin_ match but only when the
+first name matches the regular expression.
+
+ Name.new("Debbie", "Quinn").match("Debbie", "Benjamin") # => true
+ Name.new("Mark", "Quinn").match("Mark", "Benjamin") # => false
+
+Another example, this time for first names, is:
+
+ [Sean, John, !ruby/regexp /^Bradley$/]
+
+This caters for an individual who is known by two normally unrelated first names.
+We only want these two names to match for that individual and no others.
+
+ Name.new("John", "Bradley").match("Sean", "Bradley") # => true
+ Name.new("John", "Alfred").match("Sean", "Alfred") # => false
== Author
Mark Orr, rating officer for the Irish Chess Union (ICU[http://icu.ie]).