README.rdoc in icu_name-0.1.4 vs README.rdoc in icu_name-1.0.0

- old
+ new
@@ -21,96 +21,189 @@
 
   robert = ICU::Name.new(' robert  j ', ' FISHER ')
 
 Capitalisation, white space and punctuation will all be automatically corrected:
 
-  robert.name                                             # => 'Robert J. Fischer'
-  robert.rname                                            # => 'Fischer, Robert J.'  (reversed name)
+  robert.name                                                 # => 'Robert J. Fischer'
+  robert.rname                                                # => 'Fischer, Robert J.'  (reversed name)
 
 The input text, without any changes apart from white-space cleanup and the insertion of a comma
-(to separate the two names), is returned by the _original_ method:
+(to separate the two names), is returned by the <tt>original</tt> method:
 
-  robert.original                                         # => 'FISCHER, robert j'
+  robert.original                                             # => 'FISCHER, robert j'
 
 To avoid ambiguity when either the first or second names consist of multiple words, it is better to
-supply the two separately, if known. However, the full name can be supplied alone to the constructor
-and a guess will be made as to the first and last names (the last distinct word becomes the last name).
+supply the two separately. If the full name is supplied alone to the constructor, without any indication
+of where the first names end, then the last distinct name is assumed to be the last name.
 
   bobby = ICU::Name.new(' bobby  fischer ')
 
-  bobby.first                                             # => 'Bobby'
-  bobby.last                                              # => 'Fischer'
+  bobby.first                                                 # => 'Bobby'
+  bobby.last                                                  # => 'Fischer'
 
-But in this case, since the names were not supplied separately, the _original_ text will not contain a comma:
+In this case, since the names were not supplied separately, the <tt>original</tt> text will not contain a comma:
 
-  bobby.original                                          # => 'bobby fischer'
+  bobby.original                                              # => 'bobby fischer'
 
 Names will match even if one is missing middle initials or if a nickname is used for one of the first names.
 
-  bobby.match('Robert J.', 'Fischer')                     # => true
+  bobby.match('Robert J.', 'Fischer')                         # => true
 
-Note that the class is aware of only common nicknames (e.g. _Bobby_ and _Robert_, _Bill_ and _William_, etc)
-and not all possibilities.
+The method <tt>alternatives</tt> can be used to list alternatives to a given first or last name:
 
-Supplying the _match_ method with strings is equivalent to instantiating a Name instance with the same
+  Name.new('Stephen', 'Orr').alternatives(:first)             # => ["Steve"]
+  Name.new('Michael Stephen', 'Orr').alternatives(:first)     # => ["Steve", "Mike", "Mick", "Mikey"],
+  Name.new('Mark', 'Orr').alternatives(:first)                # => []
+  
+By default the class is only aware of a few common alternatives for first names (e.g. _Bobby_ and _Robert_,
+_Bill_ and _William_, etc). However, this can be customized (see below).
+
+Supplying the <tt>match</tt> method with strings is equivalent to instantiating an instance with the same
 strings and then matching it. So, for example the following are equivalent:
 
-  robert.match('R.', 'Fischer')                           # => true
-  robert.match(ICU::Name.new('R.', 'Fischer'))            # => true
+  robert.match('R.', 'Fischer')                               # => true
+  robert.match(ICU::Name.new('R.', 'Fischer'))                # => true
 
-The inital _R_, for example, matches the first letter of _Robert_. However, nickname matches will not
-always work with initials. In the next example, the initial _R_ does not match the first letter _B_ of the
-nickname _Bobby_.
+Here the inital _R_ matches the first letter of _Robert_. However, nickname matches will not
+always work with initials. In the next example, the initial _R_ does not match the first letter
+_B_ of the nickname _Bobby_.
 
-  bobby.match('R. J.', 'Fischer')                         # => false
+  bobby.match('R. J.', 'Fischer')                             # => false
 
-Some of the ways last names are canonicalised are illustrated below:
+Some other ways last names are canonicalised are illustrated below:
 
-  ICU::Name.new('John', 'O Reilly').last                  # => "O'Reilly"
-  ICU::Name.new('dave', 'mcmanus').last                   # => "McManus"
+  ICU::Name.new('John', 'O Reilly').last                      # => "O'Reilly, John"
+  ICU::Name.new('dave', 'mcmanus').last                       # => "McManus, Dave"
 
 == Characters and Encoding
 
 The class can only cope with Latin characters, including those with diacritics (accents).
 Along with hyphens and single quotes (which represent apostophes) letters in ISO-8859-1
 (e.g. "a", "è", "Ö") and letters outside ISO-8859-1 which are decomposable into a US-ASCII
 character plus one or more diacritics (e.g. "ł" or "Ś") are preserved, while everything
 else is removed.
 
-  ICU::Name.new('éric', 'PRIÉ').name                      # => "Éric Prié"
-  ICU::Name.new('BARTŁOMIEJ', 'śliwa').name               # => "Bartłomiej Śliwa"
-  ICU::Name.new(' 渡井美代子').name                            # => ""
+  ICU::Name.new('éric', 'PRIÉ').name                          # => "Éric Prié"
+  ICU::Name.new('BARTŁOMIEJ', 'śliwa').name                   # => "Bartłomiej Śliwa"
+  ICU::Name.new('Սմբատ', 'Լպուտյան').name                     # => ""
 
-The various accessors (_first_, _last_, _name_, _rname_, _to_s_, _original_) always return
+The various accessors (<tt>first</tt>, <tt>last</tt>, <tt>name</tt>, <tt>rname</tt>, <tt>to_s</tt>, <tt>original</tt>) always return
 strings encoded in UTF-8, no matter what the input encoding.
 
   eric = ICU::Name.new('éric'.encode("ISO-8859-1"), 'PRIÉ'.force_encoding("ASCII-8BIT"))
-  eric.rname                                              # => "Prié, Éric"
-  eric.rname.encoding.name                                # => "UTF-8"
-  eric.original                                           # => "PRIÉ, éric"
-  eric.original.encoding.name                             # => "UTF-8"
+  eric.rname                                                  # => "Prié, Éric"
+  eric.rname.encoding.name                                    # => "UTF-8"
+  eric.original                                               # => "PRIÉ, éric"
+  eric.original.encoding.name                                 # => "UTF-8"
 
 Accented letters can be transliterated into their US-ASCII counterparts by setting the
-_chars_ option, which is available in all accessors. For example:
+<tt>:chars</tt> option, which is available in all accessors. For example:
 
-  eric.rname(:chars => "US-ASCII")                        # => "Prie, Eric"
-  eric.original(:chars => "US-ASCII")                     # => "PRIE, eric"
+  eric.rname(:chars => "US-ASCII")                            # => "Prie, Eric"
+  eric.original(:chars => "US-ASCII")                         # => "PRIE, eric"
 
 Also possible is the preservation of ISO-8859-1 characters, but the transliteration of
 all other accented characters:
 
   joe = Name.new('Józef', 'Żabiński')
-  joe.rname                                               # => "Żabiński, Józef"
-  joe.rname(:chars => "ISO-8859-1")                       # => "Zabinski, Józef"
-  joe.rname(:chars => "US-ASCII")                         # => "Zabinski, Jozef"
+  joe.rname                                                   # => "Żabiński, Józef"
+  joe.rname(:chars => "ISO-8859-1")                           # => "Zabinski, Józef"
+  joe.rname(:chars => "US-ASCII")                             # => "Zabinski, Jozef"
 
 Note that the character encoding of the strings returned is still UTF-8 in all cases.
 The same option also relaxes the need for accented characters to match exactly:
 
-  eric.match('Eric', 'Prie')                              # => false
-  eric.match('Eric', 'Prie', :chars => "US-ASCII")        # => true
-  joe.match('Józef', 'Zabinski')                          # => false
-  joe.match('Józef', 'Zabinski', :chars => "ISO-8859-1")  # => true
+  eric.match('Eric', 'Prie')                                  # => false
+  eric.match('Eric', 'Prie', :chars => "US-ASCII")            # => true
+  joe.match('Józef', 'Zabinski')                              # => false
+  joe.match('Józef', 'Zabinski', :chars => "ISO-8859-1")      # => true
+
+== Customization of Alternative Names
+
+We saw above how _Bobby_ and _Robert_ were able to match because, by default, the
+matcher is aware of some common English nicknames. These name alternatives can be
+customised to handle additional nick names and other types of alternative names
+such as common spelling mistakes and name changes.
+
+The alternative names are specified in two YAML files, one for first names and
+one for last names. Each YAML file represents an array and each element in the
+array is an array representing a set of alternative names. Here, for example,
+are some of the default first name alternatives:
+
+  [Anthony, Tony]
+  [James, Jim, Jimmy]
+  [Michael, Mike, Mick, Mikey]
+  [Robert, Bob, Bobby]
+  [Stephen, Steve]
+  [Steven, Steve]
+  [Thomas, Tom, Tommy]
+  [William, Will, Willy, Willie, Bill]
+
+The first of these means that _Anthony_ and _Tony_ are considered equivalent and can match.
+
+  Name.new("Tony", "Miles").match("Anthony", "Miles")         # => true
+
+Note that both _Steven_ and _Stephen_ match _Steve_ but, because they don't occur in the
+same group, they don't match each other.
+
+  Name.new("Steven", "Hanly").match("Steve", "Hanly")         # => true
+  Name.new("Stephen", "Hanly").match("Steve", "Hanly")        # => true
+  Name.new("Stephen", "Hanly").match("Steven", "Hanly")       # => false
+
+To customize alternative name behaviour, prepare YAML files with your chosen alternatives
+and then replace the default alternatives like this:
+
+  Name.load_alternatives(:first, "my_first_name_alternatives.yaml")
+  Name.load_alternatives(:last, "my_last_name_alternatives.yaml")
+
+An example of one way in which you might want to customize the alternatives is to
+cater for common spelling mistakes such as _Steven_ and _Stephen_. These two names
+don't match by default, but you can make them so by replacing the two default rules:
+
+  [Stephen, Steve]
+  [Steven, Steve]
+
+with the following single rule:
+
+  [Stephen, Steven, Steve]
+
+so that now:
+
+  Name.new("Stephen", "Hanly").match("Steven", "Hanly")       # => true
+
+Another use is to cater for English and Irish versions of the same name. For example,
+for last names:
+
+  [Murphy, Murchadha]
+
+or for first names, including spelling variations:
+
+  [Patrick, Pat, Paddy, Padraig, Padraic, Padhraig, Padhraic]
+
+== Conditional Alternatives
+
+Normally, entries in the two YAML files are just lists of alternative names. There is one
+exception to this however, when one of the entries (it doesn't matter which one but,
+by convention, the last one) is a regular expression. Here is an example that might
+be added to the last name alternatives:
+
+  [Quinn, Benjamin, !ruby/regexp /^(Debbie|Deborah)$/]
+
+What this means is that the last names _Quinn_ and _Benjamin_ match but only when the
+first name matches the regular expression.
+
+  Name.new("Debbie", "Quinn").match("Debbie", "Benjamin")     # => true
+  Name.new("Mark", "Quinn").match("Mark", "Benjamin")         # => false
+
+Another example, this time for first names, is:
+
+  [Sean, John, !ruby/regexp /^Bradley$/]
+
+This caters for an individual who is known by two normally unrelated first names.
+We only want these two names to match for that individual and no others.
+
+  Name.new("John", "Bradley").match("Sean", "Bradley")        # => true
+  Name.new("John", "Alfred").match("Sean", "Alfred")          # => false
 
 == Author
 
 Mark Orr, rating officer for the Irish Chess Union (ICU[http://icu.ie]).