README.rdoc in icu_name-0.1.4 vs README.rdoc in icu_name-1.0.0

- old
+ new

@@ -21,96 +21,189 @@ robert = ICU::Name.new(' robert j ', ' FISHER ') Capitalisation, white space and punctuation will all be automatically corrected: - robert.name # => 'Robert J. Fischer' - robert.rname # => 'Fischer, Robert J.' (reversed name) + robert.name # => 'Robert J. Fischer' + robert.rname # => 'Fischer, Robert J.' (reversed name) The input text, without any changes apart from white-space cleanup and the insertion of a comma -(to separate the two names), is returned by the _original_ method: +(to separate the two names), is returned by the <tt>original</tt> method: - robert.original # => 'FISCHER, robert j' + robert.original # => 'FISCHER, robert j' To avoid ambiguity when either the first or second names consist of multiple words, it is better to -supply the two separately, if known. However, the full name can be supplied alone to the constructor -and a guess will be made as to the first and last names (the last distinct word becomes the last name). +supply the two separately. If the full name is supplied alone to the constructor, without any indication +of where the first names end, then the last distinct name is assumed to be the last name. bobby = ICU::Name.new(' bobby fischer ') - bobby.first # => 'Bobby' - bobby.last # => 'Fischer' + bobby.first # => 'Bobby' + bobby.last # => 'Fischer' -But in this case, since the names were not supplied separately, the _original_ text will not contain a comma: +In this case, since the names were not supplied separately, the <tt>original</tt> text will not contain a comma: - bobby.original # => 'bobby fischer' + bobby.original # => 'bobby fischer' Names will match even if one is missing middle initials or if a nickname is used for one of the first names. - bobby.match('Robert J.', 'Fischer') # => true + bobby.match('Robert J.', 'Fischer') # => true -Note that the class is aware of only common nicknames (e.g. _Bobby_ and _Robert_, _Bill_ and _William_, etc) -and not all possibilities. +The method <tt>alternatives</tt> can be used to list alternatives to a given first or last name: -Supplying the _match_ method with strings is equivalent to instantiating a Name instance with the same + Name.new('Stephen', 'Orr').alternatives(:first) # => ["Steve"] + Name.new('Michael Stephen', 'Orr').alternatives(:first) # => ["Steve", "Mike", "Mick", "Mikey"], + Name.new('Mark', 'Orr').alternatives(:first) # => [] + +By default the class is only aware of a few common alternatives for first names (e.g. _Bobby_ and _Robert_, +_Bill_ and _William_, etc). However, this can be customized (see below). + +Supplying the <tt>match</tt> method with strings is equivalent to instantiating an instance with the same strings and then matching it. So, for example the following are equivalent: - robert.match('R.', 'Fischer') # => true - robert.match(ICU::Name.new('R.', 'Fischer')) # => true + robert.match('R.', 'Fischer') # => true + robert.match(ICU::Name.new('R.', 'Fischer')) # => true -The inital _R_, for example, matches the first letter of _Robert_. However, nickname matches will not -always work with initials. In the next example, the initial _R_ does not match the first letter _B_ of the -nickname _Bobby_. +Here the inital _R_ matches the first letter of _Robert_. However, nickname matches will not +always work with initials. In the next example, the initial _R_ does not match the first letter +_B_ of the nickname _Bobby_. - bobby.match('R. J.', 'Fischer') # => false + bobby.match('R. J.', 'Fischer') # => false -Some of the ways last names are canonicalised are illustrated below: +Some other ways last names are canonicalised are illustrated below: - ICU::Name.new('John', 'O Reilly').last # => "O'Reilly" - ICU::Name.new('dave', 'mcmanus').last # => "McManus" + ICU::Name.new('John', 'O Reilly').last # => "O'Reilly, John" + ICU::Name.new('dave', 'mcmanus').last # => "McManus, Dave" == Characters and Encoding The class can only cope with Latin characters, including those with diacritics (accents). Along with hyphens and single quotes (which represent apostophes) letters in ISO-8859-1 (e.g. "a", "è", "Ö") and letters outside ISO-8859-1 which are decomposable into a US-ASCII character plus one or more diacritics (e.g. "ł" or "Ś") are preserved, while everything else is removed. - ICU::Name.new('éric', 'PRIÉ').name # => "Éric Prié" - ICU::Name.new('BARTŁOMIEJ', 'śliwa').name # => "Bartłomiej Śliwa" - ICU::Name.new(' 渡井美代子').name # => "" + ICU::Name.new('éric', 'PRIÉ').name # => "Éric Prié" + ICU::Name.new('BARTŁOMIEJ', 'śliwa').name # => "Bartłomiej Śliwa" + ICU::Name.new('Սմբատ', 'Լպուտյան').name # => "" -The various accessors (_first_, _last_, _name_, _rname_, _to_s_, _original_) always return +The various accessors (<tt>first</tt>, <tt>last</tt>, <tt>name</tt>, <tt>rname</tt>, <tt>to_s</tt>, <tt>original</tt>) always return strings encoded in UTF-8, no matter what the input encoding. eric = ICU::Name.new('éric'.encode("ISO-8859-1"), 'PRIÉ'.force_encoding("ASCII-8BIT")) - eric.rname # => "Prié, Éric" - eric.rname.encoding.name # => "UTF-8" - eric.original # => "PRIÉ, éric" - eric.original.encoding.name # => "UTF-8" + eric.rname # => "Prié, Éric" + eric.rname.encoding.name # => "UTF-8" + eric.original # => "PRIÉ, éric" + eric.original.encoding.name # => "UTF-8" Accented letters can be transliterated into their US-ASCII counterparts by setting the -_chars_ option, which is available in all accessors. For example: +<tt>:chars</tt> option, which is available in all accessors. For example: - eric.rname(:chars => "US-ASCII") # => "Prie, Eric" - eric.original(:chars => "US-ASCII") # => "PRIE, eric" + eric.rname(:chars => "US-ASCII") # => "Prie, Eric" + eric.original(:chars => "US-ASCII") # => "PRIE, eric" Also possible is the preservation of ISO-8859-1 characters, but the transliteration of all other accented characters: joe = Name.new('Józef', 'Żabiński') - joe.rname # => "Żabiński, Józef" - joe.rname(:chars => "ISO-8859-1") # => "Zabinski, Józef" - joe.rname(:chars => "US-ASCII") # => "Zabinski, Jozef" + joe.rname # => "Żabiński, Józef" + joe.rname(:chars => "ISO-8859-1") # => "Zabinski, Józef" + joe.rname(:chars => "US-ASCII") # => "Zabinski, Jozef" Note that the character encoding of the strings returned is still UTF-8 in all cases. The same option also relaxes the need for accented characters to match exactly: - eric.match('Eric', 'Prie') # => false - eric.match('Eric', 'Prie', :chars => "US-ASCII") # => true - joe.match('Józef', 'Zabinski') # => false - joe.match('Józef', 'Zabinski', :chars => "ISO-8859-1") # => true + eric.match('Eric', 'Prie') # => false + eric.match('Eric', 'Prie', :chars => "US-ASCII") # => true + joe.match('Józef', 'Zabinski') # => false + joe.match('Józef', 'Zabinski', :chars => "ISO-8859-1") # => true + +== Customization of Alternative Names + +We saw above how _Bobby_ and _Robert_ were able to match because, by default, the +matcher is aware of some common English nicknames. These name alternatives can be +customised to handle additional nick names and other types of alternative names +such as common spelling mistakes and name changes. + +The alternative names are specified in two YAML files, one for first names and +one for last names. Each YAML file represents an array and each element in the +array is an array representing a set of alternative names. Here, for example, +are some of the default first name alternatives: + + [Anthony, Tony] + [James, Jim, Jimmy] + [Michael, Mike, Mick, Mikey] + [Robert, Bob, Bobby] + [Stephen, Steve] + [Steven, Steve] + [Thomas, Tom, Tommy] + [William, Will, Willy, Willie, Bill] + +The first of these means that _Anthony_ and _Tony_ are considered equivalent and can match. + + Name.new("Tony", "Miles").match("Anthony", "Miles") # => true + +Note that both _Steven_ and _Stephen_ match _Steve_ but, because they don't occur in the +same group, they don't match each other. + + Name.new("Steven", "Hanly").match("Steve", "Hanly") # => true + Name.new("Stephen", "Hanly").match("Steve", "Hanly") # => true + Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => false + +To customize alternative name behaviour, prepare YAML files with your chosen alternatives +and then replace the default alternatives like this: + + Name.load_alternatives(:first, "my_first_name_alternatives.yaml") + Name.load_alternatives(:last, "my_last_name_alternatives.yaml") + +An example of one way in which you might want to customize the alternatives is to +cater for common spelling mistakes such as _Steven_ and _Stephen_. These two names +don't match by default, but you can make them so by replacing the two default rules: + + [Stephen, Steve] + [Steven, Steve] + +with the following single rule: + + [Stephen, Steven, Steve] + +so that now: + + Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => true + +Another use is to cater for English and Irish versions of the same name. For example, +for last names: + + [Murphy, Murchadha] + +or for first names, including spelling variations: + + [Patrick, Pat, Paddy, Padraig, Padraic, Padhraig, Padhraic] + +== Conditional Alternatives + +Normally, entries in the two YAML files are just lists of alternative names. There is one +exception to this however, when one of the entries (it doesn't matter which one but, +by convention, the last one) is a regular expression. Here is an example that might +be added to the last name alternatives: + + [Quinn, Benjamin, !ruby/regexp /^(Debbie|Deborah)$/] + +What this means is that the last names _Quinn_ and _Benjamin_ match but only when the +first name matches the regular expression. + + Name.new("Debbie", "Quinn").match("Debbie", "Benjamin") # => true + Name.new("Mark", "Quinn").match("Mark", "Benjamin") # => false + +Another example, this time for first names, is: + + [Sean, John, !ruby/regexp /^Bradley$/] + +This caters for an individual who is known by two normally unrelated first names. +We only want these two names to match for that individual and no others. + + Name.new("John", "Bradley").match("Sean", "Bradley") # => true + Name.new("John", "Alfred").match("Sean", "Alfred") # => false == Author Mark Orr, rating officer for the Irish Chess Union (ICU[http://icu.ie]).