README.rdoc in icu_name-1.0.16 vs README.rdoc in icu_name-1.1.0
- old
+ new
@@ -1,23 +1,23 @@
= ICU Tournament
-Canonicalises and matches person names with Western European characters and first and last names.
+Canonicalises and matches person names with Western European characters.
+Note: version 1.1.0 dropped support for characters beyond codepoint 255 and became independent of activesupport and i18n.
+
== Installation
For ruby 1.9.2, 1.9.3, 2.0.0.
gem install icu_name
-It depends on _active_support_ and _i18n_.
-
== Names
This class exists for two main purposes:
-* to normalise to a common format the different ways names are typed in practice
-* to be able to match two names even if they are not exactly the same
+* to normalise to a common format the different ways Irish person names are typed in practice
+* to be able to match two names even if they are not exactly the same in their original form
To create a name object, supply both the first and second names separately to the constructor.
robert = ICU::Name.new(' robert j ', ' FISHER ')
@@ -34,11 +34,10 @@
To avoid ambiguity when either the first or second names consist of multiple words, it is better to
supply the two separately. If the full name is supplied alone to the constructor, without any indication
of where the first names end, then the last distinct name is assumed to be the last name.
bobby = ICU::Name.new(' bobby fischer ')
-
bobby.first # => 'Bobby'
bobby.last # => 'Fischer'
In this case, since the names were not supplied separately, the <tt>original</tt> text will not contain a comma:
@@ -75,17 +74,15 @@
ICU::Name.new('dave', 'mcmanus').last # => "McManus, Dave"
== Characters and Encoding
The class can only cope with Latin characters, including those with diacritics (accents).
-Along with hyphens and single quotes (which represent apostophes) letters in ISO-8859-1
-(e.g. "a", "è", "Ö") and letters outside ISO-8859-1 which are decomposable into a US-ASCII
-character plus one or more diacritics (e.g. "ł" or "Ś") are preserved, while everything
-else is removed.
+Hyphens, single quotes (which represent apostophes) and letters in the ISO-8859-1 range
+(e.g. "a", "è", "Ö") are preserved, while everything else is removed (unsupported).
ICU::Name.new('éric', 'PRIÉ').name # => "Éric Prié"
- ICU::Name.new('BARTŁOMIEJ', 'śliwa').name # => "Bartłomiej Śliwa"
+ ICU::Name.new('BARTŁOMIEJ', 'śliwa').name # => "Bartomiej Liwa"
ICU::Name.new('Սմբատ', 'Լպուտյան').name # => ""
The various accessors (<tt>first</tt>, <tt>last</tt>, <tt>name</tt>, <tt>rname</tt>, <tt>to_s</tt>, <tt>original</tt>) always return
strings encoded in UTF-8, no matter what the input encoding.
@@ -99,25 +96,15 @@
<tt>:chars</tt> option, which is available in all accessors. For example:
eric.rname(:chars => "US-ASCII") # => "Prie, Eric"
eric.original(:chars => "US-ASCII") # => "PRIE, eric"
-Also possible is the preservation of ISO-8859-1 characters, but the transliteration of
-all other accented characters:
-
- joe = Name.new('Józef', 'Żabiński')
- joe.rname # => "Żabiński, Józef"
- joe.rname(:chars => "ISO-8859-1") # => "Zabinski, Józef"
- joe.rname(:chars => "US-ASCII") # => "Zabinski, Jozef"
-
Note that the character encoding of the strings returned is still UTF-8 in all cases.
The same option also relaxes the need for accented characters to match exactly:
eric.match('Eric', 'Prie') # => false
eric.match('Eric', 'Prie', :chars => "US-ASCII") # => true
- joe.match('Józef', 'Zabinski') # => false
- joe.match('Józef', 'Zabinski', :chars => "ISO-8859-1") # => true
== Customization of Alternative Names
We saw above how _Bobby_ and _Robert_ were able to match because, by default, the
matcher is aware of some common English nicknames. These name alternatives can be
@@ -151,11 +138,11 @@
To change alternative name behaviour, you can replace the default alternatives
with a customized set perhaps stored in a database or a YAML file, as illustrated below:
data = YAML.load(File open "my_last_name_alternatives.yaml")
- Name.load_alternatives(:first, data)
+ Name.load_alternatives(:last, data)
data = YAML.load(File open "my_first_name_alternatives.yaml")
Name.load_alternatives(:first, data)
An example of one way in which you might want to customize the alternatives is to
cater for common spelling mistakes such as _Steven_ and _Stephen_. These two names
@@ -171,11 +158,11 @@
so that now:
Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => true
This kind of rule risks producing false positives - you must judge
-carefully whether that risk is outweighed by the benefits of being
-able to overcome spelling mistakes in the context of your application.
+whether that risk is outweighed by the benefits of being able to overcome
+spelling mistakes in the context of your application.
Another use is to cater for English and Irish versions of the same name.
For example, for last names:
[Murphy, Murchadha]