# errata
Correct strings based on remote errata files.
# Example
Every errata has a table structure based on the [IETF RFC Editor's "How to Report Errata"](http://www.rfc-editor.org/how_to_report.html).
date |
name |
email |
type |
section |
action |
x |
y |
condition |
notes |
2011-03-22 |
Ian Hough |
ian@brighterplanet.com |
meta |
Intended use |
|
http://example.com/original-data-with-errors.xls |
|
|
A hypothetical document that uses non-ISO country names |
2011-03-22 |
Ian Hough |
ian@brighterplanet.com |
technical |
Country Name |
replace |
/ANTIGUA & BARBUDA/ |
ANTIGUA AND BARBUDA |
|
|
2011-03-22 |
Ian Hough |
ian@brighterplanet.com |
technical |
Country Name |
replace |
/BOLIVIA/ |
BOLIVIA, PLURINATIONAL STATE OF |
|
|
2011-03-22 |
Ian Hough |
ian@brighterplanet.com |
technical |
Country Name |
replace |
/BOSNIA & HERZEGOVINA/ |
BOSNIA AND HERZEGOVINA |
|
|
2011-03-22 |
Ian Hough |
ian@brighterplanet.com |
technical |
Country Name |
replace |
/BRITISH VIRGIN ISLANDS/ |
VIRGIN ISLANDS, BRITISH |
|
|
2011-03-22 |
Ian Hough |
ian@brighterplanet.com |
technical |
Country Name |
replace |
/COTE D'IVOIRE/ |
CÔTE D'IVOIRE |
|
|
2011-03-22 |
Ian Hough |
ian@brighterplanet.com |
technical |
Country Name |
replace |
/DEM\. PEOPLE'S REP\. OF KOREA/ |
KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF |
|
|
2011-03-22 |
Ian Hough |
ian@brighterplanet.com |
technical |
Country Name |
replace |
/DEM\. REP\. OF THE CONGO/ |
CONGO, THE DEMOCRATIC REPUBLIC OF THE |
|
|
2011-03-22 |
Ian Hough |
ian@brighterplanet.com |
technical |
Country Name |
replace |
/HONG KONG SAR/ |
HONG KONG |
|
|
2011-03-22 |
Ian Hough |
ian@brighterplanet.com |
technical |
Country Name |
replace |
/IRAN \(ISLAMIC REPUBLIC OF\)/ |
IRAN, ISLAMIC REPUBLIC OF |
|
|
Which would be saved as a CSV:
date,name,email,type,section,action,x,y,condition,notes
2011-03-22,Ian Hough,ian@brighterplanet.com,meta,Intended use,,http://example.com/original-data-with-errors.xls,,A hypothetical document that uses non-ISO country names
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/ANTIGUA & BARBUDA/,ANTIGUA AND BARBUDA,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BOLIVIA/,"BOLIVIA, PLURINATIONAL STATE OF",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BOSNIA & HERZEGOVINA/,BOSNIA AND HERZEGOVINA,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BRITISH VIRGIN ISLANDS/,"VIRGIN ISLANDS, BRITISH",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/COTE D'IVOIRE/,CÔTE D'IVOIRE,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/DEM\. PEOPLE'S REP\. OF KOREA/,"KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/DEM\. REP\. OF THE CONGO/,"CONGO, THE DEMOCRATIC REPUBLIC OF THE",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/HONG KONG SAR/,HONG KONG,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/IRAN \(ISLAMIC REPUBLIC OF\)/,"IRAN, ISLAMIC REPUBLIC OF",,
And then used
errata = Errata.new(:url => 'http://example.com/errata.csv')
original = RemoteTable.new(:url => 'http://example.com/original-data-with-errors.xls')
original.each do |row|
errata.correct! row # destructively correct each row
end
## UTF-8
Assumes all input strings are UTF-8. Otherwise there can be problems with Ruby 1.9 and Regexp::FIXEDENCODING. Specifically, ASCII-8BIT regexps might be applied to UTF-8 strings (or vice-versa), resulting in Encoding::CompatibilityError.
## Real-life usage
Used by [data_miner](http://github.com/seamusabshere/data_miner)
## Authors
* Seamus Abshere