Sterile
=======
Sterilize your strings! Transliterate, generate slugs, smart format, strip tags, encode/decode entities and more.
Usage
-----
Sterile provides functionality both as class methods on the Sterile module and as extensions to the String class. Each function also has a "bang" version to replace the string in place.
Sterile.transliterate("šţɽĩɳģ") # => "string"
"šţɽĩɳģ".transliterate # => "string"
str = "šţɽĩɳģ"
str.transliterate!
str == "string" # => true
Transliterate
-------------
Transliterate Unicode [and accented ASCII] characters to their plain-text ASCII equivalents. This is based on data from the stringex gem (https://github.com/rsl/stringex) which is in turn a port of Perl's Unidecode and ostensibly provides superior results to iconv. The optical conversion data is based on work by Eric Boehs at https://github.com/ericboehs/to_slug
"šţɽĩɳģ".transliterate # => "string"
Passing an option of :optical => true will prefer optical mapping instead of more pedantic matches. The optical dataset is incomplete, but will fall back to the pedantic match if missing.
Smart Format
------------
Format text with proper "curly" quotes, m-dashes, copyright, trademark, etc.
q{"He said, 'Away with you, Drake!'"}.smart_format
# => “He said, ‘Away with you, Drake!’”
You can also use smart formatting with HTML:
%q{"He said, 'Away with you, Drake!'"}.smart_format_tags
# => "“He said, ‘Away with you, Drake!’“"
Entities
--------
Turn Unicode characters into their HTML equivilents. If a valid HTML entity is not possible, it will create a numeric entity.
q{“Economy Hits Bottom,” ran the headline}.encode_entities # => "“Economy Hits Bottom,” ran the headline"
Turn HTML entities into unicode characters:
"“Economy Hits Bottom,” ran the headline".decode_entities # => "“Economy Hits Bottom,” ran the headline"
Titlecase
---------
Format text appropriately for titles. This method is much smarter than ActiveSupport's titlecase. The algorithm is based on work done by John Gruber et al (http://daringfireball.net/2008/08/title_case_update). It gets closer to the AP standard for title capitalization, including proper support for small words and handles a variety of edge cases.
"Q&A with Steve Jobs: 'That's what happens in technology'".titlecase
# => "Q&A With Steve Jobs: 'That's What Happens in Technology'"
"Small word at end is nothing to be afraid of".titleize # alias for titlecase
# => "Small Word at End Is Nothing to Be Afraid Of"
Strip Tags
----------
Remove HTML/XML tags from text. Also strips out comments, PHP and ERB style tags.
'Visit our website!'.strip_tags # => "Visit our website!"
Miscellaneous
-------------
Transliterate to ASCII, downcase and format for URL permalink/slug by stripping out all non-alphanumeric characters and replacing spaces with a delimiter (defaults to '-', configured by :delimiter option).
"Hello World!".sluggerize # => "hello-world"
"Hello World!".to_slug # => "hello-world"
Transliterate to ASCII and strip out any HTML/XML tags.
"nåsty".sterilize # => "nasty"
Trim whitespace from start and end of string and remove any redundant whitespace in between.
" Hello world! ".transliterate # => "Hello world!"
Iterate over all text in between HTML/XML tags and yield text to a block, replace by what the block returns.
"Only uppercase the text in this".gsub_tags { |t| t.upcase }
Iterate over all text in between HTML/XML tags and yield to a block.
"Only output the text in this".scan_tags { |t| puts t }
Warning / To Do
---------------
All the *_tags functions are based on a regular expressions. Yes, I know this is [wrong](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) and I plan to using a proper parser for it in the future.
Installation
------------
Install with RubyGems:
gem install sterile
License
-------
Copyright (c) 2011 Patrick Hogan, released under the MIT License.
http://www.opensource.org/licenses/mit-license