README.markdown in mdalessio-dryopteris-0.1.0 vs README.markdown in mdalessio-dryopteris-0.1.1
- old
+ new
@@ -49,9 +49,22 @@
The returned string will contain exactly one (1) well-formed HTML document, with all broken HTML fixed and all harmful tags and attributes removed.
Coolness: <tt>dangerous\_html\_document</tt> can be a string OR an IO object (a file, or a socket, or ...). Which makes it particularly easy to sanitize large numbers of docs.
+### Whitewashing HTML
+
+Other times, you may want to allow a user to submit HTML, and remove all styling, attributes and invalid HTML tags. I like to call this "whitewashing", since it's putting a new layer of paint on top of the user's HTML input to make it look nice.
+
+One use case for this feature is to clean up HTML that was cut-and-pasted from Microsoft(tm) Word into a WYSIWYG editor/textarea. Microsoft's editor is famous for injecting all kinds of cruft into its HTML output. Who needs that? Certainly not me.
+
+ whitewashed_html = Dryopteris.whitewash(ugly_microsoft_html_snippet)
+
+Please note that whitewashing implicitly also sanitizes your HTML, as it uses the same HTML tag whitelist as <tt>sanitize()</tt>. It's implementation is:
+
+ 1. unless the tag is on the whitelist, remove it from the document
+ 2. if the tag has an XML namespace on it, remove it from the document
+ 2. remove all attributes from the node
Standing on the Shoulders of Giants
-----
Dryopteris uses [Nokogiri](http://nokogiri.rubyforge.org/) and [libxml2](http://xmlsoft.org/), so it's fast.