README.markdown in mdalessio-dryopteris-0.1.1 vs README.markdown in mdalessio-dryopteris-0.1.2

- old
+ new

@@ -22,11 +22,11 @@ Yeah, it's that easy. In this example, <tt>safe\_html\_snippet</tt> will have all of its __broken markup fixed__ by libxml2, and it will also be completely __sanitized of harmful tags and attributes__. That's twice as clean! -More Usage +Sanitization Usage ----- You're still here? Ok, let me tell you a little something about the two different methods of sanitizing the Dryopteris offers. ### Fragments @@ -49,22 +49,29 @@ The returned string will contain exactly one (1) well-formed HTML document, with all broken HTML fixed and all harmful tags and attributes removed. Coolness: <tt>dangerous\_html\_document</tt> can be a string OR an IO object (a file, or a socket, or ...). Which makes it particularly easy to sanitize large numbers of docs. -### Whitewashing HTML +Whitewashing Usage +----- -Other times, you may want to allow a user to submit HTML, and remove all styling, attributes and invalid HTML tags. I like to call this "whitewashing", since it's putting a new layer of paint on top of the user's HTML input to make it look nice. +### Whitewashing Fragments +Other times, you may want to remove all styling, attributes and invalid HTML tags. I like to call this "whitewashing", since it's putting a new layer of paint on top of the HTML input to make it look nice. + One use case for this feature is to clean up HTML that was cut-and-pasted from Microsoft(tm) Word into a WYSIWYG editor/textarea. Microsoft's editor is famous for injecting all kinds of cruft into its HTML output. Who needs that? Certainly not me. whitewashed_html = Dryopteris.whitewash(ugly_microsoft_html_snippet) Please note that whitewashing implicitly also sanitizes your HTML, as it uses the same HTML tag whitelist as <tt>sanitize()</tt>. It's implementation is: 1. unless the tag is on the whitelist, remove it from the document 2. if the tag has an XML namespace on it, remove it from the document 2. remove all attributes from the node + +### Whitewashing Documents + +Also note the existence of <tt>whitewash\_document</tt>, which is analogous to <tt>sanitize\_document</tt>. Standing on the Shoulders of Giants ----- Dryopteris uses [Nokogiri](http://nokogiri.rubyforge.org/) and [libxml2](http://xmlsoft.org/), so it's fast.