README.md in sanitize-6.0.0 vs README.md in sanitize-6.0.1
- old
+ new
@@ -9,31 +9,30 @@
elements, certain attributes within those elements, and even certain URL
protocols within attributes that contain URLs. You can also allow specific CSS
properties, @ rules, and URL protocols in elements or attributes containing CSS.
Any HTML or CSS that you don't explicitly allow will be removed.
-Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML
-exactly the same way modern browsers do, and [Crass][crass], which parses CSS
-exactly the same way modern browsers do. As long as your allowlist config only
-allows safe markup and CSS, even the most malformed or malicious input will be
-transformed into safe output.
+Sanitize is based on the [Nokogiri HTML5 parser][nokogiri], which parses HTML
+the same way modern browsers do, and [Crass][crass], which parses CSS the same
+way modern browsers do. As long as your allowlist config only allows safe markup
+and CSS, even the most malformed or malicious input will be transformed into
+safe output.
[![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
[![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
[crass]:https://github.com/rgrove/crass
-[nokogumbo]:https://github.com/rubys/nokogumbo
+[nokogiri]:https://github.com/sparklemotion/nokogiri
Links
-----
* [Home](https://github.com/rgrove/sanitize/)
-* [API Docs](http://rubydoc.info/github/rgrove/sanitize/master)
+* [API Docs](https://rubydoc.info/github/rgrove/sanitize/Sanitize)
* [Issues](https://github.com/rgrove/sanitize/issues)
-* [Release History](https://github.com/rgrove/sanitize/blob/master/HISTORY.md#sanitize-history)
-* [Online Demo](https://sanitize.herokuapp.com/)
-* [Biased comparison of Ruby HTML sanitization libraries](https://github.com/rgrove/sanitize/blob/master/COMPARISON.md)
+* [Release History](https://github.com/rgrove/sanitize/releases)
+* [Online Demo](https://sanitize-web.fly.dev/)
Installation
-------------
```
@@ -70,14 +69,15 @@
* CSS stylesheets inside HTML `<style>` elements
* CSS properties inside HTML `style` attributes
* Standalone CSS stylesheets
* Standalone CSS properties
-However, please note that Sanitize _cannot_ fully sanitize the contents of
-`<math>` or `<svg>` elements, since these elements don't follow the same parsing
-rules as the rest of HTML. If this is something you need, you may want to look
-for another solution.
+> **Warning**
+>
+> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
+>
+> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you may create a security vulnerability in your application.
### HTML Fragments
A fragment is a snippet of HTML that doesn't contain a root-level `<html>`
element.
@@ -418,14 +418,20 @@
a abbr b blockquote br cite code dd dfn dl dt em i kbd li mark ol p pre
q s samp small strike strong sub sup time u ul var
]
```
-**Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
-elements, since these elements don't follow the same parsing rules as the rest
-of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
-content inside them will be allowed, even if that content would otherwise be
-removed by Sanitize.
+> **Warning**
+>
+> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
+>
+> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application.
+
+> **Note**
+>
+> Sanitize always removes `<noscript>` elements and their contents, even if `noscript` is in the allowlist.
+>
+> This is because a `<noscript>` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser.
#### :parser_options (Hash)
[Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.