README.md in sanitize-6.0.0 vs README.md in sanitize-6.0.1

- old
+ new

@@ -9,31 +9,30 @@ elements, certain attributes within those elements, and even certain URL protocols within attributes that contain URLs. You can also allow specific CSS properties, @ rules, and URL protocols in elements or attributes containing CSS. Any HTML or CSS that you don't explicitly allow will be removed. -Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML -exactly the same way modern browsers do, and [Crass][crass], which parses CSS -exactly the same way modern browsers do. As long as your allowlist config only -allows safe markup and CSS, even the most malformed or malicious input will be -transformed into safe output. +Sanitize is based on the [Nokogiri HTML5 parser][nokogiri], which parses HTML +the same way modern browsers do, and [Crass][crass], which parses CSS the same +way modern browsers do. As long as your allowlist config only allows safe markup +and CSS, even the most malformed or malicious input will be transformed into +safe output. [![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize) [![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests) [crass]:https://github.com/rgrove/crass -[nokogumbo]:https://github.com/rubys/nokogumbo +[nokogiri]:https://github.com/sparklemotion/nokogiri Links ----- * [Home](https://github.com/rgrove/sanitize/) -* [API Docs](http://rubydoc.info/github/rgrove/sanitize/master) +* [API Docs](https://rubydoc.info/github/rgrove/sanitize/Sanitize) * [Issues](https://github.com/rgrove/sanitize/issues) -* [Release History](https://github.com/rgrove/sanitize/blob/master/HISTORY.md#sanitize-history) -* [Online Demo](https://sanitize.herokuapp.com/) -* [Biased comparison of Ruby HTML sanitization libraries](https://github.com/rgrove/sanitize/blob/master/COMPARISON.md) +* [Release History](https://github.com/rgrove/sanitize/releases) +* [Online Demo](https://sanitize-web.fly.dev/) Installation ------------- ``` @@ -70,14 +69,15 @@ * CSS stylesheets inside HTML `<style>` elements * CSS properties inside HTML `style` attributes * Standalone CSS stylesheets * Standalone CSS properties -However, please note that Sanitize _cannot_ fully sanitize the contents of -`<math>` or `<svg>` elements, since these elements don't follow the same parsing -rules as the rest of HTML. If this is something you need, you may want to look -for another solution. +> **Warning** +> +> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules. +> +> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you may create a security vulnerability in your application. ### HTML Fragments A fragment is a snippet of HTML that doesn't contain a root-level `<html>` element. @@ -418,14 +418,20 @@ a abbr b blockquote br cite code dd dfn dl dt em i kbd li mark ol p pre q s samp small strike strong sub sup time u ul var ] ``` -**Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` -elements, since these elements don't follow the same parsing rules as the rest -of HTML. If you add `math` or `svg` to the allowlist, you must assume that any -content inside them will be allowed, even if that content would otherwise be -removed by Sanitize. +> **Warning** +> +> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules. +> +> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application. + +> **Note** +> +> Sanitize always removes `<noscript>` elements and their contents, even if `noscript` is in the allowlist. +> +> This is because a `<noscript>` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser. #### :parser_options (Hash) [Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.