README.md in compare-xml-0.6 vs README.md in compare-xml-0.61

- old
+ new

@@ -15,21 +15,19 @@ ## Installation Add this line to your application's Gemfile: -```ruby -gem 'compare-xml' -``` + gem 'compare-xml' And then execute: - $ bundle + bundle Or install it yourself as: - $ gem install compare-xml + gem install compare-xml ## Usage @@ -63,283 +61,250 @@ ## Options at a Glance CompareXML has a variety of options that can be invoked as an optional argument, e.g.: ```ruby -CompareXML.equivalent?(doc1, doc2, {ignore_comments: false, verbose: true, ...}) +CompareXML.equivalent?(doc1, doc2, {collapse_whitespace: false, verbose: true, ...}) ``` -- `collapse_whitespace: {true|false}` default: **`true`** [&rarr; read more &larr;](#collapse_whitespace) - - when `true`, trims and collapses whitespace +- `collapse_whitespace: {true|false}` default: **`true`**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[⇨ show examples ⇦](#collapse_whitespace) + - when `true`, trims and collapses whitespace -- `ignore_attr_order: {true|false}` default: **`true`** [&rarr; read more &larr;](#ignore_attr_order) - - when `true`, ignores attribute order within tags +- `ignore_attr_order: {true|false}` default: **`true`**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[⇨ show examples ⇦](#ignore_attr_order) + - when `true`, ignores attribute order within tags -- `ignore_attr_content: [string1, string2, ...]` default: **`[]`** [&rarr; read more &larr;](#ignore_attr_content) - - when provided, ignores all attributes that contain substrings `string`, `string2`, etc. +- `ignore_attr_content: [string1, string2, ...]` default: **`[]`**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[⇨ show examples ⇦](#ignore_attr_content) + - when provided, ignores all attributes that contain substrings `string`, `string2`, etc. -- `ignore_attrs: [css_selector1, css_selector1, ...]` default: **`[]`** [&rarr; read more &larr;](#ignore_attrs) - - when provided, ignores specific *attributes* using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp) +- `ignore_attrs: [css_selector1, css_selector1, ...]` default: **`[]`**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[⇨ show examples ⇦](#ignore_attrs) + - when provided, ignores specific *attributes* using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp) -- `ignore_comments: {true|false}` default: **`true`** [...](#ignore_comments) - - when `true`, ignores comments, such as `<!-- comment -->` +- `ignore_comments: {true|false}` default: **`true`**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[⇨ show examples ⇦](#ignore_comments) + - when `true`, ignores comments, such as `<!-- comment -->` -- `ignore_nodes: [css_selector1, css_selector1, ...]` default: **`[]`** [&rarr; read more &larr;](#ignore_nodes) - - when provided, ignores specific *nodes* using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp) +- `ignore_nodes: [css_selector1, css_selector1, ...]` default: **`[]`** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[⇨ show examples ⇦](#ignore_nodes) + - when provided, ignores specific *nodes* using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp) -- `ignore_text_nodes: {true|false}` default: **`false`** [&rarr; read more &larr;](#ignore_text_nodes) - - when `true`, ignores all text content within a document +- `ignore_text_nodes: {true|false}` default: **`false`**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[⇨ show examples ⇦](#ignore_text_nodes) + - when `true`, ignores all text content within a document -- `verbose: {true|false}` default: **`false`** [&rarr; read more &larr;](#verbose) - - when `true`, instead of a boolean, `CompareXML.equivalent?` returns an array of discrepancies. +- `verbose: {true|false}` default: **`false`**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[⇨ show examples ⇦](#verbose) + - when `true`, instead of a boolean, `CompareXML.equivalent?` returns an array of discrepancies. ## Options in Depth - <a id="collapse_whitespace"></a>`collapse_whitespace: {true|false}` default: **`true`** When `true`, all text content within the document is trimmed (i.e. space removed from left and right) and whitespace is collapsed (i.e. tabs, new lines, multiple whitespace characters are replaced by a single whitespace). - **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {collapse_whitespace: true})` + **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {collapse_whitespace: true})` **Example:** When `true` the following HTML strings are considered equal: - <a href="/admin"> SOME TEXT CONTENT </a> - <a href="/index"> SOME TEXT CONTENT </a> + <a href="/admin"> SOME TEXT CONTENT </a> + <a href="/index"> SOME TEXT CONTENT </a> - **Example:** When `true` the following HTML strings are considered equal: + **Example:** When `true` the following HTML strings are considered equal: - <html> - <title> - This is my title - </title> - </html> + <html> + <title> + This is my title + </title> + </html> - <html><title>This is my title</title></html> + <html><title>This is my title</title></html> ---------- - <a id="ignore_attr_order"></a>`ignore_attr_order: {true|false}` default: **`true`** When `true`, all attributes are sorted before comparison and only attributes of the same type are compared. - **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attr_order: true})` + **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attr_order: true})` **Example:** When `true` the following HTML strings are considered equal: - <a href="/admin" class="button" target="_blank">Link</a> - <a class="button" target="_blank" href="/admin">Link</a> + <a href="/admin" class="button" target="_blank">Link</a> + <a class="button" target="_blank" href="/admin">Link</a> - **Example:** When `false` the above HTML strings are compared as follows: + **Example:** When `false` the above HTML strings are compared as follows: - href="admin" != class="button + href="admin" != class="button - The comparison of the `<a>` element will stop at this point, since a discrepancy is found. + The comparison of the `<a>` element will stop at this point, since a discrepancy is found. - **Example:** When `true` the following HTML strings are compared as follows: + **Example:** When `true` the following HTML strings are compared as follows: - <a href="/admin" class="button" target="_blank">Link</a> - <a class="button" target="_blank" href="/admin" rel="nofollow">Link</a> + <a href="/admin" class="button" target="_blank">Link</a> + <a class="button" target="_blank" href="/admin" rel="nofollow">Link</a> - class="button" == class="button" - href="/admin" == href="/admin" - =! rel="nofollow" - target="_blank" == target="_blank" + class="button" == class="button" + href="/admin" == href="/admin" + =! rel="nofollow" + target="_blank" == target="_blank" ---------- - <a id="ignore_attr_content"></a>`ignore_attr_content: [string1, string2, ...]` default: **`[]`** When provided, ignores all **attributes** that contain any of the given substrings. **Note:** types of attributes still have to match (i.e. `<p>` = `<p>`, `<div>` = `<div>`, etc). - **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attr_content: ['button']})` + **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attr_content: ['button']})` **Example:** With `ignore_attr_content: ['button']` the following HTML strings are considered equal: - <a href="/admin" id="button_1" class="blue button">Link</a> - <a href="/admin" id="button_2" class="info button">Link</a> + <a href="/admin" id="button_1" class="blue button">Link</a> + <a href="/admin" id="button_2" class="info button">Link</a> **Example:** With `ignore_attr_content: ['menu']` the following HTML strings are considered equal: - <a class="menu left" data-scope="abrth$menu" role="side-menu">Link</a> - <a class="main menu" data-scope="ergeh$menu" role="main-menu">Link</a> + <a class="menu left" data-scope="abrth$menu" role="side-menu">Link</a> + <a class="main menu" data-scope="ergeh$menu" role="main-menu">Link</a> ---------- - <a id="ignore_attrs"></a>`ignore_attrs: [css_selector1, css_selector1, ...]` default: **`[]`** When provided, ignores all **attributes** that satisfy a particular rule using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp). - **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attrs: ['a[rel="nofollow"]', 'input[type="hidden"']})` + **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attrs: ['a[rel="nofollow"]', 'input[type="hidden"']})` **Example:** With `ignore_attrs: ['a[rel="nofollow"]', 'a[target]']` the following HTML strings are considered equal: - <a href="/admin" class="button" target="_blank">Link</a> - <a href="/admin" class="button" target="_self" rel="nofollow">Link</a> + <a href="/admin" class="button" target="_blank">Link</a> + <a href="/admin" class="button" target="_self" rel="nofollow">Link</a> - **Example:** With `ignore_attrs: ['a[href^="http"]', 'a[class*="button"]']` the following HTML strings are considered equal: + **Example:** With `ignore_attrs: ['a[href^="http"]', 'a[class*="button"]']` the following HTML strings are considered equal: - <a href="http://google.ca" class="primary button">Link</a> - <a href="https://google.com" class="primary button rounded">Link</a> + <a href="http://google.ca" class="primary button">Link</a> + <a href="https://google.com" class="primary button rounded">Link</a> ---------- - <a id="ignore_comments"></a>`ignore_comments: {true|false}` default: **`true`** When `true`, ignores comments, such as `<!-- This is a comment -->`. - **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_comments: true})` + **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_comments: true})` **Example:** When `true` the following HTML strings are considered equal: - <!-- This is a comment --> - <!-- This is another comment --> + <!-- This is a comment --> + <!-- This is another comment --> - **Example:** When `true` the following HTML strings are considered equal: + **Example:** When `true` the following HTML strings are considered equal: - <a href="/admin"><!-- This is a comment -->Link</a> - <a href="/admin">Link</a> + <a href="/admin"><!-- This is a comment -->Link</a> + <a href="/admin">Link</a> ---------- - <a id="ignore_nodes"></a>`ignore_nodes: [css_selector1, css_selector1, ...]` default: **`[]`** When provided, ignores all **nodes** that satisfy a particular rule using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp). - **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_nodes: ['script', 'object']})` + **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_nodes: ['script', 'object']})` **Example:** With `ignore_nodes: ['a[rel="nofollow"]', 'a[target]']` the following HTML strings are considered equal: - <a href="/admin" class="icon" target="_blank">Link 1</a> - <a href="/index" class="button" target="_self" rel="nofollow">Link 2</a> + <a href="/admin" class="icon" target="_blank">Link 1</a> + <a href="/index" class="button" target="_self" rel="nofollow">Link 2</a> - **Example:** With `ignore_nodes: ['b', 'i']` the following HTML strings are considered equal: + **Example:** With `ignore_nodes: ['b', 'i']` the following HTML strings are considered equal: - <a href="/admin"><i class"icon bulb"></i><b>Warning:</b> Link</a> - <a href="/admin"><i class"icon info"></i><b>Message:</b> Link</a> + <a href="/admin"><i class"icon bulb"></i><b>Warning:</b> Link</a> + <a href="/admin"><i class"icon info"></i><b>Message:</b> Link</a> ---------- - <a id="ignore_text_nodes"></a>`ignore_text_nodes: {true|false}` default: **`false`** When `true`, ignores all text content. Text content is anything that is included between an opening and a closing tag, e.g. `<tag>THIS IS TEXT CONTENT</tag>`. - **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_text_nodes: true})` + **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_text_nodes: true})` **Example:** When `true` the following HTML strings are considered equal: - <a href="/admin">SOME TEXT CONTENT</a> - <a href="/admin">DIFFERENT TEXT CONTENT</a> + <a href="/admin">SOME TEXT CONTENT</a> + <a href="/admin">DIFFERENT TEXT CONTENT</a> - **Example:** When `true` the following HTML strings are considered equal: + **Example:** When `true` the following HTML strings are considered equal: - <i class="icon></i> <b>Warning:</b> - <i class="icon> </i> <b>Message:</b> + <i class="icon></i> <b>Warning:</b> + <i class="icon> </i> <b>Message:</b> ---------- - <a id="verbose"></a>`verbose: {true|false}` default: **`false`** When `true`, instead of returning a boolean value `CompareXML.equivalent?` returns an array of all errors encountered when performing a comparison. - > **Warning:** When `true`, the comparison takes longer! Not only because more processing is required to produce meaningful differences, but also because in this mode, comparison does **NOT** stop when a first difference is encountered, because the goal is to capture as many differences as possible. + > **Warning:** When `true`, the comparison takes longer! Not only because more processing is required to produce meaningful differences, but also because in this mode, comparison does **NOT** stop when a first difference is encountered, because the goal is to capture as many differences as possible. - **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {verbose: true})` + **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {verbose: true})` **Example:** When `true` given the following HTML strings: - ![diffing](https://dl.dropboxusercontent.com/u/1001101/input.png) + ![diffing](https://github.com/vkononov/compare-xml/raw/master/img/diffing.png) - `CompareXML.equivalent?(doc1, doc2, {verbose: true})` will produce an array shown below. + `CompareXML.equivalent?(doc1, doc2, {verbose: true})` will produce an array shown below. - ```ruby - [ - { - node1: '<title>TITLE</title>', - node2: '<title>ANOTHER TITLE</title>', - diff1: 'TITLE', - diff2: 'ANOTHER TITLE', - }, - { - node1: '<h1>SOME HEADING</h1>', - node2: '<h1 id="main">SOME HEADING</h1>', - diff1: nil, - diff2: 'id="main"', - }, - { - node1: '<a href="/admin" rel="icon">Link</a>', - node2: '<a rel="button" href="/admin">Link</a>', - diff1: '"rel="icon"', - diff2: '"rel="button"', - }, - { - node1: '<cite>Author Name</cite>', - node2: nil, - diff1: '<cite>Author Name</cite>', - diff2: nil, - }, - { - node1: '<p class="footer">FOOTER</p>', - node1: '<div class="footer">FOOTER</div>', - diff1: 'p', - diff2: 'div', - } - ] - ``` - - The structure of each hash inside the array is: - - node1: [Nokogiri::XML::Node] left node that contains the difference - node2: [Nokogiri::XML::Node] right node that contains the difference - diff1: [Nokogiri::XML::Node|String] left difference - diff1: [Nokogiri::XML::Node|String] right difference - - **Node location** of `html:body:p(4)` means that the element in question is `<p>`, its hierarchical ancestors are `html > body`, and it is the **4th** `<p>` tag. That is, it could be found in - - <html><body><p>one</p...p>two</p...p>three</p...p>TARGET</p></body></html> - - > **Note:** `p(4)` means that it is the fourth tag of type `<p>`, but there could be many other tags of other types between `p(3)` and `p(4)`. - - **Node content** displays the discrepancy in content (which could be the name of the tag, attributes, text content, comments, etc) - - **Error code** is a numeric value that indicates the type of a discrepancy. CompareXML implements the following error codes - - ```ruby - EQUIVALENT = 1 # nodes are equal (for internal use only) - MISSING_ATTRIBUTE = 2 # attribute is missing its counterpart - MISSING_NODE = 3 # node is missing its counterpart - UNEQUAL_ATTRIBUTES = 4 # attributes are not equal - UNEQUAL_COMMENTS = 5 # comment contents are not equal - UNEQUAL_DOCUMENTS = 6 # document types are not equal - UNEQUAL_ELEMENTS = 7 # nodes have the same type but are not equal - UNEQUAL_NODES_TYPES = 8 # nodes do not have the same type - UNEQUAL_TEXT_CONTENTS = 9 # text contents are not equal - ``` - - Here is an example of how these could be used: - ```ruby - case error_code - when CompareXML::UNEQUAL_ATTRIBUTES - '!=' - when CompareXML::MISSING_ATTRIBUTE - '?' - end + [ + { + node1: '<title>TITLE</title>', + node2: '<title>ANOTHER TITLE</title>', + diff1: 'TITLE', + diff2: 'ANOTHER TITLE', + }, + { + node1: '<h1>SOME HEADING</h1>', + node2: '<h1 id="main">SOME HEADING</h1>', + diff1: nil, + diff2: 'id="main"', + }, + { + node1: '<a href="/admin" rel="icon">Link</a>', + node2: '<a rel="button" href="/admin">Link</a>', + diff1: '"rel="icon"', + diff2: '"rel="button"', + }, + { + node1: '<cite>Author Name</cite>', + node2: nil, + diff1: '<cite>Author Name</cite>', + diff2: nil, + }, + { + node1: '<p class="footer">FOOTER</p>', + node2: '<div class="footer">FOOTER</div>', + diff1: 'p', + diff2: 'div', + } + ] ``` + + The structure of each hash inside the array is: + + node1: [Nokogiri::XML::Node] left node that contains the difference + node2: [Nokogiri::XML::Node] right node that contains the difference + diff1: [Nokogiri::XML::Node|String] left difference + diff2: [Nokogiri::XML::Node|String] right difference ## Contributing \ No newline at end of file