# CompareXML [![Gem Version](https://badge.fury.io/rb/compare-xml.svg)](https://rubygems.org/gems/compare-xml) CompareXML is a fast, lightweight and feature-rich tool that will solve your XML/HTML comparison or diffing needs. its purpose is to compare two instances of `Nokogiri::XML::Node` or `Nokogiri::XML::NodeSet` for equality or equivalency. **Features** - Fast, light-weight and highly customizable - Compares XML/HTML documents and document fragments - Can produce both detailed diffing discrepancies or execute silently - Has the ability to exclude specific nodes or attributes from all comparisons ## Installation Add this line to your application's Gemfile: gem 'compare-xml' And then execute: bundle Or install it yourself as: gem install compare-xml ## Usage Using CompareXML is as simple as ```ruby CompareXML.equivalent?(doc1, doc2) ``` where `doc1` and `doc2` are instances of `Nokogiri::XML::Node` or `Nokogiri::XML::NodeSet`. **Example** Suppose you have two files `1.html` and `2.html` that you would like to compare. You could do it as follows: ```ruby doc1 = Nokogiri::HTML(open('1.html')) doc2 = Nokogiri::HTML(open('2.html')) puts CompareXML.equivalent?(doc1, doc2) ``` The above code will print `true` or `false` depending on the result of the comparison. > If you are using CompareXML in a script, then you need to require it manually with: ```ruby require 'compare-xml' ``` ## Options at a Glance CompareXML has a variety of options that can be invoked as an optional argument, e.g.: ```ruby CompareXML.equivalent?(doc1, doc2, {collapse_whitespace: false, verbose: true, ...}) ``` - `collapse_whitespace: {true|false}` default: **`true`**     [show examples ⇨](#collapse_whitespace) - when `true`, trims and collapses whitespace - `ignore_attr_order: {true|false}` default: **`true`**     [show examples ⇨](#ignore_attr_order) - when `true`, ignores attribute order within tags - `ignore_attr_content: [string1, string2, ...]` default: **`[]`**     [show examples ⇨](#ignore_attr_content) - when provided, ignores all attributes that contain substrings `string`, `string2`, etc. - `ignore_attrs: [css_selector1, css_selector1, ...]` default: **`[]`**     [show examples ⇨](#ignore_attrs) - when provided, ignores specific *attributes* using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp) - `ignore_attrs_by_name: [string1, string2, ...]` default: **`[]`**     [show examples ⇨](#ignore_attrs_by_name) - when provided, ignores specific *attributes* using [String] - `ignore_comments: {true|false}` default: **`true`**     [show examples ⇨](#ignore_comments) - when `true`, ignores comments, such as `` - `ignore_nodes: [css_selector1, css_selector1, ...]` default: **`[]`**      [show examples ⇨](#ignore_nodes) - when provided, ignores specific *nodes* using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp) - `ignore_text_nodes: {true|false}` default: **`false`**     [show examples ⇨](#ignore_text_nodes) - when `true`, ignores all text content within a document - `verbose: {true|false}` default: **`false`**          [show examples ⇨](#verbose) - when `true`, instead of a boolean, `CompareXML.equivalent?` returns an array of discrepancies. - `ignore_children {true|false}` default **`false`**          [show examples ⇨](#ignore_children) - when `true`, the subnodes of a node in the xml are ignored - `force_children {true|false}` default **`false`**          [show examples ⇨](#force_children) - when `true`, the subnodes of a node are checked independently of the status of the parent node ## Options in Depth - `collapse_whitespace: {true|false}` default: **`true`** When `true`, all text content within the document is trimmed (i.e. space removed from left and right) and whitespace is collapsed (i.e. tabs, new lines, multiple whitespace characters are replaced by a single whitespace). **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {collapse_whitespace: true})` **Example:** When `true` the following HTML strings are considered equal: SOME TEXT CONTENT SOME TEXT CONTENT **Example:** When `true` the following HTML strings are considered equal: This is my title This is my title ---------- - `ignore_attr_order: {true|false}` default: **`true`** When `true`, all attributes are sorted before comparison and only attributes of the same type are compared. **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attr_order: true})` **Example:** When `true` the following HTML strings are considered equal: Link Link **Example:** When `false` the above HTML strings are compared as follows: href="admin" != class="button The comparison of the `` element will stop at this point, since a discrepancy is found. **Example:** When `true` the following HTML strings are compared as follows: Link Link class="button" == class="button" href="/admin" == href="/admin" =! rel="nofollow" target="_blank" == target="_blank" ---------- - `ignore_attr_content: [string1, string2, ...]` default: **`[]`** When provided, ignores all **attributes** that contain any of the given substrings. **Note:** types of attributes still have to match (i.e. `

` = `

`, `

` = `
`, etc). **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attr_content: ['button']})` **Example:** With `ignore_attr_content: ['button']` the following HTML strings are considered equal: Link Link **Example:** With `ignore_attr_content: ['menu']` the following HTML strings are considered equal: Link Link ---------- - `ignore_attrs: [css_selector1, css_selector1, ...]` default: **`[]`** When provided, ignores all **attributes** that satisfy a particular rule using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp). **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attrs: ['a[rel="nofollow"]', 'input[type="hidden"']})` **Example:** With `ignore_attrs: ['a[rel="nofollow"]', 'a[target]']` the following HTML strings are considered equal: Link Link **Example:** With `ignore_attrs: ['a[href^="http"]', 'a[class*="button"]']` the following HTML strings are considered equal: Link Link ---------- - `ignore_attrs_by_name: [string1, string2, ...]` default: **`false`** When provided, ignores all **attributes** which name is specified in the string array. **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attrs_by_name: ['target'])` **Example:** With `ignore_attrs_by_name: ['target', 'rel']` the following HTML strings are considered equal: Link Link ---------- - `ignore_comments: {true|false}` default: **`true`** When `true`, ignores comments, such as ``. **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_comments: true})` **Example:** When `true` the following HTML strings are considered equal: **Example:** When `true` the following HTML strings are considered equal: Link Link ---------- - `ignore_nodes: [css_selector1, css_selector1, ...]` default: **`[]`** When provided, ignores all **nodes** that satisfy a particular rule using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp). **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_nodes: ['script', 'object']})` **Example:** With `ignore_nodes: ['a[rel="nofollow"]', 'a[target]']` the following HTML strings are considered equal: Link 1 Link 2 **Example:** With `ignore_nodes: ['b', 'i']` the following HTML strings are considered equal: Warning: Link Message: Link ---------- - `ignore_text_nodes: {true|false}` default: **`false`** When `true`, ignores all text content. Text content is anything that is included between an opening and a closing tag, e.g. `THIS IS TEXT CONTENT`. **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_text_nodes: true})` **Example:** When `true` the following HTML strings are considered equal: SOME TEXT CONTENT DIFFERENT TEXT CONTENT **Example:** When `true` the following HTML strings are considered equal: Message: ---------- - `verbose: {true|false}` default: **`false`** When `true`, instead of returning a boolean value `CompareXML.equivalent?` returns an array of all errors encountered when performing a comparison. > **Warning:** When `true`, the comparison takes longer! Not only because more processing is required to produce meaningful differences, but also because in this mode, comparison does **NOT** stop when a first difference is encountered, because the goal is to capture as many differences as possible. **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {verbose: true})` **Example:** When `true` given the following HTML strings: ![diffing](https://github.com/vkononov/compare-xml/raw/master/img/diffing.png) `CompareXML.equivalent?(doc1, doc2, {verbose: true})` will produce an array shown below. ```ruby [ { node1: 'TITLE', node2: 'ANOTHER TITLE', diff1: 'TITLE', diff2: 'ANOTHER TITLE', }, { node1: '

SOME HEADING

', node2: '

SOME HEADING

', diff1: nil, diff2: 'id="main"', }, { node1: '', node2: 'Link', diff1: '"rel="icon"', diff2: '"rel="button"', }, { node1: 'Author Name', node2: nil, diff1: 'Author Name', diff2: nil, }, { node1: '', node2: '', diff1: 'p', diff2: 'div', } ] ``` The structure of each hash inside the array is: node1: [Nokogiri::XML::Node] left node that contains the difference node2: [Nokogiri::XML::Node] right node that contains the difference diff1: [Nokogiri::XML::Node|String] left difference diff2: [Nokogiri::XML::Node|String] right difference ---------- - `ignore_children: {true|false}` default: **`false`** When provided, ignores all **subnodes** of any node. **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_children: true})` **Example:** With `ignore_children: true` the following HTML strings are considered equal: Link 1 Link 2 ---------- - `force_children: {true|false}` default: **`false`** When provided, compares all **subnodes** of any node. **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_children: true})` ---------- ## Contributing 1. Fork it 2. Create your feature branch (`git checkout -b my-new-feature`) 3. Commit your changes (`git commit -am 'Add some feature'`) 4. Push to the branch (`git push origin my-new-feature`) 5. Create new Pull Request ## Credits This gem was inspired by [Michael B. Klein](https://github.com/mbklein)'s gem [`equivalent-xml`](https://github.com/mbklein/equivalent-xml) - another excellent tool for XML comparison. ## License The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).