# CompareXML [![Gem Version](https://badge.fury.io/rb/compare-xml.svg)](https://rubygems.org/gems/compare-xml) CompareXML is a fast, lightweight and feature-rich tool that will solve your XML/HTML comparison or diffing needs. its purpose is to compare two instances of `Nokogiri::XML::Node` or `Nokogiri::XML::NodeSet` for equality or equivalency. **Features** - Fast, light-weight and highly customizable - Compares XML/HTML documents and document fragments - Can produce both detailed diffing discrepancies or execute silently - Has the ability to exclude specific nodes or attributes from all comparisons ## Installation Add this line to your application's Gemfile: ```ruby gem 'compare-xml' ``` And then execute: $ bundle Or install it yourself as: $ gem install compare-xml ## Usage Using CompareXML is as simple as ```ruby CompareXML.equivalent?(doc1, doc2) ``` where `doc1` and `doc2` are instances of `Nokogiri::XML::Node` or `Nokogiri::XML::NodeSet`. **Example** Suppose you have two files `1.html` and `2.html` that you would like to compare. You could do it as follows: ```ruby doc1 = Nokogiri::HTML(open('1.html')) doc2 = Nokogiri::HTML(open('2.html')) puts CompareXML.equivalent?(doc1, doc2) ``` The above code will print `true` or `false` depending on the result of the comparison. > If you are using CompareXML in a script, then you need to require it manually with: ```ruby require 'compare-xml' ``` ## Options at a Glance CompareXML has a variety of options that can be invoked as an optional argument, e.g.: ```ruby CompareXML.equivalent?(doc1, doc2, {ignore_comments: false, verbose: true, ...}) ``` - `collapse_whitespace: {true|false}` default: **`true`** [→ read more ←](#collapse_whitespace) - when `true`, trims and collapses whitespace - `ignore_attr_order: {true|false}` default: **`true`** [→ read more ←](#ignore_attr_order) - when `true`, ignores attribute order within tags - `ignore_attr_content: [string1, string2, ...]` default: **`[]`** [→ read more ←](#ignore_attr_content) - when provided, ignores all attributes that contain substrings `string`, `string2`, etc. - `ignore_attrs: [css_selector1, css_selector1, ...]` default: **`[]`** [→ read more ←](#ignore_attrs) - when provided, ignores specific *attributes* using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp) - `ignore_comments: {true|false}` default: **`true`** [...](#ignore_comments) - when `true`, ignores comments, such as `` - `ignore_nodes: [css_selector1, css_selector1, ...]` default: **`[]`** [→ read more ←](#ignore_nodes) - when provided, ignores specific *nodes* using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp) - `ignore_text_nodes: {true|false}` default: **`false`** [→ read more ←](#ignore_text_nodes) - when `true`, ignores all text content within a document - `verbose: {true|false}` default: **`false`** [→ read more ←](#verbose) - when `true`, instead of a boolean, `CompareXML.equivalent?` returns an array of discrepancies. ## Options in Depth - `collapse_whitespace: {true|false}` default: **`true`** When `true`, all text content within the document is trimmed (i.e. space removed from left and right) and whitespace is collapsed (i.e. tabs, new lines, multiple whitespace characters are replaced by a single whitespace). **Usage Example:** `CompareXML.equivalent?(doc1, doc2, {collapse_whitespace: true})` **Example:** When `true` the following HTML strings are considered equal: SOME TEXT CONTENT SOME TEXT CONTENT **Example:** When `true` the following HTML strings are considered equal:
` = `
`, `
`, its hierarchical ancestors are `html > body`, and it is the **4th** `
` tag. That is, it could be found in
onetwothreeTARGET
> **Note:** `p(4)` means that it is the fourth tag of type ``, but there could be many other tags of other types between `p(3)` and `p(4)`. **Node content** displays the discrepancy in content (which could be the name of the tag, attributes, text content, comments, etc) **Error code** is a numeric value that indicates the type of a discrepancy. CompareXML implements the following error codes ```ruby EQUIVALENT = 1 # nodes are equal (for internal use only) MISSING_ATTRIBUTE = 2 # attribute is missing its counterpart MISSING_NODE = 3 # node is missing its counterpart UNEQUAL_ATTRIBUTES = 4 # attributes are not equal UNEQUAL_COMMENTS = 5 # comment contents are not equal UNEQUAL_DOCUMENTS = 6 # document types are not equal UNEQUAL_ELEMENTS = 7 # nodes have the same type but are not equal UNEQUAL_NODES_TYPES = 8 # nodes do not have the same type UNEQUAL_TEXT_CONTENTS = 9 # text contents are not equal ``` Here is an example of how these could be used: ```ruby case error_code when CompareXML::UNEQUAL_ATTRIBUTES '!=' when CompareXML::MISSING_ATTRIBUTE '?' end ``` ## Contributing 1. Fork it 2. Create your feature branch (`git checkout -b my-new-feature`) 3. Commit your changes (`git commit -am 'Add some feature'`) 4. Push to the branch (`git push origin my-new-feature`) 5. Create new Pull Request ## Credits This gem was inspired by [Michael B. Klein](https://github.com/mbklein)'s gem [`equivalent-xml`](https://github.com/mbklein/equivalent-xml) - another excellent tool for XML comparison. ## License The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).