ANOTHER HEADING
Extra content
# CompareXML
[![Gem Version](https://badge.fury.io/rb/compare-xml.svg)](https://rubygems.org/gems/compare-xml)
CompareXML is a fast, lightweight and feature-rich tool that will solve your XML/HTML comparison or diffing needs. its purpose is to compare two instances of `Nokogiri::XML::Node` or `Nokogiri::XML::NodeSet` for equality or equivalency.
**Features**
- Fast, light-weight and highly customizable
- Compares XML/HTML documents and document fragments
- Can produce both detailed diffing discrepancies or execute silently
- Has the ability to exclude specific nodes or attributes from all comparisons
## Installation
Add this line to your application's Gemfile:
```ruby
gem 'compare-xml'
```
And then execute:
$ bundle
Or install it yourself as:
$ gem install compare-xml
## Usage
Using CompareXML is as simple as
```ruby
CompareXML.equivalent?(doc1, doc2)
```
where `doc1` and `doc2` are instances of `Nokogiri::XML::Node` or `Nokogiri::XML::NodeSet`.
**Example**
Suppose you have two files `1.html` and `2.html` that you would like to compare. You could do it as follows:
```ruby
doc1 = Nokogiri::HTML(open('1.html'))
doc2 = Nokogiri::HTML(open('2.html'))
puts CompareXML.equivalent?(doc1, doc2)
```
The above code will print `true` or `false` depending on the result of the comparison.
> If you are using CompareXML in a script, then you need to require it manually with:
```ruby
require 'compare-xml'
```
## Options at a Glance
CompareXML has a variety of options that can be invoked as an optional argument, e.g.:
```ruby
CompareXML.equivalent?(doc1, doc2, {ignore_comments: false, verbose: true, ...})
```
- `ignore_attr_order: {true|false}` default: **`true`**
- when `true`, ignores attribute order within tags
- `ignore_attrs: {css}` default: **`{}`**
- when provided, ignores specific *attributes* using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp)
- `ignore_comments: {true|false}` default: **`true`**
- when `true`, ignores comments, such as ``
- `ignore_nodes: {css}` default: **`{}`**
- when provided, ignores specific *nodes* using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp)
- `ignore_text_nodes: {true|false}` default: **`false`**
- when `true`, ignores all text content within a document
- `collapse_whitespace: {true|false}` default: **`true`**
- when `true`, trims and collapses whitespace
- `verbose: {true|false}` default: **`false`**
- when `true`, instead of a boolean, `CompareXML.equivalent?` returns an array of discrepancies.
## Options in Depth
- `ignore_attr_order: {true|false}` default: **`true`**
When `true`, all attributes are sorted before comparison and only attributes of the same type are compared.
**Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attr_order: true})`
**Example:** When `true` the following HTML strings are considered equal:
Link
Link
**Example:** When `false` the above HTML strings are compared as follows:
href="admin" != class="button
The comparison of the `` element will stop at this point, since a discrepancy is found.
**Example:** When `true` the following HTML strings are compared as follows:
Link
Link
class="button" == class="button"
href="/admin" == href="/admin"
=! rel="nofollow"
target="_blank" == target="_blank"
- `ignore_attrs: {css}` default: **`{}`**
When provided, ignores all **attributes** that satisfy a particular rule using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp).
**Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_attrs: ['a[rel="nofollow"]', 'input[type="hidden"']})`
**Example:** With `ignore_attrs: ['a[rel="nofollow"]', 'a[target]']` the following HTML strings are considered equal:
Link
Link
**Example:** With `ignore_attrs: ['a[href^="http"]', 'a[class*="button"]']` the following HTML strings are considered equal:
Link
Link
- `ignore_comments: {true|false}` default: **`true`**
When `true`, ignores comments, such as ``.
**Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_comments: true})`
**Example:** When `true` the following HTML strings are considered equal:
**Example:** When `true` the following HTML strings are considered equal:
Link
Link
- `ignore_nodes: {css}` default: **`{}`**
When provided, ignores all **nodes** that satisfy a particular rule using [CSS selectors](http://www.w3schools.com/cssref/css_selectors.asp).
**Usage Example:** `CompareXML.equivalent?(doc1, doc2, {ignore_nodes: ['script', 'object']})`
**Example:** With `ignore_nodes: ['a[rel="nofollow"]', 'a[target]']` the following HTML strings are considered equal:
Link 1
Link 2
**Example:** With `ignore_nodes: ['b', 'i']` the following HTML strings are considered equal:
Warning: Link
Message: Link
- `ignore_text_nodes: {true|false}` default: **`false`**
When `true`, ignores all text content. Text content is anything that is included between an opening and a closing tag, e.g. `
Extra content
Some fancy quote Author Name
Some more text
Yet more text
Too much text
Extra content
Some fancy quote
Some more text
Yet more text
Too much text
`CompareXML.equivalent?(doc1, doc2, {verbose: true})` will produce an array shown below. [ "html:head:title", "TITLE", 10, "ANOTHER TITLE", "html:head:title" ], [ "html:body:h1", nil, 2, "id=\"main\"", "html:body:h1" ], [ "html:body:div(2):a", "rel=\"button\"", 4, "rel=\"icon\"", "html:body:div(2):a" ], [ "html:body:blockquote:cite", "cite", 3, nil, "html:body:blockquote:cite" ], [ "html:body:p(4)", "p", 8, "div", "html:body:div(3)" ] The structure of the array is as follows: [left_node_location, left_content, error_code, right_content, right_node_location] **Node location** of `html:body:p(4)` means that the element in question is ``, its hierarchical ancestors are `html > body`, and it is the **4th** `
` tag. That is, it could be found in
one
...two
...three
...TARGET
> **Note:** `p(4)` means that it is the fourth tag of type ``, but there could be many other tags of other types between `p(3)` and `p(4)`. **Node content** displays the discrepancy in content (which could be the name of the tag, attributes, text content, comments, etc) **Error code** is a numeric value that indicates the type of a discrepancy. CompareXML implements the following error codes ```ruby EQUIVALENT = 1 # nodes are equal (for internal use only) MISSING_ATTRIBUTE = 2 # attribute is missing its counterpart MISSING_NODE = 3 # node is missing its counterpart UNEQUAL_ATTRIBUTES = 4 # attributes are not equal UNEQUAL_COMMENTS = 5 # comment contents are not equal UNEQUAL_DOCUMENTS = 6 # document types are not equal UNEQUAL_ELEMENTS = 7 # nodes have the same type but are not equal UNEQUAL_NODES_TYPES = 8 # nodes do not have the same type UNEQUAL_TEXT_CONTENTS = 9 # text contents are not equal ``` Here is an example of how these could be used: ```ruby case error_code when CompareXML::UNEQUAL_ATTRIBUTES '!=' when CompareXML::MISSING_ATTRIBUTE '?' end ``` ## Contributing 1. Fork it 2. Create your feature branch (`git checkout -b my-new-feature`) 3. Commit your changes (`git commit -am 'Add some feature'`) 4. Push to the branch (`git push origin my-new-feature`) 5. Create new Pull Request ## Credits This gem was inspired by [Michael B. Klein](https://github.com/mbklein)'s gem [`equivalent-xml`](https://github.com/mbklein/equivalent-xml) - another excellent tool for XML comparison. ## License The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).