= Ox: A fast XML parser and Object marshaller.
*GitHub* *repo*: https://github.com/ohler55/ox
*RubyGems* *repo*: https://rubygems.org/gems/ox
=== Description:
Optimized XML (Ox), as the name implies was written to provide speed optimized
XML handling. It was designed to be an alternative to Nokogiri in generic XML
parsing and as an alternative to Marshal for Object serialization.
Nokogiri relies on libXml while Ox is self contained. Ox uses nothing other
than standard C libraries so version issues with libXml are not an issue.
Marshal uses a binary format for serializing Objects. That binary format
changes with releases making Marshal dumped Object incompatible between some
versions. The use of a binary format make debugging message streams or file
contents next to impossible unless the same version of Ruby and only Ruby is
used for inspecting the serialize Object. Ox on the other hand uses human
readable XML. Ox also includes options that allow strict, tolerant, or a mode
that automatically defines missing classes.
It is possible to write an XML serialization gem with Nokogiri but writing
such a package in Ruby results in a module significantly slower than
Marshal. This is what triggered the start of Ox development.
Ox handles XML documents in two ways. It is a generic XML parser and writer as
well as a fast Object / XML marshaller. Ox was written for speed as a
replacement for Nokogiri and for Marshal.
As an XML parser it is 2 or more times faster than Nokogiri and as a generic
XML writer it is as much as 20 times faster than Nokogiri. Of course different
files may result in slightly different times.
As an Object serializer Ox is up to 6 times faster than the standard Ruby
Marshal.dump() and up to 3 times faster than Marshal.load().
=== Object Dump Sample:
require 'ox'
class Sample
attr_accessor :a, :b, :c
def initialize(a, b, c)
@a = a
@b = b
@c = c
end
end
# Create Object
obj = Sample.new(1, "bee", ['x', :y, 7.0])
# Now dump the Object to an XML String.
xml = Ox.dump(obj)
# Convert the object back into a Sample Object.
obj2 = Ox.parse_obj(xml)
=== Generic XML Writing and Parsing:
require 'ox'
doc = Ox::Document.new(:version => '1.0')
top = Ox::Element.new('top')
top[:name] = 'sample'
doc << top
mid = Ox::Element.new('middle')
mid[:name] = 'second'
top << mid
bot = Ox::Element.new('bottom')
bot[:name] = 'third'
mid << bot
xml = Ox.dump(doc)
puts xml
doc2 = Ox.parse(xml)
puts "Same? #{doc == doc2}"
=== Object XML format
The XML format used for Object encoding follows the structure of the
Object. Each XML element is encoded so that the XML element name is a type
indicator. Attributes of the element provide additional information such as
the Class if relevant, the Object attribute name, and Object ID if
necessary.
The type indicator map is:
- +a+ => Array
- +b+ => Base64
- +c+ => Class
- +f+ => Float
- +g+ => Regexp
- +h+ => Hash
- +i+ => Fixnum
- +j+ => Bignum
- +l+ => Rational
- +m+ => Symbol
- +n+ => FalseClass
- +o+ => Object
- +p+ => Ref
- +r+ => Range
- +s+ => String
- +t+ => Time
- +u+ => Struct
- +v+ => Complex
- +x+ => Raw
- +y+ => TrueClass
- +z+ => NilClass
If the type is an Object, type 'o' then an attribute named 'c' should be set
with the full Class name including the Module names. If the XML element
represents an Object then a sub-elements is included for each attribute of
the Object. An XML element attribute 'a' is set with a value that is the
name of the Ruby Object attribute. In all cases, except for the Exception
attribute hack the attribute names begin with an @ character. (Exception are
strange in that the attributes of the Exception Class are not named with a @
suffix. A hack since it has to be done in C and can not be done through the
interpreter.)
Values are encoded as the text portion of an element or in the sub-elements
of the principle. For example, a Fixnum is encoded as:
123
An Array has sub-elements and is encoded similar to this example.
1
abc
A Hash is encoded with an even number of elements where the first element is
the key and the second is the value. This is repeated for each entry in the
Hash. An example is of { 1 => 'one', 2 => 'two' } encoding is:
1
one
2
two
Strings with characters not allowed in XML are base64 encoded amd will be
converted back into a String when loaded.
Ox supports circular references where attributes of one Object can refer to
an Object that refers back to the first Object. When this option is used an
Object ID is added to each XML Object element as the value of the 'a'
attribute.