= Moxml: Modern XML processing for Ruby :toc: macro :toclevels: 3 :toc-title: Contents :source-highlighter: highlight.js image:https://github.com/lutaml/moxml/workflows/rake/badge.svg["Build Status", link="https://github.com/lutaml/moxml/actions?workflow=rake"] toc::[] == Introduction and purpose Moxml provides a unified, modern XML processing interface for Ruby applications. It offers a consistent API that abstracts away the underlying XML implementation details while maintaining high performance through efficient node mapping and native XPath querying. Key features: * Intuitive, Ruby-idiomatic API for XML manipulation * Consistent interface across different XML libraries * Efficient node mapping for XPath queries * Support for all XML node types and features * Easy switching between XML processing engines * Clean separation between interface and implementation == Getting started Install the gem and at least one supported XML library: [source,ruby] ---- # In your Gemfile gem 'moxml' gem 'nokogiri' # Or 'ox' or 'oga' ---- === Basic document creation [source,ruby] ---- require 'moxml' # Create a new XML document doc = Moxml.new.create_document # Add XML declaration doc.add_declaration(version: "1.0", encoding: "UTF-8") # Create root element with namespace root = doc.create_element('book') root.add_namespace('dc', 'http://purl.org/dc/elements/1.1/') doc.add_child(root) # Add content title = doc.create_element('dc:title') title.text = 'XML Processing with Ruby' root.add_child(title) # Output formatted XML puts doc.to_xml(indent: 2) ---- == Working with documents === Using the builder pattern The builder pattern provides a clean DSL for creating XML documents: [source,ruby] ---- doc = Moxml.new.build do declaration version: "1.0", encoding: "UTF-8" element 'library', xmlns: 'http://example.org/library' do element 'book' do element 'title' do text 'Ruby Programming' end element 'author' do text 'Jane Smith' end comment 'Publication details' element 'published', year: '2024' cdata '<custom>metadata</custom>' end end end ---- === Direct document manipulation [source,ruby] ---- doc = Moxml.new.create_document # Add declaration doc.add_declaration(version: "1.0", encoding: "UTF-8") # Create root with namespace root = doc.create_element('library') root.add_namespace(nil, 'http://example.org/library') doc.add_child(root) # Add elements with attributes book = doc.create_element('book') book['id'] = 'b1' root.add_child(book) # Add mixed content book.add_child(doc.create_comment('Book details')) title = doc.create_element('title') title.text = 'Ruby Programming' book.add_child(title) ---- == XML objects and their methods === Document object The Document object represents an XML document and serves as the root container for all XML nodes. [source,ruby] ---- # Creating a document doc = Moxml.new.create_document doc = Moxml.new.parse(xml_string) # Document properties and methods doc.encoding # Get document encoding doc.encoding = "UTF-8" # Set document encoding doc.version # Get XML version doc.version = "1.1" # Set XML version doc.standalone # Get standalone declaration doc.standalone = "yes" # Set standalone declaration # Document structure doc.root # Get root element doc.children # Get all top-level nodes doc.add_child(node) # Add a child node doc.remove_child(node) # Remove a child node # Node creation methods doc.create_element(name) # Create new element doc.create_text(content) # Create text node doc.create_cdata(content) # Create CDATA section doc.create_comment(content) # Create comment doc.create_processing_instruction(target, content) # Create PI # Document querying doc.xpath(expression) # Find nodes by XPath doc.at_xpath(expression) # Find first node by XPath # Serialization doc.to_xml(options) # Convert to XML string ---- === Element object Elements are the primary structural components of an XML document, representing tags with attributes and content. [source,ruby] ---- # Element properties element.name # Get element name element.name = "new_name" # Set element name element.text # Get text content element.text = "content" # Set text content element.inner_html # Get inner XML content element.inner_html = xml # Set inner XML content # Attributes element[name] # Get attribute value element[name] = value # Set attribute value element.attributes # Get all attributes element.remove_attribute(name) # Remove attribute # Namespace handling element.namespace # Get element's namespace element.namespace = ns # Set element's namespace element.add_namespace(prefix, uri) # Add new namespace element.namespaces # Get all namespace definitions # Node structure element.parent # Get parent node element.children # Get child nodes element.add_child(node) # Add child node element.remove_child(node) # Remove child node element.add_previous_sibling(node) # Add sibling before element.add_next_sibling(node) # Add sibling after element.replace(node) # Replace with another node element.remove # Remove from document # Node type checking element.element? # Returns true element.text? # Returns false element.cdata? # Returns false element.comment? # Returns false element.processing_instruction? # Returns false # Node querying element.xpath(expression) # Find nodes by XPath element.at_xpath(expression) # Find first node by XPath ---- === Text object Text nodes represent character data in the XML document. [source,ruby] ---- # Creating text nodes text = doc.create_text("content") # Text properties text.content # Get text content text.content = "new" # Set text content # Node type checking text.text? # Returns true # Structure text.parent # Get parent node text.remove # Remove from document text.replace(node) # Replace with another node ---- === CDATA object CDATA sections contain text that should not be parsed as markup. [source,ruby] ---- # Creating CDATA sections cdata = doc.create_cdata("<raw>content</raw>") # CDATA properties cdata.content # Get CDATA content cdata.content = "new" # Set CDATA content # Node type checking cdata.cdata? # Returns true # Structure cdata.parent # Get parent node cdata.remove # Remove from document cdata.replace(node) # Replace with another node ---- === Comment object Comments contain human-readable notes in the XML document. [source,ruby] ---- # Creating comments comment = doc.create_comment("Note") # Comment properties comment.content # Get comment content comment.content = "new" # Set comment content # Node type checking comment.comment? # Returns true # Structure comment.parent # Get parent node comment.remove # Remove from document comment.replace(node) # Replace with another node ---- === Processing instruction object Processing instructions provide instructions to applications processing the XML. [source,ruby] ---- # Creating processing instructions pi = doc.create_processing_instruction("xml-stylesheet", 'type="text/xsl" href="style.xsl"') # PI properties pi.target # Get PI target pi.target = "new" # Set PI target pi.content # Get PI content pi.content = "new" # Set PI content # Node type checking pi.processing_instruction? # Returns true # Structure pi.parent # Get parent node pi.remove # Remove from document pi.replace(node) # Replace with another node ---- === Attribute object Attributes represent name-value pairs on elements. [source,ruby] ---- # Attribute properties attr.name # Get attribute name attr.name = "new" # Set attribute name attr.value # Get attribute value attr.value = "new" # Set attribute value # Namespace handling attr.namespace # Get attribute's namespace attr.namespace = ns # Set attribute's namespace # Node type checking attr.attribute? # Returns true ---- === Namespace object Namespaces define XML namespaces used in the document. [source,ruby] ---- # Namespace properties ns.prefix # Get namespace prefix ns.uri # Get namespace URI # Formatting ns.to_s # Format as xmlns declaration # Node type checking ns.namespace? # Returns true ---- === Node traversal and inspection Each node type provides methods for traversing the document structure: [source,ruby] ---- node.parent # Get parent node node.children # Get child nodes node.next_sibling # Get next sibling node.previous_sibling # Get previous sibling node.ancestors # Get all ancestor nodes node.descendants # Get all descendant nodes # Type checking node.element? # Is it an element? node.text? # Is it a text node? node.cdata? # Is it a CDATA section? node.comment? # Is it a comment? node.processing_instruction? # Is it a PI? node.attribute? # Is it an attribute? node.namespace? # Is it a namespace? # Node information node.document # Get owning document node.path # Get XPath to node node.line_number # Get source line number (if available) ---- == Advanced features === XPath querying and node mapping Moxml provides efficient XPath querying by leveraging the native XML library's implementation while maintaining consistent node mapping: [source,ruby] ---- # Find all book elements books = doc.xpath('//book') # Returns Moxml::Element objects mapped to native nodes # Find with namespaces titles = doc.xpath('//dc:title', 'dc' => 'http://purl.org/dc/elements/1.1/') # Find first matching node first_book = doc.at_xpath('//book') # Chain queries doc.xpath('//book').each do |book| # Each book is a mapped Moxml::Element title = book.at_xpath('.//title') puts "#{book['id']}: #{title.text}" end ---- === Namespace handling [source,ruby] ---- # Add namespace to element element.add_namespace('dc', 'http://purl.org/dc/elements/1.1/') # Create element in namespace title = doc.create_element('dc:title') title.text = 'Document Title' # Query with namespaces doc.xpath('//dc:title', 'dc' => 'http://purl.org/dc/elements/1.1/') ---- === Accessing native implementation While not typically needed, you can access the underlying XML library's nodes: [source,ruby] ---- # Get native node native_node = element.native # Get adapter being used adapter = element.context.config.adapter # Create from native node element = Moxml::Element.new(native_node, context) ---- == Error handling Moxml provides specific error classes for different types of errors that may occur during XML processing: [source,ruby] ---- begin doc = context.parse(xml_string) rescue Moxml::ParseError => e # Handles XML parsing errors puts "Parse error at line #{e.line}, column #{e.column}" puts "Message: #{e.message}" rescue Moxml::ValidationError => e # Handles XML validation errors puts "Validation error: #{e.message}" rescue Moxml::XPathError => e # Handles XPath expression errors puts "XPath error: #{e.message}" rescue Moxml::Error => e # Handles other Moxml-specific errors puts "Error: #{e.message}" end ---- == Configuration Moxml can be configured globally or per instance: [source,ruby] ---- # Global configuration Moxml.configure do |config| config.default_adapter = :nokogiri config.strict = true config.encoding = 'UTF-8' end # Instance configuration moxml = Moxml.new do |config| config.adapter = :ox config.strict = false end ---- == Thread safety Moxml is thread-safe when used properly. Each instance maintains its own state and can be used safely in concurrent operations: [source,ruby] ---- class XmlProcessor def initialize @mutex = Mutex.new @context = Moxml.new end def process(xml) @mutex.synchronize do doc = @context.parse(xml) # Modify document doc.to_xml end end end ---- == Performance considerations === Memory management Moxml maintains a node registry to ensure consistent object mapping: [source,ruby] ---- doc = context.parse(large_xml) # Process document doc = nil # Allow garbage collection of document and registry GC.start # Force garbage collection if needed ---- === Efficient querying Use specific XPath expressions for better performance: [source,ruby] ---- # More efficient - specific path doc.xpath('//book/title') # Less efficient - requires full document scan doc.xpath('//title') # Most efficient - direct child access root.xpath('./title') ---- == Best practices === Document creation [source,ruby] ---- # Preferred - using builder pattern doc = Moxml.new.build do declaration version: "1.0", encoding: "UTF-8" element 'root' do element 'child' do text 'content' end end end # Alternative - direct manipulation doc = Moxml.new.create_document doc.add_declaration(version: "1.0", encoding: "UTF-8") root = doc.create_element('root') doc.add_child(root) ---- === Node manipulation [source,ruby] ---- # Preferred - chainable operations element .add_namespace('dc', 'http://purl.org/dc/elements/1.1/') .add_child(doc.create_text('content')) # Preferred - clear node type checking if node.element? node.add_child(doc.create_text('content')) end ---- == Contributing 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/my-new-feature`) 3. Commit your changes (`git commit -am 'Add some feature'`) 4. Push to the branch (`git push origin feature/my-new-feature`) 5. Create a new Pull Request == License Copyright (c) 2024 Ribose Inc. This project is licensed under the BSD-2-Clause License. See the LICENSE file for details.