= Moxml: Modular XML processing for Ruby Moxml provides a unified API for XML processing in Ruby, supporting multiple XML parsing backends (Nokogiri, Ox, and Oga). Moxml ("mox-em-el") stands for "Modular XML" and aims to provide a consistent interface for working with XML documents, regardless of the underlying XML library. == Installation [source,ruby] ---- gem 'moxml' ---- == Basic usage === Configuration Configure Moxml to use your preferred XML backend: [source,ruby] ---- require 'moxml' Moxml.configure do |config| config.backend = :nokogiri # or :ox, :oga end ---- === Creating and parsing documents [source,ruby] ---- # Create new empty document doc = Moxml::Document.new # Parse from string doc = Moxml::Document.parse("<root><child>content</child></root>") # Parse with encoding doc = Moxml::Document.parse(xml_string, encoding: 'UTF-8') ---- === Document creation patterns [source,ruby] ---- # Method 1: Create and build doc = Moxml::Document.new root = doc.create_element('root') doc.add_child(root) # Method 2: Parse from string doc = Moxml::Document.parse("<root/>") # Method 3: Parse with encoding doc = Moxml::Document.parse(xml_string, encoding: 'UTF-8') # Method 4: Parse with options doc = Moxml::Document.parse(xml_string, { encoding: 'UTF-8', strict: true }) ---- === Common XML patterns [source,ruby] ---- # Working with namespaces doc = Moxml::Document.new root = doc.create_element('root') root['xmlns:custom'] = 'http://example.com/ns' child = doc.create_element('custom:element') root.add_child(child) # Creating structured data person = doc.create_element('person') person['id'] = '123' name = doc.create_element('name') name.add_child(doc.create_text('John Doe')) person.add_child(name) # Working with attributes element = doc.create_element('div') element['class'] = 'container' element['data-id'] = '123' element['style'] = 'color: blue' # Handling special characters text = doc.create_text('Special chars: < > & " \'') cdata = doc.create_cdata('<script>alert("Hello!");</script>') # Processing instructions pi = doc.create_processing_instruction('xml-stylesheet', 'type="text/xsl" href="style.xsl"') doc.add_child(pi) ---- === Working with elements [source,ruby] ---- # Create new element element = Moxml::Element.new('tagname') # Add attributes element['class'] = 'content' # Access attributes class_attr = element['class'] # Add child elements child = element.create_element('child') element.add_child(child) # Access text content text_content = element.text # Add text content text = element.create_text('content') element.add_child(text) # Chaining operations element .add_child(doc.create_element('child')) .add_child(doc.create_text('content')) ['class'] = 'new-class' # Complex element creation div = doc.create_element('div') div['class'] = 'container' div.add_child(doc.create_element('span')) .add_child(doc.create_text('Hello')) div.add_child(doc.create_element('br')) div.add_child(doc.create_text('World')) ---- === Working with different node types [source,ruby] ---- # Text nodes with various content plain_text = Moxml::Text.new("Simple text") multiline_text = Moxml::Text.new("Line 1\nLine 2") special_chars = Moxml::Text.new("Special: & < > \" '") # CDATA sections for different content types script_cdata = Moxml::Cdata.new("function() { alert('Hello!'); }") xml_cdata = Moxml::Cdata.new("<data><item>value</item></data>") mixed_cdata = Moxml::Cdata.new("Text with ]]> characters") # Comments for documentation todo_comment = Moxml::Comment.new("TODO: Add validation") section_comment = Moxml::Comment.new("----- Section Break -----") debug_comment = Moxml::Comment.new("DEBUG: Remove in production") # Processing instructions for various uses style_pi = Moxml::ProcessingInstruction.new( "xml-stylesheet", 'type="text/css" href="style.css"' ) php_pi = Moxml::ProcessingInstruction.new( "php", 'echo "<?php echo $var; ?>>";' ) custom_pi = Moxml::ProcessingInstruction.new( "custom-processor", 'param1="value1" param2="value2"' ) ---- === Element manipulation examples [source,ruby] ---- # Building complex structures doc = Moxml::Document.new root = doc.create_element('html') doc.add_child(root) # Create head section head = doc.create_element('head') root.add_child(head) title = doc.create_element('title') title.add_child(doc.create_text('Example Page')) head.add_child(title) meta = doc.create_element('meta') meta['charset'] = 'UTF-8' head.add_child(meta) # Create body section body = doc.create_element('body') root.add_child(body) div = doc.create_element('div') div['class'] = 'container' body.add_child(div) # Add multiple paragraphs 3.times do |i| p = doc.create_element('p') p.add_child(doc.create_text("Paragraph #{i + 1}")) div.add_child(p) end # Working with lists ul = doc.create_element('ul') div.add_child(ul) ['Item 1', 'Item 2', 'Item 3'].each do |text| li = doc.create_element('li') li.add_child(doc.create_text(text)) ul.add_child(li) end # Adding link element a = doc.create_element('a') a['href'] = 'https://example.com' a.add_child(doc.create_text('Visit Example')) div.add_child(a) ---- === Advanced node manipulation [source,ruby] ---- # Cloning nodes original = doc.create_element('div') original['id'] = 'original' clone = original.clone # Moving nodes target = doc.create_element('target') source = doc.create_element('source') source.add_child(doc.create_text('Content')) target.add_child(source) # Replacing nodes old_node = doc.at_xpath('//old') new_node = doc.create_element('new') old_node.replace(new_node) # Inserting before/after reference = doc.create_element('reference') before = doc.create_element('before') after = doc.create_element('after') reference.add_previous_sibling(before) reference.add_next_sibling(after) # Conditional manipulation element = doc.at_xpath('//conditional') if element['flag'] == 'true' element.add_child(doc.create_text('Flag is true')) else element.remove end ---- === Working with namespaces [source,ruby] ---- # Creating namespaced document doc = Moxml::Document.new root = doc.create_element('root') root['xmlns'] = 'http://example.com/default' root['xmlns:custom'] = 'http://example.com/custom' doc.add_child(root) # Adding namespaced elements default_elem = doc.create_element('default-elem') custom_elem = doc.create_element('custom:elem') root.add_child(default_elem) root.add_child(custom_elem) # Working with attributes in namespaces custom_elem['custom:attr'] = 'value' # Accessing namespaced content ns_elem = doc.at_xpath('//custom:elem') ns_attr = ns_elem['custom:attr'] ---- === Document serialization examples [source,ruby] ---- # Basic serialization xml_string = doc.to_xml # Pretty printing with indentation formatted_xml = doc.to_xml( indent: 2, pretty: true ) # Controlling XML declaration with_declaration = doc.to_xml( xml_declaration: true, encoding: 'UTF-8', standalone: 'yes' ) # Compact output minimal_xml = doc.to_xml( indent: 0, pretty: false, xml_declaration: false ) # Custom formatting custom_format = doc.to_xml( indent: 4, encoding: 'ISO-8859-1', xml_declaration: true ) ---- == Implementation details === Memory management [source,ruby] ---- # Efficient document handling doc = Moxml::Document.parse(large_xml) begin # Process document result = process_document(doc) ensure # Clear references doc = nil GC.start end # Streaming large node sets doc.xpath('//large-set/*').each do |node| # Process node process_node(node) # Clear reference node = nil end # Handling large collections def process_large_nodeset(nodeset) nodeset.each do |node| yield node if block_given? end ensure # Clear references nodeset = nil GC.start end ---- === Backend-specific optimizations [source,ruby] ---- # Nokogiri-specific optimizations if Moxml.config.backend == :nokogiri # Use native CSS selectors nodes = doc.native.css('complex > selector') nodes.each do |native_node| node = Moxml::Node.wrap(native_node) # Process node end # Use native XPath results = doc.native.xpath('//complex/xpath/expression') end # Ox-specific optimizations if Moxml.config.backend == :ox # Use native parsing options doc = Moxml::Document.parse(xml, { mode: :generic, effort: :tolerant, smart: true }) # Direct element creation element = Ox::Element.new('name') wrapped = Moxml::Element.new(element) end # Oga-specific optimizations if Moxml.config.backend == :oga # Use native parsing features doc = Moxml::Document.parse(xml, { encoding: 'UTF-8', strict: true }) # Direct access to native methods nodes = doc.native.xpath('//element') end ---- === Threading patterns [source,ruby] ---- # Thread-safe document creation require 'thread' class ThreadSafeXmlProcessor def initialize @mutex = Mutex.new end def process_document(xml_string) @mutex.synchronize do doc = Moxml::Document.parse(xml_string) # Process document result = doc.to_xml doc = nil result end end end # Parallel document processing def process_documents(xml_strings) threads = xml_strings.map do |xml| Thread.new do doc = Moxml::Document.parse(xml) # Process document doc = nil end end threads.each(&:join) end # Thread-local document storage Thread.new do Thread.current[:document] = Moxml::Document.new # Process document ensure Thread.current[:document] = nil end ---- == Troubleshooting === Common issues and solutions ==== Parsing errors [source,ruby] ---- # Handle malformed XML begin doc = Moxml::Document.parse(xml_string) rescue Moxml::ParseError => e puts "Parse error at line #{e.line}, column #{e.column}: #{e.message}" # Attempt recovery xml_string = cleanup_xml(xml_string) retry end # Handle encoding issues begin doc = Moxml::Document.parse(xml_string, encoding: 'UTF-8') rescue Moxml::ParseError => e if e.message =~ /encoding/ # Try detecting encoding detected_encoding = detect_encoding(xml_string) retry if detected_encoding end raise end ---- ==== Memory issues [source,ruby] ---- # Handle large documents def process_large_document(path) # Read and process in chunks File.open(path) do |file| doc = Moxml::Document.parse(file) doc.xpath('//chunk').each do |chunk| process_chunk(chunk) chunk = nil end doc = nil end GC.start end # Monitor memory usage require 'get_process_mem' def memory_safe_processing(xml) memory = GetProcessMem.new initial_memory = memory.mb doc = Moxml::Document.parse(xml) result = process_document(doc) doc = nil GC.start final_memory = memory.mb puts "Memory usage: #{final_memory - initial_memory}MB" result end ---- ==== Backend-specific issues [source,ruby] ---- # Handle backend limitations def safe_xpath(doc, xpath) case Moxml.config.backend when :nokogiri doc.xpath(xpath) when :ox # Ox has limited XPath support fallback_xpath_search(doc, xpath) when :oga # Handle Oga-specific XPath syntax modified_xpath = adjust_xpath_for_oga(xpath) doc.xpath(modified_xpath) end end # Handle backend switching def with_backend(backend) original_backend = Moxml.config.backend Moxml.config.backend = backend yield ensure Moxml.config.backend = original_backend end ---- === Performance optimization ==== Document creation [source,ruby] ---- # Efficient document building def build_large_document doc = Moxml::Document.new root = doc.create_element('root') doc.add_child(root) # Pre-allocate elements elements = Array.new(1000) do |i| elem = doc.create_element('item') elem['id'] = i.to_s elem end # Batch add elements elements.each do |elem| root.add_child(elem) end doc end # Memory-efficient processing def process_large_xml(xml_string) result = [] doc = Moxml::Document.parse(xml_string) doc.xpath('//item').each do |item| # Process and immediately discard result << process_item(item) item = nil end doc = nil GC.start result end ---- ==== Query optimization [source,ruby] ---- # Optimize node selection def efficient_node_selection(doc) # Cache frequently used nodes @header_nodes ||= doc.xpath('//header').to_a # Use specific selectors doc.xpath('//specific/path') # Better than '//*[name()="specific"]' # Combine queries when possible doc.xpath('//a | //b') # Better than two separate queries end # Optimize attribute access def efficient_attribute_handling(element) # Cache attribute values @cached_attrs ||= element.attributes # Direct attribute access value = element['attr'] # Better than element.attributes['attr'] # Batch attribute updates attrs = {'id' => '1', 'class' => 'new', 'data' => 'value'} attrs.each { |k,v| element[k] = v } end ---- ==== Serialization optimization [source,ruby] ---- # Efficient output generation def optimized_serialization(doc) # Minimal output compact = doc.to_xml( indent: 0, pretty: false, xml_declaration: false ) # Balanced formatting readable = doc.to_xml( indent: 2, pretty: true, xml_declaration: true ) # Stream large documents File.open('large.xml', 'w') do |file| doc.write_to(file, indent: 2) end end ---- === Debugging tips ==== Inspection helpers [source,ruby] ---- # Debug node structure def inspect_node(node, level = 0) indent = " " * level puts "#{indent}#{node.class.name}: #{node.name}" if node.respond_to?(:attributes) node.attributes.each do |name, attr| puts "#{indent} @#{name}=#{attr.value.inspect}" end end if node.respond_to?(:children) node.children.each { |child| inspect_node(child, level + 1) } end end # Track node operations def debug_node_operations nodes_created = 0 nodes_removed = 0 yield ensure puts "Nodes created: #{nodes_created}" puts "Nodes removed: #{nodes_removed}" end ---- ==== Backend validation [source,ruby] ---- # Verify backend behavior def verify_backend_compatibility doc = Moxml::Document.new # Test basic operations element = doc.create_element('test') doc.add_child(element) # Verify node handling raise "Node creation failed" unless doc.root raise "Node type wrong" unless doc.root.is_a?(Moxml::Element) # Verify serialization xml = doc.to_xml raise "Serialization failed" unless xml.include?('<test/>') puts "Backend verification successful" rescue => e puts "Backend verification failed: #{e.message}" end ---- == Error handling Moxml provides unified error handling: * `Moxml::Error` - Base error class * `Moxml::ParseError` - XML parsing errors * `Moxml::ArgumentError` - Invalid argument errors === Error handling patterns [source,ruby] ---- # Handle parsing errors begin doc = Moxml::Document.parse(xml_string) rescue Moxml::ParseError => e logger.error "Parse error: #{e.message}" logger.error "At line #{e.line}, column #{e.column}" raise end # Handle invalid operations begin element['invalid/name'] = 'value' rescue Moxml::ArgumentError => e logger.warn "Invalid operation: #{e.message}" # Use alternative approach end # Custom error handling class XmlProcessor def process(xml) doc = Moxml::Document.parse(xml) yield doc rescue Moxml::Error => e handle_moxml_error(e) rescue StandardError => e handle_standard_error(e) ensure doc = nil end end ---- == Contributing Bug reports and pull requests are welcome on GitHub at https://github.com/lutaml/moxml. === Development guidelines * Follow Ruby style guide * Add tests for new features * Update documentation * Ensure backwards compatibility * Consider performance implications * Test with all supported backends == Copyright and license Copyright Ribose. The gem is available as open source under the terms of the BSD-2-Clause License.