Sha256: c8fbf262e592b0a9906fd69fb40c11c7f13b6a7a1340625dc97da54745c7c892

Contents?: true

Size: 1.66 KB

Versions: 21

Compression:

Stored size: 1.66 KB

Contents

require 'ndr_support/safe_file'
require 'ndr_support/utf8_encoding'

module NdrImport
  module Helpers
    module File
      # This mixin adds XML functionality to unified importers.
      module Xml
        include UTF8Encoding

        private

        def read_xml_file(path)
          file_data = SafeFile.new(path).read

          require 'nokogiri'

          Nokogiri::XML(ensure_utf8! file_data).tap do |doc|
            doc.encoding = 'UTF-8'
            emulate_strict_mode_fatal_check!(doc)
          end
        end

        # Nokogiri can use give a `STRICT` parse option to libxml, but our friendly
        # handling of muddled encodings causes XML explicitly declared as something
        # other than UTF-8 to fail (because it has been recoded to UTF-8 by the
        # time it is given to Nokogiri / libxml).
        # This raises a SyntaxError if strict mode would have found any other
        # (fatal) issues with the document.
        def emulate_strict_mode_fatal_check!(document)
          # We let slide any warnings about xml declared as one of our
          # auto encodings, but parsed as UTF-8:
          encoding_pattern = AUTO_ENCODINGS.map { |name| Regexp.escape(name) }.join('|')
          encoding_warning = /Document labelled (#{encoding_pattern}) but has UTF-8 content\z/
          fatal_errors     = document.errors.select do |error|
            error.fatal? && (encoding_warning !~ error.message)
          end

          return unless fatal_errors.any?
          raise Nokogiri::XML::SyntaxError, <<~MSG
            The file had #{fatal_errors.length} fatal error(s)!"
            #{fatal_errors.join("\n")}
          MSG
        end
      end
    end
  end
end

Version data entries

21 entries across 21 versions & 1 rubygems

Version Path
ndr_import-5.0.0 lib/ndr_import/helpers/file/xml.rb