Sha256: 4db13e959bfeee8342ba0346bf63fe8d832aa86e87986c83f858cdcc759952ee

Contents?: true

Size: 1.08 KB

Versions: 3

Compression:

Stored size: 1.08 KB

Contents

module HttpSpell
  class SpellChecker
    def initialize(personal_dictionary_path = nil, tracing: false)
      @personal_dictionary_arg = "-p #{personal_dictionary_path}" if personal_dictionary_path
      @tracing = tracing
    end

    def check(doc, lang)
      commands = [
        'pandoc --from html --to plain',
        "hunspell -d #{translate(lang)} #{@personal_dictionary_arg} -i UTF-8 -l",
      ]

      if @tracing
        warn "Piping the HTML document into the following chain of commands:"
        warn commands
      end

      Open3.pipeline_rw(*commands) do |stdin, stdout, _wait_thrs|
        stdin.puts(doc)
        stdin.close
        stdout.read.split.uniq
      end
    end

    private

    # The W3C [recommends](https://www.w3.org/International/questions/qa-html-language-declarations)
    # to specify language using identifiers as per [RFC 5646](https://tools.ietf.org/html/rfc5646)
    # which uses dashes. Hunspell, however, uses underscores. This method translates RFC-style identifiers
    # to hunspell-style.
    def translate(lang)
      lang.tr('-', '_')
    end
  end
end

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
httpspell-1.3.0 lib/httpspell/spellchecker.rb
httpspell-1.2.1 lib/httpspell/spellchecker.rb
httpspell-1.2.0 lib/httpspell/spellchecker.rb