Sha256: 959976797e36ad9e4191b1a3a4ae20bd4fdbe003a67ca66dee3fd4332b48ead8

Contents?: true

Size: 667 Bytes

Versions: 1

Compression:

Stored size: 667 Bytes

Contents

PlainTextExtractor.new {
  every :html, :htm
  as "text/html"
  aka "HyperText Markup Language document"
  extract_content_with {|source|
    encoding=File.encoding(source)
    if encoding.empty? or encoding.gsub(/[^\w]/,'').downcase=="utf8" then
      %x{html2text -nobs "#{source}"}
    else
      %x{html2text -nobs "#{source}" | iconv -f #{encoding} -t utf8}
    end
  }
  which_requires 'html2text', 'iconv'
  which_should_for_example_extract 'zentrum für angewandte forschung an fachhochschulen nachhaltige energietechnik Baden-Württemberg', :from => 'zafh.net.html'
  or_extract 'Málaga', :from => '7.html'
  or_extract 'le monde', :from => 'lemonde.htm'
}

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
picolena-0.2.2 lib/picolena/templates/lib/plain_text_extractors/html.rb