Sha256: d20426874ee6592ff5361c1057c925a362a5922168f4ff0451e9efe429e6b731

Contents?: true

Size: 651 Bytes

Versions: 8

Compression:

Stored size: 651 Bytes

Contents

PlainTextExtractor.new {
  every :html, :htm
  as "text/html"
  aka "HyperText Markup Language document"
  with {|source|
    encoding=File.encoding(source)
    if encoding.empty? or encoding.gsub(/[^\w]/,'').downcase=="utf8" then
      %x{html2text -nobs "#{source}"}
    else
      %x{html2text -nobs "#{source}" | iconv -f #{encoding} -t utf8}
    end
  }
  which_requires 'html2text', 'iconv'
  which_should_for_example_extract 'zentrum für angewandte forschung an fachhochschulen nachhaltige energietechnik Baden-Württemberg', :from => 'zafh.net.html'
  or_extract 'Málaga', :from => '7.html'
  or_extract 'le monde', :from => 'lemonde.htm'
}

Version data entries

8 entries across 8 versions & 1 rubygems

Version Path
picolena-0.1.2 lib/picolena/templates/lib/plain_text_extractors/html.rb
picolena-0.1.3 lib/picolena/templates/lib/plain_text_extractors/html.rb
picolena-0.1.4 lib/picolena/templates/lib/plain_text_extractors/html.rb
picolena-0.1.5 lib/picolena/templates/lib/plain_text_extractors/html.rb
picolena-0.1.6 lib/picolena/templates/lib/plain_text_extractors/html.rb
picolena-0.1.7 lib/picolena/templates/lib/plain_text_extractors/html.rb
picolena-0.2.0 lib/picolena/templates/lib/plain_text_extractors/html.rb
picolena-0.1.8 lib/picolena/templates/lib/plain_text_extractors/html.rb