Sha256: 959976797e36ad9e4191b1a3a4ae20bd4fdbe003a67ca66dee3fd4332b48ead8
Contents?: true
Size: 667 Bytes
Versions: 1
Compression:
Stored size: 667 Bytes
Contents
PlainTextExtractor.new { every :html, :htm as "text/html" aka "HyperText Markup Language document" extract_content_with {|source| encoding=File.encoding(source) if encoding.empty? or encoding.gsub(/[^\w]/,'').downcase=="utf8" then %x{html2text -nobs "#{source}"} else %x{html2text -nobs "#{source}" | iconv -f #{encoding} -t utf8} end } which_requires 'html2text', 'iconv' which_should_for_example_extract 'zentrum für angewandte forschung an fachhochschulen nachhaltige energietechnik Baden-Württemberg', :from => 'zafh.net.html' or_extract 'Málaga', :from => '7.html' or_extract 'le monde', :from => 'lemonde.htm' }
Version data entries
1 entries across 1 versions & 1 rubygems
Version | Path |
---|---|
picolena-0.2.2 | lib/picolena/templates/lib/plain_text_extractors/html.rb |