Sha256: d20426874ee6592ff5361c1057c925a362a5922168f4ff0451e9efe429e6b731
Contents?: true
Size: 651 Bytes
Versions: 8
Compression:
Stored size: 651 Bytes
Contents
PlainTextExtractor.new { every :html, :htm as "text/html" aka "HyperText Markup Language document" with {|source| encoding=File.encoding(source) if encoding.empty? or encoding.gsub(/[^\w]/,'').downcase=="utf8" then %x{html2text -nobs "#{source}"} else %x{html2text -nobs "#{source}" | iconv -f #{encoding} -t utf8} end } which_requires 'html2text', 'iconv' which_should_for_example_extract 'zentrum für angewandte forschung an fachhochschulen nachhaltige energietechnik Baden-Württemberg', :from => 'zafh.net.html' or_extract 'Málaga', :from => '7.html' or_extract 'le monde', :from => 'lemonde.htm' }
Version data entries
8 entries across 8 versions & 1 rubygems