RubygemsResearch

Sha256: fa2ae568f00fd99ec04012703abef917e5016fc266900575087099360e35791c

Contents?: true

Size: 1.25 KB

Versions: 2

Compression:

Stored size: 1.25 KB

=begin rdoc
html2text that works with Nokogiri
=end
module WWMD

  INLINETAGS =  ['a','abbr','acronym','address','b','bdo','big','cite',
                 'code','del','dfn','em','font','i','ins','kbd','label',
                 'noframes','noscript','q','s','samp','small','span',
                 'strike','strong','sub','sup','td','th','tt','u',
                 'html','body','table']
  BLOCKTAGS =   ['blockquote','center','dd','div','fieldset','form',
                 'h1','h2','h3','h4','h5','h6','p','pre','tr','var',]
  LISTTAGS =    ['dir','dl','menu','ol','ul']
  ITEMTAGS =    ['li','dt']
  SPECIALTAGS = ['br','hr']

  class Page
    def html2text
      arr = []
      self.scrape.hdoc.traverse do |x|
        arr << [x.parent.name,x.text] if x.text?
        if x.elem?
          arr << [x.name,""] if SPECIALTAGS.include?(x.name)
        end
      end
      ret = ""
      arr.each do |name,str|
        (ret += "\n"; next ) if name == "br"
        (ret += "\n" + ("-" * 72) + "\n"; next) if name == "hr"
        s = str.strip
        if BLOCKTAGS.include?(name) or LISTTAGS.include?(name)
          s += "\n"
        elsif ITEMTAGS.include?(name)
          s = "* " + s + "\n"
        end
        ret += s
      end
      ret.gsub(/\n+/) { "\n" }
    end
  end
end

Version data entries

2 entries across 2 versions & 1 rubygems

Version	Path
miketracy-wwmd-0.2.11	lib/wwmd/nokogiri_html2text.rb
miketracy-wwmd-0.2.12	lib/wwmd/nokogiri_html2text.rb