Sha256: 674d78d95418d2707775c34565a14e61ad86bf6581d47bd3a1a6f1f8a74185f0

Contents?: true

Size: 338 Bytes

Versions: 2

Compression:

Stored size: 338 Bytes

Contents

module Boilerpipe::SAX
  class Preprocessor
    def self.strip(text)
      # script bug - delete script tags
      text = text.gsub(/\<script.+?<\/script>/im, '')
      # nokogiri uses libxml for mri and nekohtml for jruby
      # mri doesn't remove &nbsp; when missing the semicolon
      text.gsub(/(&nbsp) /, '\1; ')
    end
  end
end

Version data entries

2 entries across 2 versions & 1 rubygems

Version Path
boilerpipe-ruby-0.5.0 lib/boilerpipe/sax/preprocessor.rb
boilerpipe-ruby-0.4.4 lib/boilerpipe/sax/preprocessor.rb