Sha256: 01e5230c53c9fb3ad9bf85406b9b444696a1563904eef83d5fefb746f2657358

Contents?: true

Size: 478 Bytes

Versions: 5

Compression:

Stored size: 478 Bytes

Contents

# Marks trailing headlines TextBlocks that have the label :#HEADING
# as boilerplate. Trailing means they are marked content and are
# below any other content block.

module Boilerpipe::Filters
  class TrailingHeadlineToBoilerplateFilter
    def self.process(doc)
      doc.text_blocks.each do |tb|
        next unless tb.is_content?

        if tb.has_label? :HEADING
          tb.content = false
        else
          break
        end
      end

      doc
    end
  end
end

Version data entries

5 entries across 5 versions & 1 rubygems

Version Path
boilerpipe-ruby-0.5.0 lib/boilerpipe/filters/trailing_headline_to_boilerplate_filter.rb
boilerpipe-ruby-0.4.4 lib/boilerpipe/filters/trailing_headline_to_boilerplate_filter.rb
boilerpipe-ruby-0.4.3 lib/boilerpipe/filters/trailing_headline_to_boilerplate_filter.rb
boilerpipe-ruby-0.4.2 lib/boilerpipe/filters/trailing_headline_to_boilerplate_filter.rb
boilerpipe-ruby-0.4.1 lib/boilerpipe/filters/trailing_headline_to_boilerplate_filter.rb