Sha256: e78381a707f33d1f879e43678c94c540eb301142962e46aad61bbe5fece1b5b8

Contents?: true

Size: 318 Bytes

Versions: 5

Compression:

Stored size: 318 Bytes

Contents

# Keeps only those content blocks which contain at least k words.

module Boilerpipe::Filters
  class MinWordsFilter
    def self.process(min_words, doc)
      doc.text_blocks.each do |tb|
        next if tb.is_not_content?

        tb.content = false if tb.num_words < min_words
      end
      doc
    end
  end
end

Version data entries

5 entries across 5 versions & 1 rubygems

Version Path
boilerpipe-ruby-0.5.0 lib/boilerpipe/filters/min_words_filter.rb
boilerpipe-ruby-0.4.4 lib/boilerpipe/filters/min_words_filter.rb
boilerpipe-ruby-0.4.3 lib/boilerpipe/filters/min_words_filter.rb
boilerpipe-ruby-0.4.2 lib/boilerpipe/filters/min_words_filter.rb
boilerpipe-ruby-0.4.1 lib/boilerpipe/filters/min_words_filter.rb