Sha256: 882b21388be67c92cfca6f71791b23664067a982759db871ea89bb3e43697139

Contents?: true

Size: 320 Bytes

Versions: 1

Compression:

Stored size: 320 Bytes

Contents

# Keeps only those content blocks which contain at least k words.

module Boilerpipe::Filters
  class MinWordsFilter

    def self.process(min_words, doc)
      doc.text_blocks.each do |tb|
        next if tb.is_not_content?
        tb.content = false if tb.num_words < min_words
      end
      doc
    end

  end
end

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
boilerpipe-ruby-0.4.0 lib/boilerpipe/filters/min_words_filter.rb