Sha256: e78381a707f33d1f879e43678c94c540eb301142962e46aad61bbe5fece1b5b8
Contents?: true
Size: 318 Bytes
Versions: 5
Compression:
Stored size: 318 Bytes
Contents
# Keeps only those content blocks which contain at least k words. module Boilerpipe::Filters class MinWordsFilter def self.process(min_words, doc) doc.text_blocks.each do |tb| next if tb.is_not_content? tb.content = false if tb.num_words < min_words end doc end end end
Version data entries
5 entries across 5 versions & 1 rubygems