Sha256: 286d2ec951bcf2c26914ecfed519773ae841432f1020252056d127b41f8b1e0c
Contents?: true
Size: 521 Bytes
Versions: 7
Compression:
Stored size: 521 Bytes
Contents
module Boilerpipe::Extractors class LargestContentExtractor def self.text(contents) doc = ::Boilerpipe::SAX::BoilerpipeHTMLParser.parse(contents) ::Boilerpipe::Extractors::LargestContentExtractor.process doc doc.content end def self.process(doc) filters = ::Boilerpipe::Filters filters::NumWordsRulesClassifier.process doc filters::BlockProximityFusion::MAX_DISTANCE_1.process doc filters::KeepLargestBlockFilter::INSTANCE.process doc doc end end end
Version data entries
7 entries across 7 versions & 1 rubygems