Sha256: a97b1ffe59a66161de5b7cb8a9ffe345ada3d43585cfe3ef440cb1f2372cf72a
Contents?: true
Size: 580 Bytes
Versions: 6
Compression:
Stored size: 580 Bytes
Contents
# A full-text extractor which is tuned towards extracting sentences from news articles. module Boilerpipe::Extractors class ArticleSentenceExtractor def self.text(contents) doc = ::Boilerpipe::SAX::BoilerpipeHTMLParser.parse(contents) ::Boilerpipe::Extractors::ArticleSentenceExtractor.process(doc) doc.content end def self.process(doc) ::Boilerpipe::Extractors::ArticleExtractor.process doc ::Boilerpipe::Filters::SplitParagraphBlocksFilter.process doc ::Boilerpipe::Filters::MinClauseWordsFilter.process doc end end end
Version data entries
6 entries across 6 versions & 1 rubygems