Sha256: a97b1ffe59a66161de5b7cb8a9ffe345ada3d43585cfe3ef440cb1f2372cf72a

Contents?: true

Size: 580 Bytes

Versions: 6

Compression:

Stored size: 580 Bytes

Contents

# A full-text extractor which is tuned towards extracting sentences from news articles.

module Boilerpipe::Extractors
  class ArticleSentenceExtractor
    def self.text(contents)
      doc = ::Boilerpipe::SAX::BoilerpipeHTMLParser.parse(contents)
      ::Boilerpipe::Extractors::ArticleSentenceExtractor.process(doc)
      doc.content
    end

    def self.process(doc)
      ::Boilerpipe::Extractors::ArticleExtractor.process doc
      ::Boilerpipe::Filters::SplitParagraphBlocksFilter.process doc
      ::Boilerpipe::Filters::MinClauseWordsFilter.process doc
    end
  end
end

Version data entries

6 entries across 6 versions & 1 rubygems

Version Path
boilerpipe-ruby-0.5.0 lib/boilerpipe/extractors/article_sentence_extractor.rb
boilerpipe-ruby-0.4.4 lib/boilerpipe/extractors/article_sentence_extractor.rb
boilerpipe-ruby-0.4.3 lib/boilerpipe/extractors/article_sentence_extractor.rb
boilerpipe-ruby-0.4.2 lib/boilerpipe/extractors/article_sentence_extractor.rb
boilerpipe-ruby-0.4.1 lib/boilerpipe/extractors/article_sentence_extractor.rb
boilerpipe-ruby-0.4.0 lib/boilerpipe/extractors/article_sentence_extractor.rb