Sha256: 4897d5724f49b0e42330c7ee0ec42b6a559135e9c8ecf47f3acf91ac085a439b

Contents?: true

Size: 548 Bytes

Versions: 3

Compression:

Stored size: 548 Bytes

Contents

 # Merges two subsequent blocks if their text densities are equal.

module Boilerpipe::Filters
  class SimpleBlockFusionProcessor
    def self.process(doc)
      tbs = doc.text_blocks
      return doc if tbs.size < 2

      blocks_to_remove = []
      tb1 = tbs.first
      tbs.drop(1).each do |tb|
        if tb1.text_density == tb.text_density
          tb1.merge_next(tb)
          blocks_to_remove << tb
        else
          tb1 = tb
        end
      end

      doc.replace_text_blocks!( tbs - blocks_to_remove )
      doc
    end
  end
end

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
boilerpipe-ruby-0.4.0 lib/boilerpipe/filters/simple_block_fusion_processor.rb
boilerpipe-ruby-0.3.0 lib/boilerpipe/filters/simple_block_fusion_processor.rb
boilerpipe-ruby-0.2.0 lib/boilerpipe/filters/simple_block_fusion_processor.rb