Sha256: a98f9c25f244a72d819bb610bbbaf51c60591f75425c902596f4c27e101a6788
Contents?: true
Size: 838 Bytes
Versions: 1
Compression:
Stored size: 838 Bytes
Contents
require 'nokogiri' require 'tmpdir' class RTesseract module Box def self.temp_dir @file_path = Pathname.new(Dir.tmpdir) end def self.run(source, options) name = "rtesseract_#{SecureRandom.uuid}" options.tessedit_create_hocr = 1 RTesseract::Command.new(source, temp_dir.join(name).to_s, options).run parse(temp_dir.join("#{name}.hocr").read) end def self.parse(content) html = Nokogiri::HTML(content) html.css('span.ocrx_word, span.ocr_word').map do |word| @attributes = word.attributes['title'].value.to_s.gsub(';', '').split(' ') { word: word.text, x_start: @attributes[1].to_i, y_start: @attributes[2].to_i, x_end: @attributes[3].to_i, y_end: @attributes[4].to_i } end end end end
Version data entries
1 entries across 1 versions & 1 rubygems
Version | Path |
---|---|
rtesseract-3.0.0 | lib/rtesseract/box.rb |