Sha256: 0d40121e265d818111c982af5592fe39a81ac68f96c3597d13befdf3fd3b1905

Contents?: true

Size: 957 Bytes

Versions: 3

Compression:

Stored size: 957 Bytes

Contents

# PDF Table Data Extractor
# by Eresse <eresse@eresse.net>

# External Includes
require 'htmlentities'
require 'pdftohtml'

# Internal Includes
require 'pdftdx/parser'
require 'pdftdx/version'

# PDF TDX Module
module PDFTDX

	# Extract Data from PDF
	# @param [String] pdf_file Path to a PDF file
	# @return [Array] An array of tables, each represented as a hash containing an optional header and table data, in the form of either one single array of rows, or a hash of sub-tables (arrays of rows) mapped by name. Table rows are represented as an array of table cells. Example: [{ head: ['trauma.eresse.net', 'durjaya.dooba.io', 'suessmost.eresse.net'], data: { 'System' => [['Machine OS', 'Win32', 'Linux', 'MacOS'], ['IP Address', '10.0.232.48', '10.0.232.134', '10.0.232.108']] } }]
	def self.extract_data pdf_file

		# Dump PDF Data
		page_data = Pdftohtml.convert pdf_file

		# Process Page Data
		PDFTDX::Parser.process_page_files page_data
	end
end

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
pdftdx-1.0.3 lib/pdftdx.rb
pdftdx-1.0.2 lib/pdftdx.rb
pdftdx-1.0.1 lib/pdftdx.rb