Sha256: 0d40121e265d818111c982af5592fe39a81ac68f96c3597d13befdf3fd3b1905
Contents?: true
Size: 957 Bytes
Versions: 3
Compression:
Stored size: 957 Bytes
Contents
# PDF Table Data Extractor # by Eresse <eresse@eresse.net> # External Includes require 'htmlentities' require 'pdftohtml' # Internal Includes require 'pdftdx/parser' require 'pdftdx/version' # PDF TDX Module module PDFTDX # Extract Data from PDF # @param [String] pdf_file Path to a PDF file # @return [Array] An array of tables, each represented as a hash containing an optional header and table data, in the form of either one single array of rows, or a hash of sub-tables (arrays of rows) mapped by name. Table rows are represented as an array of table cells. Example: [{ head: ['trauma.eresse.net', 'durjaya.dooba.io', 'suessmost.eresse.net'], data: { 'System' => [['Machine OS', 'Win32', 'Linux', 'MacOS'], ['IP Address', '10.0.232.48', '10.0.232.134', '10.0.232.108']] } }] def self.extract_data pdf_file # Dump PDF Data page_data = Pdftohtml.convert pdf_file # Process Page Data PDFTDX::Parser.process_page_files page_data end end
Version data entries
3 entries across 3 versions & 1 rubygems
Version | Path |
---|---|
pdftdx-1.0.3 | lib/pdftdx.rb |
pdftdx-1.0.2 | lib/pdftdx.rb |
pdftdx-1.0.1 | lib/pdftdx.rb |