Sha256: 5700f73b8f256ed9c8887e43ae98bd8228ee8afcee5f52089bba95c7890538ff
Contents?: true
Size: 946 Bytes
Versions: 5
Compression:
Stored size: 946 Bytes
Contents
# PDF Table Data Extractor # by Eresse <eresse@eresse.net> # External Includes require 'htmlentities' require 'pdftohtml' # Internal Includes require 'pdftdx/parser' require 'pdftdx/version' # PDF TDX Module module PDFTDX # Extract Data from PDF # @param [String] pdf_file Path to a PDF file # @return [Array] An array of tables, each represented as a hash containing an optional header and table data, in the form of either one single array of rows, or a hash of sub-tables (arrays of rows) mapped by name. Table rows are represented as an array of table cells. Example: [{ head: ['trauma.eresse.net', 'durjaya.dooba.io', 'suessmost.eresse.net'], data: { 'System' => [['Machine OS', 'Win32', 'Linux', 'MacOS'], ['IP Address', '10.0.232.48', '10.0.232.134', '10.0.232.108']] } }] def self.extract_data pdf_file # Dump PDF Data page_data = Pdftohtml.convert pdf_file # Process Page Data PDFTDX::Parser.process page_data end end
Version data entries
5 entries across 5 versions & 1 rubygems
Version | Path |
---|---|
pdftdx-1.2.1 | lib/pdftdx.rb |
pdftdx-1.2.0 | lib/pdftdx.rb |
pdftdx-1.1.8 | lib/pdftdx.rb |
pdftdx-1.1.7 | lib/pdftdx.rb |
pdftdx-1.0.4 | lib/pdftdx.rb |