= Simple Spreadsheet Extractor Authors:: Stuart Owen, Finn Bacall Version:: 0.8.0 Contact:: mailto:stuart.owen@manchester.ac.uk Licence:: BSD (See LICENCE or http://www.opensource.org/licenses/bsd-license.php) Copyright:: (c) 2010-2012 The University of Manchester, UK == Synopsis This is a simple gem that provides a facility to read an XLS or XLSX Excel spreadsheet document and produce an XML representation of its content. CSV output can also be generated for a single sheet. Internally it uses Apache POI, using the sister http://github.com/myGrid/simple-spreadsheet-extractor tool. This is a simple tool developed for use within SysMO-DB[http://www.sysmo-db.org]. == Installation Java 1.6 (JRE) is required. gem install simple-spreadsheet-extractor *Note that Windows is no longer supported* (since version 0.7.2.1) == Usage * require 'simple-spreadsheet-extractor' * include the module SysMODB::SpreadsheetExtractor * pass an IO object to the method spreedsheet_to_xml which responds with the XML for the contents of the spreadsheet. Alternatively use spreadsheet_to_csv for CSV. * if something goes wrong with the extraction then a SysMODB::SpreadsheetExtractionException will be thrown e.g. #examples/example.rb - takes path, i.e. ruby example.rb /tmp/spreadsheet.xls require 'rubygems' require 'simple-spreadsheet-extractor' include SysMODB::SpreadsheetExtractor path=ARGV.first f=open path begin puts spreadsheet_to_xml(f) rescue SysMODB::SpreadsheetExtractionException=>e puts "Something went wrong #{e.message}" end Formulas are evaluated placing the result in the XML produced for that cell, however the original formula is included as an attribute. Row and column indexes start at 1, rather than 0, to keep consistent with namings of the cells in Excel. An XSD schema for the XML is available in doc/schema-v1.xsd["tree/master/doc/schema-v1.xsd"] The desired spreadsheet extractor jar can be specified by defining SPREADSHEET_EXTRACTOR_JAR_PATH in a config file (e.g. environment.rb) CSV can be generated in a similar way, and also takes an optional sheet number. If the sheet number is missing then the first sheet is used. Note that the sheet number for the first sheet is 1, and can either be a string or integer. e.g. puts spreadsheet_to_csv(f,"1") == Example XML