=ORCFILE Ruby Gem for reading and writing Apache Optimized Row Columnar (ORC) files. This gem can also be paired using the factory_girl gem. ==Installation Must use jruby. Add this line to your application's Gemfile: gem 'orc_file' And then execute: $ bundle install Or install it yourself as: $ gem install orc_file ==Usage ===+OrcFileWriter+ To write a file, you will need to initialize the OrcFileWriter class. This object needs a table schema, your dataset, the path to store the file, and an optional configuration hash. OrcFileWriter.new(table_schema, data_set, path, *options={}) ====_table_schema_ The table_schema must be a hash containing the column name and datatype as the key-value pair. Valid datatypes are: - integer - decimal - float - date - datetime - time - string table_schema = {:id => :integer, :amount => :decimal, :rate => :float} ====_data_set_ The data_set must contain a hash with the column name and data value as the key-value pair. For one row in the dataset: data_set = {:id => 1, :amount => 1000.01, :rate => 0.0005} For multiple rows in the dataset: dataset = [{:id => 1, :amount => 1000.01, :rate => 0.0005}, {:id => 2, :amount => 2500.5, :rate => 0.1}, {:id => 3, :amount => 10.12, :rate => 10.0134}] ====_path_ The path should be the full file path or relative to your working directory. You must also specify the file name. path = '/temp/orc_file.orc' ====_options_ Options is an optional hash parameter containing 5 configurable settings for writing an ORC file. `:stripe_size` defines the size of the stripe, defaulted as 67,108,864 bytes
`:row_index_stride` defines the number of rows between row index entries, defaulted as 10,000
`:buffer_size` defines the orc buffer size, defaulted as 262,144 bytes
`:compression` defines the compression codec (NONE,ZLIB,SNAPPY,LZO), defaulted as ZLIB.
Define the options parameter has a hash options = {:stripe_size => 70000000, :compression => 'SNAPPY'} ===+write_to_orc+ Once you have the OrcFileWriter object initialized you must call write_to_orc to write out the file OrcFileWriter.new(table_schema, data_set, path, options).write_to_orc ===+OrcFileReader+ To read a file, you will need to initialize the OrcFileReader class. This object needs a table schema, and the path of the file to be read. OrcFileReader.new(table_schema, path) ====_table_schema_ The table_schema must be a hash containing the column name and datatype as the key-value pair. Valid datatypes are: - integer - decimal - float - date - datetime - time - string table_schema = {:id => :integer, :amount => :decimal, :rate => :float} ====_path_ The path should be the full file path or relative to your working directory. You must also specify the file name. path = '/temp/orc_file.orc'