=ORCFILE
Ruby Gem for reading and writing Apache Optimized Row Columnar (ORC) files.
This gem can also be paired using the factory_girl gem.
==Installation
Must use jruby.
Add this line to your application's Gemfile:
gem 'orc_file'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install orc_file
==Usage
===+OrcFileWriter+
To write a file, you will need to initialize the OrcFileWriter class.
This object needs a table schema, your dataset, the path to store the file, and an optional configuration hash.
OrcFileWriter.new(table_schema, data_set, path, *options={})
====_table_schema_
The table_schema must be a hash containing the column name and datatype as the key-value pair.
Valid datatypes are:
- integer
- decimal
- float
- date
- datetime
- time
- string
table_schema = {:id => :integer, :amount => :decimal, :rate => :float}
====_data_set_
The data_set must contain a hash with the column name and data value as the key-value pair.
For one row in the dataset:
data_set = {:id => 1, :amount => 1000.01, :rate => 0.0005}
For multiple rows in the dataset:
dataset = [{:id => 1, :amount => 1000.01, :rate => 0.0005},
{:id => 2, :amount => 2500.5, :rate => 0.1},
{:id => 3, :amount => 10.12, :rate => 10.0134}]
====_path_
The path should be the full file path or relative to your working directory. You must also specify the file name.
path = '/temp/orc_file.orc'
====_options_
Options is an optional hash parameter containing 5 configurable settings for writing an ORC file.
`:stripe_size` defines the size of the stripe, defaulted as 67,108,864 bytes
`:row_index_stride` defines the number of rows between row index entries, defaulted as 10,000
`:buffer_size` defines the orc buffer size, defaulted as 262,144 bytes
`:compression` defines the compression codec (NONE,ZLIB,SNAPPY,LZO), defaulted as ZLIB.
Define the options parameter has a hash
options = {:stripe_size => 70000000, :compression => 'SNAPPY'}
===+write_to_orc+
Once you have the OrcFileWriter object initialized you must call write_to_orc to write out the file
OrcFileWriter.new(table_schema, data_set, path, options).write_to_orc
===+OrcFileReader+
To read a file, you will need to initialize the OrcFileReader class.
This object needs a table schema, and the path of the file to be read.
OrcFileReader.new(table_schema, path)
====_table_schema_
The table_schema must be a hash containing the column name and datatype as the key-value pair.
Valid datatypes are:
- integer
- decimal
- float
- date
- datetime
- time
- string
table_schema = {:id => :integer, :amount => :decimal, :rate => :float}
====_path_
The path should be the full file path or relative to your working directory. You must also specify the file name.
path = '/temp/orc_file.orc'