# RedAmber A simple dataframe library for Ruby (experimental) - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) - Simple API similar to [Rover-df](https://github.com/ankane/rover) ## Requirements ```ruby gem 'red-arrow', '>= 7.0.0' gem 'red-parquet', '>= 7.0.0' # if you use IO from/to parquet gem 'rover-df', '~> 0.3.0' # if you use IO from/to Rover::DataFrame ``` ## Installation Add this line to your Gemfile: ```ruby gem 'red_amber' ``` And then execute: ```shell bundle install ``` Or install it yourself as: ```shell gem install red_amber ``` ## `RedAmber::DataFrame` Represents a set of data in 2D-shape. ```ruby require 'red_amber' require 'datasets-arrow' penguins = Datasets::Penguins.new.to_arrow puts RedAmber::DataFrame.new(penguins).tdr # => RedAmber::DataFrame : 344 x 8 Vectors Vectors : 5 numeric, 3 strings # key type level data_preview 1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124} 2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124} 3 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils 4 :bill_depth_mm double 81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils 5 :flipper_length_mm uint8 56 [181, 186, 195, nil, 193, ... ], 2 nils 6 :body_mass_g uint16 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils 7 :sex string 3 {"male"=>168, "female"=>165, nil=>11} 8 :year uint16 3 {2007=>110, 2008=>114, 2009=>120} ``` ### DataFrame model ![dataframe model of RedAmber](doc/image/dataframe_model.png) For example, `DataFrame#pick` accepts keys as an argument and returns a sub DataFrame. ```ruby df = penguins.pick(:body_mass_g) # => # Vector : 1 numeric # key type level data_preview 1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils ``` `DataFrame#assign` can accept a block and create new variables. ```ruby df.assign do {:body_mass_kg => penguins[:body_mass_g] / 1000.0} end # => # Vectors : 2 numeric # key type level data_preview 1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils 2 :body_mass_kg double 95 [3.75, 3.8, 3.25, nil, 3.45, ... ], 2 nils ``` Other DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove` and `rename` also accept a block. See [DataFrame.md](doc/DataFrame.md) for details. ## `RedAmber::Vector` Class `RedAmber::Vector` represents a series of data in the DataFrame. ```ruby penguins[:species] # => # ["Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", ... ] ``` Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html). See [Vector.md](doc/Vector.md) for details. ## TDR concept I named the data frame representation style in the model above as TDR (Transposed DataFrame Representation). See [TDR.md](doc/tdr.md) for details. ## Development ```shell git clone https://github.com/heronshoes/red_amber.git cd red_amber bundle install bundle exec rake test ``` ## License The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).