README.md in red_amber-0.1.4 vs README.md in red_amber-0.1.5

- old
+ new

@@ -1,22 +1,31 @@ # RedAmber A simple dataframe library for Ruby (experimental) - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) -- Simple API similar to [Rover-df](https://github.com/ankane/rover) +- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover) ## Requirements ```ruby -gem 'red-arrow', '>= 7.0.0' -gem 'red-parquet', '>= 7.0.0' # if you use IO from/to parquet +gem 'red-arrow', '>= 8.0.0' +gem 'red-parquet', '>= 8.0.0' # if you use IO from/to parquet gem 'rover-df', '~> 0.3.0' # if you use IO from/to Rover::DataFrame ``` ## Installation +Install requirements before you install Red Amber. + +- Apache Arrow GLib (>= 8.0.0) +- Apache Parquet GLib (>= 8.0.0) + + See [Apache Arrow install document](https://arrow.apache.org/install/). + + Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber. + Add this line to your Gemfile: ```ruby gem 'red_amber' ``` @@ -39,12 +48,13 @@ ```ruby require 'red_amber' require 'datasets-arrow' -penguins = Datasets::Penguins.new.to_arrow -puts RedAmber::DataFrame.new(penguins).tdr +arrow = Datasets::Penguins.new.to_arrow +penguins = RedAmber::DataFrame.new(arrow) +penguins.tdr # => RedAmber::DataFrame : 344 x 8 Vectors Vectors : 5 numeric, 3 strings # key type level data_preview 1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124} @@ -69,37 +79,61 @@ Vector : 1 numeric # key type level data_preview 1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils ``` -`DataFrame#assign` can accept a block and create new variables. +`DataFrame#assign` creates new variables (column in the table). ```ruby -df.assign do - {:body_mass_kg => penguins[:body_mass_g] / 1000.0} -end +df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0) # => #<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000fa28> Vectors : 2 numeric # key type level data_preview 1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils 2 :body_mass_kg double 95 [3.75, 3.8, 3.25, nil, 3.45, ... ], 2 nils ``` -Other DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove` and `rename` also accept a block. +DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block. +This is an exaple to eliminate observations (row in the table) containing nil. + +```ruby +# remove all observation contains nil +nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) } +nil_removed.tdr +# => +RedAmber::DataFrame : 342 x 8 Vectors +Vectors : 5 numeric, 3 strings +# key type level data_preview +1 :species string 3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123} +2 :island string 3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124} +3 :bill_length_mm double 164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ] +4 :bill_depth_mm double 80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ] +5 :flipper_length_mm int64 55 [181, 186, 195, 193, 190, ... ] +6 :body_mass_g int64 94 [3750, 3800, 3250, 3450, 3650, ... ] +7 :sex string 3 {"male"=>168, "female"=>165, ""=>9} +8 :year int64 3 {2007=>109, 2008=>114, 2009=>119} +``` + +For this frequently needed task, we can do it much simpler. + +```ruby +penguins.remove_nil # => same result as above +``` + See [DataFrame.md](doc/DataFrame.md) for details. ## `RedAmber::Vector` Class `RedAmber::Vector` represents a series of data in the DataFrame. ```ruby -penguins[:species] +penguins[:bill_length_mm] # => -#<RedAmber::Vector(:string, size=344):0x000000000000f8e8> -["Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", ... ] +#<RedAmber::Vector(:double, size=344):0x000000000000f8fc> +[39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ] ``` Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html). See [Vector.md](doc/Vector.md) for details.