README.md in red_amber-0.1.4 vs README.md in red_amber-0.1.5
- old
+ new
@@ -1,22 +1,31 @@
# RedAmber
A simple dataframe library for Ruby (experimental)
- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
-- Simple API similar to [Rover-df](https://github.com/ankane/rover)
+- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
## Requirements
```ruby
-gem 'red-arrow', '>= 7.0.0'
-gem 'red-parquet', '>= 7.0.0' # if you use IO from/to parquet
+gem 'red-arrow', '>= 8.0.0'
+gem 'red-parquet', '>= 8.0.0' # if you use IO from/to parquet
gem 'rover-df', '~> 0.3.0' # if you use IO from/to Rover::DataFrame
```
## Installation
+Install requirements before you install Red Amber.
+
+- Apache Arrow GLib (>= 8.0.0)
+- Apache Parquet GLib (>= 8.0.0)
+
+ See [Apache Arrow install document](https://arrow.apache.org/install/).
+
+ Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber.
+
Add this line to your Gemfile:
```ruby
gem 'red_amber'
```
@@ -39,12 +48,13 @@
```ruby
require 'red_amber'
require 'datasets-arrow'
-penguins = Datasets::Penguins.new.to_arrow
-puts RedAmber::DataFrame.new(penguins).tdr
+arrow = Datasets::Penguins.new.to_arrow
+penguins = RedAmber::DataFrame.new(arrow)
+penguins.tdr
# =>
RedAmber::DataFrame : 344 x 8 Vectors
Vectors : 5 numeric, 3 strings
# key type level data_preview
1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
@@ -69,37 +79,61 @@
Vector : 1 numeric
# key type level data_preview
1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
```
-`DataFrame#assign` can accept a block and create new variables.
+`DataFrame#assign` creates new variables (column in the table).
```ruby
-df.assign do
- {:body_mass_kg => penguins[:body_mass_g] / 1000.0}
-end
+df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
# =>
#<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000fa28>
Vectors : 2 numeric
# key type level data_preview
1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
2 :body_mass_kg double 95 [3.75, 3.8, 3.25, nil, 3.45, ... ], 2 nils
```
-Other DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove` and `rename` also accept a block.
+DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
+This is an exaple to eliminate observations (row in the table) containing nil.
+
+```ruby
+# remove all observation contains nil
+nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
+nil_removed.tdr
+# =>
+RedAmber::DataFrame : 342 x 8 Vectors
+Vectors : 5 numeric, 3 strings
+# key type level data_preview
+1 :species string 3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
+2 :island string 3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
+3 :bill_length_mm double 164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
+4 :bill_depth_mm double 80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
+5 :flipper_length_mm int64 55 [181, 186, 195, 193, 190, ... ]
+6 :body_mass_g int64 94 [3750, 3800, 3250, 3450, 3650, ... ]
+7 :sex string 3 {"male"=>168, "female"=>165, ""=>9}
+8 :year int64 3 {2007=>109, 2008=>114, 2009=>119}
+```
+
+For this frequently needed task, we can do it much simpler.
+
+```ruby
+penguins.remove_nil # => same result as above
+```
+
See [DataFrame.md](doc/DataFrame.md) for details.
## `RedAmber::Vector`
Class `RedAmber::Vector` represents a series of data in the DataFrame.
```ruby
-penguins[:species]
+penguins[:bill_length_mm]
# =>
-#<RedAmber::Vector(:string, size=344):0x000000000000f8e8>
-["Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", ... ]
+#<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
+[39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
```
Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
See [Vector.md](doc/Vector.md) for details.