README.md in red_amber-0.1.6 vs README.md in red_amber-0.1.7
- old
+ new
@@ -1,27 +1,32 @@
# RedAmber
+[![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
+[![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
+
A simple dataframe library for Ruby (experimental).
-- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
+- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
## Requirements
```ruby
gem 'red-arrow', '>= 8.0.0'
-gem 'red-parquet', '>= 8.0.0' # if you use IO from/to parquet
-gem 'rover-df', '~> 0.3.0' # if you use IO from/to Rover::DataFrame
+
+gem 'red-parquet', '>= 8.0.0' # Optional, if you use IO from/to parquet
+gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
```
## Installation
Install requirements before you install Red Amber.
- Apache Arrow GLib (>= 8.0.0)
-- Apache Parquet GLib (>= 8.0.0)
+- Apache Parquet GLib (>= 8.0.0) # If you use IO from/to parquet
+
See [Apache Arrow install document](https://arrow.apache.org/install/).
Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber.
Add this line to your Gemfile:
@@ -40,98 +45,78 @@
```shell
gem install red_amber
```
-(From v0.1.6)
-
-RedAmber uses TDR mode for `#inspect` and `#to_iruby` by default. If you prefer Table mode, please set the environment variable
-`RED_AMBER_OUTPUT_MODE` to `"table"`. See [TDR section](#TDR) for detail.
-
## `RedAmber::DataFrame`
Represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
```ruby
require 'red_amber' # require 'red-amber' is also OK.
require 'datasets-arrow'
arrow = Datasets::Penguins.new.to_arrow
-penguins = RedAmber::DataFrame.new(arrow)
-penguins.table
+RedAmber::DataFrame.new(arrow)
# =>
-#<Arrow::Table:0x111271098 ptr=0x7f9118b3e0b0>
- species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
- 0 Adelie Torgersen 39.100000 18.700000 181 3750 male 2007
- 1 Adelie Torgersen 39.500000 17.400000 186 3800 female 2007
- 2 Adelie Torgersen 40.300000 18.000000 195 3250 female 2007
- 3 Adelie Torgersen (null) (null) (null) (null) (null) 2007
- 4 Adelie Torgersen 36.700000 19.300000 193 3450 female 2007
- 5 Adelie Torgersen 39.300000 20.600000 190 3650 male 2007
- 6 Adelie Torgersen 38.900000 17.800000 181 3625 female 2007
- 7 Adelie Torgersen 39.200000 19.600000 195 4675 male 2007
- 8 Adelie Torgersen 34.100000 18.100000 193 3475 (null) 2007
- 9 Adelie Torgersen 42.000000 20.200000 190 4250 (null) 2007
-...
-334 Gentoo Biscoe 46.200000 14.100000 217 4375 female 2009
-335 Gentoo Biscoe 55.100000 16.000000 230 5850 male 2009
-336 Gentoo Biscoe 44.500000 15.700000 217 4875 (null) 2009
-337 Gentoo Biscoe 48.800000 16.200000 222 6000 male 2009
-338 Gentoo Biscoe 47.200000 13.700000 214 4925 female 2009
-339 Gentoo Biscoe (null) (null) (null) (null) (null) 2009
-340 Gentoo Biscoe 46.800000 14.300000 215 4850 female 2009
-341 Gentoo Biscoe 50.400000 15.700000 222 5750 male 2009
-342 Gentoo Biscoe 45.200000 14.800000 212 5200 female 2009
-343 Gentoo Biscoe 49.900000 16.100000 213 5400 male 2009
+#<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
+ <string> <string> <double> <double> <uint8> ... <uint16>
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
+ 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
+ 5 Adelie Torgersen 36.7 19.3 193 ... 2007
+ : : : : : : ... :
+342 Gentoo Biscoe 50.4 15.7 222 ... 2009
+343 Gentoo Biscoe 45.2 14.8 212 ... 2009
+344 Gentoo Biscoe 49.9 16.1 213 ... 2009
```
-By default, RedAmber shows self by compact transposed style. This unfamiliar style (TDR) is designed for
-the exploratory data processing. It keeps Vectors as row vectors, shows keys and types at a glance, shows levels
-for the 'factor-like' variables and shows the number of abnormal values like NaN and nil.
-
-```ruby
-penguins
-
-# =>
-RedAmber::DataFrame : 344 x 8 Vectors
-Vectors : 5 numeric, 3 strings
-# key type level data_preview
-1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
-2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
-3 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
-4 :bill_depth_mm double 81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
-5 :flipper_length_mm uint8 56 [181, 186, 195, nil, 193, ... ], 2 nils
-6 :body_mass_g uint16 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
-7 :sex string 3 {"male"=>168, "female"=>165, nil=>11}
-8 :year uint16 3 {2007=>110, 2008=>114, 2009=>120}
-```
-
### DataFrame model
![dataframe model of RedAmber](doc/image/dataframe_model.png)
For example, `DataFrame#pick` accepts keys as an argument and returns a sub DataFrame.
```ruby
df = penguins.pick(:body_mass_g)
+df
+
# =>
-#<RedAmber::DataFrame : 344 x 1 Vector, 0x000000000000fa14>
-Vector : 1 numeric
-# key type level data_preview
-1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
+#<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000015cc0>
+ body_mass_g
+ <uint16>
+ 1 3750
+ 2 3800
+ 3 3250
+ 4 (nil)
+ 5 3450
+ : :
+342 5750
+343 5200
+344 5400
```
`DataFrame#assign` creates new variables (column in the table).
```ruby
df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
+
# =>
-#<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000fa28>
-Vectors : 2 numeric
-# key type level data_preview
-1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
-2 :body_mass_kg double 95 [3.75, 3.8, 3.25, nil, 3.45, ... ], 2 nils
+#<RedAmber::DataFrame : 344 x 2 Vectors, 0x00000000000212f0>
+ body_mass_g body_mass_kg
+ <uint16> <double>
+ 1 3750 3.8
+ 2 3800 3.8
+ 3 3250 3.3
+ 4 (nil) (nil)
+ 5 3450 3.5
+ : : :
+342 5750 5.8
+343 5200 5.2
+344 5400 5.4
```
DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
This is an exaple to eliminate observations (row in the table) containing nil.
@@ -176,22 +161,12 @@
Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
See [Vector.md](doc/Vector.md) for details.
-## TDR
+## Jupyter notebook
-I named the data frame representation style in the model above as TDR (Transposed DataFrame Representation).
-
-This library can be used with both TDR mode and usual Table mode.
-If you set the environment variable `RED_AMBER_OUTPUT_MODE` to `"table"`, output style by `inspect` and `to_iruby` is the Table mode. Other value including nil will output TDR style.
-
-You can switch the mode in Ruby like this.
-```ruby
-ENV['RED_AMBER_OUTPUT_STYLE'] = 'table' # => Table mode
-```
-
-For more detail information about TDR, see [TDR.md](doc/tdr.md).
+[47 Examples of Red Amber](doc/47_examples_of_red_amber.ipynb)
## Development
```shell
git clone https://github.com/heronshoes/red_amber.git