README.md in red_amber-0.1.8 vs README.md in red_amber-0.2.0
- old
+ new
@@ -1,31 +1,37 @@
# RedAmber
[![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
[![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
-A simple dataframe library for Ruby (experimental).
+A simple dataframe library for Ruby.
- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
## Requirements
+Supported Ruby version is >= 2.7.
+
+Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
+I recommend Ruby 3 for performance.
+
```ruby
-gem 'red-arrow', '>= 8.0.0'
+# Libraries required
+gem 'red-arrow', '>= 9.0.0'
-gem 'red-parquet', '>= 8.0.0' # Optional, if you use IO from/to parquet
+gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet
gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
```
## Installation
Install requirements before you install Red Amber.
-- Apache Arrow GLib (>= 8.0.0)
+- Apache Arrow GLib (>= 9.0.0)
-- Apache Parquet GLib (>= 8.0.0) # If you use IO from/to parquet
+- Apache Parquet GLib (>= 9.0.0) # If you use IO from/to parquet
See [Apache Arrow install document](https://arrow.apache.org/install/).
Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber.
@@ -120,26 +126,26 @@
# Same as df.drop(:species, :island)
df = df.drop(true, true, false)
# =>
#<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
- body_mass_g
- <uint16>
- 1 3750
- 2 3800
- 3 3250
- 4 (nil)
- 5 3450
- : :
-342 5750
-343 5200
+ body_mass_g
+ <uint16>
+ 1 3750
+ 2 3800
+ 3 3250
+ 4 (nil)
+ 5 3450
+ : :
+342 5750
+343 5200
344 5400
```
Arrow data is immutable, so these methods always return an new object.
-`DataFrame#assign` creates new variables (column in the table).
+`DataFrame#assign` creates new columns or update existing columns.
![assign method image](doc/image/dataframe/assign.png)
```ruby
# New column is created because ':body_mass_kg' is a new key.
@@ -206,11 +212,11 @@
244 Gentoo Biscoe 49.9 16.1 213 ... 2009
```
DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
-This example is usage of block to update numeric columns.
+This example is usage of block to update a column.
```ruby
df = RedAmber::DataFrame.new(
integer: [0, 1, 2, 3, nil],
float: [0.0, 1.1, 2.2, Float::NAN, nil],
@@ -227,34 +233,32 @@
3 2 2.2 C true
4 3 NaN D false
5 (nil) (nil) (nil) (nil)
df.assign do
- vectors.each_with_object({}) do |v, h|
- h[v.key] = -v if v.numeric?
- end
+ vectors.select(&:float?).map { |v| [v.key, -v] }
+ # => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
end
# =>
-#<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000009a1b4>
- integer float string boolean
- <uint8> <double> <string> <boolean>
-1 0 -0.0 A true
-2 255 -1.1 B false
-3 254 -2.2 C true
-4 253 NaN D false
-5 (nil) (nil) (nil) (nil)
+#<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
+ index float string
+ <uint8> <double> <string>
+1 0 -0.0 A
+2 1 -1.1 B
+3 2 -2.2 C
+4 3 NaN D
+5 (nil) (nil) (nil)
```
-Negate (-@) method of unsigned integer Vector returns complement.
+Next example is to eliminate rows containing nil.
-Next example is to eliminate observations (row in the table) containing nil.
-
```ruby
# remove all observations containing nil
nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
nil_removed.tdr
+
# =>
RedAmber::DataFrame : 342 x 8 Vectors
Vectors : 5 numeric, 3 strings
# key type level data_preview
1 :species string 3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
@@ -271,10 +275,25 @@
```ruby
penguins.remove_nil # => same result as above
```
+`DataFrame#summary` shows summary statistics in a DataFrame.
+
+```ruby
+puts penguins.summary.to_s(width: 82)
+
+# =>
+ variables count mean std min 25% median 75% max
+ <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
+1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
+2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
+3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
+4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
+5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
+```
+
`DataFrame#group` method can be used for the grouping tasks.
```ruby
starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
starwars
@@ -309,11 +328,11 @@
7 Twi'lek 2 179.0 55.0
8 Mirialan 2 168.0 53.1
9 Kaminoan 2 221.0 88.0
```
-See [DataFrame.md](doc/DataFrame.md) for details.
+See [DataFrame.md](doc/DataFrame.md) for other examples and details.
## `RedAmber::Vector`
Class `RedAmber::Vector` represents a series of data in the DataFrame.
@@ -353,19 +372,25 @@
See [Vector.md](doc/Vector.md) for details.
## Jupyter notebook
-[53 Examples of Red Amber](doc/examples_of_red_amber.ipynb)
+[61 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
## Development
```shell
git clone https://github.com/heronshoes/red_amber.git
cd red_amber
bundle install
bundle exec rake test
```
+
+I will appreciate if you could help to improve this project. Here are a few ways you can help:
+
+- [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
+- Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
+- Write, clarify, or fix documentation
## License
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).