README.md in red_amber-0.1.8 vs README.md in red_amber-0.2.0

- old
+ new

@@ -1,31 +1,37 @@ # RedAmber [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber) [![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml) -A simple dataframe library for Ruby (experimental). +A simple dataframe library for Ruby. - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en) - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover) ## Requirements +Supported Ruby version is >= 2.7. + +Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 . +I recommend Ruby 3 for performance. + ```ruby -gem 'red-arrow', '>= 8.0.0' +# Libraries required +gem 'red-arrow', '>= 9.0.0' -gem 'red-parquet', '>= 8.0.0' # Optional, if you use IO from/to parquet +gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame ``` ## Installation Install requirements before you install Red Amber. -- Apache Arrow GLib (>= 8.0.0) +- Apache Arrow GLib (>= 9.0.0) -- Apache Parquet GLib (>= 8.0.0) # If you use IO from/to parquet +- Apache Parquet GLib (>= 9.0.0) # If you use IO from/to parquet See [Apache Arrow install document](https://arrow.apache.org/install/). Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber. @@ -120,26 +126,26 @@ # Same as df.drop(:species, :island) df = df.drop(true, true, false) # => #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760> - body_mass_g - <uint16> - 1 3750 - 2 3800 - 3 3250 - 4 (nil) - 5 3450 - : : -342 5750 -343 5200 + body_mass_g + <uint16> + 1 3750 + 2 3800 + 3 3250 + 4 (nil) + 5 3450 + : : +342 5750 +343 5200 344 5400 ``` Arrow data is immutable, so these methods always return an new object. -`DataFrame#assign` creates new variables (column in the table). +`DataFrame#assign` creates new columns or update existing columns. ![assign method image](doc/image/dataframe/assign.png) ```ruby # New column is created because ':body_mass_kg' is a new key. @@ -206,11 +212,11 @@ 244 Gentoo Biscoe 49.9 16.1 213 ... 2009 ``` DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block. -This example is usage of block to update numeric columns. +This example is usage of block to update a column. ```ruby df = RedAmber::DataFrame.new( integer: [0, 1, 2, 3, nil], float: [0.0, 1.1, 2.2, Float::NAN, nil], @@ -227,34 +233,32 @@ 3 2 2.2 C true 4 3 NaN D false 5 (nil) (nil) (nil) (nil) df.assign do - vectors.each_with_object({}) do |v, h| - h[v.key] = -v if v.numeric? - end + vectors.select(&:float?).map { |v| [v.key, -v] } + # => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]] end # => -#<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000009a1b4> - integer float string boolean - <uint8> <double> <string> <boolean> -1 0 -0.0 A true -2 255 -1.1 B false -3 254 -2.2 C true -4 253 NaN D false -5 (nil) (nil) (nil) (nil) +#<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c> + index float string + <uint8> <double> <string> +1 0 -0.0 A +2 1 -1.1 B +3 2 -2.2 C +4 3 NaN D +5 (nil) (nil) (nil) ``` -Negate (-@) method of unsigned integer Vector returns complement. +Next example is to eliminate rows containing nil. -Next example is to eliminate observations (row in the table) containing nil. - ```ruby # remove all observations containing nil nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) } nil_removed.tdr + # => RedAmber::DataFrame : 342 x 8 Vectors Vectors : 5 numeric, 3 strings # key type level data_preview 1 :species string 3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123} @@ -271,10 +275,25 @@ ```ruby penguins.remove_nil # => same result as above ``` +`DataFrame#summary` shows summary statistics in a DataFrame. + +```ruby +puts penguins.summary.to_s(width: 82) + +# => + variables count mean std min 25% median 75% max + <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double> +1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6 +2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5 +3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0 +4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0 +5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0 +``` + `DataFrame#group` method can be used for the grouping tasks. ```ruby starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv")) starwars @@ -309,11 +328,11 @@ 7 Twi'lek 2 179.0 55.0 8 Mirialan 2 168.0 53.1 9 Kaminoan 2 221.0 88.0 ``` -See [DataFrame.md](doc/DataFrame.md) for details. +See [DataFrame.md](doc/DataFrame.md) for other examples and details. ## `RedAmber::Vector` Class `RedAmber::Vector` represents a series of data in the DataFrame. @@ -353,19 +372,25 @@ See [Vector.md](doc/Vector.md) for details. ## Jupyter notebook -[53 Examples of Red Amber](doc/examples_of_red_amber.ipynb) +[61 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook. ## Development ```shell git clone https://github.com/heronshoes/red_amber.git cd red_amber bundle install bundle exec rake test ``` + +I will appreciate if you could help to improve this project. Here are a few ways you can help: + +- [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues) +- Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls) +- Write, clarify, or fix documentation ## License The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).