README.md in red_amber-0.1.2 vs README.md in red_amber-0.1.3

- old
+ new

@@ -43,32 +43,46 @@ - `RedAmber::DataFrame.new(Arrow::Table.new(x: [1, 2, 3]))` - [x] `new` from a Rover::DataFrame - `RedAmber::DataFrame.new(Rover::DataFrame.new(x: [1, 2, 3]))` -- [ ] `load` (class method) +- [x] `load` (class method) - [x] from a [`.arrow`, `.arrows`, `.csv`, `.csv.gz`, `.tsv`] file - `RedAmber::DataFrame.load("test/entity/with_header.csv")` - [x] from a string buffer - [x] from a URI - `RedAmber::DataFrame.load(URI("https://github.com/heronshoes/red_amber/blob/master/test/entity/with_header.csv"))` - - [ ] from a parquet file + - [x] from a Parquet file -- [ ] `save` (instance method) + `red-parquet` gem is required. + ```ruby + require 'parquet' + dataframe = RedAmber::DataFrame.load("file.parquet") + ``` + +- [x] `save` (instance method) + - [x] to a [`.arrow`, `.arrows`, `.csv`, `.csv.gz`, `.tsv`] file - [x] to a string buffer - [x] to a URI - - [ ] to a parquet file + - [x] to a Parquet file + `red-parquet` gem is required. + + ```ruby + require 'parquet' + dataframe.save("file.parquet") + ``` + ### Properties - [x] `table` Reader of Arrow::Table object inside. @@ -127,45 +141,54 @@ Returns a `Rover::DataFrame`. - [x] `inspect(tally_level: 5, max_element: 5)` - Shows some information about self. + Shows some information about self in a transposed style. ```ruby -hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]} -RedAmber::DataFrame.new(hash) +require 'red_amber' +require 'datasets-arrow' + +penguins = Datasets::Penguins.new.to_arrow +RedAmber::DataFrame.new(penguins) # => -RedAmber::DataFrame : 3 observations(rows) of 3 variables(columns) -Variables : 2 numeric, 1 string -# key type level data_preview -1 :a uint8 3 [1, 2, 3] -2 :b string 3 [A, B, C] -3 :c double 3 [1.0, 2.0, 3.0] +RedAmber::DataFrame : 344 x 8 Vectors +Vectors : 5 numeric, 3 strings +# key type level data_preview +1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124} +2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124} +3 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils +4 :bill_depth_mm double 81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils +5 :flipper_length_mm uint8 56 [181, 186, 195, nil, 193, ... ], 2 nils +6 :body_mass_g uint16 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils +7 :sex string 3 {"male"=>168, "female"=>165, nil=>11} +8 :year uint16 3 {2007=>110, 2008=>114, 2009=>120} ``` - tally_level: max level to use tally mode - max_element: max num of element to show values in each row ### Selecting - [x] Select columns by `[]` as `[key]`, `[keys]`, `[keys[index]]` - Key in a Symbol: `df[:symbol]` - Key in a String: `df["string"]` - - Keys in an Array: `df[:symbol1`, `"string"`, `:symbol2` + - Keys in an Array: `df[:symbol1, "string", :symbol2]` - Keys in indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]` - Keys in a Range: A end-less Range can be used to represent keys. + ```ruby hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]} df = RedAmber::DataFrame.new(hash) df[:b..:c, "a"] # => -RedAmber::DataFrame : 3 observations(rows) of 3 variables(columns) -Variables : 2 numeric, 1 string +RedAmber::DataFrame : 3 x 3 Vectors +Vectors : 2 numeric, 1 string # key type level data_preview -1 :b string 3 [A, B, C] +1 :b string 3 ["A", "B", "C"] 2 :c double 3 [1.0, 2.0, 3.0] 3 :a uint8 3 [1, 2, 3] ``` - [x] Select rows by `[]` as `[index]`, `[range]`, `[array]` @@ -256,93 +279,130 @@ - [ ] `each_chunk` - [x] `tally` -- [ ] `n_nulls` +- [x] `n_nils`, `n_nans` + - `n_nulls` is an alias of `n_nils` + +- [x] `inspect(limit: 80)` + + - `limit` sets size limit to display long array. + ### Functions -#### Unary aggregations: vector.func => Scalar +#### Unary aggregations: vector.func => scalar -| Method |Boolean|Numeric|String|Remarks| -| ------------ | --- | --- | --- | ----- | -|[x] `all` | [x] | | | | -|[x] `any` | [x] | | | | -|[x] `approximate_median`| | [x] | | | -|[x] `count` | [x] | [x] | [x] | | -|[x] `count_distinct`| [x] | [x] | [x] | | -|[x] `count_uniq` | [x] | [x] | [x] |an alias of `count_distinct`| -|[ ] `index` | | | | | -|[x] `max` | [x] | [x] | [x] | | -|[x] `mean` | [x] | [x] | | | -|[x] `min` | [x] | [x] | [x] | | -|[ ] `min_max` | | | | | -|[ ] `mode` | | | | | -|[x] `product` | [x] | [x] | | | -|[ ] `quantile`| | | | | -|[x] `stddev` | | [x] | | | -|[x] `sum` | [x] | [x] | | | -|[ ] `tdigest` | | | | | -|[x] `variance`| | [x] | | | +| Method |Boolean|Numeric|String|Options|Remarks| +| ----------- | --- | --- | --- | --- | --- | +| ✓ `all` | ✓ | | | ✓ ScalarAggregate| | +| ✓ `any` | ✓ | | | ✓ ScalarAggregate| | +| ✓ `approximate_median`| |✓| | ✓ ScalarAggregate| alias `median`| +| ✓ `count` | ✓ | ✓ | ✓ | ✓ Count | | +| ✓ `count_distinct`| ✓ | ✓ | ✓ | ✓ Count |alias `count_uniq`| +|[ ]`index` | [ ] | [ ] | [ ] |[ ] Index | | +| ✓ `max` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| | +| ✓ `mean` | ✓ | ✓ | | ✓ ScalarAggregate| | +| ✓ `min` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| | +|[ ]`min_max` | [ ] | [ ] | [ ] |[ ] ScalarAggregate| | +|[ ]`mode` | | [ ] | |[ ] Mode | | +| ✓ `product` | ✓ | ✓ | | ✓ ScalarAggregate| | +|[ ]`quantile`| | [ ] | |[ ] Quantile| | +|[ ]`stddev` | | ✓ | |[ ] Variance| | +| ✓ `sum` | ✓ | ✓ | | ✓ ScalarAggregate| | +|[ ]`tdigest` | | [ ] | |[ ] TDigest | | +|[ ]`variance`| | ✓ | |[ ] Variance| | -#### Unary element-wise: vector.func => Vector -| Method |Boolean|Numeric|String|Remarks| -| ------------ | --- | --- | --- | ----- | -|[x] `-@` | | [x] | |as `-vector`| -|[x] `negate` | | [x] | |`-@` | -|[x] `abs` | | [x] | | | -|[ ] `acos` | | [ ] | | | -|[ ] `asin` | | [ ] | | | -|[x] `atan` | | [x] | | | -|[ ] `ceil` | | [x] | | | -|[x] `cos` | | [x] | | | -|[ ] `floor` | | [x] | | | -|[ ] `ln` | | [ ] | | | -|[ ] `log10` | | [ ] | | | -|[ ] `log1p` | | [ ] | | | -|[ ] `log2` | | [ ] | | | -|[x] `sign` | | [x] | | | -|[x] `sin` | | [x] | | | -|[x] `tan` | | [x] | | | -|[ ] `trunc` | | [x] | | | +Options can be used as follows. +See the [document of C++ function](https://arrow.apache.org/docs/cpp/compute.html) for detail. -#### Binary element-wise: vector.func(vector) => Vector +```ruby +double = RedAmber::Vector.new([1, 0/0.0, -1/0.0, 1/0.0, nil, ""]) +#=> +#<RedAmber::Vector(:double, size=6):0x000000000000f910> +[1.0, NaN, -Infinity, Infinity, nil, 0.0] -| Method |Boolean|Numeric|String|Remarks| -| ------------------ | --- | --- | --- | ----- | -|[x] `add` | | [x] | | `+` | -|[x] `atan2` | | [x] | | | -|[x] `and` | [x] | | | | -|[x] `and_kleene` | [x] | | | | -|[x] `and_not` | [x] | | | | -|[x] `and_not_kleene`| [x] | | | | -|[x] `bit_wise_and` | |([x])| |`&`, integer only| -|[ ] `bit_wise_not` | |([x])| |`!`, integer only| -|[x] `bit_wise_or` | |([x])| |`|`, integer only| -|[x] `bit_wise_xor` | |([x])| |`^`, integer only| -|[x] `divide` | | [x] | | `/` | -|[x] `equal` | [x] | [x] | [x] |`==`, alias `eq`| -|[x] `greater` | [x] | [x] | [x] |`>`, alias `gt`| -|[x] `greater_equal` | [x] | [x] | [x] |`>=`, alias `ge`| -|[x] `less` | [x] | [x] | [x] |`<`, alias `lt`| -|[x] `less_equal` | [x] | [x] | [x] |`<=`, alias `le`| -|[ ] `logb` | | [ ] | | | -|[ ] `mod` | | [ ] | | | -|[x] `multiply` | | [x] | | `*` | -|[x] `not_equal` | [x] | [x] | [x] |`!=`, alias `ne`| -|[x] `or` | [x] | | | | -|[x] `or_kleene` | [x] | | | | -|[x] `power` | | [x] | | `**` | -|[x] `subtract` | | [x] | | `-` | -|[x] `shift_left` | |([x])| |`<<`, integer only| -|[x] `shift_right` | |([x])| |`>>`, integer only| -|[x] `xor` | [x] | | | | +double.count #=> 5 +double.count(opts: {mode: :only_valid}) #=> 5, default +double.count(opts: {mode: :only_null}) #=> 1 +double.count(opts: {mode: :all}) #=> 6 +boolean = RedAmber::Vector.new([true, true, nil]) +#=> +#<RedAmber::Vector(:boolean, size=3):0x000000000000f924> +[true, true, nil] + +boolean.all #=> true +boolean.all(opts: {skip_nulls: true}) #=> true +boolean.all(opts: {skip_nulls: false}) #=> false +``` + +#### Unary element-wise: vector.func => vector + +| Method |Boolean|Numeric|String|Options|Remarks| +| ------------ | --- | --- | --- | --- | ----- | +| ✓ `-@` | | ✓ | | |as `-vector`| +| ✓ `negate` | | ✓ | | |`-@` | +| ✓ `abs` | | ✓ | | | | +|[ ]`acos` | | [ ] | | | | +|[ ]`asin` | | [ ] | | | | +| ✓ `atan` | | ✓ | | | | +| ✓ `bit_wise_not`| | (✓) | | |integer only| +|[ ]`ceil` | | ✓ | | | | +| ✓ `cos` | | ✓ | | | | +|[ ]`floor` | | ✓ | | | | +| ✓ `invert` | ✓ | | | |`!`, alias `not`| +|[ ]`ln` | | [ ] | | | | +|[ ]`log10` | | [ ] | | | | +|[ ]`log1p` | | [ ] | | | | +|[ ]`log2` | | [ ] | | | | +|[ ]`round` | | [ ] | |[ ] Round| | +|[ ]`round_to_multiple`| | [ ] | |[ ] RoundToMultiple| | +| ✓ `sign` | | ✓ | | | | +| ✓ `sin` | | ✓ | | | | +| ✓ `tan` | | ✓ | | | | +|[ ]`trunc` | | ✓ | | | | + +#### Binary element-wise: vector.func(vector) => vector + +| Method |Boolean|Numeric|String|Options|Remarks| +| ----------------- | --- | --- | --- | --- | ----- | +| ✓ `add` | | ✓ | | | `+` | +| ✓ `atan2` | | ✓ | | | | +| ✓ `and_kleene` | ✓ | | | | `&` | +| ✓ `and_org ` | ✓ | | | |`and` in Red Arrow| +| ✓ `and_not` | ✓ | | | | | +| ✓ `and_not_kleene`| ✓ | | | | | +| ✓ `bit_wise_and` | | (✓) | | |integer only| +| ✓ `bit_wise_or` | | (✓) | | |integer only| +| ✓ `bit_wise_xor` | | (✓) | | |integer only| +| ✓ `divide` | | ✓ | | | `/` | +| ✓ `equal` | ✓ | ✓ | ✓ | |`==`, alias `eq`| +| ✓ `greater` | ✓ | ✓ | ✓ | |`>`, alias `gt`| +| ✓ `greater_equal` | ✓ | ✓ | ✓ | |`>=`, alias `ge`| +| ✓ `is_finite` | | ✓ | | | | +| ✓ `is_inf` | | ✓ | | | | +| ✓ `is_na` | ✓ | ✓ | ✓ | | | +| ✓ `is_nan` | | ✓ | | | | +|[ ]`is_nil` | ✓ | ✓ | ✓ |[ ] Null|alias `is_null`| +| ✓ `is_valid` | ✓ | ✓ | ✓ | | | +| ✓ `less` | ✓ | ✓ | ✓ | |`<`, alias `lt`| +| ✓ `less_equal` | ✓ | ✓ | ✓ | |`<=`, alias `le`| +|[ ]`logb` | | [ ] | | | | +|[ ]`mod` | | [ ] | | | `%` | +| ✓ `multiply` | | ✓ | | | `*` | +| ✓ `not_equal` | ✓ | ✓ | ✓ | |`!=`, alias `ne`| +| ✓ `or_kleene` | ✓ | | | | `\|` | +| ✓ `or_org` | ✓ | | | |`or` in Red Arrow| +| ✓ `power` | | ✓ | | | `**` | +| ✓ `subtract` | | ✓ | | | `-` | +| ✓ `shift_left` | | (✓) | | |`<<`, integer only| +| ✓ `shift_right` | | (✓) | | |`>>`, integer only| +| ✓ `xor` | ✓ | | | | `^` | + ##### (Not impremented) -- [ ] invert, round, round_to_multiple - [ ] sort, sort_index -- [ ] minmax, var, median, quantile - [ ] argmin, argmax - [ ] (array functions) - [ ] (strings functions) - [ ] (temporal functions) - [ ] (conditional functions)