README.md in red_amber-0.1.2 vs README.md in red_amber-0.1.3
- old
+ new
@@ -43,32 +43,46 @@
- `RedAmber::DataFrame.new(Arrow::Table.new(x: [1, 2, 3]))`
- [x] `new` from a Rover::DataFrame
- `RedAmber::DataFrame.new(Rover::DataFrame.new(x: [1, 2, 3]))`
-- [ ] `load` (class method)
+- [x] `load` (class method)
- [x] from a [`.arrow`, `.arrows`, `.csv`, `.csv.gz`, `.tsv`] file
- `RedAmber::DataFrame.load("test/entity/with_header.csv")`
- [x] from a string buffer
- [x] from a URI
- `RedAmber::DataFrame.load(URI("https://github.com/heronshoes/red_amber/blob/master/test/entity/with_header.csv"))`
- - [ ] from a parquet file
+ - [x] from a Parquet file
-- [ ] `save` (instance method)
+ `red-parquet` gem is required.
+ ```ruby
+ require 'parquet'
+ dataframe = RedAmber::DataFrame.load("file.parquet")
+ ```
+
+- [x] `save` (instance method)
+
- [x] to a [`.arrow`, `.arrows`, `.csv`, `.csv.gz`, `.tsv`] file
- [x] to a string buffer
- [x] to a URI
- - [ ] to a parquet file
+ - [x] to a Parquet file
+ `red-parquet` gem is required.
+
+ ```ruby
+ require 'parquet'
+ dataframe.save("file.parquet")
+ ```
+
### Properties
- [x] `table`
Reader of Arrow::Table object inside.
@@ -127,45 +141,54 @@
Returns a `Rover::DataFrame`.
- [x] `inspect(tally_level: 5, max_element: 5)`
- Shows some information about self.
+ Shows some information about self in a transposed style.
```ruby
-hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
-RedAmber::DataFrame.new(hash)
+require 'red_amber'
+require 'datasets-arrow'
+
+penguins = Datasets::Penguins.new.to_arrow
+RedAmber::DataFrame.new(penguins)
# =>
-RedAmber::DataFrame : 3 observations(rows) of 3 variables(columns)
-Variables : 2 numeric, 1 string
-# key type level data_preview
-1 :a uint8 3 [1, 2, 3]
-2 :b string 3 [A, B, C]
-3 :c double 3 [1.0, 2.0, 3.0]
+RedAmber::DataFrame : 344 x 8 Vectors
+Vectors : 5 numeric, 3 strings
+# key type level data_preview
+1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
+2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
+3 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
+4 :bill_depth_mm double 81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
+5 :flipper_length_mm uint8 56 [181, 186, 195, nil, 193, ... ], 2 nils
+6 :body_mass_g uint16 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
+7 :sex string 3 {"male"=>168, "female"=>165, nil=>11}
+8 :year uint16 3 {2007=>110, 2008=>114, 2009=>120}
```
- tally_level: max level to use tally mode
- max_element: max num of element to show values in each row
### Selecting
- [x] Select columns by `[]` as `[key]`, `[keys]`, `[keys[index]]`
- Key in a Symbol: `df[:symbol]`
- Key in a String: `df["string"]`
- - Keys in an Array: `df[:symbol1`, `"string"`, `:symbol2`
+ - Keys in an Array: `df[:symbol1, "string", :symbol2]`
- Keys in indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`
- Keys in a Range:
A end-less Range can be used to represent keys.
+
```ruby
hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
df = RedAmber::DataFrame.new(hash)
df[:b..:c, "a"]
# =>
-RedAmber::DataFrame : 3 observations(rows) of 3 variables(columns)
-Variables : 2 numeric, 1 string
+RedAmber::DataFrame : 3 x 3 Vectors
+Vectors : 2 numeric, 1 string
# key type level data_preview
-1 :b string 3 [A, B, C]
+1 :b string 3 ["A", "B", "C"]
2 :c double 3 [1.0, 2.0, 3.0]
3 :a uint8 3 [1, 2, 3]
```
- [x] Select rows by `[]` as `[index]`, `[range]`, `[array]`
@@ -256,93 +279,130 @@
- [ ] `each_chunk`
- [x] `tally`
-- [ ] `n_nulls`
+- [x] `n_nils`, `n_nans`
+ - `n_nulls` is an alias of `n_nils`
+
+- [x] `inspect(limit: 80)`
+
+ - `limit` sets size limit to display long array.
+
### Functions
-#### Unary aggregations: vector.func => Scalar
+#### Unary aggregations: vector.func => scalar
-| Method |Boolean|Numeric|String|Remarks|
-| ------------ | --- | --- | --- | ----- |
-|[x] `all` | [x] | | | |
-|[x] `any` | [x] | | | |
-|[x] `approximate_median`| | [x] | | |
-|[x] `count` | [x] | [x] | [x] | |
-|[x] `count_distinct`| [x] | [x] | [x] | |
-|[x] `count_uniq` | [x] | [x] | [x] |an alias of `count_distinct`|
-|[ ] `index` | | | | |
-|[x] `max` | [x] | [x] | [x] | |
-|[x] `mean` | [x] | [x] | | |
-|[x] `min` | [x] | [x] | [x] | |
-|[ ] `min_max` | | | | |
-|[ ] `mode` | | | | |
-|[x] `product` | [x] | [x] | | |
-|[ ] `quantile`| | | | |
-|[x] `stddev` | | [x] | | |
-|[x] `sum` | [x] | [x] | | |
-|[ ] `tdigest` | | | | |
-|[x] `variance`| | [x] | | |
+| Method |Boolean|Numeric|String|Options|Remarks|
+| ----------- | --- | --- | --- | --- | --- |
+| ✓ `all` | ✓ | | | ✓ ScalarAggregate| |
+| ✓ `any` | ✓ | | | ✓ ScalarAggregate| |
+| ✓ `approximate_median`| |✓| | ✓ ScalarAggregate| alias `median`|
+| ✓ `count` | ✓ | ✓ | ✓ | ✓ Count | |
+| ✓ `count_distinct`| ✓ | ✓ | ✓ | ✓ Count |alias `count_uniq`|
+|[ ]`index` | [ ] | [ ] | [ ] |[ ] Index | |
+| ✓ `max` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| |
+| ✓ `mean` | ✓ | ✓ | | ✓ ScalarAggregate| |
+| ✓ `min` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| |
+|[ ]`min_max` | [ ] | [ ] | [ ] |[ ] ScalarAggregate| |
+|[ ]`mode` | | [ ] | |[ ] Mode | |
+| ✓ `product` | ✓ | ✓ | | ✓ ScalarAggregate| |
+|[ ]`quantile`| | [ ] | |[ ] Quantile| |
+|[ ]`stddev` | | ✓ | |[ ] Variance| |
+| ✓ `sum` | ✓ | ✓ | | ✓ ScalarAggregate| |
+|[ ]`tdigest` | | [ ] | |[ ] TDigest | |
+|[ ]`variance`| | ✓ | |[ ] Variance| |
-#### Unary element-wise: vector.func => Vector
-| Method |Boolean|Numeric|String|Remarks|
-| ------------ | --- | --- | --- | ----- |
-|[x] `-@` | | [x] | |as `-vector`|
-|[x] `negate` | | [x] | |`-@` |
-|[x] `abs` | | [x] | | |
-|[ ] `acos` | | [ ] | | |
-|[ ] `asin` | | [ ] | | |
-|[x] `atan` | | [x] | | |
-|[ ] `ceil` | | [x] | | |
-|[x] `cos` | | [x] | | |
-|[ ] `floor` | | [x] | | |
-|[ ] `ln` | | [ ] | | |
-|[ ] `log10` | | [ ] | | |
-|[ ] `log1p` | | [ ] | | |
-|[ ] `log2` | | [ ] | | |
-|[x] `sign` | | [x] | | |
-|[x] `sin` | | [x] | | |
-|[x] `tan` | | [x] | | |
-|[ ] `trunc` | | [x] | | |
+Options can be used as follows.
+See the [document of C++ function](https://arrow.apache.org/docs/cpp/compute.html) for detail.
-#### Binary element-wise: vector.func(vector) => Vector
+```ruby
+double = RedAmber::Vector.new([1, 0/0.0, -1/0.0, 1/0.0, nil, ""])
+#=>
+#<RedAmber::Vector(:double, size=6):0x000000000000f910>
+[1.0, NaN, -Infinity, Infinity, nil, 0.0]
-| Method |Boolean|Numeric|String|Remarks|
-| ------------------ | --- | --- | --- | ----- |
-|[x] `add` | | [x] | | `+` |
-|[x] `atan2` | | [x] | | |
-|[x] `and` | [x] | | | |
-|[x] `and_kleene` | [x] | | | |
-|[x] `and_not` | [x] | | | |
-|[x] `and_not_kleene`| [x] | | | |
-|[x] `bit_wise_and` | |([x])| |`&`, integer only|
-|[ ] `bit_wise_not` | |([x])| |`!`, integer only|
-|[x] `bit_wise_or` | |([x])| |`|`, integer only|
-|[x] `bit_wise_xor` | |([x])| |`^`, integer only|
-|[x] `divide` | | [x] | | `/` |
-|[x] `equal` | [x] | [x] | [x] |`==`, alias `eq`|
-|[x] `greater` | [x] | [x] | [x] |`>`, alias `gt`|
-|[x] `greater_equal` | [x] | [x] | [x] |`>=`, alias `ge`|
-|[x] `less` | [x] | [x] | [x] |`<`, alias `lt`|
-|[x] `less_equal` | [x] | [x] | [x] |`<=`, alias `le`|
-|[ ] `logb` | | [ ] | | |
-|[ ] `mod` | | [ ] | | |
-|[x] `multiply` | | [x] | | `*` |
-|[x] `not_equal` | [x] | [x] | [x] |`!=`, alias `ne`|
-|[x] `or` | [x] | | | |
-|[x] `or_kleene` | [x] | | | |
-|[x] `power` | | [x] | | `**` |
-|[x] `subtract` | | [x] | | `-` |
-|[x] `shift_left` | |([x])| |`<<`, integer only|
-|[x] `shift_right` | |([x])| |`>>`, integer only|
-|[x] `xor` | [x] | | | |
+double.count #=> 5
+double.count(opts: {mode: :only_valid}) #=> 5, default
+double.count(opts: {mode: :only_null}) #=> 1
+double.count(opts: {mode: :all}) #=> 6
+boolean = RedAmber::Vector.new([true, true, nil])
+#=>
+#<RedAmber::Vector(:boolean, size=3):0x000000000000f924>
+[true, true, nil]
+
+boolean.all #=> true
+boolean.all(opts: {skip_nulls: true}) #=> true
+boolean.all(opts: {skip_nulls: false}) #=> false
+```
+
+#### Unary element-wise: vector.func => vector
+
+| Method |Boolean|Numeric|String|Options|Remarks|
+| ------------ | --- | --- | --- | --- | ----- |
+| ✓ `-@` | | ✓ | | |as `-vector`|
+| ✓ `negate` | | ✓ | | |`-@` |
+| ✓ `abs` | | ✓ | | | |
+|[ ]`acos` | | [ ] | | | |
+|[ ]`asin` | | [ ] | | | |
+| ✓ `atan` | | ✓ | | | |
+| ✓ `bit_wise_not`| | (✓) | | |integer only|
+|[ ]`ceil` | | ✓ | | | |
+| ✓ `cos` | | ✓ | | | |
+|[ ]`floor` | | ✓ | | | |
+| ✓ `invert` | ✓ | | | |`!`, alias `not`|
+|[ ]`ln` | | [ ] | | | |
+|[ ]`log10` | | [ ] | | | |
+|[ ]`log1p` | | [ ] | | | |
+|[ ]`log2` | | [ ] | | | |
+|[ ]`round` | | [ ] | |[ ] Round| |
+|[ ]`round_to_multiple`| | [ ] | |[ ] RoundToMultiple| |
+| ✓ `sign` | | ✓ | | | |
+| ✓ `sin` | | ✓ | | | |
+| ✓ `tan` | | ✓ | | | |
+|[ ]`trunc` | | ✓ | | | |
+
+#### Binary element-wise: vector.func(vector) => vector
+
+| Method |Boolean|Numeric|String|Options|Remarks|
+| ----------------- | --- | --- | --- | --- | ----- |
+| ✓ `add` | | ✓ | | | `+` |
+| ✓ `atan2` | | ✓ | | | |
+| ✓ `and_kleene` | ✓ | | | | `&` |
+| ✓ `and_org ` | ✓ | | | |`and` in Red Arrow|
+| ✓ `and_not` | ✓ | | | | |
+| ✓ `and_not_kleene`| ✓ | | | | |
+| ✓ `bit_wise_and` | | (✓) | | |integer only|
+| ✓ `bit_wise_or` | | (✓) | | |integer only|
+| ✓ `bit_wise_xor` | | (✓) | | |integer only|
+| ✓ `divide` | | ✓ | | | `/` |
+| ✓ `equal` | ✓ | ✓ | ✓ | |`==`, alias `eq`|
+| ✓ `greater` | ✓ | ✓ | ✓ | |`>`, alias `gt`|
+| ✓ `greater_equal` | ✓ | ✓ | ✓ | |`>=`, alias `ge`|
+| ✓ `is_finite` | | ✓ | | | |
+| ✓ `is_inf` | | ✓ | | | |
+| ✓ `is_na` | ✓ | ✓ | ✓ | | |
+| ✓ `is_nan` | | ✓ | | | |
+|[ ]`is_nil` | ✓ | ✓ | ✓ |[ ] Null|alias `is_null`|
+| ✓ `is_valid` | ✓ | ✓ | ✓ | | |
+| ✓ `less` | ✓ | ✓ | ✓ | |`<`, alias `lt`|
+| ✓ `less_equal` | ✓ | ✓ | ✓ | |`<=`, alias `le`|
+|[ ]`logb` | | [ ] | | | |
+|[ ]`mod` | | [ ] | | | `%` |
+| ✓ `multiply` | | ✓ | | | `*` |
+| ✓ `not_equal` | ✓ | ✓ | ✓ | |`!=`, alias `ne`|
+| ✓ `or_kleene` | ✓ | | | | `\|` |
+| ✓ `or_org` | ✓ | | | |`or` in Red Arrow|
+| ✓ `power` | | ✓ | | | `**` |
+| ✓ `subtract` | | ✓ | | | `-` |
+| ✓ `shift_left` | | (✓) | | |`<<`, integer only|
+| ✓ `shift_right` | | (✓) | | |`>>`, integer only|
+| ✓ `xor` | ✓ | | | | `^` |
+
##### (Not impremented)
-- [ ] invert, round, round_to_multiple
- [ ] sort, sort_index
-- [ ] minmax, var, median, quantile
- [ ] argmin, argmax
- [ ] (array functions)
- [ ] (strings functions)
- [ ] (temporal functions)
- [ ] (conditional functions)