doc/DataFrame.md in red_amber-0.3.0 vs doc/DataFrame.md in red_amber-0.4.0
- old
+ new
@@ -55,10 +55,14 @@
- from a `.arrow`, `.arrows`, `.csv`, `.csv.gz` or `.tsv` file
```ruby
RedAmber::DataFrame.load("test/entity/with_header.csv")
```
+
+ ```ruby
+ RedAmber::DataFrame.load("test/entity/without_header.csv", headers: [:x, :y, :z])
+ ```
- from a string buffer
- from a URI
@@ -273,10 +277,11 @@
### `tdr(limit = 10, tally: 5, elements: 5)`
- Shows some information about self in a transposed style.
- `tdr_str` returns same info as a String.
+ - `glimpse` is an alias. It is similar to dplyr's (or Polars's) `glimpse()`.
```ruby
require 'red_amber'
require 'datasets-arrow'
@@ -566,11 +571,11 @@
# =>
#<RedAmber::Vector(:uint8, size=3):0x000000000000f258>
[1, 2, 3]
```
-### `slice ` - slice and select records -
+### `slice ` - cut into slices of records -
Slice and select records (rows) to create a sub DataFrame.
![slice method image](doc/../image/dataframe/slice.png)
@@ -599,15 +604,18 @@
9 Gentoo Biscoe 49.9 16.1 213 ... 2009
```
- Booleans as an argument
- `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
+ `filter(booleans)` or `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
+ note: `slice(booleans)` is acceptable for orthogonality of `slice`/`remove`.
+
```ruby
vector = penguins[:bill_length_mm]
- penguins.slice(vector >= 40)
+ penguins.filter(vector >= 40)
+ # penguins.slice(vector >= 40) is also acceptable
# =>
#<RedAmber::DataFrame : 242 x 8 Vectors, 0x0000000000043d3c>
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
<string> <string> <double> <double> <uint8> ... <uint16>
@@ -831,18 +839,18 @@
### `assign`
Assign new or updated variables (columns) and create an updated DataFrame.
- - Variables with new keys will append new columns from the right.
+ - Variables with new keys will append new columns from right.
- Variables with exisiting keys will update corresponding vectors.
![assign method image](doc/../image/dataframe/assign.png)
- Variables as arguments
- `assign(key_pairs)` accepts pairs of key and values as parameters. `key_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
+ `assign(key_value_pairs)` accepts pairs of key and values as parameters. `key_value_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
```ruby
df = RedAmber::DataFrame.new(
name: %w[Yasuko Rui Hinata],
age: [68, 49, 28])
@@ -855,16 +863,16 @@
0 Yasuko 68
1 Rui 49
2 Hinata 28
# update :age and add :brother
- df.assign do
+ df.assign(
{
age: age + 29,
brother: ['Santa', nil, 'Momotaro']
}
- end
+ )
# =>
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000658b0>
name age brother
<string> <uint8> <string>
@@ -930,11 +938,11 @@
If assigner is empty or nil, returns self.
- Append from left
- `assign_left` method accepts the same parameters and block as `assign`, but append new columns from leftside.
+ `assign_left` method accepts the same parameters and block as `assign`, but append new columns from left.
```ruby
df.assign_left(new_index: df.indices(1))
# =>
@@ -1451,10 +1459,12 @@
<string> <uint8>
0 A 1
1 B 4
2 D 5
```
+##### `set_operable?(other)`
+ Check if `types` of self and other are same.
##### `intersect(other)`
Select records appearing in both self and other.
@@ -1496,19 +1506,27 @@
#<RedAmber::DataFrame : 1 x 2 Vectors, 0x0000000000029fcc>
KEY1 KEY2
<string> <uint8>
1 B 2
2 C 3
+
+ other.differencr(df)
+ #=>
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000040e0c>
+ KEY1 KEY2
+ <string> <uint8>
+ 0 B 4
+ 1 D 5
```
## Binding
### `concatenate(other)`
- Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
+ Concatenate another DataFrame or Table onto the bottom of self. The types of other must be the same as self.
- The alias is `concat`.
+ The alias is `concat` and `bind_rows`.
An array of DataFrames or Tables is also acceptable as other.
```ruby
df
@@ -1536,12 +1554,14 @@
1 2 B
2 3 C
3 4 D
```
-### `merge(other)`
+### `merge(*other)`
- Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
+ Concatenate another DataFrame or Table onto the bottom of self. The size of other must be the same as self. Self and other must not share the same key.
+
+ The alias is `bind_cols`.
```ruby
df
#=>
#<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000009150>