doc/DataFrame.md in red_amber-0.1.5 vs doc/DataFrame.md in red_amber-0.1.6
- old
+ new
@@ -2,15 +2,17 @@
Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
- A collection of data which have same data type within. We call it `Vector`.
- A label is attached to `Vector`. We call it `key`.
- A `Vector` and associated `key` is grouped as a `variable`.
-- `variable`s with same vector length are aligned and arranged to be a `DaTaFrame`.
+- `variable`s with same vector length are aligned and arranged to be a `DataFrame`.
- Each `Vector` in a `DataFrame` contains a set of relating data at same position. We call it `observation`.
![dataframe model image](doc/../image/dataframe_model.png)
+(No change in this model in v0.1.6 .)
+
## Constructors and saving
### `new` from a Hash
```ruby
@@ -50,11 +52,11 @@
- from a string buffer
- from a URI
```ruby
- uri = URI("uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
+ uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
RedAmber::DataFrame.load(uri)
```
- from a Parquet file
@@ -145,13 +147,13 @@
### `vectors`
- Returns an Array of Vectors.
-### `indexes`, `indices`
+### `indices`, `indexes`
-- Returns all indexes in a Range.
+- Returns all indexes in an Array.
### `to_h`
- Returns column-oriented data in a Hash.
@@ -177,10 +179,14 @@
### `to_rover`
- Returns a `Rover::DataFrame`.
+### `to_iruby`
+
+- Show the DataFrame as a Table in Jupyter Notebook or Jupyter Lab with IRuby.
+
### `tdr(limit = 10, tally: 5, elements: 5)`
- Shows some information about self in a transposed style.
- `tdr_str` returns same info as a String.
@@ -278,10 +284,13 @@
- Select obs. by indeces in a Range: `df[1..2]`
An end-less or a begin-less Range can be used to represent indeces.
- Select obs. by indeces in an Array: `df[1, 2]`
+
+- You can use float indices.
+
- Mixed case: `df[2, 0..]`
```ruby
hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
df = RedAmber::DataFrame.new(hash)
@@ -421,14 +430,16 @@
Slice and select observations (rows) to create a sub DataFrame.
![slice method image](doc/../image/dataframe/slice.png)
-- Keys as arguments
+- Indices as arguments
- `slice(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
+ `slice(indeces)` accepts indices as arguments. Indices should be Integers, Floats or Ranges of Integers.
+ Negative index from the tail like Ruby's Array is also acceptable.
+
```ruby
# returns 5 obs. at start and 5 obs. from end
penguins.slice(0...5, -5..-1)
# =>
#<RedAmber::DataFrame : 10 x 8 Vectors, 0x000000000000f230>
@@ -455,11 +466,11 @@
2 :island string 3 {"Torgersen"=>18, "Biscoe"=>139, "Dream"=>85}
3 :bill_length_mm double 115 [40.3, 42.0, 41.1, 42.5, 46.0, ... ]
... 5 more Vectors ...
```
-- Keys or booleans by a block
+- Indices or booleans by a block
`slice {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.
```ruby
# return a DataFrame with bill_length_mm is in 2*std range around mean
@@ -467,10 +478,11 @@
vector = self[:bill_length_mm]
min = vector.mean - vector.std
max = vector.mean + vector.std
vector.to_a.map { |e| (min..max).include? e }
end
+
# =>
#<RedAmber::DataFrame : 204 x 8 Vectors, 0x000000000000f30c>
Vectors : 5 numeric, 3 strings
# key type level data_preview
1 :species string 3 {"Adelie"=>82, "Chinstrap"=>33, "Gentoo"=>89}
@@ -507,11 +519,11 @@
Slice and reject observations (rows) to create a remainer DataFrame.
![remove method image](doc/../image/dataframe/remove.png)
-- Keys as arguments
+- Indices as arguments
`remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
```ruby
# returns 6th to 339th obs.
@@ -546,11 +558,11 @@
6 :body_mass_g uint16 93 [3750, 3800, 3250, 3450, 3650, ... ]
7 :sex string 2 {"male"=>168, "female"=>165}
8 :year uint16 3 {2007=>103, 2008=>113, 2009=>117}
```
-- Keys or booleans by a block
+- Indices or booleans by a block
`remove {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.
```ruby
penguins.remove do
@@ -745,9 +757,11 @@
Remove any observations containing nil.
## Grouping
### `group(aggregating_keys, function, target_keys)`
+
+ (This is a temporary API and may change in the future version.)
Create grouped dataframe by `aggregation_keys` and apply `function` to each group and returns in `target_keys`. Aggregated key name is `function(key)` style.
(The current implementation is not intuitive. Needs improvement.)