doc/DataFrame.md in red_amber-0.1.5 vs doc/DataFrame.md in red_amber-0.1.6

- old
+ new

@@ -2,15 +2,17 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with: - A collection of data which have same data type within. We call it `Vector`. - A label is attached to `Vector`. We call it `key`. - A `Vector` and associated `key` is grouped as a `variable`. -- `variable`s with same vector length are aligned and arranged to be a `DaTaFrame`. +- `variable`s with same vector length are aligned and arranged to be a `DataFrame`. - Each `Vector` in a `DataFrame` contains a set of relating data at same position. We call it `observation`. ![dataframe model image](doc/../image/dataframe_model.png) +(No change in this model in v0.1.6 .) + ## Constructors and saving ### `new` from a Hash ```ruby @@ -50,11 +52,11 @@ - from a string buffer - from a URI ```ruby - uri = URI("uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv") + uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv") RedAmber::DataFrame.load(uri) ``` - from a Parquet file @@ -145,13 +147,13 @@ ### `vectors` - Returns an Array of Vectors. -### `indexes`, `indices` +### `indices`, `indexes` -- Returns all indexes in a Range. +- Returns all indexes in an Array. ### `to_h` - Returns column-oriented data in a Hash. @@ -177,10 +179,14 @@ ### `to_rover` - Returns a `Rover::DataFrame`. +### `to_iruby` + +- Show the DataFrame as a Table in Jupyter Notebook or Jupyter Lab with IRuby. + ### `tdr(limit = 10, tally: 5, elements: 5)` - Shows some information about self in a transposed style. - `tdr_str` returns same info as a String. @@ -278,10 +284,13 @@ - Select obs. by indeces in a Range: `df[1..2]` An end-less or a begin-less Range can be used to represent indeces. - Select obs. by indeces in an Array: `df[1, 2]` + +- You can use float indices. + - Mixed case: `df[2, 0..]` ```ruby hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]} df = RedAmber::DataFrame.new(hash) @@ -421,14 +430,16 @@ Slice and select observations (rows) to create a sub DataFrame. ![slice method image](doc/../image/dataframe/slice.png) -- Keys as arguments +- Indices as arguments - `slice(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer. + `slice(indeces)` accepts indices as arguments. Indices should be Integers, Floats or Ranges of Integers. + Negative index from the tail like Ruby's Array is also acceptable. + ```ruby # returns 5 obs. at start and 5 obs. from end penguins.slice(0...5, -5..-1) # => #<RedAmber::DataFrame : 10 x 8 Vectors, 0x000000000000f230> @@ -455,11 +466,11 @@ 2 :island string 3 {"Torgersen"=>18, "Biscoe"=>139, "Dream"=>85} 3 :bill_length_mm double 115 [40.3, 42.0, 41.1, 42.5, 46.0, ... ] ... 5 more Vectors ... ``` -- Keys or booleans by a block +- Indices or booleans by a block `slice {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self. ```ruby # return a DataFrame with bill_length_mm is in 2*std range around mean @@ -467,10 +478,11 @@ vector = self[:bill_length_mm] min = vector.mean - vector.std max = vector.mean + vector.std vector.to_a.map { |e| (min..max).include? e } end + # => #<RedAmber::DataFrame : 204 x 8 Vectors, 0x000000000000f30c> Vectors : 5 numeric, 3 strings # key type level data_preview 1 :species string 3 {"Adelie"=>82, "Chinstrap"=>33, "Gentoo"=>89} @@ -507,11 +519,11 @@ Slice and reject observations (rows) to create a remainer DataFrame. ![remove method image](doc/../image/dataframe/remove.png) -- Keys as arguments +- Indices as arguments `remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer. ```ruby # returns 6th to 339th obs. @@ -546,11 +558,11 @@ 6 :body_mass_g uint16 93 [3750, 3800, 3250, 3450, 3650, ... ] 7 :sex string 2 {"male"=>168, "female"=>165} 8 :year uint16 3 {2007=>103, 2008=>113, 2009=>117} ``` -- Keys or booleans by a block +- Indices or booleans by a block `remove {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self. ```ruby penguins.remove do @@ -745,9 +757,11 @@ Remove any observations containing nil. ## Grouping ### `group(aggregating_keys, function, target_keys)` + + (This is a temporary API and may change in the future version.) Create grouped dataframe by `aggregation_keys` and apply `function` to each group and returns in `target_keys`. Aggregated key name is `function(key)` style. (The current implementation is not intuitive. Needs improvement.)