{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e355db8b-ebb6-4ea6-97b5-3b9fdadc302c",
   "metadata": {},
   "source": [
    "# 61 examples of Red Amber"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f20f4970-db38-4d96-9a36-d4cf9d007596",
   "metadata": {},
   "source": [
    "Last update: August 14, 2022, for RedAmber Version 0.2.0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6e927d0-b59a-4c4e-9f8a-4fa08f9a6b2f",
   "metadata": {},
   "source": [
    "## 1. Install"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85eacfe6-fa11-4749-844f-5914d6cd7dbc",
   "metadata": {},
   "source": [
    "Install requirements before you install Red Amber.\n",
    "\n",
    "- Apache Arrow GLib (>= 8.0.0)\n",
    "\n",
    "- Apache Parquet GLib (>= 8.0.0)  # if you need IO from/to Parquet resource.\n",
    "\n",
    "  See [Apache Arrow install document](https://arrow.apache.org/install/).\n",
    "  \n",
    "  Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber.\n",
    "\n",
    "Then add this line to your Gemfile:\n",
    "```\n",
    "gem 'red_amber'\n",
    "```\n",
    "\n",
    "And then execute:\n",
    "```\n",
    "$ bundle install\n",
    "```\n",
    "\n",
    "Or install it yourself as:\n",
    "```\n",
    "$ gem install red_amber\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c08c45d-0818-4b43-bc65-4d43dd8b6b66",
   "metadata": {},
   "source": [
    "## 2. Require"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "74b76022-03ea-40ae-bac8-fc8743659042",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{:RedAmber=>\"0.2.0\", :Arrow=>\"9.0.0\"}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "require 'red_amber' # require 'red-amber' is also OK\n",
    "include RedAmber\n",
    "{RedAmber: VERSION, Arrow: Arrow::VERSION}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8fb6289-39ea-4fa9-a165-b87ee6d125e9",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 3. Initialize"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "51f81824-626a-4741-a29b-30ea357fe7b5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 2 vectors> <table><tr><th>x</th><th>y</th></tr><tr><td>1</td><td>A</td></tr><tr><td>2</td><td>B</td></tr><tr><td>3</td><td>C</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f154>\n",
       "        x y\n",
       "  <uint8> <string>\n",
       "1       1 A\n",
       "2       2 B\n",
       "3       3 C\n"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# From a Hash\n",
    "DataFrame.new(x: [1, 2, 3], y: %w[A B C])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "20b696eb-c199-444d-a957-e0b1081f1506",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 2 vectors> <table><tr><th>x</th><th>y</th></tr><tr><td>1</td><td>A</td></tr><tr><td>2</td><td>B</td></tr><tr><td>3</td><td>C</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f168>\n",
       "        x y\n",
       "  <uint8> <string>\n",
       "1       1 A\n",
       "2       2 B\n",
       "3       3 C\n"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# From a schema and a row-oriented array\n",
    "DataFrame.new({ x: :uint8, y: :string }, [[1, 'A'], [2, 'B'], [3, 'C']])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "21eab151-f977-4474-a6d1-576169e24b26",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 2 vectors> <table><tr><th>x</th><th>y</th></tr><tr><td>1</td><td>A</td></tr><tr><td>2</td><td>B</td></tr><tr><td>3</td><td>C</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f17c>\n",
       "        x y\n",
       "  <uint8> <string>\n",
       "1       1 A\n",
       "2       2 B\n",
       "3       3 C\n"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# From an Arrow::Table\n",
    "table = Arrow::Table.new(x: [1, 2, 3], y: %w[A B C])\n",
    "DataFrame.new(table)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "aa09d3da-f332-45cd-92ca-712c6a679035",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 2 vectors> <table><tr><th>x</th><th>y</th></tr><tr><td>1</td><td>A</td></tr><tr><td>2</td><td>B</td></tr><tr><td>3</td><td>C</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f2a8>\n",
       "        x y\n",
       "  <uint8> <string>\n",
       "1       1 A\n",
       "2       2 B\n",
       "3       3 C\n"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# From a Rover::DataFrame\n",
    "require 'rover'\n",
    "rover = Rover::DataFrame.new(x: [1, 2, 3], y: %w[A B C])\n",
    "DataFrame.new(rover)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "cd2c3677-00fb-48fe-bb94-18bc0815db72",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <344 x 8 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.1</td><td>18.7</td><td>181</td><td>3750</td><td>male</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.5</td><td>17.4</td><td>186</td><td>3800</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>40.3</td><td>18.0</td><td>195</td><td>3250</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td>2007</td></tr><tr><td colspan='8'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>50.4</td><td>15.7</td><td>222</td><td>5750</td><td>male</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>45.2</td><td>14.8</td><td>212</td><td>5200</td><td>female</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>49.9</td><td>16.1</td><td>213</td><td>5400</td><td>male</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 344 x 8 Vectors, 0x000000000000f2bc>\n",
       "    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "    <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       "  1 Adelie   Torgersen           39.1          18.7               181 ...     2007\n",
       "  2 Adelie   Torgersen           39.5          17.4               186 ...     2007\n",
       "  3 Adelie   Torgersen           40.3          18.0               195 ...     2007\n",
       "  4 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007\n",
       "  5 Adelie   Torgersen           36.7          19.3               193 ...     2007\n",
       "  : :        :                      :             :                 : ...        :\n",
       "342 Gentoo   Biscoe              50.4          15.7               222 ...     2009\n",
       "343 Gentoo   Biscoe              45.2          14.8               212 ...     2009\n",
       "344 Gentoo   Biscoe              49.9          16.1               213 ...     2009\n"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# from a red-datasets\n",
    "require 'datasets-arrow'\n",
    "dataset = Datasets::Penguins.new\n",
    "penguins = DataFrame.new(dataset.to_arrow)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a2d12b4-7623-42c7-9e32-76cf303c7cea",
   "metadata": {},
   "source": [
    "It should be in the future version;\n",
    "```ruby\n",
    "require 'datasets-red-amber'\n",
    "penguins = Datasets::Penguins.new.to_red_amber\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "2e4619b7-bf6d-4081-9066-b186da8fdf5b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <32 x 11 vectors> <table><tr><th>mpg</th><th>cyl</th><th>disp</th><th>hp</th><th>drat</th><th>wt</th><th>qsec</th><th>vs</th><th>am</th><th>gear</th><th>carb</th></tr><tr><td>21.0</td><td>6</td><td>160.0</td><td>110</td><td>3.9</td><td>2.62</td><td>16.46</td><td>0</td><td>1</td><td>4</td><td>4</td></tr><tr><td>21.0</td><td>6</td><td>160.0</td><td>110</td><td>3.9</td><td>2.875</td><td>17.02</td><td>0</td><td>1</td><td>4</td><td>4</td></tr><tr><td>22.8</td><td>4</td><td>108.0</td><td>93</td><td>3.85</td><td>2.32</td><td>18.61</td><td>1</td><td>1</td><td>4</td><td>1</td></tr><tr><td>21.4</td><td>6</td><td>258.0</td><td>110</td><td>3.08</td><td>3.215</td><td>19.44</td><td>1</td><td>0</td><td>3</td><td>1</td></tr><tr><td colspan='11'>&#8942;</td></tr><tr><td>19.7</td><td>6</td><td>145.0</td><td>175</td><td>3.62</td><td>2.77</td><td>15.5</td><td>0</td><td>1</td><td>5</td><td>6</td></tr><tr><td>15.0</td><td>8</td><td>301.0</td><td>335</td><td>3.54</td><td>3.57</td><td>14.6</td><td>0</td><td>1</td><td>5</td><td>8</td></tr><tr><td>21.4</td><td>4</td><td>121.0</td><td>109</td><td>4.11</td><td>2.78</td><td>18.6</td><td>1</td><td>1</td><td>4</td><td>2</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 32 x 11 Vectors, 0x000000000000f2d0>\n",
       "        mpg     cyl     disp       hp     drat       wt     qsec      vs      am ...    carb\n",
       "   <double> <uint8> <double> <uint16> <double> <double> <double> <uint8> <uint8> ... <uint8>\n",
       " 1     21.0       6    160.0      110      3.9     2.62    16.46       0       1 ...       4\n",
       " 2     21.0       6    160.0      110      3.9     2.88    17.02       0       1 ...       4\n",
       " 3     22.8       4    108.0       93     3.85     2.32    18.61       1       1 ...       1\n",
       " 4     21.4       6    258.0      110     3.08     3.22    19.44       1       0 ...       1\n",
       " 5     18.7       8    360.0      175     3.15     3.44    17.02       0       0 ...       2\n",
       " :        :       :        :        :        :        :        :       :       : ...       :\n",
       "30     19.7       6    145.0      175     3.62     2.77     15.5       0       1 ...       6\n",
       "31     15.0       8    301.0      335     3.54     3.57     14.6       0       1 ...       8\n",
       "32     21.4       4    121.0      109     4.11     2.78     18.6       1       1 ...       2\n"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset = Datasets::Rdatasets.new('datasets', 'mtcars')\n",
    "mtcars = DataFrame.new(dataset.to_arrow)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1f77a54-3a43-4d17-bb6f-332ef13832a3",
   "metadata": {},
   "source": [
    "## 4. Load"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0fed4f43-3fbb-43e5-af0d-f93401deea78",
   "metadata": {},
   "source": [
    "`RedAmber::DataFrame` delegates `#load` to `Arrow::Table#load`. We can load from `[.arrow, .arrows, .csv, .csv.gz, .tsv]` files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "4203e671-0a0a-405c-8482-53a8cd78a891",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 2 vectors> <table><tr><th>name</th><th>age</th></tr><tr><td>Yasuko</td><td>68</td></tr><tr><td>Rui</td><td>49</td></tr><tr><td>Hinata</td><td>28</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f2e4>\n",
       "  name         age\n",
       "  <string> <int64>\n",
       "1 Yasuko        68\n",
       "2 Rui           49\n",
       "3 Hinata        28\n"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame.load(\"../test/entity/with_header.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29875147-1371-4575-a565-69c3534c15f2",
   "metadata": {},
   "source": [
    "## 5. Load from a URI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "916b86e2-e3a2-4ebb-8770-9e8a29c46523",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <344 x 7 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.1</td><td>18.7</td><td>181</td><td>3750</td><td>MALE</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.5</td><td>17.4</td><td>186</td><td>3800</td><td>FEMALE</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>40.3</td><td>18.0</td><td>195</td><td>3250</td><td>FEMALE</td></tr><tr><td>Adelie</td><td>Torgersen</td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td></td></tr><tr><td colspan='7'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>50.4</td><td>15.7</td><td>222</td><td>5750</td><td>MALE</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>45.2</td><td>14.8</td><td>212</td><td>5200</td><td>FEMALE</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>49.9</td><td>16.1</td><td>213</td><td>5400</td><td>MALE</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 344 x 7 Vectors, 0x000000000000f2f8>\n",
       "    species  island    bill_length_mm bill_depth_mm flipper_length_mm ... sex\n",
       "    <string> <string>        <double>      <double>           <int64> ... <string>\n",
       "  1 Adelie   Torgersen           39.1          18.7               181 ... MALE\n",
       "  2 Adelie   Torgersen           39.5          17.4               186 ... FEMALE\n",
       "  3 Adelie   Torgersen           40.3          18.0               195 ... FEMALE\n",
       "  4 Adelie   Torgersen          (nil)         (nil)             (nil) ...\n",
       "  5 Adelie   Torgersen           36.7          19.3               193 ... FEMALE\n",
       "  : :        :                      :             :                 : ... :\n",
       "342 Gentoo   Biscoe              50.4          15.7               222 ... MALE\n",
       "343 Gentoo   Biscoe              45.2          14.8               212 ... FEMALE\n",
       "344 Gentoo   Biscoe              49.9          16.1               213 ... MALE\n"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "uri = URI(\"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv\")\n",
    "DataFrame.load(uri)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6abe64d-e97f-437e-9c54-18f9e06e9668",
   "metadata": {},
   "source": [
    "## 6. Save"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "91c0fb62-7990-47f1-9fb6-b0529bc1783f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.save(\"file.arrow\")\n",
    "penguins.save(\"file.arrows\")\n",
    "penguins.save(\"file.csv\")\n",
    "penguins.save(\"file.csv.gz\")\n",
    "penguins.save(\"file.tsv\")\n",
    "penguins.save(\"file.feather\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1d30973-9e2f-406a-9f42-9e6e4c966baf",
   "metadata": {},
   "source": [
    "## 7. to_s/inspect"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7bc9cb7-eae4-495f-831e-b747e486d0bd",
   "metadata": {},
   "source": [
    "`to_s` or `inspect` (it uses to_s inside) shows a preview of the dataframe.\n",
    "\n",
    "It shows first 5 and last 3 rows if it has many rows. Columns are also omitted if line is exceeded 80 letters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "af6d29ef-2e1c-4a08-a8b2-d69acda79ec5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000000f30c>\n",
      "        x        y s        b\n",
      "  <uint8> <double> <string> <boolean>\n",
      "1       1      1.0 A        true\n",
      "2       2      2.0 B        false\n",
      "3       3      3.0 C        true\n",
      "4       4      NaN D        false\n",
      "5       5    (nil) (nil)    (nil)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "df = DataFrame.new(\n",
    "  x: [1, 2, 3, 4, 5],\n",
    "  y: [1, 2, 3, 0/0.0, nil],\n",
    "  s: %w[A B C D] << nil,\n",
    "  b: [true, false, true, false, nil])\n",
    "p df; nil"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "cdff2e60-bd0a-4d12-b348-201a49bbbbbe",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#<RedAmber::DataFrame : 344 x 8 Vectors, 0x000000000000f2bc>\n",
      "    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
      "    <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
      "  1 Adelie   Torgersen           39.1          18.7               181 ...     2007\n",
      "  2 Adelie   Torgersen           39.5          17.4               186 ...     2007\n",
      "  3 Adelie   Torgersen           40.3          18.0               195 ...     2007\n",
      "  4 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007\n",
      "  5 Adelie   Torgersen           36.7          19.3               193 ...     2007\n",
      "  : :        :                      :             :                 : ...        :\n",
      "342 Gentoo   Biscoe              50.4          15.7               222 ...     2009\n",
      "343 Gentoo   Biscoe              45.2          14.8               212 ...     2009\n",
      "344 Gentoo   Biscoe              49.9          16.1               213 ...     2009\n",
      "\n"
     ]
    }
   ],
   "source": [
    "p penguins; nil"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb44df38-58f7-479c-b7a4-c9c305639292",
   "metadata": {},
   "source": [
    "## 8. Show table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "fc710035-8134-4b18-89fe-8c58b95e0e0e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<Arrow::Table:0x7f54b8433320 ptr=0x55d81a4486e0>\n",
       "\tx\t         y\ts\tb\n",
       "0\t1\t  1.000000\tA\ttrue\n",
       "1\t2\t  2.000000\tB\tfalse\n",
       "2\t3\t  3.000000\tC\ttrue\n",
       "3\t4\t       NaN\tD\tfalse\n",
       "4\t5\t    (null)\t(null)\t(null)\n"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "2634fb7b-194f-4277-94ba-05f39c497ffa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<Arrow::Table:0x7f54b849dec8 ptr=0x55d81ac6f650>\n",
       "\tspecies\tisland\tbill_length_mm\tbill_depth_mm\tflipper_length_mm\tbody_mass_g\tsex\tyear\n",
       "  0\tAdelie \tTorgersen\t     39.100000\t    18.700000\t              181\t       3750\tmale\t2007\n",
       "  1\tAdelie \tTorgersen\t     39.500000\t    17.400000\t              186\t       3800\tfemale\t2007\n",
       "  2\tAdelie \tTorgersen\t     40.300000\t    18.000000\t              195\t       3250\tfemale\t2007\n",
       "  3\tAdelie \tTorgersen\t        (null)\t       (null)\t           (null)\t     (null)\t(null)\t2007\n",
       "  4\tAdelie \tTorgersen\t     36.700000\t    19.300000\t              193\t       3450\tfemale\t2007\n",
       "  5\tAdelie \tTorgersen\t     39.300000\t    20.600000\t              190\t       3650\tmale\t2007\n",
       "  6\tAdelie \tTorgersen\t     38.900000\t    17.800000\t              181\t       3625\tfemale\t2007\n",
       "  7\tAdelie \tTorgersen\t     39.200000\t    19.600000\t              195\t       4675\tmale\t2007\n",
       "  8\tAdelie \tTorgersen\t     34.100000\t    18.100000\t              193\t       3475\t(null)\t2007\n",
       "  9\tAdelie \tTorgersen\t     42.000000\t    20.200000\t              190\t       4250\t(null)\t2007\n",
       "...\n",
       "334\tGentoo \tBiscoe\t     46.200000\t    14.100000\t              217\t       4375\tfemale\t2009\n",
       "335\tGentoo \tBiscoe\t     55.100000\t    16.000000\t              230\t       5850\tmale\t2009\n",
       "336\tGentoo \tBiscoe\t     44.500000\t    15.700000\t              217\t       4875\t(null)\t2009\n",
       "337\tGentoo \tBiscoe\t     48.800000\t    16.200000\t              222\t       6000\tmale\t2009\n",
       "338\tGentoo \tBiscoe\t     47.200000\t    13.700000\t              214\t       4925\tfemale\t2009\n",
       "339\tGentoo \tBiscoe\t        (null)\t       (null)\t           (null)\t     (null)\t(null)\t2009\n",
       "340\tGentoo \tBiscoe\t     46.800000\t    14.300000\t              215\t       4850\tfemale\t2009\n",
       "341\tGentoo \tBiscoe\t     50.400000\t    15.700000\t              222\t       5750\tmale\t2009\n",
       "342\tGentoo \tBiscoe\t     45.200000\t    14.800000\t              212\t       5200\tfemale\t2009\n",
       "343\tGentoo \tBiscoe\t     49.900000\t    16.100000\t              213\t       5400\tmale\t2009\n"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "9dba2a67-ede7-4663-907b-9b2dd5db1605",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x: uint8\n",
      "y: double\n",
      "s: string\n",
      "b: bool\n",
      "----\n",
      "x:\n",
      "  [\n",
      "    [\n",
      "      1,\n",
      "      2,\n",
      "      3,\n",
      "      4,\n",
      "      5\n",
      "    ]\n",
      "  ]\n",
      "y:\n",
      "  [\n",
      "    [\n",
      "      1,\n",
      "      2,\n",
      "      3,\n",
      "      nan,\n",
      "      null\n",
      "    ]\n",
      "  ]\n",
      "s:\n",
      "  [\n",
      "    [\n",
      "      \"A\",\n",
      "      \"B\",\n",
      "      \"C\",\n",
      "      \"D\",\n",
      "      null\n",
      "    ]\n",
      "  ]\n",
      "b:\n",
      "  [\n",
      "    [\n",
      "      true,\n",
      "      false,\n",
      "      true,\n",
      "      false,\n",
      "      null\n",
      "    ]\n",
      "  ]\n"
     ]
    }
   ],
   "source": [
    "# This is a Red Arrow's feature\n",
    "puts df.table.to_s(format: :column)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "d1cc17b8-1cfc-4986-9dec-7bca02be32f0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==================== 0 ====================\n",
      "x: 1\n",
      "y:   1.000000\n",
      "s: A\n",
      "b: true\n",
      "==================== 1 ====================\n",
      "x: 2\n",
      "y:   2.000000\n",
      "s: B\n",
      "b: false\n",
      "==================== 2 ====================\n",
      "x: 3\n",
      "y:   3.000000\n",
      "s: C\n",
      "b: true\n",
      "==================== 3 ====================\n",
      "x: 4\n",
      "y:        NaN\n",
      "s: D\n",
      "b: false\n",
      "==================== 4 ====================\n",
      "x: 5\n",
      "y: (null)\n",
      "s: (null)\n",
      "b: (null)\n"
     ]
    }
   ],
   "source": [
    "# This is also a Red Arrow's feature\n",
    "puts df.table.to_s(format: :list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16e4ae6b-2399-43f0-be8e-65669b95c7b6",
   "metadata": {},
   "source": [
    "## 9. TDR"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d14eb4b-9026-4cc5-a71a-598946d40b67",
   "metadata": {},
   "source": [
    "TDR means 'Transposed Dataframe Representation'. It shows columns in lateral just the same shape as initializing by a Hash. TDR has some information which is useful for the exploratory data processing.\n",
    "\n",
    "- DataFrame shape: n_rows x n_columns\n",
    "- Data types\n",
    "- Levels: number of unique elements\n",
    "- Data preview: same data is aggregated if level is smaller (tally mode)\n",
    "- Show counts of abnormal element: NaN and nil"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "8050462f-7c60-41b7-a011-af11763784dc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RedAmber::DataFrame : 5 x 4 Vectors\n",
      "Vectors : 2 numeric, 1 string, 1 boolean\n",
      "# key type    level data_preview\n",
      "1 :x  uint8       5 [1, 2, 3, 4, 5]\n",
      "2 :y  double      5 [1.0, 2.0, 3.0, NaN, nil], 1 NaN, 1 nil\n",
      "3 :s  string      5 [\"A\", \"B\", \"C\", \"D\", nil], 1 nil\n",
      "4 :b  boolean     3 {true=>2, false=>2, nil=>1}\n"
     ]
    }
   ],
   "source": [
    "# use the same dataframe as #7\n",
    "df.tdr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "bb616ffe-c19a-4b02-a011-601ceb3db656",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RedAmber::DataFrame : 344 x 8 Vectors\n",
      "Vectors : 5 numeric, 3 strings\n",
      "# key                type   level data_preview\n",
      "1 :species           string     3 {\"Adelie\"=>152, \"Chinstrap\"=>68, \"Gentoo\"=>124}\n",
      "2 :island            string     3 {\"Torgersen\"=>52, \"Biscoe\"=>168, \"Dream\"=>124}\n",
      "3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils\n",
      "4 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils\n",
      "5 :flipper_length_mm uint8     56 [181, 186, 195, nil, 193, ... ], 2 nils\n",
      "6 :body_mass_g       uint16    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils\n",
      "7 :sex               string     3 {\"male\"=>168, \"female\"=>165, nil=>11}\n",
      "8 :year              uint16     3 {2007=>110, 2008=>114, 2009=>120}\n"
     ]
    }
   ],
   "source": [
    "penguins.tdr"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73b8dc18-079f-4d40-8d0e-239f010550da",
   "metadata": {},
   "source": [
    "`#tdr` has some options:\n",
    "\n",
    "`limit` : to limit a number of variables to show. Default value is `limit=10`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "0962845d-e642-4d2a-9607-43e197b46bc5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RedAmber::DataFrame : 344 x 8 Vectors\n",
      "Vectors : 5 numeric, 3 strings\n",
      "# key                type   level data_preview\n",
      "1 :species           string     3 {\"Adelie\"=>152, \"Chinstrap\"=>68, \"Gentoo\"=>124}\n",
      "2 :island            string     3 {\"Torgersen\"=>52, \"Biscoe\"=>168, \"Dream\"=>124}\n",
      "3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils\n",
      " ... 5 more Vectors ...\n"
     ]
    }
   ],
   "source": [
    "penguins.tdr(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "573606c4-23b9-4b38-8c92-a04f1c1e8781",
   "metadata": {},
   "source": [
    "`elements` : max number of elements to show in observations. Default value is `elements: 5`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "f957d2bd-e8c0-42a1-a3b4-0a9478e740bf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RedAmber::DataFrame : 344 x 8 Vectors\n",
      "Vectors : 5 numeric, 3 strings\n",
      "# key                type   level data_preview\n",
      "1 :species           string     3 {\"Adelie\"=>152, \"Chinstrap\"=>68, \"Gentoo\"=>124}\n",
      "2 :island            string     3 {\"Torgersen\"=>52, \"Biscoe\"=>168, \"Dream\"=>124}\n",
      "3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, ... ], 2 nils\n",
      "4 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, ... ], 2 nils\n",
      "5 :flipper_length_mm uint8     56 [181, 186, 195, ... ], 2 nils\n",
      "6 :body_mass_g       uint16    95 [3750, 3800, 3250, ... ], 2 nils\n",
      "7 :sex               string     3 {\"male\"=>168, \"female\"=>165, nil=>11}\n",
      "8 :year              uint16     3 {2007=>110, 2008=>114, 2009=>120}\n"
     ]
    }
   ],
   "source": [
    "penguins.tdr(elements: 3) # Show first 3 items in data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d37ece79-1999-49eb-a2d1-831184ee6509",
   "metadata": {},
   "source": [
    "`tally` : max level to use tally mode. Level means size of `tally`ed hash. Default value is `tally: 5`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "9c1c472c-3d15-4bca-9a1b-7f86c63d3ed8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RedAmber::DataFrame : 344 x 8 Vectors\n",
      "Vectors : 5 numeric, 3 strings\n",
      "# key                type   level data_preview\n",
      "1 :species           string     3 [\"Adelie\", \"Adelie\", \"Adelie\", \"Adelie\", \"Adelie\", ... ]\n",
      "2 :island            string     3 [\"Torgersen\", \"Torgersen\", \"Torgersen\", \"Torgersen\", \"Torgersen\", ... ]\n",
      "3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils\n",
      "4 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils\n",
      "5 :flipper_length_mm uint8     56 [181, 186, 195, nil, 193, ... ], 2 nils\n",
      "6 :body_mass_g       uint16    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils\n",
      "7 :sex               string     3 [\"male\", \"female\", \"female\", nil, \"female\", ... ], 11 nils\n",
      "8 :year              uint16     3 [2007, 2007, 2007, 2007, 2007, ... ]\n"
     ]
    }
   ],
   "source": [
    "penguins.tdr(tally: 0) # Don't use tally mode"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3c38037-90a1-4fc5-9904-41fc74085908",
   "metadata": {},
   "source": [
    "`#tdr_str` returns a String. `#tdr` do the same thing as `puts #tdr_str`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21d68764-1bc1-4915-99b6-5ae938b85999",
   "metadata": {},
   "source": [
    "## 10. Size and shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "487399f8-a3ef-467f-aa7f-ecbaee5fcb75",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# same as n_rows, n_obs\n",
    "df.size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "dc7441c3-7c85-4ce1-a20e-de8f41f280b4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# same as n_cols, n_vars\n",
    "df.n_keys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "3d42fea6-801a-45f4-8e22-ea9d76ae070f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[5, 4]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# [df.size, df.n_keys], [df.n_rows, df.n_cols]\n",
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc5caa94-325f-4014-9c90-8ac909c2b378",
   "metadata": {},
   "source": [
    "## 11. Keys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "bb47775f-fed0-42e6-8781-aa8b721d6112",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[:x, :y, :s, :b]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.keys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "3d540ab0-3e52-47b7-b338-b4e0b3d929cb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[:species, :island, :bill_length_mm, :bill_depth_mm, :flipper_length_mm, :body_mass_g, :sex, :year]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.keys"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "decc6a61-9994-4d60-9827-b257cafafb70",
   "metadata": {},
   "source": [
    "## 12. Types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "bf9cd2bc-a213-427e-bc00-f2083b0e0471",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[:uint8, :double, :string, :boolean]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "b1ecb891-98b5-4919-9f37-1847202007d8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[:string, :string, :double, :double, :uint8, :uint16, :string, :uint16]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.types"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "869b3670-62f8-4c23-807b-d6d100a1981e",
   "metadata": {},
   "source": [
    "## 13. Data type classes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "776ab4db-073b-4b30-931a-8ec77284cdc4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Arrow::UInt8DataType, Arrow::DoubleDataType, Arrow::StringDataType, Arrow::BooleanDataType]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.type_classes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "0546a5d0-cab1-4ca8-a2e5-0637d0fd48b6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Arrow::StringDataType, Arrow::StringDataType, Arrow::DoubleDataType, Arrow::DoubleDataType, Arrow::UInt8DataType, Arrow::UInt16DataType, Arrow::StringDataType, Arrow::UInt16DataType]"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.type_classes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c2513f6-909e-47fd-a543-66c4f424f44e",
   "metadata": {},
   "source": [
    "## 14. Indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "e6e9d7ef-1471-4f23-9210-56045c9fabd5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 1, 2, 3, 4]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.indexes\n",
    "# or\n",
    "df.indices"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3908395f-b086-4fbb-9855-e1ce233f0595",
   "metadata": {},
   "source": [
    "## 15. To an Array or a Hash"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22cb724e-cf61-40d9-a58b-9cc793e83645",
   "metadata": {},
   "source": [
    "DataFrame#to_a returns an array of row-oriented data without a header."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "4054daad-9266-4002-8942-c0891050cb4d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[1, 1.0, \"A\", true], [2, 2.0, \"B\", false], [3, 3.0, \"C\", true], [4, NaN, \"D\", false], [5, nil, nil, nil]]"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.to_a"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6abae59-fe31-4056-9de8-7c36e35235de",
   "metadata": {},
   "source": [
    "If you need a column-oriented array with keys, use `.to_h.to_a`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "d3631290-eb74-4d21-a469-86381c668c7f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{:x=>[1, 2, 3, 4, 5], :y=>[1.0, 2.0, 3.0, NaN, nil], :s=>[\"A\", \"B\", \"C\", \"D\", nil], :b=>[true, false, true, false, nil]}"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.to_h"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "08c45e92-f640-4e62-bc96-ee259d0ecff4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[:x, [1, 2, 3, 4, 5]], [:y, [1.0, 2.0, 3.0, NaN, nil]], [:s, [\"A\", \"B\", \"C\", \"D\", nil]], [:b, [true, false, true, false, nil]]]"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.to_h.to_a"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39b65fc0-4405-4414-9a74-91c724ef587c",
   "metadata": {},
   "source": [
    "## 16. Schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "36db7842-e9b0-4473-84d4-3aef987d427f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{:x=>:uint8, :y=>:double, :s=>:string, :b=>:boolean}"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e61237d-ac67-45bb-827c-a769dff61809",
   "metadata": {},
   "source": [
    "## 17. Vector"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27402307-aaad-49c8-88ca-65346668601d",
   "metadata": {},
   "source": [
    "Each variable (column in the table) is represented by a Vector object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "6c9ba041-231d-4057-a280-acf620b68525",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=5):0x000000000000f320>\n",
       "[1, 2, 3, 4, 5]\n"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[:x] # This syntax comes later"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e13d06d-b432-45b2-9745-0c6ef9228e23",
   "metadata": {},
   "source": [
    "Or create new Vector by the constructor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "3e18a4e0-238c-4800-8bda-a88a57dde3e9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=5):0x000000000000f334>\n",
       "[1, 2, 3, 4, 5]\n"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Vector.new(1, 2, 3, 4, 5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "3bd55d9d-b988-46b2-bc11-e3dc5f4adc6c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=5):0x000000000000f348>\n",
       "[1, 2, 3, 4, 5]\n"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Vector.new(1..5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "19688e6e-b59b-4a84-8c07-57e87cd0e242",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=5):0x000000000000f35c>\n",
       "[1, 2, 3, 4, 5]\n"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Vector.new([1, 2, 3], [4, 5])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "076bd0e2-01ab-4497-9b9b-84f72a4805bc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=5):0x000000000000f370>\n",
       "[1, 2, 3, 4, 5]\n"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "array = Arrow::Array.new([1, 2, 3, 4, 5])\n",
    "Vector.new(array)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22091661-e78a-4c66-9e48-4c3c676469b4",
   "metadata": {},
   "source": [
    "- TODO: `Vector[1..5]` as a constructor"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b729bdba-87a2-4282-bd0e-319fe17f42da",
   "metadata": {},
   "source": [
    "## 18. Vectors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5ddd840-2f84-467b-a9bb-feb769573b69",
   "metadata": {},
   "source": [
    "Returns an Array of Vectors in a DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "d3ae03f2-e2fe-4a15-abe1-331185448d61",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[#<RedAmber::Vector(:uint8, size=5):0x000000000000f320>\n",
       "[1, 2, 3, 4, 5]\n",
       ", #<RedAmber::Vector(:double, size=5):0x000000000000f384>\n",
       "[1.0, 2.0, 3.0, NaN, nil]\n",
       ", #<RedAmber::Vector(:string, size=5):0x000000000000f398>\n",
       "[\"A\", \"B\", \"C\", \"D\", nil]\n",
       ", #<RedAmber::Vector(:boolean, size=5):0x000000000000f3ac>\n",
       "[true, false, true, false, nil]\n",
       "]"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.vectors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ac88ff3-0cb6-43d6-a999-0c2e8c6defb7",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 19. Variables\n",
    "\n",
    "Returns key and Vector pairs in a Hash."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "3351a216-6fe5-485e-8686-53c1e754fa2e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{:x=>#<RedAmber::Vector(:uint8, size=5):0x000000000000f320>\n",
       "[1, 2, 3, 4, 5]\n",
       ", :y=>#<RedAmber::Vector(:double, size=5):0x000000000000f384>\n",
       "[1.0, 2.0, 3.0, NaN, nil]\n",
       ", :s=>#<RedAmber::Vector(:string, size=5):0x000000000000f398>\n",
       "[\"A\", \"B\", \"C\", \"D\", nil]\n",
       ", :b=>#<RedAmber::Vector(:boolean, size=5):0x000000000000f3ac>\n",
       "[true, false, true, false, nil]\n",
       "}"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.variables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b518c1c-eda7-406f-a885-b2344b1726eb",
   "metadata": {},
   "source": [
    "## 20. Select columns by #[ ]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "767b4e49-19eb-4d5f-b030-91bd78f0f5b9",
   "metadata": {},
   "source": [
    "`DataFrame#[]` is overloading column operations and row operations.\n",
    "\n",
    "- For columns (variables)\n",
    "  - Key in a Symbol: `df[:symbol]`\n",
    "  - Key in a String: `df[\"string\"]`\n",
    "  - Keys in an Array: `df[:symbol1, \"string\", :symbol2]`\n",
    "  - Keys by indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "ccf60edc-cccf-49e3-a503-1ca532247130",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 2 vectors> <table><tr><th>x</th><th>y</th></tr><tr><td>1</td><td>1.0</td></tr><tr><td>2</td><td>2.0</td></tr><tr><td>3</td><td>3.0</td></tr><tr><td>4</td><td>NaN</td></tr><tr><td>5</td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f3c0>\n",
       "        x        y\n",
       "  <uint8> <double>\n",
       "1       1      1.0\n",
       "2       2      2.0\n",
       "3       3      3.0\n",
       "4       4      NaN\n",
       "5       5    (nil)\n"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Keys in a Symbol and a String\n",
    "df[:x, 'y']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "8500f8c0-ff5a-4537-9f47-03d675e31b18",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 2 vectors> <table><tr><th>x</th><th>y</th></tr><tr><td>1</td><td>1.0</td></tr><tr><td>2</td><td>2.0</td></tr><tr><td>3</td><td>3.0</td></tr><tr><td>4</td><td>NaN</td></tr><tr><td>5</td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f3d4>\n",
       "        x        y\n",
       "  <uint8> <double>\n",
       "1       1      1.0\n",
       "2       2      2.0\n",
       "3       3      3.0\n",
       "4       4      NaN\n",
       "5       5    (nil)\n"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Keys in a Range\n",
    "df['x'..'y']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "db35cae1-35c2-47de-a7e8-906161f21282",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 3 vectors> <table><tr><th>s</th><th>b</th><th>x</th></tr><tr><td>A</td><td>true</td><td>1</td></tr><tr><td>B</td><td>false</td><td>2</td></tr><tr><td>C</td><td>true</td><td>3</td></tr><tr><td>D</td><td>false</td><td>4</td></tr><tr><td><i>(nil)</i></td><td><i>(nil)</i></td><td>5</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f3e8>\n",
       "  s        b               x\n",
       "  <string> <boolean> <uint8>\n",
       "1 A        true            1\n",
       "2 B        false           2\n",
       "3 C        true            3\n",
       "4 D        false           4\n",
       "5 (nil)    (nil)           5\n"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Keys with a index Range, and a symbol\n",
    "df[df.keys[2..], :x]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03e14403-f7bc-4350-9e7b-715901164331",
   "metadata": {},
   "source": [
    "## 21. Select rows by #[ ]\n",
    "`DataFrame#[]` is overloading column operations and row operations.\n",
    "\n",
    "- For rows (observations)\n",
    "  - Select rows by a Index: `df[index]`\n",
    "  - Select rows by Indices: `df[indices]` # Array, Arrow::Array, Vectors are acceptable for indices\n",
    "  - Select rows by Ranges: `df[range]`\n",
    "  - Select rows by Booleans: `df[booleans]` # Array, Arrow::Array, Vectors are acceptable for booleans"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "e3bc60a7-611e-4fd8-9770-8e0d167d3fee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 4 vectors> <table><tr><th>x</th><th>y</th><th>s</th><th>b</th></tr><tr><td>1</td><td>1.0</td><td>A</td><td>true</td></tr><tr><td>3</td><td>3.0</td><td>C</td><td>true</td></tr><tr><td>2</td><td>2.0</td><td>B</td><td>false</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 4 Vectors, 0x000000000000f3fc>\n",
       "        x        y s        b\n",
       "  <uint8> <double> <string> <boolean>\n",
       "1       1      1.0 A        true\n",
       "2       3      3.0 C        true\n",
       "3       2      2.0 B        false\n"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# indices\n",
    "df[0, 2, 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "2b8b3801-ae37-4629-9db5-ff937941c895",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 4 vectors> <table><tr><th>x</th><th>y</th><th>s</th><th>b</th></tr><tr><td>2</td><td>2.0</td><td>B</td><td>false</td></tr><tr><td>3</td><td>3.0</td><td>C</td><td>true</td></tr><tr><td>5</td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 4 Vectors, 0x000000000000f410>\n",
       "        x        y s        b\n",
       "  <uint8> <double> <string> <boolean>\n",
       "1       2      2.0 B        false\n",
       "2       3      3.0 C        true\n",
       "3       5    (nil) (nil)    (nil)\n"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# including a Range\n",
    "# negative indices are also acceptable\n",
    "df[1..2, -1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "3f6f8d73-a66c-4773-9bf5-0878c700f2d6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 4 vectors> <table><tr><th>x</th><th>y</th><th>s</th><th>b</th></tr><tr><td>2</td><td>2.0</td><td>B</td><td>false</td></tr><tr><td>3</td><td>3.0</td><td>C</td><td>true</td></tr><tr><td>5</td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 4 Vectors, 0x000000000000f424>\n",
       "        x        y s        b\n",
       "  <uint8> <double> <string> <boolean>\n",
       "1       2      2.0 B        false\n",
       "2       3      3.0 C        true\n",
       "3       5    (nil) (nil)    (nil)\n"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# booleans\n",
    "# length of boolean should be the same as self\n",
    "df[false, true, true, false, true]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "abe57279-54fd-48ec-a1a4-c7453211e776",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 4 vectors> <table><tr><th>x</th><th>y</th><th>s</th><th>b</th></tr><tr><td>1</td><td>1.0</td><td>A</td><td>true</td></tr><tr><td>3</td><td>3.0</td><td>C</td><td>true</td></tr><tr><td>5</td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 4 Vectors, 0x000000000000f438>\n",
       "        x        y s        b\n",
       "  <uint8> <double> <string> <boolean>\n",
       "1       1      1.0 A        true\n",
       "2       3      3.0 C        true\n",
       "3       5    (nil) (nil)    (nil)\n"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Arrow::Array\n",
    "indices = Arrow::UInt8Array.new([0,2,4])\n",
    "df[indices]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "2266611f-23d8-4645-a1e8-b07c2370fb3f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 4 vectors> <table><tr><th>x</th><th>y</th><th>s</th><th>b</th></tr><tr><td>3</td><td>3.0</td><td>C</td><td>true</td></tr><tr><td>4</td><td>NaN</td><td>D</td><td>false</td></tr><tr><td>5</td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 4 Vectors, 0x000000000000f44c>\n",
       "        x        y s        b\n",
       "  <uint8> <double> <string> <boolean>\n",
       "1       3      3.0 C        true\n",
       "2       4      NaN D        false\n",
       "3       5    (nil) (nil)    (nil)\n"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# By a Vector as indices\n",
    "indices = Vector.new(df.indices)\n",
    "# indices > 1 returns a boolean Vector\n",
    "df[indices > 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "0ea2da7e-aeca-4874-be4a-6af563aa378b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=5):0x000000000000f3ac>\n",
       "[true, false, true, false, nil]\n"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# By a Vector as booleans\n",
    "booleans = df[:b]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "9f842890-6359-4266-9a23-2f8f813ef548",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <2 x 4 vectors> <table><tr><th>x</th><th>y</th><th>s</th><th>b</th></tr><tr><td>1</td><td>1.0</td><td>A</td><td>true</td></tr><tr><td>3</td><td>3.0</td><td>C</td><td>true</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 2 x 4 Vectors, 0x000000000000f460>\n",
       "        x        y s        b\n",
       "  <uint8> <double> <string> <boolean>\n",
       "1       1      1.0 A        true\n",
       "2       3      3.0 C        true\n"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[booleans]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98a04874-cb2c-44c0-b410-b330b9d12b0f",
   "metadata": {},
   "source": [
    "## 22. empty?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "7b1ab319-90a7-4f09-8629-04dcd94076cb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "false"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.empty?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "1e09c32f-20a8-4175-827f-cdb98063535a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame.new.empty?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "3f9f8771-87dd-44eb-8aac-6a3ed8b4c183",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(empty DataFrame)"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame.new"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86b826dd-10e6-4087-9162-b89ac6561a61",
   "metadata": {},
   "source": [
    "## 23. Select columns by pick"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5aefd22-4e96-4dc5-91d2-e6826256bda6",
   "metadata": {
    "tags": []
   },
   "source": [
    "`DataFrame#pick` accepts an Array of keys to pick up columns (variables) and creates a new DataFrame. You can change the order of columns at a same time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "68124521-b823-424d-9e06-d11aa927d618",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 2 vectors> <table><tr><th>s</th><th>y</th></tr><tr><td>A</td><td>1.0</td></tr><tr><td>B</td><td>2.0</td></tr><tr><td>C</td><td>3.0</td></tr><tr><td>D</td><td>NaN</td></tr><tr><td><i>(nil)</i></td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f474>\n",
       "  s               y\n",
       "  <string> <double>\n",
       "1 A             1.0\n",
       "2 B             2.0\n",
       "3 C             3.0\n",
       "4 D             NaN\n",
       "5 (nil)       (nil)\n"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pick(:s, :y)\n",
    "# or\n",
    "df.pick([:s, :y]) # OK too."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a76dca00-da8f-4959-be18-7a1015a9d13c",
   "metadata": {},
   "source": [
    "Or use a boolean Array of lengeh `n_key` to `pick`. This style remains the order of variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "b91f8925-529c-43c9-93ba-e21bcac0f2f7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 2 vectors> <table><tr><th>y</th><th>s</th></tr><tr><td>1.0</td><td>A</td></tr><tr><td>2.0</td><td>B</td></tr><tr><td>3.0</td><td>C</td></tr><tr><td>NaN</td><td>D</td></tr><tr><td><i>(nil)</i></td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f488>\n",
       "         y s\n",
       "  <double> <string>\n",
       "1      1.0 A\n",
       "2      2.0 B\n",
       "3      3.0 C\n",
       "4      NaN D\n",
       "5    (nil) (nil)\n"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pick(false, true, true, false)\n",
    "# or\n",
    "df.pick([false, true, true, false]) # OK"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f903182-745b-4923-99d8-14a9b9c6ea4c",
   "metadata": {},
   "source": [
    "`#pick` also accepts a block in the context of self.\n",
    "\n",
    "Next example is picking up numeric variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "37bb0a49-c38a-484c-91d4-3e23ab43a727",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 2 vectors> <table><tr><th>x</th><th>y</th></tr><tr><td>1</td><td>1.0</td></tr><tr><td>2</td><td>2.0</td></tr><tr><td>3</td><td>3.0</td></tr><tr><td>4</td><td>NaN</td></tr><tr><td>5</td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f49c>\n",
       "        x        y\n",
       "  <uint8> <double>\n",
       "1       1      1.0\n",
       "2       2      2.0\n",
       "3       3      3.0\n",
       "4       4      NaN\n",
       "5       5    (nil)\n"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# reciever is required with the argument style\n",
    "df.pick(df.vectors.map(&:numeric?))\n",
    "\n",
    "# with a block\n",
    "df.pick { vectors.map(&:numeric?) }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e51f07c0-54eb-4114-8cd6-63c7780e7248",
   "metadata": {},
   "source": [
    "The name `pick` comes from the action to pick variables(columns) according to the label keys."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c1815e4-de6c-425e-8602-b8dd66836250",
   "metadata": {},
   "source": [
    "## 24. Reject columns by drop"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1ab045e-66f9-4922-8bf2-35aee7f2812e",
   "metadata": {
    "tags": []
   },
   "source": [
    "`DataFrame#drop` accepts an Array keys to drop columns (variables) to create a remainer DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "7ccace08-62b0-4b0b-93fb-81edf673abf7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 2 vectors> <table><tr><th>y</th><th>s</th></tr><tr><td>1.0</td><td>A</td></tr><tr><td>2.0</td><td>B</td></tr><tr><td>3.0</td><td>C</td></tr><tr><td>NaN</td><td>D</td></tr><tr><td><i>(nil)</i></td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f4b0>\n",
       "         y s\n",
       "  <double> <string>\n",
       "1      1.0 A\n",
       "2      2.0 B\n",
       "3      3.0 C\n",
       "4      NaN D\n",
       "5    (nil) (nil)\n"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.drop(:x, :b)\n",
    "# df.drop([:x, :b]) #is OK too."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2085b349-95c5-4607-b029-f7c3d630ac1c",
   "metadata": {},
   "source": [
    "Or use a boolean Array of lengeh `n_key` to `drop`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "785c02f1-1e16-4722-9961-4b49223c8290",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 2 vectors> <table><tr><th>y</th><th>s</th></tr><tr><td>1.0</td><td>A</td></tr><tr><td>2.0</td><td>B</td></tr><tr><td>3.0</td><td>C</td></tr><tr><td>NaN</td><td>D</td></tr><tr><td><i>(nil)</i></td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f4c4>\n",
       "         y s\n",
       "  <double> <string>\n",
       "1      1.0 A\n",
       "2      2.0 B\n",
       "3      3.0 C\n",
       "4      NaN D\n",
       "5    (nil) (nil)\n"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.drop(true, false, false, true)\n",
    "# df.drop([true, false, false, true]) # is OK too"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d246161e-02cc-40fb-8921-26b37eb3956f",
   "metadata": {},
   "source": [
    "`#drop` also accepts a block in the context of self.\n",
    "\n",
    "Next example will drop variables which have nil or NaN values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "069932e3-d393-4ede-9eb5-7aac8625e0c0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 1 vector> <table><tr><th>x</th></tr><tr><td>1</td></tr><tr><td>2</td></tr><tr><td>3</td></tr><tr><td>4</td></tr><tr><td>5</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 1 Vector, 0x000000000000f4d8>\n",
       "        x\n",
       "  <uint8>\n",
       "1       1\n",
       "2       2\n",
       "3       3\n",
       "4       4\n",
       "5       5\n"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.drop { vectors.map { |v| v.is_na.any } }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88b064d6-7d90-4a0b-b9c8-d92e103269fb",
   "metadata": {},
   "source": [
    "Argument style is also acceptable but it requires the reciever 'df'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "3003a5c2-0966-4f2c-9643-59e8b546c8aa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 1 vector> <table><tr><th>x</th></tr><tr><td>1</td></tr><tr><td>2</td></tr><tr><td>3</td></tr><tr><td>4</td></tr><tr><td>5</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 1 Vector, 0x000000000000f4ec>\n",
       "        x\n",
       "  <uint8>\n",
       "1       1\n",
       "2       2\n",
       "3       3\n",
       "4       4\n",
       "5       5\n"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.drop(df.vectors.map { |v| v.is_na.any })"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6fce15c-d4a9-4281-9c07-457e78d3c13e",
   "metadata": {},
   "source": [
    "The name `drop` comes from the pair word of `pick`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f6dc86c-828d-4f9f-8b07-fce63c30fdca",
   "metadata": {},
   "source": [
    "## 25. Pick/drop and nil"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a108878-565b-400e-9a47-a15aae09429c",
   "metadata": {},
   "source": [
    "When `pick` or `drop` is used with booleans, nil in the booleans is treated as false. This behavior is aligned with Ruby's `BasicObject#!`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "7c01fbb4-9bfa-4afc-8e6b-45c97c0beb03",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "booleans = [true, true, false, nil]\n",
    "booleans_invert = booleans.map(&:!) # => [false, false, true, true] because nil.! is true\n",
    "df.pick(booleans) == df.drop(booleans_invert)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12a24264-9b7a-42a1-a541-e292e3876e35",
   "metadata": {},
   "source": [
    "## 26. Vector#invert, #primitive_invert\n",
    "\n",
    "For the boolean Vector;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "ea352e12-7e8a-43be-b8ac-797adbc47708",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=4):0x000000000000f500>\n",
       "[true, true, false, nil]\n"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vector = Vector.new(booleans)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a0f82e0-157b-4185-9254-0618be291f9b",
   "metadata": {},
   "source": [
    "nil is converted to nil by `Vector#invert`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "596c521f-12bf-4448-9e5d-e1b4a2c3d896",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=4):0x000000000000f514>\n",
       "[false, false, true, nil]\n"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vector.invert\n",
    "# or\n",
    "!vector"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1aec910-3055-4627-a02b-22d45f2ceb70",
   "metadata": {},
   "source": [
    "So `df.pick(booleans) != df.drop(booleans.invert)` when booleans have any nils.\n",
    "\n",
    "On the other hand, `Vector#primitive_invert` follows Ruby's `BasicObject#!`'s behavior. Then pick and drop keep 'MECE' behavior."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "4dcaba48-1cea-4ce9-b4a9-b079b43af7ec",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=4):0x000000000000f528>\n",
       "[false, false, true, true]\n"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vector.primitive_invert"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "c7ae4dad-275a-49e0-a0b0-bf3686248070",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pick(vector) == df.drop(vector.primitive_invert)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a6cec74-43f0-4a72-8262-25b1e311f602",
   "metadata": {},
   "source": [
    "## 27. Pick/drop and [ ]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32c8f74d-b3ce-4305-9af7-6ea70052c773",
   "metadata": {},
   "source": [
    "When `pick` or `drop` select a single column (variable), it returns a `DataFrame` with one column (variable)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "e13aee24-cac6-41ad-b8a3-0ec26edbe5d1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 1 vector> <table><tr><th>x</th></tr><tr><td>1</td></tr><tr><td>2</td></tr><tr><td>3</td></tr><tr><td>4</td></tr><tr><td>5</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 1 Vector, 0x000000000000f53c>\n",
       "        x\n",
       "  <uint8>\n",
       "1       1\n",
       "2       2\n",
       "3       3\n",
       "4       4\n",
       "5       5\n"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pick(:x) # or\n",
    "df.drop(:y, :s, :b)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e47b9d2-929e-4674-9690-0a1fdf7b0a7d",
   "metadata": {},
   "source": [
    "In contrast, when `[]` selects a single column (variable), it returns a `Vector`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "60d228be-7357-434d-9d39-ee72c110e6fe",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=5):0x000000000000f320>\n",
       "[1, 2, 3, 4, 5]\n"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[:x]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d973934-e08b-4b45-8efb-52f9167e7238",
   "metadata": {},
   "source": [
    "This behavior may be useful to use with DataFrame manipulation verbs (like pick, drop, slice, remove, assign, rename)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "6beefc5a-dc47-42cc-a283-456073c4251e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 2 vectors> <table><tr><th>x</th><th>y</th></tr><tr><td>1</td><td>1.0</td></tr><tr><td>2</td><td>2.0</td></tr><tr><td>3</td><td>3.0</td></tr><tr><td>4</td><td>NaN</td></tr><tr><td>5</td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f550>\n",
       "        x        y\n",
       "  <uint8> <double>\n",
       "1       1      1.0\n",
       "2       2      2.0\n",
       "3       3      3.0\n",
       "4       4      NaN\n",
       "5       5    (nil)\n"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pick { keys.select { |key| df[key].numeric? } }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34c9bcb0-889a-4190-b2b8-49765cd059c2",
   "metadata": {},
   "source": [
    "## 28. Slice"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a428ba8-c306-4ab8-8607-51174e8e6ebe",
   "metadata": {},
   "source": [
    "`slice` selects rows (observations) to create a subset of a DataFrame."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6016d6d4-72d6-4ae2-b7dd-3d526c91ae61",
   "metadata": {},
   "source": [
    "`slice(indeces)` accepts indices as arguments. Indices should be Integers, Floats or Ranges of Integers. Negative index from the tail like Ruby's Array is also acceptable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "9cdce2e4-7876-4be6-bd1f-bc8ab6e6c871",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <10 x 8 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.1</td><td>18.7</td><td>181</td><td>3750</td><td>male</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.5</td><td>17.4</td><td>186</td><td>3800</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>40.3</td><td>18.0</td><td>195</td><td>3250</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td>2007</td></tr><tr><td colspan='8'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>50.4</td><td>15.7</td><td>222</td><td>5750</td><td>male</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>45.2</td><td>14.8</td><td>212</td><td>5200</td><td>female</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>49.9</td><td>16.1</td><td>213</td><td>5400</td><td>male</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 10 x 8 Vectors, 0x000000000000f564>\n",
       "   species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "   <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       " 1 Adelie   Torgersen           39.1          18.7               181 ...     2007\n",
       " 2 Adelie   Torgersen           39.5          17.4               186 ...     2007\n",
       " 3 Adelie   Torgersen           40.3          18.0               195 ...     2007\n",
       " 4 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007\n",
       " 5 Adelie   Torgersen           36.7          19.3               193 ...     2007\n",
       " : :        :                      :             :                 : ...        :\n",
       " 8 Gentoo   Biscoe              50.4          15.7               222 ...     2009\n",
       " 9 Gentoo   Biscoe              45.2          14.8               212 ...     2009\n",
       "10 Gentoo   Biscoe              49.9          16.1               213 ...     2009\n"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# returns 5 rows from the start and 5 rows from the end\n",
    "penguins.slice(0...5, -5..-1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "93c3f6f0-7bc9-4909-8f32-20e8c1ddfd3a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <1 x 9 vectors> <table><tr><th>index</th><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>113</td><td>Adelie</td><td>Biscoe</td><td>42.2</td><td>19.5</td><td>197</td><td>4275</td><td>male</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 1 x 9 Vectors, 0x000000000000f578>\n",
       "     index species  island   bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "  <uint16> <string> <string>       <double>      <double>           <uint8> ... <uint16>\n",
       "1      113 Adelie   Biscoe             42.2          19.5               197 ...     2009\n"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# slice accepts Float index\n",
    "# 33% of 344 observations in index => 113.52 th data ??\n",
    "indexed_penguins = penguins.assign_left { [:index, indexes] } # #assign_left and assigner by Array is 0.2.0 feature\n",
    "indexed_penguins.slice(penguins.size * 0.33)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8139bb28-89f8-4058-b824-dde33ead0b60",
   "metadata": {},
   "source": [
    "Indices in Vectors or Arrow::Arrays are also acceptable."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f79db8c-c706-4d60-949b-3f644474d375",
   "metadata": {},
   "source": [
    "Another way to select in `slice` is to use booleans.\n",
    "- Booleans is an Array, Arrow::Array, Vector or their Array.\n",
    "- Each data type must be boolean.\n",
    "- Size of booleans must be same as the size of self."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "f58ca131-7375-4489-90ce-6ba54b898eb5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=344):0x000000000000f58c>\n",
       "[false, false, true, nil, false, false, false, false, false, true, false, false, ... ]\n"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# make booleans to check over 40\n",
    "booleans = penguins[:bill_length_mm] > 40"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "id": "176ab365-c66a-4712-97b9-4381a536321b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <242 x 8 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>Adelie</td><td>Torgersen</td><td>40.3</td><td>18.0</td><td>195</td><td>3250</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>42.0</td><td>20.2</td><td>190</td><td>4250</td><td><i>(nil)</i></td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>41.1</td><td>17.6</td><td>182</td><td>3200</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>42.5</td><td>20.7</td><td>197</td><td>4500</td><td>male</td><td>2007</td></tr><tr><td colspan='8'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>50.4</td><td>15.7</td><td>222</td><td>5750</td><td>male</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>45.2</td><td>14.8</td><td>212</td><td>5200</td><td>female</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>49.9</td><td>16.1</td><td>213</td><td>5400</td><td>male</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 242 x 8 Vectors, 0x000000000000f5a0>\n",
       "    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "    <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       "  1 Adelie   Torgersen           40.3          18.0               195 ...     2007\n",
       "  2 Adelie   Torgersen           42.0          20.2               190 ...     2007\n",
       "  3 Adelie   Torgersen           41.1          17.6               182 ...     2007\n",
       "  4 Adelie   Torgersen           42.5          20.7               197 ...     2007\n",
       "  5 Adelie   Torgersen           46.0          21.5               194 ...     2007\n",
       "  : :        :                      :             :                 : ...        :\n",
       "240 Gentoo   Biscoe              50.4          15.7               222 ...     2009\n",
       "241 Gentoo   Biscoe              45.2          14.8               212 ...     2009\n",
       "242 Gentoo   Biscoe              49.9          16.1               213 ...     2009\n"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.slice(booleans)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3264a182-6b72-461a-b712-c3b708c53516",
   "metadata": {},
   "source": [
    "`slice` accepts a block.\n",
    "- We can't use both arguments and a block at a same time.\n",
    "- The block should return indeces in any length or a boolean Array with a same length as `size`.\n",
    "- Block is called in the context of self. So reciever 'self' can be omitted in the block."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "c95d3426-0bbb-430e-8d83-6e22434d99ed",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <204 x 8 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.1</td><td>18.7</td><td>181</td><td>3750</td><td>male</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.5</td><td>17.4</td><td>186</td><td>3800</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>40.3</td><td>18.0</td><td>195</td><td>3250</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.3</td><td>20.6</td><td>190</td><td>3650</td><td>male</td><td>2007</td></tr><tr><td colspan='8'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>47.2</td><td>13.7</td><td>214</td><td>4925</td><td>female</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>46.8</td><td>14.3</td><td>215</td><td>4850</td><td>female</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>45.2</td><td>14.8</td><td>212</td><td>5200</td><td>female</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 204 x 8 Vectors, 0x000000000000f5b4>\n",
       "    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "    <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       "  1 Adelie   Torgersen           39.1          18.7               181 ...     2007\n",
       "  2 Adelie   Torgersen           39.5          17.4               186 ...     2007\n",
       "  3 Adelie   Torgersen           40.3          18.0               195 ...     2007\n",
       "  4 Adelie   Torgersen           39.3          20.6               190 ...     2007\n",
       "  5 Adelie   Torgersen           38.9          17.8               181 ...     2007\n",
       "  : :        :                      :             :                 : ...        :\n",
       "202 Gentoo   Biscoe              47.2          13.7               214 ...     2009\n",
       "203 Gentoo   Biscoe              46.8          14.3               215 ...     2009\n",
       "204 Gentoo   Biscoe              45.2          14.8               212 ...     2009\n"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# return a DataFrame with bill_length_mm is in 2*std range around mean\n",
    "penguins.slice do\n",
    "  vector = self[:bill_length_mm]\n",
    "  min = vector.mean - vector.std\n",
    "  max = vector.mean + vector.std\n",
    "  vector.to_a.map { |e| (min..max).include? e }\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fa42801-64f5-4432-856b-85c26a68515d",
   "metadata": {},
   "source": [
    "## 29. Slice and nil option"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31017a7e-0923-4283-bc92-246ebe2591c3",
   "metadata": {},
   "source": [
    "`Arrow::Table#slice` uses `#filter` method with a option `Arrow::FilterOptions.null_selection_behavior = :emit_null`. This will propagate nil at the same row."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "id": "8e4a8108-154b-4621-acd1-704ddf229d61",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<Arrow::Table:0x7f54b8439518 ptr=0x55d81a8f6f30>\n",
       "\t     a\tb\t         c\n",
       "0\t     1\tA\t  1.000000\n",
       "1\t(null)\t(null)\t    (null)\n"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hash = { a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3] }\n",
    "table = Arrow::Table.new(hash)\n",
    "table.slice([true, false, nil])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbb57c5a-e949-42b8-a82c-9affb3fe5b7b",
   "metadata": {},
   "source": [
    "Whereas in RedAmber, `DataFrame#slice` with booleans containing nil is treated as false. This behavior comes from `Allow::FilterOptions.null_selection_behavior = :drop`. This is a default value for `Arrow::Table.filter` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "id": "851c3bf6-b9e9-41bd-92c5-5372ed934549",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<Arrow::Table:0x7f54b83fdf40 ptr=0x55d81abf1a70>\n",
       "\ta\tb\t         c\n",
       "0\t1\tA\t  1.000000\n"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "RedAmber::DataFrame.new(table).slice([true, false, nil]).table"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56398a3d-6146-43af-8b96-fec37730fc49",
   "metadata": {},
   "source": [
    "## 30. Remove"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e042a97-8a5d-412e-8e4a-fda382225a2d",
   "metadata": {},
   "source": [
    "Slice and reject rows (observations) to create a remainer DataFrame."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b4cbb97-eef3-4db8-8f25-c44c208ec554",
   "metadata": {},
   "source": [
    "`#remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "id": "17e38ab8-886b-4114-bcaf-ee18df7d00cd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <334 x 8 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.3</td><td>20.6</td><td>190</td><td>3650</td><td>male</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>38.9</td><td>17.8</td><td>181</td><td>3625</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.2</td><td>19.6</td><td>195</td><td>4675</td><td>male</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>34.1</td><td>18.1</td><td>193</td><td>3475</td><td><i>(nil)</i></td><td>2007</td></tr><tr><td colspan='8'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>44.5</td><td>15.7</td><td>217</td><td>4875</td><td><i>(nil)</i></td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>48.8</td><td>16.2</td><td>222</td><td>6000</td><td>male</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>47.2</td><td>13.7</td><td>214</td><td>4925</td><td>female</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 334 x 8 Vectors, 0x000000000000f5c8>\n",
       "    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "    <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       "  1 Adelie   Torgersen           39.3          20.6               190 ...     2007\n",
       "  2 Adelie   Torgersen           38.9          17.8               181 ...     2007\n",
       "  3 Adelie   Torgersen           39.2          19.6               195 ...     2007\n",
       "  4 Adelie   Torgersen           34.1          18.1               193 ...     2007\n",
       "  5 Adelie   Torgersen           42.0          20.2               190 ...     2007\n",
       "  : :        :                      :             :                 : ...        :\n",
       "332 Gentoo   Biscoe              44.5          15.7               217 ...     2009\n",
       "333 Gentoo   Biscoe              48.8          16.2               222 ...     2009\n",
       "334 Gentoo   Biscoe              47.2          13.7               214 ...     2009\n"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# returns 6th to 339th obs. Remainer of 1st example of #30\n",
    "penguins.remove(0...5, -5..-1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "def1c1c4-6b60-4864-ae24-c797fbf008a7",
   "metadata": {},
   "source": [
    "`remove(booleans)` accepts booleans as a argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `#size`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "id": "6f169420-7eb2-457f-8d59-7a5c90aa3fa5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <333 x 8 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.1</td><td>18.7</td><td>181</td><td>3750</td><td>male</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.5</td><td>17.4</td><td>186</td><td>3800</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>40.3</td><td>18.0</td><td>195</td><td>3250</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>36.7</td><td>19.3</td><td>193</td><td>3450</td><td>female</td><td>2007</td></tr><tr><td colspan='8'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>50.4</td><td>15.7</td><td>222</td><td>5750</td><td>male</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>45.2</td><td>14.8</td><td>212</td><td>5200</td><td>female</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>49.9</td><td>16.1</td><td>213</td><td>5400</td><td>male</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 333 x 8 Vectors, 0x000000000000f5dc>\n",
       "    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "    <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       "  1 Adelie   Torgersen           39.1          18.7               181 ...     2007\n",
       "  2 Adelie   Torgersen           39.5          17.4               186 ...     2007\n",
       "  3 Adelie   Torgersen           40.3          18.0               195 ...     2007\n",
       "  4 Adelie   Torgersen           36.7          19.3               193 ...     2007\n",
       "  5 Adelie   Torgersen           39.3          20.6               190 ...     2007\n",
       "  : :        :                      :             :                 : ...        :\n",
       "331 Gentoo   Biscoe              50.4          15.7               222 ...     2009\n",
       "332 Gentoo   Biscoe              45.2          14.8               212 ...     2009\n",
       "333 Gentoo   Biscoe              49.9          16.1               213 ...     2009\n"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# remove all observation contains nil\n",
    "removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f1864c9-4ae4-4fcd-9840-ea424ef5e27d",
   "metadata": {},
   "source": [
    "`remove {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as size. Block is called in the context of self."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "id": "a6807c65-25e5-4ee1-8d1b-6018c46b3999",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <140 x 8 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>Adelie</td><td>Torgersen</td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>36.7</td><td>19.3</td><td>193</td><td>3450</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>34.1</td><td>18.1</td><td>193</td><td>3475</td><td><i>(nil)</i></td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>37.8</td><td>17.1</td><td>186</td><td>3300</td><td><i>(nil)</i></td><td>2007</td></tr><tr><td colspan='8'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>50.4</td><td>15.7</td><td>222</td><td>5750</td><td>male</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>49.9</td><td>16.1</td><td>213</td><td>5400</td><td>male</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 140 x 8 Vectors, 0x000000000000f5f0>\n",
       "    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "    <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       "  1 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007\n",
       "  2 Adelie   Torgersen           36.7          19.3               193 ...     2007\n",
       "  3 Adelie   Torgersen           34.1          18.1               193 ...     2007\n",
       "  4 Adelie   Torgersen           37.8          17.1               186 ...     2007\n",
       "  5 Adelie   Torgersen           37.8          17.3               180 ...     2007\n",
       "  : :        :                      :             :                 : ...        :\n",
       "138 Gentoo   Biscoe             (nil)         (nil)             (nil) ...     2009\n",
       "139 Gentoo   Biscoe              50.4          15.7               222 ...     2009\n",
       "140 Gentoo   Biscoe              49.9          16.1               213 ...     2009\n"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Remove data in 2*std range around mean\n",
    "penguins.remove do\n",
    "  vector = self[:bill_length_mm]\n",
    "  min = vector.mean - vector.std\n",
    "  max = vector.mean + vector.std\n",
    "  vector.to_a.map { |e| (min..max).include? e }\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "591e6b22-da98-4336-b22e-c7bc9bcf2ebf",
   "metadata": {},
   "source": [
    "## 31. Remove and nil"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67926d1b-c76e-4cb7-b679-6545d850e7e4",
   "metadata": {},
   "source": [
    "When `remove` used with booleans, nil in booleans is treated as false. This behavior is aligned with Ruby's `nil#!`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "id": "8575614e-f702-4ee4-ac7b-745e9b32e803",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 3 vectors> <table><tr><th>a</th><th>b</th><th>c</th></tr><tr><td>1</td><td>A</td><td>1.0</td></tr><tr><td>2</td><td>B</td><td>2.0</td></tr><tr><td><i>(nil)</i></td><td>C</td><td>3.0</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000f604>\n",
       "        a b               c\n",
       "  <uint8> <string> <double>\n",
       "1       1 A             1.0\n",
       "2       2 B             2.0\n",
       "3   (nil) C             3.0\n"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = RedAmber::DataFrame.new(a: [1, 2, nil], b: %w[A B C], c: [1.0, 2, 3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "id": "932a5e71-8cef-44e5-a789-ce97329bc001",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=3):0x000000000000f618>\n",
       "[true, false, nil]\n"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "booleans = df[:a] < 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "id": "74cf6aa6-8913-433d-97ad-bba2d548afe5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[false, true, true]"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "booleans_invert = booleans.to_a.map(&:!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "id": "5e466a06-cb17-4dc1-a5b0-34bfd3ffb78b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.slice(booleans) == df.remove(booleans_invert)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8bca0b06-2d08-4c28-8b4c-4fd088f2d2d3",
   "metadata": {},
   "source": [
    "Whereas `Vector#invert` returns nil for elements nil. This will bring different result. (See #26)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "id": "077b216f-0a08-413e-95c9-12789d15a9ba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=3):0x000000000000f62c>\n",
       "[false, true, nil]\n"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "booleans.invert"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "id": "b3df62a6-c4a3-44cb-bde6-f6be12b120c8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <2 x 3 vectors> <table><tr><th>a</th><th>b</th><th>c</th></tr><tr><td>1</td><td>A</td><td>1.0</td></tr><tr><td><i>(nil)</i></td><td>C</td><td>3.0</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000f640>\n",
       "        a b               c\n",
       "  <uint8> <string> <double>\n",
       "1       1 A             1.0\n",
       "2   (nil) C             3.0\n"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.remove(booleans.invert)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e05f00b6-3bae-4650-8bbc-d4e0692f6f85",
   "metadata": {},
   "source": [
    "We have `#primitive_invert` method in Vector. This method returns the same result as `.to_a.map(&:!)` above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "id": "296ca3cd-a6da-4603-a576-d8c36a810e4f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=3):0x000000000000f654>\n",
       "[false, true, true]\n"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "booleans.primitive_invert"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "id": "ba5b8c0b-b94e-4209-adcd-258ea3b87bfd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <1 x 3 vectors> <table><tr><th>a</th><th>b</th><th>c</th></tr><tr><td>1</td><td>A</td><td>1.0</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000000f668>\n",
       "        a b               c\n",
       "  <uint8> <string> <double>\n",
       "1       1 A             1.0\n"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.remove(booleans.primitive_invert)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "id": "2446792f-0b0a-4642-acae-b4fec89261c1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.slice(booleans) == df.remove(booleans.primitive_invert)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c23a4ad-0c17-4178-b58a-abfd8153d49b",
   "metadata": {},
   "source": [
    "## 32. Remove nil"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84c7238b-1029-416f-b495-9d045f77b22c",
   "metadata": {},
   "source": [
    "Remove any observations containing nil."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "id": "de4bb615-d14d-4c90-ab54-db2f375b9f00",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <333 x 8 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.1</td><td>18.7</td><td>181</td><td>3750</td><td>male</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.5</td><td>17.4</td><td>186</td><td>3800</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>40.3</td><td>18.0</td><td>195</td><td>3250</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>36.7</td><td>19.3</td><td>193</td><td>3450</td><td>female</td><td>2007</td></tr><tr><td colspan='8'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>50.4</td><td>15.7</td><td>222</td><td>5750</td><td>male</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>45.2</td><td>14.8</td><td>212</td><td>5200</td><td>female</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>49.9</td><td>16.1</td><td>213</td><td>5400</td><td>male</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 333 x 8 Vectors, 0x000000000000f67c>\n",
       "    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "    <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       "  1 Adelie   Torgersen           39.1          18.7               181 ...     2007\n",
       "  2 Adelie   Torgersen           39.5          17.4               186 ...     2007\n",
       "  3 Adelie   Torgersen           40.3          18.0               195 ...     2007\n",
       "  4 Adelie   Torgersen           36.7          19.3               193 ...     2007\n",
       "  5 Adelie   Torgersen           39.3          20.6               190 ...     2007\n",
       "  : :        :                      :             :                 : ...        :\n",
       "331 Gentoo   Biscoe              50.4          15.7               222 ...     2009\n",
       "332 Gentoo   Biscoe              45.2          14.8               212 ...     2009\n",
       "333 Gentoo   Biscoe              49.9          16.1               213 ...     2009\n"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.remove_nil"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a4ae8f9-dcf8-4dad-bb77-af076e9cadb5",
   "metadata": {},
   "source": [
    "The roundabout way for this is to use `#remove`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "id": "27a3da5f-0ea2-4c5d-a6c3-c0e20f2224a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <333 x 8 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.1</td><td>18.7</td><td>181</td><td>3750</td><td>male</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.5</td><td>17.4</td><td>186</td><td>3800</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>40.3</td><td>18.0</td><td>195</td><td>3250</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>36.7</td><td>19.3</td><td>193</td><td>3450</td><td>female</td><td>2007</td></tr><tr><td colspan='8'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>50.4</td><td>15.7</td><td>222</td><td>5750</td><td>male</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>45.2</td><td>14.8</td><td>212</td><td>5200</td><td>female</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>49.9</td><td>16.1</td><td>213</td><td>5400</td><td>male</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 333 x 8 Vectors, 0x000000000000f690>\n",
       "    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "    <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       "  1 Adelie   Torgersen           39.1          18.7               181 ...     2007\n",
       "  2 Adelie   Torgersen           39.5          17.4               186 ...     2007\n",
       "  3 Adelie   Torgersen           40.3          18.0               195 ...     2007\n",
       "  4 Adelie   Torgersen           36.7          19.3               193 ...     2007\n",
       "  5 Adelie   Torgersen           39.3          20.6               190 ...     2007\n",
       "  : :        :                      :             :                 : ...        :\n",
       "331 Gentoo   Biscoe              50.4          15.7               222 ...     2009\n",
       "332 Gentoo   Biscoe              45.2          14.8               212 ...     2009\n",
       "333 Gentoo   Biscoe              49.9          16.1               213 ...     2009\n"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.remove { vectors.map(&:is_nil).reduce(&:|) }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f2a58fd-f033-44f6-9eb4-ed893a2b5d1d",
   "metadata": {},
   "source": [
    "## 33. Rename"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0d39506-8ae5-48e7-9dd2-acf38d4ec1a9",
   "metadata": {},
   "source": [
    "Rename keys (column names) to create a updated DataFrame."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f6924ec-e86c-4089-ae40-6783027d3ce0",
   "metadata": {},
   "source": [
    "`#rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}` or an Array of Array `[[existing_key, new_key], ...]` ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "id": "9396c96d-83d7-4b92-a4ca-27bc9e4d7b9d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 2 vectors> <table><tr><th>name</th><th>age</th></tr><tr><td>Yasuko</td><td>68</td></tr><tr><td>Rui</td><td>49</td></tr><tr><td>Hinata</td><td>28</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f6a4>\n",
       "  name         age\n",
       "  <string> <uint8>\n",
       "1 Yasuko        68\n",
       "2 Rui           49\n",
       "3 Hinata        28\n"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "h = { name: %w[Yasuko Rui Hinata], age: [68, 49, 28] }\n",
    "comecome = RedAmber::DataFrame.new(h)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "id": "fad279c6-1ca0-4493-bd69-0e9ef011bff7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 2 vectors> <table><tr><th>name</th><th>age_in_1993</th></tr><tr><td>Yasuko</td><td>68</td></tr><tr><td>Rui</td><td>49</td></tr><tr><td>Hinata</td><td>28</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f6b8>\n",
       "  name     age_in_1993\n",
       "  <string>     <uint8>\n",
       "1 Yasuko            68\n",
       "2 Rui               49\n",
       "3 Hinata            28\n"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "comecome.rename(:age => :age_in_1993)\n",
    "# comecome.rename(:age, :age_in_1993) # is also OK\n",
    "# comecome.rename([:age, :age_in_1993]) # is also OK"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9dabb005-9822-4c4b-aaa5-fa6f28f2ed43",
   "metadata": {},
   "source": [
    "`#rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}` or an Array of Array `[[existing_key, new_key], ...]`. Block is called in the context of self."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aabbba20-6ef8-4da2-8dc0-0cb243cf3b23",
   "metadata": {},
   "source": [
    "Symbol key and String key are distinguished."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07f98b31-6123-4466-b4f8-f995c7cde474",
   "metadata": {},
   "source": [
    "## 34. Assign"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99f6787f-2b36-4360-b155-1c2d7874d25e",
   "metadata": {},
   "source": [
    "Assign new or updated columns (variables) and create a updated DataFrame.\n",
    "\n",
    "- Columns with new keys will append new variables at right (bottom in TDR).\n",
    "- Columns with exisiting keys will update corresponding vectors."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4b22da0-4ee2-4196-88e1-1cfea6a72f4d",
   "metadata": {},
   "source": [
    "`#assign(key_pairs)` accepts pairs of key and array_like values as arguments. The pairs should be a Hash of `{key => array_like}` or an Array of Array `[[key, array_like], ... ]`. `array_like` is one of `Vector`, `Array` or `Arrow::Array`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "id": "56dcfed8-a6f9-4d8c-bac3-e8ce7c0674a7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 2 vectors> <table><tr><th>name</th><th>age</th></tr><tr><td>Yasuko</td><td>68</td></tr><tr><td>Rui</td><td>49</td></tr><tr><td>Hinata</td><td>28</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f6cc>\n",
       "  name         age\n",
       "  <string> <uint8>\n",
       "1 Yasuko        68\n",
       "2 Rui           49\n",
       "3 Hinata        28\n"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "comecome = RedAmber::DataFrame.new( name: %w[Yasuko Rui Hinata], age: [68, 49, 28] )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "id": "8da8d282-8798-44d5-bb7b-7fa2df922308",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 3 vectors> <table><tr><th>name</th><th>age</th><th>brother</th></tr><tr><td>Yasuko</td><td>97</td><td>Santa</td></tr><tr><td>Rui</td><td>78</td><td><i>(nil)</i></td></tr><tr><td>Hinata</td><td>57</td><td>Momotaro</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000f6e0>\n",
       "  name         age brother\n",
       "  <string> <uint8> <string>\n",
       "1 Yasuko        97 Santa\n",
       "2 Rui           78 (nil)\n",
       "3 Hinata        57 Momotaro\n"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# update :age and add :brother\n",
    "assigner = { age: [97, 78, 57], brother: ['Santa', nil, 'Momotaro'] }\n",
    "comecome.assign(assigner)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6d3ddfc-b16d-4b20-83df-357e9cdb32e6",
   "metadata": {},
   "source": [
    "`#assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and array_like values as a Hash of `{key => array_like}` or an Array of Array `[[key, array_like], ... ]`. `array_like` is one of `Vector`, `Array` or `Arrow::Array`. Block is called in the context of self."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "id": "8d69edd0-7ad7-4318-8033-1785ce2543db",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 3 vectors> <table><tr><th>index</th><th>float</th><th>string</th></tr><tr><td>0</td><td>0.0</td><td>A</td></tr><tr><td>1</td><td>1.1</td><td>B</td></tr><tr><td>2</td><td>2.2</td><td>C</td></tr><tr><td>3</td><td>NaN</td><td>D</td></tr><tr><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f6f4>\n",
       "    index    float string\n",
       "  <uint8> <double> <string>\n",
       "1       0      0.0 A\n",
       "2       1      1.1 B\n",
       "3       2      2.2 C\n",
       "4       3      NaN D\n",
       "5   (nil)    (nil) (nil)\n"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = RedAmber::DataFrame.new(\n",
    "  index: [0, 1, 2, 3, nil],\n",
    "  float: [0.0, 1.1,  2.2, Float::NAN, nil],\n",
    "  string: ['A', 'B', 'C', 'D', nil])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "id": "e884af01-d82b-42e7-8e92-62baf19919cb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 3 vectors> <table><tr><th>index</th><th>float</th><th>string</th></tr><tr><td>0</td><td>-0.0</td><td>A</td></tr><tr><td>255</td><td>-1.1</td><td>B</td></tr><tr><td>254</td><td>-2.2</td><td>C</td></tr><tr><td>253</td><td>NaN</td><td>D</td></tr><tr><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f708>\n",
       "    index    float string\n",
       "  <uint8> <double> <string>\n",
       "1       0     -0.0 A\n",
       "2     255     -1.1 B\n",
       "3     254     -2.2 C\n",
       "4     253      NaN D\n",
       "5   (nil)    (nil) (nil)\n"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# update numeric variables\n",
    "df.assign do\n",
    "  vectors.select(&:numeric?).map { |v| [v.key, -v] }\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b8e2090-628f-4b17-8929-cbb5e0285dff",
   "metadata": {},
   "source": [
    "In this example, columns :x and :y are updated. Column :x returns complements for #negate method because :x is :uint8 type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "id": "9452f8db-5f23-4044-ac87-ac5695fae8ae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[:uint8, :double, :string]"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.types"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5c83559-f4d8-4ed2-8b20-5c50eb1faa14",
   "metadata": {},
   "source": [
    "## 35. Coerce (Vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77bdfc69-b728-4335-b76e-e4be92f94310",
   "metadata": {},
   "source": [
    "Vector has coerce method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "id": "2bfbe584-be54-486b-af32-e76b37c10e49",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=3):0x000000000000f71c>\n",
       "[1, 2, 3]\n"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vector = RedAmber::Vector.new(1,2,3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "id": "ce35d901-38a8-4f13-b2d1-29b83f6c5438",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:int16, size=3):0x000000000000f730>\n",
       "[-1, -2, -3]\n"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Vector's `#*` method\n",
    "vector * -1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "id": "7d5fc2be-f590-4678-92e9-faa27b618266",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:int16, size=3):0x000000000000f744>\n",
       "[-1, -2, -3]\n"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# coerced calculation\n",
    "-1 * vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "id": "fa90a6af-add7-42f2-9707-7d726575aeb6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=3):0x000000000000f758>\n",
       "[255, 254, 253]\n"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# `@-` operator\n",
    "-vector"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4820b527-44e9-4738-aa0e-73604078b3b0",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 36. to_ary (Vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8507dcc4-74e3-44ad-aa54-cf43d55f2131",
   "metadata": {},
   "source": [
    "`Vector#to_ary` will enable implicit conversion to an Array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "id": "b12bd7c8-2981-426c-8ae3-154504a8ea15",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[3, 4, 5]"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Array(Vector.new([3, 4, 5]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "id": "c0cb5a98-7cdf-43a8-b2f7-f9df1961c761",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 2, 3, 4, 5]"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[1, 2] + Vector.new([3, 4, 5])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "216dde4f-e4d8-4f29-903a-8cbf75de5b8e",
   "metadata": {},
   "source": [
    "## 37. Fill nil (Vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1959d0d7-6d09-4fa5-9365-1e2f7fc35d61",
   "metadata": {},
   "source": [
    "`Vector#fill_nil_forward` or `Vector#fill_nil_backward` will\n",
    "propagate the last valid observation forward (or backward).\n",
    "Or preserve nil if all previous values are nil or at the end."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "id": "d003b06a-859f-4de0-9e35-803efac85169",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=5):0x000000000000f76c>\n",
       "[0, 1, 1, 3, 3]\n"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "integer = Vector.new([0, 1, nil, 3, nil])\n",
    "integer.fill_nil_forward"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "id": "c5d74006-d364-4e86-8a5e-9e96e87a96e0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=5):0x000000000000f780>\n",
       "[0, 1, 3, 3, nil]\n"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "integer.fill_nil_backward"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "347785a6-eab0-4864-a871-2c320005211e",
   "metadata": {},
   "source": [
    "## 38. all?/any? (Vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f82a6f5d-03d3-4645-85f5-d25999165378",
   "metadata": {},
   "source": [
    "`Vector#all?` returns true if all elements is true.\n",
    "\n",
    "`Vector#any?` returns true if exists any true.\n",
    "\n",
    "These are unary aggregation function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "id": "ebad37ad-0a09-48b1-ba3a-4e030a917837",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "booleans = Vector.new([true, true, nil])\n",
    "booleans.all?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "id": "97fc24da-03d4-406d-b353-562896775d60",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "booleans.any?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ff3b22d-9f7c-42f2-8d18-c89a06af681b",
   "metadata": {},
   "source": [
    "If these methods are used with option `skip_nulls: false` nil is considered."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "id": "3e0e5800-665a-4a05-b2cb-d152f3f077de",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "false"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "booleans.all?(skip_nulls: false)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "id": "3e43f0c4-a254-4735-ac28-de14d2670c67",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "booleans.any?(skip_nulls: false)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "abc71a85-7958-4a21-91cf-8c96c0784525",
   "metadata": {},
   "source": [
    "## 39. count/count_uniq (Vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d556118-4105-4d12-806d-ba56c6ae3d1b",
   "metadata": {},
   "source": [
    "`Vector#count` counts element.\n",
    "\n",
    "`Vector#count_uniq` counts unique element. `#count_distinct` is an alias (Arrow's name).\n",
    "\n",
    "These are unary aggregation function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "id": "2af73e32-1d7e-4f80-b54e-c40ef08b7034",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string = Vector.new(%w[A B A])\n",
    "string.count"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "id": "fe6d8d85-27b0-438f-b1b4-1b15e9eb05f9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string.count_uniq # count_distinct is also OK"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70abed9f-665a-4ea7-939e-4b185ee53755",
   "metadata": {},
   "source": [
    "## 40. stddev/variance (Vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "965de338-b3be-4d33-92e1-5ad7e2ed18f0",
   "metadata": {},
   "source": [
    "These are unary element-wise function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "id": "0afec200-f377-432b-a260-ae5a0c5ce794",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.816496580927726"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "integers = Vector.new([1, 2, 3, nil])\n",
    "integers.stddev"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "id": "2e40ac09-cb7f-4978-87e8-53f84f16f7c7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Unbiased standard deviation\n",
    "integers.sd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "id": "e6158e3b-4af8-467c-a355-8e9f2e579548",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.6666666666666666"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "integers.variance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "id": "d64d39f2-d979-49f1-9946-65890f40d646",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Unbiased variance\n",
    "integers.var"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25023f5a-798a-40a5-ab84-a6615602f747",
   "metadata": {},
   "source": [
    "## 41. negate (Vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00ddf322-ef50-40a1-86a6-22bf3d43f007",
   "metadata": {},
   "source": [
    "These are unary element-wise function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "id": "ab5a357a-e98c-40a1-9b89-0b38645e416f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=3):0x000000000000f794>\n",
       "[-1.0, 2.0, -3.0]\n"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double = Vector.new([1.0, -2, 3])\n",
    "double.negate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "id": "8a06c856-d61c-4752-a296-1fa207ffd9a1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=3):0x000000000000f7a8>\n",
       "[-1.0, 2.0, -3.0]\n"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Same as #negate\n",
    "-double"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b145724-d165-4ef3-8a06-2948dd0c7dbb",
   "metadata": {},
   "source": [
    "## 42. round (Vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b780c2f3-935c-4b2f-b18a-b277cf7c24b7",
   "metadata": {},
   "source": [
    "Otions for `#round`;\n",
    "\n",
    "- `:n-digits` The number of digits to show.\n",
    "- `round_mode` Specify rounding mode.\n",
    "\n",
    "This is a unary element-wise function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "id": "e7a069b0-3547-4cd2-a2f0-0740f186b191",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f7bc>\n",
       "[15.15, 2.5, 3.5, -4.5, -5.5]\n"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double = RedAmber::Vector.new([15.15, 2.5, 3.5, -4.5, -5.5])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "id": "5ee84b24-8830-4788-a404-d5e1cca22abf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f7d0>\n",
       "[15.0, 2.0, 4.0, -4.0, -6.0]\n"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.round"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "id": "20adb1ad-473c-4245-b959-7848c239fb76",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f7e4>\n",
       "[15.0, 2.0, 4.0, -4.0, -6.0]\n"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.round(mode: :half_to_even)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "id": "d2777ad8-2c24-48e4-8f5f-77403e3109ea",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f7f8>\n",
       "[16.0, 3.0, 4.0, -5.0, -6.0]\n"
      ]
     },
     "execution_count": 122,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.round(mode: :towards_infinity)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "id": "a8ab2735-74cb-4cfe-a5a2-61bfa90c72ac",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f80c>\n",
       "[15.0, 3.0, 4.0, -4.0, -5.0]\n"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.round(mode: :half_up)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "id": "3575481c-40ed-405f-a69c-7581d4dce2cf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f820>\n",
       "[15.0, 2.0, 3.0, -4.0, -5.0]\n"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.round(mode: :half_towards_zero)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "id": "a86e4c5c-aced-4a88-b692-4e26b90f1653",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f834>\n",
       "[15.0, 3.0, 4.0, -5.0, -6.0]\n"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.round(mode: :half_towards_infinity)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "id": "73f51bab-ff46-4b99-96a5-8c6547ad9d35",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f848>\n",
       "[15.0, 3.0, 3.0, -5.0, -5.0]\n"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.round(mode: :half_to_odd)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "id": "a12c684c-4a63-4dac-a81b-969978812a24",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f85c>\n",
       "[15.0, 2.0, 4.0, -4.0, -6.0]\n"
      ]
     },
     "execution_count": 127,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.round(n_digits: 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "id": "17370f2b-0957-411b-8145-56aa9fc956ac",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f870>\n",
       "[15.2, 2.5, 3.5, -4.5, -5.5]\n"
      ]
     },
     "execution_count": 128,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.round(n_digits: 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "id": "53072cff-b28b-4672-b30a-8ca37562bc21",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f884>\n",
       "[20.0, 0.0, 0.0, -0.0, -10.0]\n"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.round(n_digits: -1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51dedfce-51c7-4e5b-b890-a90ad9cf7596",
   "metadata": {},
   "source": [
    "## 43. and/or (Vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2c4869b-6ebf-476c-b2fd-a4b9c0638dc5",
   "metadata": {},
   "source": [
    "RedAmber select `and_kleene`/`or_kleene` as default `&`/`|` method.\n",
    "\n",
    "These are unary element-wise function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "id": "2d4f5853-1ed9-4d8b-87a9-b5c1faac5fae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=9):0x000000000000f898>\n",
       "[true, false, nil, false, false, false, nil, false, nil]\n"
      ]
     },
     "execution_count": 130,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bool_self  = Vector.new([true, true, true, false, false, false, nil, nil, nil])\n",
    "bool_other = Vector.new([true, false, nil, true, false, nil, true, false, nil])\n",
    "\n",
    "bool_self & bool_other  # same as bool_self.and_kleene(bool_other)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "id": "236c9733-8d45-467e-b288-e6c18b9c39d2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=9):0x000000000000f8ac>\n",
       "[true, false, nil, true, false, nil, true, false, nil]\n"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Ruby's primitive `&&`\n",
    "bool_self && bool_other"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "id": "4e984a9c-7d9c-465d-bf26-0c685dedd4bf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=9):0x000000000000f8c0>\n",
       "[true, false, nil, false, false, nil, nil, nil, nil]\n"
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Arrow's default `and`\n",
    "bool_self.and_org(bool_other)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "id": "0120ebf5-355d-41f5-83d5-49b9802f337b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=9):0x000000000000f8d4>\n",
       "[true, true, true, true, false, nil, true, nil, nil]\n"
      ]
     },
     "execution_count": 133,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bool_self | bool_other  # same as bool_self.or_kleene(bool_other)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "id": "24ceee23-79df-4fcd-afd8-f3839a087785",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=9):0x000000000000f8e8>\n",
       "[true, true, true, false, false, false, nil, nil, nil]\n"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Ruby's primitive `||`\n",
    "bool_self || bool_other"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "id": "c152d04b-71a0-4b18-acd1-b5ab9e413d00",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=9):0x000000000000f8fc>\n",
       "[true, true, nil, true, false, nil, nil, nil, nil]\n"
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Arrow's default `or`\n",
    "bool_self.or_org(bool_other)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "beede237-c5ed-4e12-a432-ec7e4546d786",
   "metadata": {},
   "source": [
    "## 44. is_finite/is_nan/is_nil/is_na (Vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77418efd-c0d7-4d63-a7db-2d43fafd386e",
   "metadata": {},
   "source": [
    "These are unary element-wise function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "id": "19558f9e-fdc4-46e5-90d0-724e4e8fbd8e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000f910>\n",
       "[3.141592653589793, Infinity, -Infinity, NaN, nil]\n"
      ]
     },
     "execution_count": 136,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double = Vector.new([Math::PI, Float::INFINITY, -Float::INFINITY, Float::NAN, nil])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "id": "d90a7168-1f87-4363-9589-c1f161babc7d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=5):0x000000000000f924>\n",
       "[true, false, false, false, nil]\n"
      ]
     },
     "execution_count": 137,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.is_finite"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "id": "7d88049b-695f-4b0c-a105-8fb5797a58b1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=5):0x000000000000f938>\n",
       "[false, true, true, false, nil]\n"
      ]
     },
     "execution_count": 138,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.is_inf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "id": "7d86a7b5-84bf-4031-9811-4076281920cf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=5):0x000000000000f94c>\n",
       "[false, false, false, true, true]\n"
      ]
     },
     "execution_count": 139,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.is_na"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "id": "d562f826-7a37-4c57-8f92-777555987246",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=5):0x000000000000f960>\n",
       "[false, false, false, false, true]\n"
      ]
     },
     "execution_count": 140,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.is_nil"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "id": "e460dc6b-e48f-4462-9ce8-aa6069ebae27",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:boolean, size=5):0x000000000000f974>\n",
       "[true, true, true, true, false]\n"
      ]
     },
     "execution_count": 141,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "double.is_valid"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2cca75eb-f0e8-4f85-89cb-3601512e76b0",
   "metadata": {},
   "source": [
    "## 45. Prime-th rows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "id": "0751e820-a22d-45b5-9005-df523d2353be",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <68 x 9 vectors> <table><tr><th>index</th><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>2</td><td>Adelie</td><td>Torgersen</td><td>39.5</td><td>17.4</td><td>186</td><td>3800</td><td>female</td><td>2007</td></tr><tr><td>3</td><td>Adelie</td><td>Torgersen</td><td>40.3</td><td>18.0</td><td>195</td><td>3250</td><td>female</td><td>2007</td></tr><tr><td>5</td><td>Adelie</td><td>Torgersen</td><td>36.7</td><td>19.3</td><td>193</td><td>3450</td><td>female</td><td>2007</td></tr><tr><td>7</td><td>Adelie</td><td>Torgersen</td><td>38.9</td><td>17.8</td><td>181</td><td>3625</td><td>female</td><td>2007</td></tr><tr><td colspan='9'>&#8942;</td></tr><tr><td>317</td><td>Gentoo</td><td>Biscoe</td><td>49.4</td><td>15.8</td><td>216</td><td>4925</td><td>male</td><td>2009</td></tr><tr><td>331</td><td>Gentoo</td><td>Biscoe</td><td>50.5</td><td>15.2</td><td>216</td><td>5000</td><td>female</td><td>2009</td></tr><tr><td>337</td><td>Gentoo</td><td>Biscoe</td><td>44.5</td><td>15.7</td><td>217</td><td>4875</td><td><i>(nil)</i></td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 68 x 9 Vectors, 0x000000000000f988>\n",
       "      index species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "   <uint16> <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       " 1        2 Adelie   Torgersen           39.5          17.4               186 ...     2007\n",
       " 2        3 Adelie   Torgersen           40.3          18.0               195 ...     2007\n",
       " 3        5 Adelie   Torgersen           36.7          19.3               193 ...     2007\n",
       " 4        7 Adelie   Torgersen           38.9          17.8               181 ...     2007\n",
       " 5       11 Adelie   Torgersen           37.8          17.1               186 ...     2007\n",
       " :        : :        :                      :             :                 : ...        :\n",
       "66      317 Gentoo   Biscoe              49.4          15.8               216 ...     2009\n",
       "67      331 Gentoo   Biscoe              50.5          15.2               216 ...     2009\n",
       "68      337 Gentoo   Biscoe              44.5          15.7               217 ...     2009\n"
      ]
     },
     "execution_count": 142,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# prime-th rows ... Don't ask me what it means.\n",
    "require 'prime'\n",
    "penguins.assign_left(:index, Vector.new(penguins.indices) + 1) # since 0.2.0\n",
    "        .slice { Vector.new(Prime.each(size).to_a) - 1 }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9e8de1a-ad8f-4fdc-a65c-4d3db7123530",
   "metadata": {},
   "source": [
    "## 46. Slice by Enumerator"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32dd53a3-a822-4ae1-afe2-b5aa2bfbd3e3",
   "metadata": {},
   "source": [
    "Slice accepts Enumerator as an option."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "id": "b2a118fa-f3c0-4f31-9b45-6db27ccbebe6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <35 x 8 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.1</td><td>18.7</td><td>181</td><td>3750</td><td>male</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>37.8</td><td>17.1</td><td>186</td><td>3300</td><td><i>(nil)</i></td><td>2007</td></tr><tr><td>Adelie</td><td>Biscoe</td><td>37.8</td><td>18.3</td><td>174</td><td>3400</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Dream</td><td>39.5</td><td>16.7</td><td>178</td><td>3250</td><td>female</td><td>2007</td></tr><tr><td colspan='8'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>48.5</td><td>15.0</td><td>219</td><td>4850</td><td>female</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>50.5</td><td>15.2</td><td>216</td><td>5000</td><td>female</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>46.8</td><td>14.3</td><td>215</td><td>4850</td><td>female</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 35 x 8 Vectors, 0x000000000000f99c>\n",
       "   species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "   <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       " 1 Adelie   Torgersen           39.1          18.7               181 ...     2007\n",
       " 2 Adelie   Torgersen           37.8          17.1               186 ...     2007\n",
       " 3 Adelie   Biscoe              37.8          18.3               174 ...     2007\n",
       " 4 Adelie   Dream               39.5          16.7               178 ...     2007\n",
       " 5 Adelie   Dream               36.5          18.0               182 ...     2007\n",
       " : :        :                      :             :                 : ...        :\n",
       "33 Gentoo   Biscoe              48.5          15.0               219 ...     2009\n",
       "34 Gentoo   Biscoe              50.5          15.2               216 ...     2009\n",
       "35 Gentoo   Biscoe              46.8          14.3               215 ...     2009\n"
      ]
     },
     "execution_count": 143,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select every 10 samples\n",
    "penguins.slice(0.step by: 10, to: 340)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db312c2c-3a7c-4765-bfad-b3313b173a79",
   "metadata": {},
   "source": [
    "## 47. Output mode"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "714ed8df-5aa3-4ac4-8b0d-6390aff73c8c",
   "metadata": {},
   "source": [
    "Output mode of `#inspect` and `#to_iruby` is Table mode by default. If you prefer TDR mode set the environment variable `RED_AMBER_OUTPUT_MODE` to `\"TDR\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 144,
   "id": "a721804b-006e-44c6-8d38-885eae747eaa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <344 x 8 vectors> <table><tr><th>species</th><th>island</th><th>bill_length_mm</th><th>bill_depth_mm</th><th>flipper_length_mm</th><th>body_mass_g</th><th>sex</th><th>year</th></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.1</td><td>18.7</td><td>181</td><td>3750</td><td>male</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>39.5</td><td>17.4</td><td>186</td><td>3800</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td>40.3</td><td>18.0</td><td>195</td><td>3250</td><td>female</td><td>2007</td></tr><tr><td>Adelie</td><td>Torgersen</td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td><i>(nil)</i></td><td>2007</td></tr><tr><td colspan='8'>&#8942;</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>50.4</td><td>15.7</td><td>222</td><td>5750</td><td>male</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>45.2</td><td>14.8</td><td>212</td><td>5200</td><td>female</td><td>2009</td></tr><tr><td>Gentoo</td><td>Biscoe</td><td>49.9</td><td>16.1</td><td>213</td><td>5400</td><td>male</td><td>2009</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 344 x 8 Vectors, 0x000000000000f2bc>\n",
       "    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year\n",
       "    <string> <string>        <double>      <double>           <uint8> ... <uint16>\n",
       "  1 Adelie   Torgersen           39.1          18.7               181 ...     2007\n",
       "  2 Adelie   Torgersen           39.5          17.4               186 ...     2007\n",
       "  3 Adelie   Torgersen           40.3          18.0               195 ...     2007\n",
       "  4 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007\n",
       "  5 Adelie   Torgersen           36.7          19.3               193 ...     2007\n",
       "  : :        :                      :             :                 : ...        :\n",
       "342 Gentoo   Biscoe              50.4          15.7               222 ...     2009\n",
       "343 Gentoo   Biscoe              45.2          14.8               212 ...     2009\n",
       "344 Gentoo   Biscoe              49.9          16.1               213 ...     2009\n"
      ]
     },
     "execution_count": 144,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ENV['RED_AMBER_OUTPUT_MODE'] = 'Table' # or nil (default)\n",
    "penguins  # Almost same as `puts penguins.to_s` in any mode"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 145,
   "id": "e4c9f70c-a4b1-4a81-bbc4-e9b14a6b6cb0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#<RedAmber::DataFrame : 344 x 8 Vectors, 0x000000000000f2bc>\n",
      "Vectors : 5 numeric, 3 strings\n",
      "# key                type   level data_preview\n",
      "1 :species           string     3 {\"Adelie\"=>152, \"Chinstrap\"=>68, \"Gentoo\"=>124}\n",
      "2 :island            string     3 {\"Torgersen\"=>52, \"Biscoe\"=>168, \"Dream\"=>124}\n",
      "3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils\n",
      " ... 5 more Vectors ...\n",
      "\n"
     ]
    }
   ],
   "source": [
    "ENV['RED_AMBER_OUTPUT_MODE'] = 'TDR'\n",
    "p penguins; nil # Almost same as `penguins.tdr` in any mode"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "id": "2786e9a7-e321-43c5-b56e-9f2ca9d62f8b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RedAmber::DataFrame : 344 x 8 Vectors\n",
       "Vectors : 5 numeric, 3 strings\n",
       "# key                type   level data_preview\n",
       "1 :species           string     3 {\"Adelie\"=>152, \"Chinstrap\"=>68, \"Gentoo\"=>124}\n",
       "2 :island            string     3 {\"Torgersen\"=>52, \"Biscoe\"=>168, \"Dream\"=>124}\n",
       "3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils\n",
       "4 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils\n",
       "5 :flipper_length_mm uint8     56 [181, 186, 195, nil, 193, ... ], 2 nils\n",
       "6 :body_mass_g       uint16    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils\n",
       "7 :sex               string     3 {\"male\"=>168, \"female\"=>165, nil=>11}\n",
       "8 :year              uint16     3 {2007=>110, 2008=>114, 2009=>120}\n"
      ]
     },
     "execution_count": 146,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "id": "b00c858b-b14a-492b-bc22-d6a707bcc1ba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"Table\""
      ]
     },
     "execution_count": 147,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ENV['RED_AMBER_OUTPUT_MODE'] = 'Table'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6781904-c014-4ef0-86c5-8a758ca3ca1c",
   "metadata": {},
   "source": [
    "## 48. Empty key"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38912e38-bddd-47cc-adfd-5c4f780636ea",
   "metadata": {},
   "source": [
    "Empty key `:\"\"` will be automatically renamed to `:unnamed1`.\n",
    "\n",
    "If `:unnamed1` was used, `:unnamed1.succ` will be used.\n",
    "\n",
    "(Since 0.1.8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 148,
   "id": "13569004-bb23-45fa-8d11-fe5f367641a6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <2 x 2 vectors> <table><tr><th>unnamed2</th><th>unnamed1</th></tr><tr><td>1</td><td>3</td></tr><tr><td>2</td><td>4</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 2 x 2 Vectors, 0x000000000000f9b0>\n",
       "  unnamed2 unnamed1\n",
       "   <uint8>  <uint8>\n",
       "1        1        3\n",
       "2        2        4\n"
      ]
     },
     "execution_count": 148,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = DataFrame.new(\"\": [1, 2], unnamed1: [3, 4])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4343343-d59a-4b71-b08f-4b737a6d4258",
   "metadata": {},
   "source": [
    "## 49. Grouping"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ddf0937-7b0e-4ee6-8116-8e78c308e76f",
   "metadata": {},
   "source": [
    "`DataFrame#group` takes group_key as an argument, and creates `Group` class. \n",
    "\n",
    "Group class inspects counts of each unique elements. \n",
    "\n",
    "(Since 0.1.7)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "id": "ee602e52-7988-4fab-b5e3-c466acf01c98",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Group:0x000000000000f9c4\n",
       "{:species=>{\"Adelie\"=>152, \"Chinstrap\"=>68, \"Gentoo\"=>124}}>"
      ]
     },
     "execution_count": 149,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group = penguins.group(:species)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a48ab0e9-60e3-4c07-a27b-9c53783318c1",
   "metadata": {},
   "source": [
    "The instance of `Group` class has methods to summary functions.\n",
    "\n",
    "It returns `function(key)` style summarized columns as a result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "id": "20b23ada-b895-4921-b57b-8d46b451e494",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 8 vectors> <table><tr><th>species</th><th>count(island)</th><th>count(bill_length_mm)</th><th>count(bill_depth_mm)</th><th>count(flipper_length_mm)</th><th>count(body_mass_g)</th><th>count(sex)</th><th>count(year)</th></tr><tr><td>Adelie</td><td>152</td><td>151</td><td>151</td><td>151</td><td>151</td><td>146</td><td>152</td></tr><tr><td>Chinstrap</td><td>68</td><td>68</td><td>68</td><td>68</td><td>68</td><td>68</td><td>68</td></tr><tr><td>Gentoo</td><td>124</td><td>123</td><td>123</td><td>123</td><td>123</td><td>119</td><td>124</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 8 Vectors, 0x000000000000f9d8>\n",
       "  species   count(island) count(bill_length_mm) count(bill_depth_mm) ... count(year)\n",
       "  <string>        <int64>               <int64>              <int64> ...     <int64>\n",
       "1 Adelie              152                   151                  151 ...         152\n",
       "2 Chinstrap            68                    68                   68 ...          68\n",
       "3 Gentoo              124                   123                  123 ...         124\n"
      ]
     },
     "execution_count": 150,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group.count"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "584d8fa1-1e70-4f31-bdc2-5d4956a28cb5",
   "metadata": {},
   "source": [
    "If count result is same in multiple columns, count column is aggregated to one column `:count`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "id": "e6936488-9f23-47bd-8492-537c5be1afb3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 2 vectors> <table><tr><th>species</th><th>count</th></tr><tr><td>Adelie</td><td>151</td></tr><tr><td>Chinstrap</td><td>68</td></tr><tr><td>Gentoo</td><td>123</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f9ec>\n",
       "  species     count\n",
       "  <string>  <int64>\n",
       "1 Adelie        151\n",
       "2 Chinstrap      68\n",
       "3 Gentoo        123\n"
      ]
     },
     "execution_count": 151,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.pick(:species, :bill_length_mm, :bill_depth_mm).group(:species).count"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c90b59b-918c-492c-ac5b-46b62b31d136",
   "metadata": {},
   "source": [
    "Grouping key comes first (leftmost) in the columns."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21c8a6fa-3f95-41ea-817f-ffc1011e8df9",
   "metadata": {},
   "source": [
    "## 50. Grouping with a block"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db89c8b4-8189-410d-abe9-eac8f440065c",
   "metadata": {},
   "source": [
    "`DataFrame#group` takes a block and we can specify multiple functions.\n",
    "\n",
    "Inside the block is the context of instance of Group. So we can use summary functions without the reciever.\n",
    "\n",
    "(Since 0.1.8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "id": "913f576b-ec86-4e94-af05-7c656ea24cc2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 3 vectors> <table><tr><th>species</th><th>count</th><th>mean(body_mass_g)</th></tr><tr><td>Adelie</td><td>152</td><td>3700.662251655629</td></tr><tr><td>Chinstrap</td><td>68</td><td>3733.0882352941176</td></tr><tr><td>Gentoo</td><td>124</td><td>5076.016260162602</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000fa00>\n",
       "  species     count mean(body_mass_g)\n",
       "  <string>  <int64>          <double>\n",
       "1 Adelie        152           3700.66\n",
       "2 Chinstrap      68           3733.09\n",
       "3 Gentoo        124           5076.02\n"
      ]
     },
     "execution_count": 152,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.group(:species) { [count(:species), mean(:body_mass_g)] }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6dc3206-d327-4242-9908-253ad4a74759",
   "metadata": {},
   "source": [
    "`Group#summarize` accepts same block as `DataFrame#group`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 153,
   "id": "67c7fc55-7b30-469c-bd0c-cda5732863fe",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 7 vectors> <table><tr><th>species</th><th>count</th><th>mean(bill_length_mm)</th><th>mean(bill_depth_mm)</th><th>mean(flipper_length_mm)</th><th>mean(body_mass_g)</th><th>mean(year)</th></tr><tr><td>Adelie</td><td>152</td><td>38.79139072847684</td><td>18.346357615894032</td><td>189.95364238410596</td><td>3700.662251655629</td><td>2008.0131578947369</td></tr><tr><td>Chinstrap</td><td>68</td><td>48.83382352941177</td><td>18.420588235294115</td><td>195.8235294117647</td><td>3733.0882352941176</td><td>2007.9705882352941</td></tr><tr><td>Gentoo</td><td>124</td><td>47.504878048780476</td><td>14.982113821138206</td><td>217.1869918699187</td><td>5076.016260162602</td><td>2008.0806451612902</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 7 Vectors, 0x000000000000fa14>\n",
       "  species     count mean(bill_length_mm) mean(bill_depth_mm) ... mean(year)\n",
       "  <string>  <int64>             <double>            <double> ...   <double>\n",
       "1 Adelie        152                38.79               18.35 ...    2008.01\n",
       "2 Chinstrap      68                48.83               18.42 ...    2007.97\n",
       "3 Gentoo        124                 47.5               14.98 ...    2008.08\n"
      ]
     },
     "execution_count": 153,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group.summarize { [count(:species), mean] }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88682c8d-0ab6-4cee-8e59-5b6ad0ae5dac",
   "metadata": {},
   "source": [
    "## 51. Vector#shift"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6dd1bc0-7146-4884-aac5-9874e829235e",
   "metadata": {},
   "source": [
    "`Vector#shift(amount = 1, fill: nil)`\n",
    "\n",
    "Shift vector's values by specified `amount`. Shifted space is filled by value `fill`.\n",
    "\n",
    "(Since 0.1.8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "id": "013f2db6-3e1d-481f-a908-57605729b51d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=5):0x000000000000fa28>\n",
       "[nil, 1, 2, 3, 4]\n"
      ]
     },
     "execution_count": 154,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vector = RedAmber::Vector.new([1, 2, 3, 4, 5])\n",
    "vector.shift"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "id": "7625acd7-d6a0-4775-b5e0-ca87f95f4f28",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:uint8, size=5):0x000000000000fa3c>\n",
       "[3, 4, 5, nil, nil]\n"
      ]
     },
     "execution_count": 155,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vector.shift(-2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 156,
   "id": "34a9ac2a-2e3f-44bc-8ba7-c4487dc3528e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "#<RedAmber::Vector(:double, size=5):0x000000000000fa50>\n",
       "[NaN, 1.0, 2.0, 3.0, 4.0]\n"
      ]
     },
     "execution_count": 156,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vector.shift(fill: Float::NAN)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39f305a6-5718-4e81-b08f-564fa93861cc",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 52. From the Pandas cookbook (if-then)\n",
    "\n",
    "https://pandas.pydata.org/docs/user_guide/cookbook.html#if-then"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f1d41dd-714a-4c68-af3b-a9080847f833",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "df = pd.DataFrame(\n",
    "    {\"AAA\": [4, 5, 6, 7], \"BBB\": [10, 20, 30, 40], \"CCC\": [100, 50, -30, -50]}\n",
    ")\n",
    "df.loc[df.AAA >= 5, \"BBB\"] = -1\n",
    "\n",
    "# returns =>\n",
    "   AAA  BBB  CCC\n",
    "0    4   10  100\n",
    "1    5   -1   50\n",
    "2    6   -1  -30\n",
    "3    7   -1  -50\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "id": "24774ccc-8f0f-4ce4-9ba0-bebed8781c38",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <4 x 3 vectors> <table><tr><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>4</td><td>10</td><td>100</td></tr><tr><td>5</td><td>-1</td><td>50</td></tr><tr><td>6</td><td>-1</td><td>-30</td></tr><tr><td>7</td><td>-1</td><td>-50</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 4 x 3 Vectors, 0x000000000000fa64>\n",
       "      AAA    BBB    CCC\n",
       "  <uint8> <int8> <int8>\n",
       "1       4     10    100\n",
       "2       5     -1     50\n",
       "3       6     -1    -30\n",
       "4       7     -1    -50\n"
      ]
     },
     "execution_count": 157,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Red Amber\n",
    "df = DataFrame.new(\n",
    "  \"AAA\": [4, 5, 6, 7], \"BBB\": [10, 20, 30, 40], \"CCC\": [100, 50, -30, -50]  # You can omit {}\n",
    ")\n",
    "\n",
    "df.assign(BBB: df[:BBB].replace(df[:AAA] >= 5, -1))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c76ca2e3-e239-440b-923b-dfabb1a1c007",
   "metadata": {},
   "source": [
    "If you want to replace both :BBB and :CCC ;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 158,
   "id": "3f97227b-cbee-4515-b76d-3514401967d9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <4 x 3 vectors> <table><tr><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>4</td><td>10</td><td>100</td></tr><tr><td>5</td><td>-1</td><td>-2</td></tr><tr><td>6</td><td>-1</td><td>-2</td></tr><tr><td>7</td><td>-1</td><td>-2</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 4 x 3 Vectors, 0x000000000000fa78>\n",
       "      AAA    BBB    CCC\n",
       "  <uint8> <int8> <int8>\n",
       "1       4     10    100\n",
       "2       5     -1     -2\n",
       "3       6     -1     -2\n",
       "4       7     -1     -2\n"
      ]
     },
     "execution_count": 158,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.assign do\n",
    "  replacer = v(:AAA) >= 5  # Boolean Vector\n",
    "  {\n",
    "    BBB: v(:BBB).replace(replacer, -1),\n",
    "    CCC: v(:CCC).replace(replacer, -2)\n",
    "  }\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4bf560f9-4393-4ed5-9994-24f8a93470fb",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 53. From the Pandas cookbook (Splitting)\n",
    "Split a frame with a boolean criterion\n",
    "\n",
    "https://pandas.pydata.org/docs/user_guide/cookbook.html#splitting"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcf06cad-4a34-45e2-b8f8-8422914eb319",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "df = pd.DataFrame(\n",
    "    {\"AAA\": [4, 5, 6, 7], \"BBB\": [10, 20, 30, 40], \"CCC\": [100, 50, -30, -50]}\n",
    ")\n",
    "df[df.AAA <= 5]\n",
    "\n",
    "# returns =>\n",
    "   AAA  BBB  CCC\n",
    "0    4   10  100\n",
    "1    5   20   50\n",
    "\n",
    "df[df.AAA > 5]\n",
    "\n",
    "# returns =>\n",
    "   AAA  BBB  CCC\n",
    "2    6   30  -30\n",
    "3    7   40  -50\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 159,
   "id": "b08e74d4-aba8-4fb5-a815-5d5384e92f81",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <2 x 3 vectors> <table><tr><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>4</td><td>10</td><td>100</td></tr><tr><td>5</td><td>20</td><td>50</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000fa8c>\n",
       "      AAA     BBB    CCC\n",
       "  <uint8> <uint8> <int8>\n",
       "1       4      10    100\n",
       "2       5      20     50\n"
      ]
     },
     "execution_count": 159,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Red Amber\n",
    "df = DataFrame.new(\n",
    "  # You can omit {}\n",
    "  \"AAA\": [4, 5, 6, 7], \"BBB\": [10, 20, 30, 40], \"CCC\": [100, 50, -30, -50]\n",
    ")\n",
    "\n",
    "df.slice(df[:AAA] <= 5)\n",
    "# df[df[:AAA] <= 5] # is also OK"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "id": "caa72796-ff7e-4275-849f-04698114ee08",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <2 x 3 vectors> <table><tr><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>6</td><td>30</td><td>-30</td></tr><tr><td>7</td><td>40</td><td>-50</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000faa0>\n",
       "      AAA     BBB    CCC\n",
       "  <uint8> <uint8> <int8>\n",
       "1       6      30    -30\n",
       "2       7      40    -50\n"
      ]
     },
     "execution_count": 160,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.remove(df[:AAA] <= 5)\n",
    "# df.slice(df[:AAA] > 5) # do the same thing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba7588e9-bbac-4547-a56c-3eea9f819460",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 54. From the Pandas cookbook (Building criteria)\n",
    "Split a frame with a boolean criterion\n",
    "\n",
    "https://pandas.pydata.org/docs/user_guide/cookbook.html#building-criteria"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6d01b08-1af7-47b0-a9e4-f27ab41fe24e",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "df = pd.DataFrame(\n",
    "    {\"AAA\": [4, 5, 6, 7], \"BBB\": [10, 20, 30, 40], \"CCC\": [100, 50, -30, -50]}\n",
    ")\n",
    "\n",
    "# and\n",
    "df.loc[(df[\"BBB\"] < 25) & (df[\"CCC\"] >= -40), \"AAA\"]\n",
    "\n",
    "# returns a series =>\n",
    "0    4\n",
    "1    5\n",
    "Name: AAA, dtype: int64\n",
    "\n",
    "# or\n",
    "df.loc[(df[\"BBB\"] > 25) | (df[\"CCC\"] >= -40), \"AAA\"]\n",
    "\n",
    "# returns a series =>\n",
    "0    4\n",
    "1    5\n",
    "2    6\n",
    "3    7\n",
    "Name: AAA, dtype: int64\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "id": "46066e96-91e5-4a96-9840-7e4ce6f06818",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <2 x 1 vector> <table><tr><th>AAA</th></tr><tr><td>4</td></tr><tr><td>5</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 2 x 1 Vector, 0x000000000000fab4>\n",
       "      AAA\n",
       "  <uint8>\n",
       "1       4\n",
       "2       5\n"
      ]
     },
     "execution_count": 161,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Red Amber\n",
    "df = DataFrame.new(\n",
    "  # You can omit {}\n",
    "  \"AAA\": [4, 5, 6, 7], \"BBB\": [10, 20, 30, 40], \"CCC\": [100, 50, -30, -50]\n",
    ")\n",
    "\n",
    "df.slice( (df[:BBB] < 25) & (df[:CCC] >= 40) ).pick(:AAA)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 162,
   "id": "dc8304c2-be28-420e-b2d5-a4b636eaac8b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <4 x 1 vector> <table><tr><th>AAA</th></tr><tr><td>4</td></tr><tr><td>5</td></tr><tr><td>6</td></tr><tr><td>7</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 4 x 1 Vector, 0x000000000000fac8>\n",
       "      AAA\n",
       "  <uint8>\n",
       "1       4\n",
       "2       5\n",
       "3       6\n",
       "4       7\n"
      ]
     },
     "execution_count": 162,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.slice( (df[:BBB] > 25) | (df[:CCC] >= 40) ).pick(:AAA)\n",
    "# df[ (df[:BBB] > 25) | (df[:CCC] >= 40) ][:AAA)] # also OK"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79616705-f497-4bb4-ad1d-5ee93c0093ce",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "# or (with assignment)\n",
    "df.loc[(df[\"BBB\"] > 25) | (df[\"CCC\"] >= 75), \"AAA\"] = 0.1\n",
    "df\n",
    "\n",
    "# returns a dataframe =>\n",
    "   AAA  BBB  CCC\n",
    "0  0.1   10  100\n",
    "1  5.0   20   50\n",
    "2  0.1   30  -30\n",
    "3  0.1   40  -50\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 163,
   "id": "7ab3b044-5a0f-4a38-8a42-aed1228b6462",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <4 x 3 vectors> <table><tr><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>0.1</td><td>10</td><td>100</td></tr><tr><td>5.0</td><td>20</td><td>50</td></tr><tr><td>0.1</td><td>30</td><td>-30</td></tr><tr><td>0.1</td><td>40</td><td>-50</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 4 x 3 Vectors, 0x000000000000fadc>\n",
       "       AAA     BBB    CCC\n",
       "  <double> <uint8> <int8>\n",
       "1      0.1      10    100\n",
       "2      5.0      20     50\n",
       "3      0.1      30    -30\n",
       "4      0.1      40    -50\n"
      ]
     },
     "execution_count": 163,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# df.assign(AAA: df[:AAA].replace((df[:BBB] > 25) | (df[:CCC] >= 75), 0.1)) # by one liner\n",
    "\n",
    "booleans = (df[:BBB] > 25) | (df[:CCC] >= 75)\n",
    "replaced = df[:AAA].replace(booleans, 0.1)\n",
    "df.assign(AAA: replaced)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cd4a47c-f619-462b-94c0-4c488761d5b0",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "# Select rows with data closest to certain value using argsort\n",
    "df = pd.DataFrame(\n",
    "    {\"AAA\": [4, 5, 6, 7], \"BBB\": [10, 20, 30, 40], \"CCC\": [100, 50, -30, -50]}\n",
    ")\n",
    "aValue = 43.0\n",
    "df.loc[(df.CCC - aValue).abs().argsort()]\n",
    "\n",
    "# returns a dataframe =>\n",
    "   AAA  BBB  CCC\n",
    "1    5   20   50\n",
    "0    4   10  100\n",
    "2    6   30  -30\n",
    "3    7   40  -50\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "id": "13cb1d45-2d13-4708-ad76-57efd72e609b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <4 x 3 vectors> <table><tr><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>5</td><td>20</td><td>50</td></tr><tr><td>4</td><td>10</td><td>100</td></tr><tr><td>6</td><td>30</td><td>-30</td></tr><tr><td>7</td><td>40</td><td>-50</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 4 x 3 Vectors, 0x000000000000faf0>\n",
       "      AAA     BBB    CCC\n",
       "  <uint8> <uint8> <int8>\n",
       "1       5      20     50\n",
       "2       4      10    100\n",
       "3       6      30    -30\n",
       "4       7      40    -50\n"
      ]
     },
     "execution_count": 164,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a_value = 43\n",
    "df[(df[:CCC] - a_value).abs.sort_indexes]\n",
    "# df.slice (df[:CCC] - a_value).abs.sort_indexes # also OK"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9330117b-40e0-4574-8900-7857622daad4",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "# Dynamically reduce a list of criteria using a binary operators\n",
    "df = pd.DataFrame(\n",
    "    {\"AAA\": [4, 5, 6, 7], \"BBB\": [10, 20, 30, 40], \"CCC\": [100, 50, -30, -50]}\n",
    ")\n",
    "Crit1 = df.AAA <= 5.5\n",
    "Crit2 = df.BBB == 10.0\n",
    "Crit3 = df.CCC > -40.0\n",
    "AllCrit = Crit1 & Crit2 & Crit3\n",
    "\n",
    "import functools\n",
    "\n",
    "CritList = [Crit1, Crit2, Crit3]\n",
    "AllCrit = functools.reduce(lambda x, y: x & y, CritList)\n",
    "df[AllCrit]\n",
    "\n",
    "# returns a dataframe =>\n",
    "   AAA  BBB  CCC\n",
    "0    4   10  100\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "id": "40336e62-d411-4655-8ec5-8d30876ada47",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <1 x 3 vectors> <table><tr><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>4</td><td>10</td><td>100</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000000fb04>\n",
       "      AAA     BBB    CCC\n",
       "  <uint8> <uint8> <int8>\n",
       "1       4      10    100\n"
      ]
     },
     "execution_count": 165,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "crit1 = df[:AAA] <= 5.5\n",
    "crit2 = df[:BBB] == 10.0\n",
    "crit3 = df[:CCC] >= -40.0\n",
    "df[crit1 & crit2 & crit3]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62e03375-573d-4368-a26e-1be3a4d58cf8",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 55. From the Pandas cookbook (Dataframes)\n",
    "\n",
    "https://pandas.pydata.org/docs/user_guide/cookbook.html#dataframes"
   ]
  },
  {
   "cell_type": "raw",
   "id": "55cee55f-7caf-423c-863a-187f96fa3072",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "# Using both row labels and value conditionals\n",
    "df = pd.DataFrame(\n",
    "    {\"AAA\": [4, 5, 6, 7], \"BBB\": [10, 20, 30, 40], \"CCC\": [100, 50, -30, -50]}\n",
    ")\n",
    "df[(df.AAA <= 6) & (df.index.isin([0, 2, 4]))] \n",
    "\n",
    "# returns =>\n",
    "   AAA  BBB  CCC\n",
    "0    4   10  100\n",
    "2    6   30  -30\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 166,
   "id": "272396f6-f689-43d5-ba34-e7c6ac0e1c28",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <2 x 3 vectors> <table><tr><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>4</td><td>10</td><td>100</td></tr><tr><td>6</td><td>30</td><td>-30</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000fb18>\n",
       "      AAA     BBB    CCC\n",
       "  <uint8> <uint8> <int8>\n",
       "1       4      10    100\n",
       "2       6      30    -30\n"
      ]
     },
     "execution_count": 166,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Red Amber\n",
    "df = DataFrame.new(\n",
    "  \"AAA\": [4, 5, 6, 7], \"BBB\": [10, 20, 30, 40], \"CCC\": [100, 50, -30, -50]\n",
    ")\n",
    "\n",
    "df[(df[:AAA] <= 6) & df.indices.map { |i| [0, 2, 4].include? i }]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26bfea62-12e4-4a01-976e-6184ecafa2fd",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "# Use loc for label-oriented slicing and iloc positional slicing GH2904\n",
    "df = pd.DataFrame(\n",
    "    {\"AAA\": [4, 5, 6, 7], \"BBB\": [10, 20, 30, 40], \"CCC\": [100, 50, -30, -50]},\n",
    "    index=[\"foo\", \"bar\", \"boo\", \"kar\"],\n",
    ")\n",
    "\n",
    "# There are 2 explicit slicing methods, with a third general case\n",
    "# 1. Positional-oriented (Python slicing style : exclusive of end)\n",
    "# 2. Label-oriented (Non-Python slicing style : inclusive of end)\n",
    "# 3. General (Either slicing style : depends on if the slice contains labels or positions)\n",
    "\n",
    "df.loc[\"bar\":\"kar\"]  # Label\n",
    "# returns =>\n",
    "     AAA  BBB  CCC\n",
    "bar    5   20   50\n",
    "boo    6   30  -30\n",
    "kar    7   40  -50\n",
    "\n",
    "# Generic\n",
    "df[0:3]\n",
    "# returns =>\n",
    "     AAA  BBB  CCC\n",
    "foo    4   10  100\n",
    "bar    5   20   50\n",
    "boo    6   30  -30\n",
    "\n",
    "df[\"bar\":\"kar\"]\n",
    "# returns =>\n",
    "     AAA  BBB  CCC\n",
    "bar    5   20   50\n",
    "boo    6   30  -30\n",
    "kar    7   40  -50\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "id": "ccabc137-33d3-47e8-ad09-88a285765380",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <4 x 4 vectors> <table><tr><th>index</th><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>foo</td><td>4</td><td>10</td><td>100</td></tr><tr><td>bar</td><td>5</td><td>20</td><td>50</td></tr><tr><td>boo</td><td>6</td><td>30</td><td>-30</td></tr><tr><td>kar</td><td>7</td><td>40</td><td>-50</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 4 x 4 Vectors, 0x000000000000fb2c>\n",
       "  index        AAA     BBB    CCC\n",
       "  <string> <uint8> <uint8> <int8>\n",
       "1 foo            4      10    100\n",
       "2 bar            5      20     50\n",
       "3 boo            6      30    -30\n",
       "4 kar            7      40    -50\n"
      ]
     },
     "execution_count": 167,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Red Amber does not have row index. Use a new column as indexes.\n",
    "labeled = df.assign_left(index: %w[foo bar boo kar])\n",
    "# labeled = df.assign(index: %w[foo bar boo kar]).pick { [keys[-1], keys[0...-1]] } # until v0.1.8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 168,
   "id": "f0871131-725e-4e33-a3cc-1fccd610a4b2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 4 vectors> <table><tr><th>index</th><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>bar</td><td>5</td><td>20</td><td>50</td></tr><tr><td>boo</td><td>6</td><td>30</td><td>-30</td></tr><tr><td>kar</td><td>7</td><td>40</td><td>-50</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 4 Vectors, 0x000000000000fb40>\n",
       "  index        AAA     BBB    CCC\n",
       "  <string> <uint8> <uint8> <int8>\n",
       "1 bar            5      20     50\n",
       "2 boo            6      30    -30\n",
       "3 kar            7      40    -50\n"
      ]
     },
     "execution_count": 168,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "labeled[1..3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "id": "dffc55f0-481e-4076-aa3f-89f1294655b9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 4 vectors> <table><tr><th>index</th><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>bar</td><td>5</td><td>20</td><td>50</td></tr><tr><td>boo</td><td>6</td><td>30</td><td>-30</td></tr><tr><td>kar</td><td>7</td><td>40</td><td>-50</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 4 Vectors, 0x000000000000fb54>\n",
       "  index        AAA     BBB    CCC\n",
       "  <string> <uint8> <uint8> <int8>\n",
       "1 bar            5      20     50\n",
       "2 boo            6      30    -30\n",
       "3 kar            7      40    -50\n"
      ]
     },
     "execution_count": 169,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "labeled.slice do\n",
    "  v = v(:index)\n",
    "  v.index(\"bar\")..v.index(\"kar\")\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d72b067c-d15d-499b-8d2a-8b79f5287e97",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "# Ambiguity arises when an index consists of integers with a non-zero start or non-unit increment.\n",
    "df2 = pd.DataFrame(data=data, index=[1, 2, 3, 4])  # Note index starts at 1.\n",
    "\n",
    "df2.iloc[1:3]  # Position-oriented\n",
    "# returns =>\n",
    "   AAA  BBB  CCC\n",
    "2    5   20   50\n",
    "3    6   30  -30\n",
    "\n",
    "df2.loc[1:3]  # Label-oriented\n",
    "# returns =>\n",
    "   AAA  BBB  CCC\n",
    "1    4   10  100\n",
    "2    5   20   50\n",
    "3    6   30  -30\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 170,
   "id": "f023a23a-ea1d-415c-b395-195801351433",
   "metadata": {},
   "outputs": [],
   "source": [
    "# RedAmber only have an implicit integer index 0...size,\n",
    "# does not happen any ambiguity unless you create a new column and use it for indexes :-)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5cada26c-a37a-4eff-bfa5-66635d278671",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "# Using inverse operator (~) to take the complement of a mask\n",
    "df[~((df.AAA <= 6) & (df.index.isin([0, 2, 4])))]\n",
    "\n",
    "# returns =>\n",
    "   AAA  BBB  CCC\n",
    "1    5   20   50\n",
    "3    7   40  -50\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 171,
   "id": "29b81efe-5b9b-4640-89b1-422ae94cf01d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <2 x 3 vectors> <table><tr><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>5</td><td>20</td><td>50</td></tr><tr><td>7</td><td>40</td><td>-50</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000fb68>\n",
       "      AAA     BBB    CCC\n",
       "  <uint8> <uint8> <int8>\n",
       "1       5      20     50\n",
       "2       7      40    -50\n"
      ]
     },
     "execution_count": 171,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# RedAmber offers #! method for boolean Vector.\n",
    "df[!((df[:AAA] <= 6) & df.indices.map { |i| [0, 2, 4].include? i })]\n",
    "\n",
    "# or\n",
    "# df[((df[:AAA] <= 6) & df.indices.map { |i| [0, 2, 4].include? i }).invert]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fb7103f-00ed-4e74-81e5-7d480004e681",
   "metadata": {},
   "source": [
    "If you have `nil` in your data, consider #primitive_invert for consistent result. See example #26."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1780372-b566-41f5-84d6-6213e3f9efa7",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 56. From the Pandas cookbook (New columns)\n",
    "\n",
    "https://pandas.pydata.org/docs/user_guide/cookbook.html#new-columns"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f5f9d8b-7550-44f6-9f79-64f940f5000a",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "# Efficiently and dynamically creating new columns using applymap\n",
    "df = pd.DataFrame({\"AAA\": [1, 2, 1, 3], \"BBB\": [1, 1, 2, 2], \"CCC\": [2, 1, 3, 1]})\n",
    "df\n",
    "\n",
    "# returns =>\n",
    "   AAA  BBB  CCC\n",
    "0    1    1    2\n",
    "1    2    1    1\n",
    "2    1    2    3\n",
    "3    3    2    1\n",
    "\n",
    "source_cols = df.columns  # Or some subset would work too\n",
    "new_cols = [str(x) + \"_cat\" for x in source_cols]\n",
    "categories = {1: \"Alpha\", 2: \"Beta\", 3: \"Charlie\"}\n",
    "df[new_cols] = df[source_cols].applymap(categories.get)\n",
    "df\n",
    "\n",
    "# returns =>\n",
    "   AAA  BBB  CCC  AAA_cat BBB_cat  CCC_cat\n",
    "0    1    1    2    Alpha   Alpha     Beta\n",
    "1    2    1    1     Beta   Alpha    Alpha\n",
    "2    1    2    3    Alpha    Beta  Charlie\n",
    "3    3    2    1  Charlie    Beta    Alpha\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "id": "265d63e6-0c01-4080-8d5b-c3153be595a5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <4 x 3 vectors> <table><tr><th>AAA</th><th>BBB</th><th>CCC</th></tr><tr><td>1</td><td>1</td><td>2</td></tr><tr><td>2</td><td>1</td><td>1</td></tr><tr><td>1</td><td>2</td><td>3</td></tr><tr><td>3</td><td>2</td><td>1</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 4 x 3 Vectors, 0x000000000000fb7c>\n",
       "      AAA     BBB     CCC\n",
       "  <uint8> <uint8> <uint8>\n",
       "1       1       1       2\n",
       "2       2       1       1\n",
       "3       1       2       3\n",
       "4       3       2       1\n"
      ]
     },
     "execution_count": 172,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# RedAmber\n",
    "df = DataFrame.new({\"AAA\": [1, 2, 1, 3], \"BBB\": [1, 1, 2, 2], \"CCC\": [2, 1, 3, 1]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 173,
   "id": "4be751d7-69a8-4400-b094-9932bf3d577b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <4 x 6 vectors> <table><tr><th>AAA</th><th>BBB</th><th>CCC</th><th>AAA_cat</th><th>BBB_cat</th><th>CCC_cat</th></tr><tr><td>1</td><td>1</td><td>2</td><td>Alpha</td><td>Alpha</td><td>Beta</td></tr><tr><td>2</td><td>1</td><td>1</td><td>Beta</td><td>Alpha</td><td>Alpha</td></tr><tr><td>1</td><td>2</td><td>3</td><td>Alpha</td><td>Beta</td><td>Charlie</td></tr><tr><td>3</td><td>2</td><td>1</td><td>Charlie</td><td>Beta</td><td>Alpha</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 4 x 6 Vectors, 0x000000000000fb90>\n",
       "      AAA     BBB     CCC AAA_cat  BBB_cat  CCC_cat\n",
       "  <uint8> <uint8> <uint8> <string> <string> <string>\n",
       "1       1       1       2 Alpha    Alpha    Beta\n",
       "2       2       1       1 Beta     Alpha    Alpha\n",
       "3       1       2       3 Alpha    Beta     Charlie\n",
       "4       3       2       1 Charlie  Beta     Alpha\n"
      ]
     },
     "execution_count": 173,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "categories = {1 => \"Alpha\", 2 => \"Beta\", 3 => \"Charlie\"}\n",
    "\n",
    "# Creating a Hash from keys\n",
    "df.assign do\n",
    "  keys.each_with_object({}) do |key, h|\n",
    "    h[\"#{key}_cat\"] = v(key).to_a.map { |x| categories[x] }\n",
    "  end\n",
    "end\n",
    "\n",
    "# Creating an Array from vectors, from v0.2.0\n",
    "df.assign do\n",
    "  vectors.map do |v|\n",
    "    [\"#{v.key}_cat\", v.to_a.map { |x| categories[x] } ]\n",
    "  end\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa8214e5-a897-406e-bb5e-95ad9fea0cdd",
   "metadata": {},
   "source": [
    "```python\n",
    "# by Python Pandas\n",
    "# Keep other columns when using min() with groupby\n",
    "df = pd.DataFrame(\n",
    "    {\"AAA\": [1, 1, 1, 2, 2, 2, 3, 3], \"BBB\": [2, 1, 3, 4, 5, 1, 2, 3]}\n",
    ")\n",
    "df\n",
    "\n",
    "# returns =>\n",
    "   AAA  BBB\n",
    "0    1    2\n",
    "1    1    1\n",
    "2    1    3\n",
    "3    2    4\n",
    "4    2    5\n",
    "5    2    1\n",
    "6    3    2\n",
    "7    3    3\n",
    "\n",
    "# Method 1 : idxmin() to get the index of the minimums\n",
    "df.loc[df.groupby(\"AAA\")[\"BBB\"].idxmin()]\n",
    "\n",
    "# returns =>\n",
    "   AAA  BBB\n",
    "1    1    1\n",
    "5    2    1\n",
    "6    3    2\n",
    "\n",
    "# Method 2 : sort then take first of each\n",
    "df.sort_values(by=\"BBB\").groupby(\"AAA\", as_index=False).first()\n",
    "\n",
    "# returns =>\n",
    "   AAA  BBB\n",
    "0    1    1\n",
    "1    2    1\n",
    "2    3    2\n",
    "\n",
    "# Notice the same results, with the exception of the index.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 174,
   "id": "5dd5f7dc-21a3-4397-9ef6-c5bc630ff858",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <8 x 2 vectors> <table><tr><th>AAA</th><th>BBB</th></tr><tr><td>1</td><td>2</td></tr><tr><td>1</td><td>1</td></tr><tr><td>1</td><td>3</td></tr><tr><td>2</td><td>4</td></tr><tr><td>2</td><td>5</td></tr><tr><td>2</td><td>1</td></tr><tr><td>3</td><td>2</td></tr><tr><td>3</td><td>3</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 8 x 2 Vectors, 0x000000000000fba4>\n",
       "      AAA     BBB\n",
       "  <uint8> <uint8>\n",
       "1       1       2\n",
       "2       1       1\n",
       "3       1       3\n",
       "4       2       4\n",
       "5       2       5\n",
       ":       :       :\n",
       "7       3       2\n",
       "8       3       3\n"
      ]
     },
     "execution_count": 174,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# RedAmber\n",
    "df = DataFrame.new(AAA: [1, 1, 1, 2, 2, 2, 3, 3], BBB: [2, 1, 3, 4, 5, 1, 2, 3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "id": "000c1632-8faf-407f-bb7d-e583ef573442",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <3 x 2 vectors> <table><tr><th>AAA</th><th>min(BBB)</th></tr><tr><td>1</td><td>1</td></tr><tr><td>2</td><td>1</td></tr><tr><td>3</td><td>2</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000fbb8>\n",
       "      AAA min(BBB)\n",
       "  <uint8>  <uint8>\n",
       "1       1        1\n",
       "2       2        1\n",
       "3       3        2\n"
      ]
     },
     "execution_count": 175,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.group(:AAA).min\n",
    "\n",
    "# Add `.rename { [keys[-1], :BBB] }` if you want."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5aa5b4ab-2fb3-4a22-804e-55d6351dd427",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 57. Summary/describe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 176,
   "id": "610be94e-b7ce-43f5-a5c1-ddef745d6bac",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 9 vectors> <table><tr><th>variables</th><th>count</th><th>mean</th><th>std</th><th>min</th><th>25%</th><th>median</th><th>75%</th><th>max</th></tr><tr><td>bill_length_mm</td><td>342</td><td>43.92192982456141</td><td>5.4595837139265315</td><td>32.1</td><td>39.225</td><td>44.382000000000005</td><td>48.5</td><td>59.6</td></tr><tr><td>bill_depth_mm</td><td>342</td><td>17.151169590643274</td><td>1.9747931568167814</td><td>13.1</td><td>15.6</td><td>17.32</td><td>18.7</td><td>21.5</td></tr><tr><td>flipper_length_mm</td><td>342</td><td>200.91520467836258</td><td>14.061713679356888</td><td>172.0</td><td>190.0</td><td>197.0</td><td>213.0</td><td>231.0</td></tr><tr><td>body_mass_g</td><td>342</td><td>4201.754385964912</td><td>801.9545356980955</td><td>2700.0</td><td>3550.0</td><td>4031.5</td><td>4750.0</td><td>6300.0</td></tr><tr><td>year</td><td>344</td><td>2008.0290697674418</td><td>0.8183559254837041</td><td>2007.0</td><td>2007.0</td><td>2008.0</td><td>2009.0</td><td>2009.0</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 9 Vectors, 0x000000000000fbcc>\n",
       "  variables            count     mean      std      min      25%   median ...      max\n",
       "  <dictionary>      <uint16> <double> <double> <double> <double> <double> ... <double>\n",
       "1 bill_length_mm         342    43.92     5.46     32.1    39.23    44.38 ...     59.6\n",
       "2 bill_depth_mm          342    17.15     1.97     13.1     15.6    17.32 ...     21.5\n",
       "3 flipper_length_mm      342   200.92    14.06    172.0    190.0    197.0 ...    231.0\n",
       "4 body_mass_g            342  4201.75   801.95   2700.0   3550.0   4031.5 ...   6300.0\n",
       "5 year                   344  2008.03     0.82   2007.0   2007.0   2008.0 ...   2009.0\n"
      ]
     },
     "execution_count": 176,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.summary\n",
    "# or\n",
    "penguins.describe"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "982a735e-c887-4c15-be82-4f3a28138fa7",
   "metadata": {},
   "source": [
    "## 58. Quantile"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0832e36-7281-4d3c-a35d-5a89aecc341e",
   "metadata": {},
   "source": [
    "`Vector#quantile(prob)` returns quantile at probability `prob`.\n",
    "\n",
    "(Since 0.2.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 177,
   "id": "88c6ed5d-b41e-4c17-b93f-47ce15702974",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "17.3"
      ]
     },
     "execution_count": 177,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins[:bill_depth_mm].quantile # default value is prob = 0.5"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07368c10-65d4-4081-ace7-12bebca493c2",
   "metadata": {},
   "source": [
    "`Vector#quantiles` accepts an Array for multiple quantiles. Returns a DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 178,
   "id": "3cbac413-fc36-42a2-abd3-1343e3b88467",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <2 x 2 vectors> <table><tr><th>probs</th><th>quantiles</th></tr><tr><td>0.05</td><td>13.9</td></tr><tr><td>0.95</td><td>20.0</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 2 x 2 Vectors, 0x000000000000fbe0>\n",
       "     probs quantiles\n",
       "  <double>  <double>\n",
       "1     0.05      13.9\n",
       "2     0.95      20.0\n"
      ]
     },
     "execution_count": 178,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins[:bill_depth_mm].quantiles([0.05, 0.95])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54800864-c33d-4fec-818b-1a12a7e4d015",
   "metadata": {},
   "source": [
    "## 59. Transpose"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "517281cf-3f29-4725-816c-5a866e9cc9cc",
   "metadata": {},
   "source": [
    "`DataFrame#transpose` creates transposed DataFrame for wide type dataframe.\n",
    "\n",
    "(Since 0.2.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 179,
   "id": "a2fe66cb-e86d-402e-8724-eced2420e3d0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 6 vectors> <table><tr><th>Year</th><th>Audi</th><th>BMW</th><th>BMW_MINI</th><th>Mercedes-Benz</th><th>VW</th></tr><tr><td>2021</td><td>22535</td><td>35905</td><td>18211</td><td>51722</td><td>35215</td></tr><tr><td>2020</td><td>22304</td><td>35712</td><td>20196</td><td>57041</td><td>36576</td></tr><tr><td>2019</td><td>24222</td><td>46814</td><td>23813</td><td>66553</td><td>46794</td></tr><tr><td>2018</td><td>26473</td><td>50982</td><td>25984</td><td>67554</td><td>51961</td></tr><tr><td>2017</td><td>28336</td><td>52527</td><td>25427</td><td>68221</td><td>49040</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000fbf4>\n",
       "     Year    Audi     BMW BMW_MINI Mercedes-Benz      VW\n",
       "  <int64> <int64> <int64>  <int64>       <int64> <int64>\n",
       "1    2021   22535   35905    18211         51722   35215\n",
       "2    2020   22304   35712    20196         57041   36576\n",
       "3    2019   24222   46814    23813         66553   46794\n",
       "4    2018   26473   50982    25984         67554   51961\n",
       "5    2017   28336   52527    25427         68221   49040\n"
      ]
     },
     "execution_count": 179,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import_cars = RedAmber::DataFrame.load('../test/entity/import_cars.tsv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 180,
   "id": "b4ef2b48-33de-4982-b082-1966749fcf65",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 6 vectors> <table><tr><th>name</th><th>2021</th><th>2020</th><th>2019</th><th>2018</th><th>2017</th></tr><tr><td>Audi</td><td>22535</td><td>22304</td><td>24222</td><td>26473</td><td>28336</td></tr><tr><td>BMW</td><td>35905</td><td>35712</td><td>46814</td><td>50982</td><td>52527</td></tr><tr><td>BMW_MINI</td><td>18211</td><td>20196</td><td>23813</td><td>25984</td><td>25427</td></tr><tr><td>Mercedes-Benz</td><td>51722</td><td>57041</td><td>66553</td><td>67554</td><td>68221</td></tr><tr><td>VW</td><td>35215</td><td>36576</td><td>46794</td><td>51961</td><td>49040</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000fc08>\n",
       "  name              2021     2020     2019     2018     2017\n",
       "  <dictionary>  <uint16> <uint16> <uint32> <uint32> <uint32>\n",
       "1 Audi             22535    22304    24222    26473    28336\n",
       "2 BMW              35905    35712    46814    50982    52527\n",
       "3 BMW_MINI         18211    20196    23813    25984    25427\n",
       "4 Mercedes-Benz    51722    57041    66553    67554    68221\n",
       "5 VW               35215    36576    46794    51961    49040\n"
      ]
     },
     "execution_count": 180,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import_cars.transpose"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "556e5c2d-bb01-4e16-9bf6-bdfd302d5b2a",
   "metadata": {},
   "source": [
    "You can specify index column by option `:key` even if it is at the middle of the original DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 181,
   "id": "31edb594-93fb-493f-8036-14e8273596ed",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 6 vectors> <table><tr><th>Audi</th><th>BMW</th><th>Year</th><th>BMW_MINI</th><th>Mercedes-Benz</th><th>VW</th></tr><tr><td>22535</td><td>35905</td><td>2021</td><td>18211</td><td>51722</td><td>35215</td></tr><tr><td>22304</td><td>35712</td><td>2020</td><td>20196</td><td>57041</td><td>36576</td></tr><tr><td>24222</td><td>46814</td><td>2019</td><td>23813</td><td>66553</td><td>46794</td></tr><tr><td>26473</td><td>50982</td><td>2018</td><td>25984</td><td>67554</td><td>51961</td></tr><tr><td>28336</td><td>52527</td><td>2017</td><td>25427</td><td>68221</td><td>49040</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000fc1c>\n",
       "     Audi     BMW    Year BMW_MINI Mercedes-Benz      VW\n",
       "  <int64> <int64> <int64>  <int64>       <int64> <int64>\n",
       "1   22535   35905    2021    18211         51722   35215\n",
       "2   22304   35712    2020    20196         57041   36576\n",
       "3   24222   46814    2019    23813         66553   46794\n",
       "4   26473   50982    2018    25984         67554   51961\n",
       "5   28336   52527    2017    25427         68221   49040\n"
      ]
     },
     "execution_count": 181,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = import_cars.pick { [keys[1..2], keys[0], keys[3..]] }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 182,
   "id": "ffa7cecc-5298-49d6-b483-b2b5cf2e1820",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 6 vectors> <table><tr><th>name</th><th>2021</th><th>2020</th><th>2019</th><th>2018</th><th>2017</th></tr><tr><td>Audi</td><td>22535</td><td>22304</td><td>24222</td><td>26473</td><td>28336</td></tr><tr><td>BMW</td><td>35905</td><td>35712</td><td>46814</td><td>50982</td><td>52527</td></tr><tr><td>BMW_MINI</td><td>18211</td><td>20196</td><td>23813</td><td>25984</td><td>25427</td></tr><tr><td>Mercedes-Benz</td><td>51722</td><td>57041</td><td>66553</td><td>67554</td><td>68221</td></tr><tr><td>VW</td><td>35215</td><td>36576</td><td>46794</td><td>51961</td><td>49040</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000fc30>\n",
       "  name              2021     2020     2019     2018     2017\n",
       "  <dictionary>  <uint16> <uint16> <uint32> <uint32> <uint32>\n",
       "1 Audi             22535    22304    24222    26473    28336\n",
       "2 BMW              35905    35712    46814    50982    52527\n",
       "3 BMW_MINI         18211    20196    23813    25984    25427\n",
       "4 Mercedes-Benz    51722    57041    66553    67554    68221\n",
       "5 VW               35215    36576    46794    51961    49040\n"
      ]
     },
     "execution_count": 182,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.transpose(key: :Year)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c0e6fd9-2355-43dc-bfdb-254fdbd405fb",
   "metadata": {},
   "source": [
    "## 60. To_long"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8970bf89-6841-44aa-8faa-8c2e97842a8d",
   "metadata": {},
   "source": [
    "`DataFrame#to_long(*keep_keys)` reshapes wide DataFrame to a longer DataFrame.\n",
    "\n",
    "- Parameter `keep_keys` specifies the key names to keep.\n",
    "\n",
    "(Since 0.2.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 183,
   "id": "f188cf35-3364-45aa-ad3c-1e079ed7b1a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 6 vectors> <table><tr><th>Year</th><th>Audi</th><th>BMW</th><th>BMW_MINI</th><th>Mercedes-Benz</th><th>VW</th></tr><tr><td>2021</td><td>22535</td><td>35905</td><td>18211</td><td>51722</td><td>35215</td></tr><tr><td>2020</td><td>22304</td><td>35712</td><td>20196</td><td>57041</td><td>36576</td></tr><tr><td>2019</td><td>24222</td><td>46814</td><td>23813</td><td>66553</td><td>46794</td></tr><tr><td>2018</td><td>26473</td><td>50982</td><td>25984</td><td>67554</td><td>51961</td></tr><tr><td>2017</td><td>28336</td><td>52527</td><td>25427</td><td>68221</td><td>49040</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000fc44>\n",
       "     Year    Audi     BMW BMW_MINI Mercedes-Benz      VW\n",
       "  <int64> <int64> <int64>  <int64>       <int64> <int64>\n",
       "1    2021   22535   35905    18211         51722   35215\n",
       "2    2020   22304   35712    20196         57041   36576\n",
       "3    2019   24222   46814    23813         66553   46794\n",
       "4    2018   26473   50982    25984         67554   51961\n",
       "5    2017   28336   52527    25427         68221   49040\n"
      ]
     },
     "execution_count": 183,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import_cars = RedAmber::DataFrame.load('../test/entity/import_cars.tsv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 184,
   "id": "bee776ae-7a82-41aa-8a31-c53af6b9ad9f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <25 x 3 vectors> <table><tr><th>Year</th><th>name</th><th>value</th></tr><tr><td>2021</td><td>Audi</td><td>22535</td></tr><tr><td>2021</td><td>BMW</td><td>35905</td></tr><tr><td>2021</td><td>BMW_MINI</td><td>18211</td></tr><tr><td>2021</td><td>Mercedes-Benz</td><td>51722</td></tr><tr><td colspan='3'>&#8942;</td></tr><tr><td>2017</td><td>BMW_MINI</td><td>25427</td></tr><tr><td>2017</td><td>Mercedes-Benz</td><td>68221</td></tr><tr><td>2017</td><td>VW</td><td>49040</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 25 x 3 Vectors, 0x000000000000fc58>\n",
       "       Year name             value\n",
       "   <uint16> <dictionary>  <uint32>\n",
       " 1     2021 Audi             22535\n",
       " 2     2021 BMW              35905\n",
       " 3     2021 BMW_MINI         18211\n",
       " 4     2021 Mercedes-Benz    51722\n",
       " 5     2021 VW               35215\n",
       " :        : :                    :\n",
       "23     2017 BMW_MINI         25427\n",
       "24     2017 Mercedes-Benz    68221\n",
       "25     2017 VW               49040\n"
      ]
     },
     "execution_count": 184,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import_cars.to_long(:Year)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "771aac21-e739-4eea-9d75-cc99f691f87e",
   "metadata": {},
   "source": [
    "- Option `:name` : key of the column which is come **from key names**.\n",
    "- Option `:value` : key of the column which is come **from values**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 185,
   "id": "b7f7aeab-d545-4a28-82ce-db844029cc9c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <25 x 3 vectors> <table><tr><th>Year</th><th>Manufacturer</th><th>Num_of_imported</th></tr><tr><td>2021</td><td>Audi</td><td>22535</td></tr><tr><td>2021</td><td>BMW</td><td>35905</td></tr><tr><td>2021</td><td>BMW_MINI</td><td>18211</td></tr><tr><td>2021</td><td>Mercedes-Benz</td><td>51722</td></tr><tr><td colspan='3'>&#8942;</td></tr><tr><td>2017</td><td>BMW_MINI</td><td>25427</td></tr><tr><td>2017</td><td>Mercedes-Benz</td><td>68221</td></tr><tr><td>2017</td><td>VW</td><td>49040</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 25 x 3 Vectors, 0x000000000000fc6c>\n",
       "       Year Manufacturer  Num_of_imported\n",
       "   <uint16> <dictionary>         <uint32>\n",
       " 1     2021 Audi                    22535\n",
       " 2     2021 BMW                     35905\n",
       " 3     2021 BMW_MINI                18211\n",
       " 4     2021 Mercedes-Benz           51722\n",
       " 5     2021 VW                      35215\n",
       " :        : :                           :\n",
       "23     2017 BMW_MINI                25427\n",
       "24     2017 Mercedes-Benz           68221\n",
       "25     2017 VW                      49040\n"
      ]
     },
     "execution_count": 185,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import_cars.to_long(:Year, name: :Manufacturer, value: :Num_of_imported)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44361299-695f-4aef-8847-9b8a9dddc3d1",
   "metadata": {},
   "source": [
    "## 61. To_wide"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7fa18ac8-99e1-4d55-bed4-cc02469496b5",
   "metadata": {},
   "source": [
    "`DataFrame#to_wide(*keep_keys)` reshapes long DataFrame to a wider DataFrame.\n",
    "\n",
    "- Option `:name` : key of the column which will be expanded **to key name**.\n",
    "- Option `:value` : key of the column which will be expanded **to values**.\n",
    "\n",
    "(Since 0.2.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 186,
   "id": "1d59c42f-c5ec-4df2-9fbe-592092ad2f8b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "RedAmber::DataFrame <5 x 6 vectors> <table><tr><th>Year</th><th>Audi</th><th>BMW</th><th>BMW_MINI</th><th>Mercedes-Benz</th><th>VW</th></tr><tr><td>2021</td><td>22535</td><td>35905</td><td>18211</td><td>51722</td><td>35215</td></tr><tr><td>2020</td><td>22304</td><td>35712</td><td>20196</td><td>57041</td><td>36576</td></tr><tr><td>2019</td><td>24222</td><td>46814</td><td>23813</td><td>66553</td><td>46794</td></tr><tr><td>2018</td><td>26473</td><td>50982</td><td>25984</td><td>67554</td><td>51961</td></tr><tr><td>2017</td><td>28336</td><td>52527</td><td>25427</td><td>68221</td><td>49040</td></tr></table>"
      ],
      "text/plain": [
       "#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000fc80>\n",
       "      Year     Audi      BMW BMW_MINI Mercedes-Benz       VW\n",
       "  <uint16> <uint16> <uint16> <uint16>      <uint32> <uint16>\n",
       "1     2021    22535    35905    18211         51722    35215\n",
       "2     2020    22304    35712    20196         57041    36576\n",
       "3     2019    24222    46814    23813         66553    46794\n",
       "4     2018    26473    50982    25984         67554    51961\n",
       "5     2017    28336    52527    25427         68221    49040\n"
      ]
     },
     "execution_count": 186,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import_cars.to_long(:Year).to_wide"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 187,
   "id": "84b7909a-200c-4aec-b192-25016399a7c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# import_cars.to_long(:Year).to_wide(name: :name, value: :value)\n",
    "# is also OK"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c92c2fe0-23a5-4153-bd03-7a07664b4d9e",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Ruby 3.1.2",
   "language": "ruby",
   "name": "ruby"
  },
  "language_info": {
   "file_extension": ".rb",
   "mimetype": "application/x-ruby",
   "name": "ruby",
   "version": "3.1.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}