README.md in bmg-0.21.5 vs README.md in bmg-0.23.0

- old
+ new

@@ -1,26 +1,34 @@
-# Bmg, a relational algebra (Alf's successor)!
+# Bmg, a relational algebra
 
-Bmg is a relational algebra implemented as a Ruby library. It implements the
+Bmg is a [relational algebra](https://www.relational-algebra.dev/) implemented as a Ruby library. It implements the
 [Relation as First-Class Citizen](http://www.try-alf.org/blog/2013-10-21-relations-as-first-class-citizen)
 paradigm contributed with [Alf](http://www.try-alf.org/) a few years ago.
 
 Bmg can be used to query relations in memory, from various files, SQL databases,
 and any data source that can be seen as serving relations. Cross data-sources
 joins are supported, as with Alf. For differences with Alf, see a section
 further down this README.
 
+## Links
+
+* Documentation can be found at https://www.relational-algebra.dev/
+* Contribute to that documentation on github: https://github.com/enspirit/bmg-website
+
 ## Outline
 
 * [Example](#example)
 * [Where are base relations coming from?](#where-are-base-relations-coming-from)
   * [Memory relations](#memory-relations)
   * [Connecting to SQL databases](#connecting-to-sql-databases)
-  * [Reading files (csv, Excel, text)](#reading-files-csv-excel-text)
+  * [Reading data files](#reading-data-files-json-csv-yaml-text-xls--xlsx)
   * [Connecting to Redis databases](#connecting-to-redis-databases)
   * [Your own relations](#your-own-relations)
+* [The Database abstraction](#the-database-abstraction)
 * [List of supported operators](#supported-operators)
+* [List of supported predicates](#supported-predicates)
+* [List of supported summaries](#supported-summaries)
 * [How is this different?](#how-is-this-different)
   * [... from similar libraries](#-from-similar-libraries)
   * [... from Alf](#-from-alf)
 * [Contribute](#contribute)
 * [License](#license)
@@ -115,37 +123,42 @@
 #   [:sid, :name, :status]
 #   :suppliers_in
 #   {:array=>false})
 ```
 
-### Reading files (csv, Excel, text)
+### Reading data files (json, csv, yaml, text, xls & xlsx)
 
 Bmg provides simple adapters to read files and reach Relationland as soon as
 possible.
 
-#### CSV files
+#### JSON files
 
 ```ruby
-csv_options = { col_sep: ",", quote_char: '"' }
-r = Bmg.csv("path/to/a/file.csv", csv_options)
+r = Bmg.json("path/to/a/file.json")
 ```
 
-Options are directly transmitted to `::CSV.new`, check Ruby's standard
-library.
+The json file is expected to contain tuples of same heading.
 
-#### Excel files
+#### YAML files
 
-You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
-read `.xls` and `.xlsx` files with Bmg.
+```ruby
+r = Bmg.yaml("path/to/a/file.yaml")
+```
 
+The yaml file is expected to contain tuples of same heading.
+
+#### CSV files
+
 ```ruby
-roo_options = { skip: 1 }
-r = Bmg.excel("path/to/a/file.xls", roo_options)
+csv_options = { col_sep: ",", quote_char: '"' }
+r = Bmg.csv("path/to/a/file.csv", csv_options)
 ```
 
-Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
-documentation.
+Options are directly transmitted to `::CSV.new`, check Ruby's standard
+library. If you don't provide them, `Bmg` uses `headers: true` (hence making
+then assumption that attributes names are provided on first line), and makes a
+best effort to infer the column separator.
 
 #### Text files
 
 There is also a straightforward way to read text files and convert lines to
 tuples.
@@ -171,10 +184,23 @@
 ```
 
 In this scenario, non matching lines are skipped. The `:line` attribute keeps
 being used to have at least one candidate key (so to speak).
 
+#### Excel files
+
+You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
+read `.xls` and `.xlsx` files with Bmg.
+
+```ruby
+roo_options = { skip: 1 }
+r = Bmg.excel("path/to/a/file.xls", roo_options)
+```
+
+Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
+documentation.
+
 ### Connecting to Redis databases
 
 Bmg currently requires `bmg-redis` and `redis >= 4.6` to connect
 to Redis databases. You also need to require `bmg/redis`.
 
@@ -238,10 +264,62 @@
 (e.g. `_restrict`).
 
 Have a look at `Bmg::Algebra` for the protocol and `Bmg::Sql::Relation` for an
 example. Keep in touch with the team if you need some help.
 
+## The Database abstraction
+
+The previous section focused on obtaining *relations*. In practice you frequently
+have a collection of relations hence a *database*:
+
+* A SQL database with multiple tables
+* A list of data files, all in the same folder
+* An excel file with various sheets
+
+Bmg supports a simple Datbabase abstraction that serves those relations "by name",
+in a simple way. A database can also be easily dumped back to a data folder of
+json or csv files, or as simple xlsx files with multiple sheets.
+
+### Connecting to a SQL Database
+
+For a SQL database, connected with Sequel:
+
+```
+db = Bmg::Database.sequel(Sequel.connect('...'))
+db.suppliers # yields a Bmg::Relation over the `suppliers` table
+```
+
+### Connecting to data files in the same folder
+
+Data files all in the same folder can be seen as a very basic form of database,
+and served as such. Bmg supports `json`, `csv` and `yaml` files:
+
+```
+db = Bmg::Database.data_folder('./my-database')
+db.suppliers # yields a Bmg::Relation over the `suppliers.(json,csv,yml)` file
+```
+
+Bmg supports files in different formats in the same folder. When files with the
+same basename exist, json is prefered over yaml, which is prefered over csv.
+
+### Dumping a Database instance
+
+As a data folder:
+
+```
+db = Bmg::Database.sequel(Sequel.connect('...'))
+db.to_data_folder('path/to/folder', :json)
+```
+
+As an .xlsx file (any existing file will be erased, we don't support modifying
+existing files):
+
+```
+require 'bmg/xlsx'
+db.to_xlsx('path/to/file.xlsx')
+```
+
 ## Supported operators
 
 ```ruby
 r.allbut([:a, :b, ...])                      # remove specified attributes
 r.autowrap(split: '_')                       # structure a flat relation, split: '_' is the default
@@ -258,10 +336,11 @@
 r.join(right, :a => :x, :b => :y, ...)       # join after right reversed renaming
 r.left_join(right, [:a, :b, ...], {...})     # left join with optional default right tuple
 r.left_join(right, {:a => :x, ...}, {...})   # left join after right reversed renaming
 r.matching(right, [:a, :b, ...])             # semi join, aka where exists
 r.matching(right, :a => :x, :b => :y, ...)   # semi join, after right reversed renaming
+r.minus(right)                               # set difference
 r.not_matching(right, [:a, :b, ...])         # inverse semi join, aka where not exists
 r.not_matching(right, :a => :x, ...)         # inverse semi join, after right reversed renaming
 r.page([[:a, :asc], ...], 12, page_size: 10) # paging, using an explicit ordering
 r.prefix(:foo_, but: [:a, ...])              # prefix kind of renaming
 r.project([:a, :b, ...])                     # keep specified attributes only
@@ -274,13 +353,74 @@
 r.transform(&:to_s)                          # similar, but Proc-driven
 r.transform(:foo => :upcase, ...)            # specific-attrs tranformation
 r.transform([:to_s, :upcase])                # chain-transformation
 r.ungroup([:a, :b, ...])                     # ungroup relation-valued attributes within parent tuple
 r.ungroup(:a)                                # shortcut over ungroup([:a])
-r.union(right)                               # relational union
+r.union(right)                               # set union
 r.unwrap([:a, :b, ...])                      # merge tuple-valued attributes within parent tuple
 r.unwrap(:a)                                 # shortcut over unwrap([:a])
 r.where(predicate)                           # alias for restrict(predicate)
+```
+
+## Supported Predicates
+
+Usual operators are supported and map to their SQL equivalent as expected:
+
+```ruby
+Predicate.eq                                 # =
+Predicate.neq                                # <>
+Predicate.lt                                 # <
+Predicate.lte                                # <=
+Predicate.gt                                 # >
+Predicate.gte                                # >=
+Predicate.in                                 # SQL's IN
+Predicate.is_null                            # SQL's IS NULL
+```
+
+See the [Predicate gem](https://github.com/enspirit/predicate) for a more
+complete list.
+
+Note: predicates that implement specific Ruby algorithms or patterns are
+not compiled to SQL (and more generally not delegated to underlying database
+servers).
+
+## Supported Summaries
+
+The `summarize` operator receives a list of `attr: summarizer` pairs, e.g.
+
+```ruby
+r.summarize([:city], {
+  how_many: :count,        # same as how_many: Bmg::Summarizer.count
+  status: :max,            # same as   status: Bmg::Summarizer.max(:status)
+  min_status: Bmg::Summarizer.min(:status)
+})
+```
+
+The following summarizers are available and translated to SQL:
+
+```ruby
+Bmg::Summarizer.count                      # count the number of tuples
+Bmg::Summarizer.distinct(:a)               # collect distinct values (as an array)
+Bmg::Summarizer.distinct_count(:a)         # count of distinct values
+Bmg::Summarizer.min(:a)                    # min value for attribute :a
+Bmg::Summarizer.max(:a)                    # max value
+Bmg::Summarizer.sum(:a)                    # sum :a's values
+Bmg::Summarizer.avg(:a)                    # average
+```
+
+The following summarizers are implemented in Ruby (they are supported when
+querying SQL databases, but not compiled to SQL):
+
+```ruby
+Bmg::Summarizer.collect(:a)                # collect :a's values (as an array)
+Bmg::Summarizer.concat(:a, opts: { ... })  # concat :a's values (opts, e.g. {between: ','})
+Bmg::Summarizer.first(:a, order: ...)      # smallest seen a:'s value according to a tuple ordering
+Bmg::Summarizer.last(:a, order: ...)       # largest seen a:'s value according to a tuple ordering
+Bmg::Summarizer.variance(:a)               # variance
+Bmg::Summarizer.stddev(:a)                 # standard deviation
+Bmg::Summarizer.percentile(:a, nth)        # (continuous) nth percentile
+Bmg::Summarizer.percentile_disc(:a, nth)   # discrete nth percentile
+Bmg::Summarizer.value_by(:a, :by => :b)    # { :b => :a } as a Hash
 ```
 
 ## How is this different?
 
 ### ... from similar libraries?