README.md in bmg-0.21.5 vs README.md in bmg-0.23.0
- old
+ new
@@ -1,26 +1,34 @@
-# Bmg, a relational algebra (Alf's successor)!
+# Bmg, a relational algebra
-Bmg is a relational algebra implemented as a Ruby library. It implements the
+Bmg is a [relational algebra](https://www.relational-algebra.dev/) implemented as a Ruby library. It implements the
[Relation as First-Class Citizen](http://www.try-alf.org/blog/2013-10-21-relations-as-first-class-citizen)
paradigm contributed with [Alf](http://www.try-alf.org/) a few years ago.
Bmg can be used to query relations in memory, from various files, SQL databases,
and any data source that can be seen as serving relations. Cross data-sources
joins are supported, as with Alf. For differences with Alf, see a section
further down this README.
+## Links
+
+* Documentation can be found at https://www.relational-algebra.dev/
+* Contribute to that documentation on github: https://github.com/enspirit/bmg-website
+
## Outline
* [Example](#example)
* [Where are base relations coming from?](#where-are-base-relations-coming-from)
* [Memory relations](#memory-relations)
* [Connecting to SQL databases](#connecting-to-sql-databases)
- * [Reading files (csv, Excel, text)](#reading-files-csv-excel-text)
+ * [Reading data files](#reading-data-files-json-csv-yaml-text-xls--xlsx)
* [Connecting to Redis databases](#connecting-to-redis-databases)
* [Your own relations](#your-own-relations)
+* [The Database abstraction](#the-database-abstraction)
* [List of supported operators](#supported-operators)
+* [List of supported predicates](#supported-predicates)
+* [List of supported summaries](#supported-summaries)
* [How is this different?](#how-is-this-different)
* [... from similar libraries](#-from-similar-libraries)
* [... from Alf](#-from-alf)
* [Contribute](#contribute)
* [License](#license)
@@ -115,37 +123,42 @@
# [:sid, :name, :status]
# :suppliers_in
# {:array=>false})
```
-### Reading files (csv, Excel, text)
+### Reading data files (json, csv, yaml, text, xls & xlsx)
Bmg provides simple adapters to read files and reach Relationland as soon as
possible.
-#### CSV files
+#### JSON files
```ruby
-csv_options = { col_sep: ",", quote_char: '"' }
-r = Bmg.csv("path/to/a/file.csv", csv_options)
+r = Bmg.json("path/to/a/file.json")
```
-Options are directly transmitted to `::CSV.new`, check Ruby's standard
-library.
+The json file is expected to contain tuples of same heading.
-#### Excel files
+#### YAML files
-You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
-read `.xls` and `.xlsx` files with Bmg.
+```ruby
+r = Bmg.yaml("path/to/a/file.yaml")
+```
+The yaml file is expected to contain tuples of same heading.
+
+#### CSV files
+
```ruby
-roo_options = { skip: 1 }
-r = Bmg.excel("path/to/a/file.xls", roo_options)
+csv_options = { col_sep: ",", quote_char: '"' }
+r = Bmg.csv("path/to/a/file.csv", csv_options)
```
-Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
-documentation.
+Options are directly transmitted to `::CSV.new`, check Ruby's standard
+library. If you don't provide them, `Bmg` uses `headers: true` (hence making
+then assumption that attributes names are provided on first line), and makes a
+best effort to infer the column separator.
#### Text files
There is also a straightforward way to read text files and convert lines to
tuples.
@@ -171,10 +184,23 @@
```
In this scenario, non matching lines are skipped. The `:line` attribute keeps
being used to have at least one candidate key (so to speak).
+#### Excel files
+
+You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
+read `.xls` and `.xlsx` files with Bmg.
+
+```ruby
+roo_options = { skip: 1 }
+r = Bmg.excel("path/to/a/file.xls", roo_options)
+```
+
+Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
+documentation.
+
### Connecting to Redis databases
Bmg currently requires `bmg-redis` and `redis >= 4.6` to connect
to Redis databases. You also need to require `bmg/redis`.
@@ -238,10 +264,62 @@
(e.g. `_restrict`).
Have a look at `Bmg::Algebra` for the protocol and `Bmg::Sql::Relation` for an
example. Keep in touch with the team if you need some help.
+## The Database abstraction
+
+The previous section focused on obtaining *relations*. In practice you frequently
+have a collection of relations hence a *database*:
+
+* A SQL database with multiple tables
+* A list of data files, all in the same folder
+* An excel file with various sheets
+
+Bmg supports a simple Datbabase abstraction that serves those relations "by name",
+in a simple way. A database can also be easily dumped back to a data folder of
+json or csv files, or as simple xlsx files with multiple sheets.
+
+### Connecting to a SQL Database
+
+For a SQL database, connected with Sequel:
+
+```
+db = Bmg::Database.sequel(Sequel.connect('...'))
+db.suppliers # yields a Bmg::Relation over the `suppliers` table
+```
+
+### Connecting to data files in the same folder
+
+Data files all in the same folder can be seen as a very basic form of database,
+and served as such. Bmg supports `json`, `csv` and `yaml` files:
+
+```
+db = Bmg::Database.data_folder('./my-database')
+db.suppliers # yields a Bmg::Relation over the `suppliers.(json,csv,yml)` file
+```
+
+Bmg supports files in different formats in the same folder. When files with the
+same basename exist, json is prefered over yaml, which is prefered over csv.
+
+### Dumping a Database instance
+
+As a data folder:
+
+```
+db = Bmg::Database.sequel(Sequel.connect('...'))
+db.to_data_folder('path/to/folder', :json)
+```
+
+As an .xlsx file (any existing file will be erased, we don't support modifying
+existing files):
+
+```
+require 'bmg/xlsx'
+db.to_xlsx('path/to/file.xlsx')
+```
+
## Supported operators
```ruby
r.allbut([:a, :b, ...]) # remove specified attributes
r.autowrap(split: '_') # structure a flat relation, split: '_' is the default
@@ -258,10 +336,11 @@
r.join(right, :a => :x, :b => :y, ...) # join after right reversed renaming
r.left_join(right, [:a, :b, ...], {...}) # left join with optional default right tuple
r.left_join(right, {:a => :x, ...}, {...}) # left join after right reversed renaming
r.matching(right, [:a, :b, ...]) # semi join, aka where exists
r.matching(right, :a => :x, :b => :y, ...) # semi join, after right reversed renaming
+r.minus(right) # set difference
r.not_matching(right, [:a, :b, ...]) # inverse semi join, aka where not exists
r.not_matching(right, :a => :x, ...) # inverse semi join, after right reversed renaming
r.page([[:a, :asc], ...], 12, page_size: 10) # paging, using an explicit ordering
r.prefix(:foo_, but: [:a, ...]) # prefix kind of renaming
r.project([:a, :b, ...]) # keep specified attributes only
@@ -274,13 +353,74 @@
r.transform(&:to_s) # similar, but Proc-driven
r.transform(:foo => :upcase, ...) # specific-attrs tranformation
r.transform([:to_s, :upcase]) # chain-transformation
r.ungroup([:a, :b, ...]) # ungroup relation-valued attributes within parent tuple
r.ungroup(:a) # shortcut over ungroup([:a])
-r.union(right) # relational union
+r.union(right) # set union
r.unwrap([:a, :b, ...]) # merge tuple-valued attributes within parent tuple
r.unwrap(:a) # shortcut over unwrap([:a])
r.where(predicate) # alias for restrict(predicate)
+```
+
+## Supported Predicates
+
+Usual operators are supported and map to their SQL equivalent as expected:
+
+```ruby
+Predicate.eq # =
+Predicate.neq # <>
+Predicate.lt # <
+Predicate.lte # <=
+Predicate.gt # >
+Predicate.gte # >=
+Predicate.in # SQL's IN
+Predicate.is_null # SQL's IS NULL
+```
+
+See the [Predicate gem](https://github.com/enspirit/predicate) for a more
+complete list.
+
+Note: predicates that implement specific Ruby algorithms or patterns are
+not compiled to SQL (and more generally not delegated to underlying database
+servers).
+
+## Supported Summaries
+
+The `summarize` operator receives a list of `attr: summarizer` pairs, e.g.
+
+```ruby
+r.summarize([:city], {
+ how_many: :count, # same as how_many: Bmg::Summarizer.count
+ status: :max, # same as status: Bmg::Summarizer.max(:status)
+ min_status: Bmg::Summarizer.min(:status)
+})
+```
+
+The following summarizers are available and translated to SQL:
+
+```ruby
+Bmg::Summarizer.count # count the number of tuples
+Bmg::Summarizer.distinct(:a) # collect distinct values (as an array)
+Bmg::Summarizer.distinct_count(:a) # count of distinct values
+Bmg::Summarizer.min(:a) # min value for attribute :a
+Bmg::Summarizer.max(:a) # max value
+Bmg::Summarizer.sum(:a) # sum :a's values
+Bmg::Summarizer.avg(:a) # average
+```
+
+The following summarizers are implemented in Ruby (they are supported when
+querying SQL databases, but not compiled to SQL):
+
+```ruby
+Bmg::Summarizer.collect(:a) # collect :a's values (as an array)
+Bmg::Summarizer.concat(:a, opts: { ... }) # concat :a's values (opts, e.g. {between: ','})
+Bmg::Summarizer.first(:a, order: ...) # smallest seen a:'s value according to a tuple ordering
+Bmg::Summarizer.last(:a, order: ...) # largest seen a:'s value according to a tuple ordering
+Bmg::Summarizer.variance(:a) # variance
+Bmg::Summarizer.stddev(:a) # standard deviation
+Bmg::Summarizer.percentile(:a, nth) # (continuous) nth percentile
+Bmg::Summarizer.percentile_disc(:a, nth) # discrete nth percentile
+Bmg::Summarizer.value_by(:a, :by => :b) # { :b => :a } as a Hash
```
## How is this different?
### ... from similar libraries?