README.md in bmg-0.17.8 vs README.md in bmg-0.18.0
- old
+ new
@@ -1,19 +1,33 @@
# Bmg, a relational algebra (Alf's successor)!
+[![Build Status](https://travis-ci.com/enspirit/bmg.svg?branch=master)](https://travis-ci.com/enspirit/bmg)
+
Bmg is a relational algebra implemented as a ruby library. It implements the
[Relation as First-Class Citizen](http://www.try-alf.org/blog/2013-10-21-relations-as-first-class-citizen)
-paradigm contributed with Alf a few years ago.
+paradigm contributed with [Alf](http://www.try-alf.org/) a few years ago.
-Like Alf, Bmg can be used to query relations in memory, from various files,
-SQL databases, and any data sources that can be seen as serving relations.
-Cross data-sources joins are supported, as with Alf.
+Bmg can be used to query relations in memory, from various files, SQL databases,
+and any data source that can be seen as serving relations. Cross data-sources
+joins are supported, as with Alf. For differences with Alf, see a section
+further down this README.
-Unlike Alf, Bmg does not make any core ruby extension and exposes the
-object-oriented syntax only (not Alf's functional one). Bmg implementation is
-also much simpler, and make its easier to implement user-defined relations.
+## Outline
+* [Example](#example)
+* [Where are base relations coming from?](#where-are-base-relations-coming-from)
+ * [Memory relations](#memory-relations)
+ * [Connecting to SQL databases](#connecting-to-sql-databases)
+ * [Reading files (csv, excel, text)](#reading-files-csv-excel-text)
+ * [Your own relations](#your-own-relations)
+* [List of supported operators](#supported-operators)
+* [How is this different?](#how-is-this-different)
+ * [... from similar libraries](#-from-similar-libraries)
+ * [... from Alf](#-from-alf)
+* [Contribute](#contribute)
+* [License](#license)
+
## Example
```ruby
require 'bmg'
require 'json'
@@ -25,97 +39,180 @@
{ sid: "S4", name: "Clark", status: 20, city: "London" },
{ sid: "S5", name: "Adams", status: 30, city: "Athens" }
])
by_city = suppliers
- .restrict(Predicate.neq(status: 30))
+ .exclude(status: 30)
.extend(upname: ->(t){ t[:name].upcase })
.group([:sid, :name, :status], :suppliers_in)
puts JSON.pretty_generate(by_city)
# [{...},...]
```
-## Connecting to a SQL database
+## Where are base relations coming from?
-Bmg requires `sequel >= 3.0` to connect to SQL databases.
+Bmg sees relations as sets/enumerable of symbolized Ruby hashes. The following
+sections show you how to get them in the first place, to enter Relationland.
+### Memory relations
+
+If you have an Array of Hashes -- in fact any Enumerable -- you can easily get
+a Relation using either `Bmg::Relation.new` or `Bmg.in_memory`.
+
```ruby
+# this...
+r = Bmg::Relation.new [{id: 1}, {id: 2}]
+
+# is the same as this...
+r = Bmg.in_memory [{id: 1}, {id: 2}]
+
+# entire algebra is available on `r`
+```
+
+### Connecting to SQL databases
+
+Bmg currently requires `sequel >= 3.0` to connect to SQL databases. You also
+need to require `bmg/sequel`.
+
+```ruby
require 'sqlite3'
require 'bmg'
require 'bmg/sequel'
+```
-DB = Sequel.connect("sqlite://suppliers-and-parts.db")
+Then `Bmg.sequel` serves relations for tables of your SQL database:
+```ruby
+DB = Sequel.connect("sqlite://suppliers-and-parts.db")
suppliers = Bmg.sequel(:suppliers, DB)
+```
+The entire algebra is available on those relations. As long as you keep using
+operators that can be translated to SQL, results remain SQL-able:
+
+```ruby
big_suppliers = suppliers
- .restrict(Predicate.neq(status: 30))
+ .exclude(status: 30)
+ .project([:sid, :name])
puts big_suppliers.to_sql
-# SELECT `t1`.`sid`, `t1`.`name`, `t1`.`status`, `t1`.`city` FROM `suppliers` AS 't1' WHERE (`t1`.`status` != 30)
+# SELECT `t1`.`sid`, `t1`.`name` FROM `suppliers` AS 't1' WHERE (`t1`.`status` != 30)
+```
-puts JSON.pretty_generate(big_suppliers)
-# [{...},...]
+Operators not translatable to SQL are available too (such as `group` below).
+Bmg fallbacks to memory operators for them, but remains capable of pushing some
+operators down the tree as illustrated below (the restriction on `:city` is
+pushed to the SQL server):
+
+```ruby
+Bmg.sequel(:suppliers, sequel_db)
+ .project([:sid, :name, :city])
+ .group([:sid, :name], :suppliers_in)
+ .restrict(city: ["Paris", "London"])
+ .debug
+
+# (group
+# (sequel SELECT `t1`.`sid`, `t1`.`name`, `t1`.`city` FROM `suppliers` AS 't1' WHERE (`t1`.`city` IN ('Paris', 'London')))
+# [:sid, :name, :status]
+# :suppliers_in
+# {:array=>false})
```
-## How is this different from similar libraries?
+### Reading files (csv, excel, text)
-1. The libraries you probably know (Sequel, Arel, SQLAlchemy, Korma, jOOQ,
- etc.) do not implement a genuine relational algebra: their support for
- chaining relational operators is limited (yielding errors or wrong SQL
- queries). Bmg **always** allows chaining operators. If it does not, it's
- a bug. In other words, the following query is 100% valid:
+Bmg provides simple adapters to read files and reach Relationland as soon as
+possible.
- relation
- .restrict(...) # aka where
- .union(...)
- .summarize(...) # aka group by
- .restrict(...)
+#### CSV files
-2. Bmg supports in memory relations, json relations, csv relations, SQL
- relations and so on. It's not tight to SQL generation, and supports
- queries accross multiple data sources.
+```ruby
+csv_options = { col_sep: ",", quote_char: '"' }
+r = Bmg.csv("path/to/a/file.csv", csv_options)
+```
-3. Bmg makes a best effort to optimize queries, simplifying both generated
- SQL code (low-level accesses to datasources) and in-memory operations.
+Options are directly transmitted to `::CSV.new`, check ruby's standard
+library.
-4. Bmg supports various *structuring* operators (group, image, autowrap,
- autosummarize, etc.) and allows building 'non flat' relations.
+#### Excel files
-## How is this different from Alf?
+You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
+read `.xls` and `.xlsx` files with Bmg.
-1. Bmg's implementation is much simpler than Alf, and uses no ruby core
- extention.
+```ruby
+roo_options = { skip: 1 }
+r = Bmg.excel("path/to/a/file.xls", roo_options)
+```
-2. We are confident using Bmg in production. Systematic inspection of query
- plans is suggested though. Alf was a bit too experimental to be used on
- (critical) production systems.
+Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
+documentation.
-2. Alf exposes a functional syntax, command line tool, restful tools and
- many more. Bmg is limited to the core algebra, main Relation abstraction
- and SQL generation.
+#### Text files
-3. Bmg is less strict regarding conformance to relational theory, and
- may actually expose non relational features (such as support for null,
- left_join operator, etc.). Sharp tools hurt, use them with great care.
+There is also a straightforward way to read text files and convert lines to
+tuples.
-4. Bmg does not yet implement all operators documented on try-alf.org, even
- if we plan to eventually support them all.
+```ruby
+r = Bmg.text_file("path/to/a/file.txt")
+r.type.attrlist
+# => [:line, :text]
+```
-5. Bmg has a few additional operators that prove very useful on real
- production use cases: prefix, suffix, autowrap, autosummarize, left_join,
- rxmatch, etc.
+Without options tuples will have `:line` and `:text` attributes, the former
+being the line number (starting at 1) and the latter being the line itself
+(stripped).
+The are a couple of options (see `Bmg::Reader::Textfile`). The most useful one
+is the use a of a Regexp with named captures to automatically extract
+attributes:
+
+```ruby
+r = Bmg.text_file("path/to/a/file.txt", parse: /GET (?<url>([^\s]+))/)
+r.type.attrlist
+# => [:line, :url]
+```
+
+In this scenario, non matching lines are skipped. The `:line` attribute keeps
+being used to have at least one candidate key (so to speak).
+
+### Your own relations
+
+As noted earlier, Bmg has a simple relation interface where you only have to
+provide an iteration of symbolized tuples.
+
+```ruby
+class MyRelation
+ include Bmg::Relation
+
+ def each
+ yield(id: 1, name: "Alf", year: 2014)
+ yield(id: 2, name: "Bmg", year: 2018)
+ end
+end
+
+MyRelation.new
+ .restrict(Predicate.gt(:year, 2015))
+ .allbut([:year])
+```
+
+As shown, creating adapters on top of various data source is straighforward.
+Adapters can also participate to query optimization (such as pushing
+restrictions down the tree) by overriding the underscored version of operators
+(e.g. `_restrict`).
+
+Have a look at `Bmg::Algebra` for the protocol and `Bmg::Sql::Relation` for an
+example. Keep in touch with the team if you need some help.
+
## Supported operators
```ruby
r.allbut([:a, :b, ...]) # remove specified attributes
r.autowrap(split: '_') # structure a flat relation, split: '_' is the default
r.autosummarize([:a, :b, ...], x: :sum) # (experimental) usual summarizers supported
r.constants(x: 12, ...) # add constant attributes (sometimes useful in unions)
r.extend(x: ->(t){ ... }, ...) # add computed attributes
+r.exclude(predicate) # shortcut for restrict(!predicate)
r.group([:a, :b, ...], :x) # relation-valued attribute from attributes
r.image(right, :x, [:a, :b, ...]) # relation-valued attribute from another relation
r.join(right, [:a, :b, ...]) # natural join on a join key
r.join(right, :a => :x, :b => :y, ...) # natural join after right reversed renaming
r.left_join(right, [:a, :b, ...], {...}) # left join with optional default right tuple
@@ -135,16 +232,97 @@
t.transform(:to_s) # all-attrs transformation
t.transform(&:to_s) # similar, but Proc-driven
t.transform(:foo => :upcase, ...) # specific-attrs tranformation
t.transform([:to_s, :upcase]) # chain-transformation
r.union(right) # relational union
+r.where(predicate) # alias for restrict(predicate)
```
-## Who is behind Bmg?
+## How is this different?
-Bernard Lambeau (bernard@klaro.cards) is Alf & Bmg main engineer & maintainer.
+### ... from similar libraries?
+1. The libraries you probably know (Sequel, Arel, SQLAlchemy, Korma, jOOQ,
+ etc.) do not implement a genuine relational algebra. Their support for
+ chaining relational operators is thus limited (restricting your expression
+ power and/or raising errors and/or outputting wrong or counterintuitive
+ SQL code). Bmg **always** allows chaining operators. If it does not, it's
+ a bug.
+
+ For instance the expression below is 100% valid in Bmg. The last where
+ clause applies to the result of the summarize (while SQL requires a `HAVING`
+ clause, or a `SELECT ... FROM (SELECT ...) r`).
+
+ ```ruby
+ relation
+ .where(...)
+ .union(...)
+ .summarize(...) # aka group by
+ .where(...)
+ ```
+
+2. Bmg supports in memory relations, json relations, csv relations, SQL
+ relations and so on. It's not tight to SQL generation, and supports
+ queries accross multiple data sources.
+
+3. Bmg makes a best effort to optimize queries, simplifying both generated
+ SQL code (low-level accesses to datasources) and in-memory operations.
+
+4. Bmg supports various *structuring* operators (group, image, autowrap,
+ autosummarize, etc.) and allows building 'non flat' relations.
+
+5. Bmg can use full ruby power when that helps (e.g. regular expressions in
+ WHERE clauses or ruby code in EXTEND clauses). This may prevent Bmg from
+ delegating work to underlying data sources (e.g. SQL server) and should
+ therefore be used with care though.
+
+### ... from Alf?
+
+If you use Alf (or used it in the past), below are the main differences between
+Bmg and Alf. Bmg has NOT been written to be API-compatible with Alf and will
+probably never be.
+
+1. Bmg's implementation is much simpler than Alf and uses no ruby core
+ extention.
+
+2. We are confident using Bmg in production. Systematic inspection of query
+ plans is advised though. Alf was a bit too experimental to be used on
+ (critical) production systems.
+
+3. Alf exposes a functional syntax, command line tool, restful tools and
+ many more. Bmg is limited to the core algebra, main Relation abstraction
+ and SQL generation.
+
+4. Bmg is less strict regarding conformance to relational theory, and
+ may actually expose non relational features (such as support for null,
+ left_join operator, etc.). Sharp tools hurt, use them with care.
+
+5. Unlike Alf::Relation instances of Bmg::Relation capture query-trees, not
+ values. Currently two instances `r1` and `r2` are not equal even if they
+ define the same mathematical relation. As a consequence joining on
+ relation-valued attributes does not work as expected in Bmg until further
+ notice.
+
+6. Bmg does not implement all operators documented on try-alf.org, even if
+ we plan to eventually support most of them.
+
+7. Bmg has a few additional operators that prove very useful on real
+ production use cases: prefix, suffix, autowrap, autosummarize, left_join,
+ rxmatch, etc.
+
+8. Bmg optimizes queries and compiles them to SQL on the fly, while Alf was
+ building an AST internally first. Strictly speaking this makes Bmg less
+ powerful than Alf since optimizations cannot be turned off for now.
+
+## Contribute
+
+Please use github issues and pull requests for all questions, bug reports,
+and contributions. Don't hesitate to get in touch with us with an early code
+spike if you plan to add non trivial features.
+
+## Licence
+
+This software is distributed by Enspirit SRL under a MIT Licence. Please
+contact Bernard Lambeau (blambeau@gmail.com) with any question.
+
Enspirit (https://enspirit.be) and Klaro App (https://klaro.cards) are both
actively using and contributing to the library.
-
-Feel free to contact us for help, ideas and/or contributions. Please use github
-issues and pull requests if possible if code is involved.