README.md in bmg-0.21.5 vs README.md in bmg-0.23.0

- old
+ new

@@ -1,26 +1,34 @@ -# Bmg, a relational algebra (Alf's successor)! +# Bmg, a relational algebra -Bmg is a relational algebra implemented as a Ruby library. It implements the +Bmg is a [relational algebra](https://www.relational-algebra.dev/) implemented as a Ruby library. It implements the [Relation as First-Class Citizen](http://www.try-alf.org/blog/2013-10-21-relations-as-first-class-citizen) paradigm contributed with [Alf](http://www.try-alf.org/) a few years ago. Bmg can be used to query relations in memory, from various files, SQL databases, and any data source that can be seen as serving relations. Cross data-sources joins are supported, as with Alf. For differences with Alf, see a section further down this README. +## Links + +* Documentation can be found at https://www.relational-algebra.dev/ +* Contribute to that documentation on github: https://github.com/enspirit/bmg-website + ## Outline * [Example](#example) * [Where are base relations coming from?](#where-are-base-relations-coming-from) * [Memory relations](#memory-relations) * [Connecting to SQL databases](#connecting-to-sql-databases) - * [Reading files (csv, Excel, text)](#reading-files-csv-excel-text) + * [Reading data files](#reading-data-files-json-csv-yaml-text-xls--xlsx) * [Connecting to Redis databases](#connecting-to-redis-databases) * [Your own relations](#your-own-relations) +* [The Database abstraction](#the-database-abstraction) * [List of supported operators](#supported-operators) +* [List of supported predicates](#supported-predicates) +* [List of supported summaries](#supported-summaries) * [How is this different?](#how-is-this-different) * [... from similar libraries](#-from-similar-libraries) * [... from Alf](#-from-alf) * [Contribute](#contribute) * [License](#license) @@ -115,37 +123,42 @@ # [:sid, :name, :status] # :suppliers_in # {:array=>false}) ``` -### Reading files (csv, Excel, text) +### Reading data files (json, csv, yaml, text, xls & xlsx) Bmg provides simple adapters to read files and reach Relationland as soon as possible. -#### CSV files +#### JSON files ```ruby -csv_options = { col_sep: ",", quote_char: '"' } -r = Bmg.csv("path/to/a/file.csv", csv_options) +r = Bmg.json("path/to/a/file.json") ``` -Options are directly transmitted to `::CSV.new`, check Ruby's standard -library. +The json file is expected to contain tuples of same heading. -#### Excel files +#### YAML files -You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to -read `.xls` and `.xlsx` files with Bmg. +```ruby +r = Bmg.yaml("path/to/a/file.yaml") +``` +The yaml file is expected to contain tuples of same heading. + +#### CSV files + ```ruby -roo_options = { skip: 1 } -r = Bmg.excel("path/to/a/file.xls", roo_options) +csv_options = { col_sep: ",", quote_char: '"' } +r = Bmg.csv("path/to/a/file.csv", csv_options) ``` -Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's -documentation. +Options are directly transmitted to `::CSV.new`, check Ruby's standard +library. If you don't provide them, `Bmg` uses `headers: true` (hence making +then assumption that attributes names are provided on first line), and makes a +best effort to infer the column separator. #### Text files There is also a straightforward way to read text files and convert lines to tuples. @@ -171,10 +184,23 @@ ``` In this scenario, non matching lines are skipped. The `:line` attribute keeps being used to have at least one candidate key (so to speak). +#### Excel files + +You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to +read `.xls` and `.xlsx` files with Bmg. + +```ruby +roo_options = { skip: 1 } +r = Bmg.excel("path/to/a/file.xls", roo_options) +``` + +Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's +documentation. + ### Connecting to Redis databases Bmg currently requires `bmg-redis` and `redis >= 4.6` to connect to Redis databases. You also need to require `bmg/redis`. @@ -238,10 +264,62 @@ (e.g. `_restrict`). Have a look at `Bmg::Algebra` for the protocol and `Bmg::Sql::Relation` for an example. Keep in touch with the team if you need some help. +## The Database abstraction + +The previous section focused on obtaining *relations*. In practice you frequently +have a collection of relations hence a *database*: + +* A SQL database with multiple tables +* A list of data files, all in the same folder +* An excel file with various sheets + +Bmg supports a simple Datbabase abstraction that serves those relations "by name", +in a simple way. A database can also be easily dumped back to a data folder of +json or csv files, or as simple xlsx files with multiple sheets. + +### Connecting to a SQL Database + +For a SQL database, connected with Sequel: + +``` +db = Bmg::Database.sequel(Sequel.connect('...')) +db.suppliers # yields a Bmg::Relation over the `suppliers` table +``` + +### Connecting to data files in the same folder + +Data files all in the same folder can be seen as a very basic form of database, +and served as such. Bmg supports `json`, `csv` and `yaml` files: + +``` +db = Bmg::Database.data_folder('./my-database') +db.suppliers # yields a Bmg::Relation over the `suppliers.(json,csv,yml)` file +``` + +Bmg supports files in different formats in the same folder. When files with the +same basename exist, json is prefered over yaml, which is prefered over csv. + +### Dumping a Database instance + +As a data folder: + +``` +db = Bmg::Database.sequel(Sequel.connect('...')) +db.to_data_folder('path/to/folder', :json) +``` + +As an .xlsx file (any existing file will be erased, we don't support modifying +existing files): + +``` +require 'bmg/xlsx' +db.to_xlsx('path/to/file.xlsx') +``` + ## Supported operators ```ruby r.allbut([:a, :b, ...]) # remove specified attributes r.autowrap(split: '_') # structure a flat relation, split: '_' is the default @@ -258,10 +336,11 @@ r.join(right, :a => :x, :b => :y, ...) # join after right reversed renaming r.left_join(right, [:a, :b, ...], {...}) # left join with optional default right tuple r.left_join(right, {:a => :x, ...}, {...}) # left join after right reversed renaming r.matching(right, [:a, :b, ...]) # semi join, aka where exists r.matching(right, :a => :x, :b => :y, ...) # semi join, after right reversed renaming +r.minus(right) # set difference r.not_matching(right, [:a, :b, ...]) # inverse semi join, aka where not exists r.not_matching(right, :a => :x, ...) # inverse semi join, after right reversed renaming r.page([[:a, :asc], ...], 12, page_size: 10) # paging, using an explicit ordering r.prefix(:foo_, but: [:a, ...]) # prefix kind of renaming r.project([:a, :b, ...]) # keep specified attributes only @@ -274,13 +353,74 @@ r.transform(&:to_s) # similar, but Proc-driven r.transform(:foo => :upcase, ...) # specific-attrs tranformation r.transform([:to_s, :upcase]) # chain-transformation r.ungroup([:a, :b, ...]) # ungroup relation-valued attributes within parent tuple r.ungroup(:a) # shortcut over ungroup([:a]) -r.union(right) # relational union +r.union(right) # set union r.unwrap([:a, :b, ...]) # merge tuple-valued attributes within parent tuple r.unwrap(:a) # shortcut over unwrap([:a]) r.where(predicate) # alias for restrict(predicate) +``` + +## Supported Predicates + +Usual operators are supported and map to their SQL equivalent as expected: + +```ruby +Predicate.eq # = +Predicate.neq # <> +Predicate.lt # < +Predicate.lte # <= +Predicate.gt # > +Predicate.gte # >= +Predicate.in # SQL's IN +Predicate.is_null # SQL's IS NULL +``` + +See the [Predicate gem](https://github.com/enspirit/predicate) for a more +complete list. + +Note: predicates that implement specific Ruby algorithms or patterns are +not compiled to SQL (and more generally not delegated to underlying database +servers). + +## Supported Summaries + +The `summarize` operator receives a list of `attr: summarizer` pairs, e.g. + +```ruby +r.summarize([:city], { + how_many: :count, # same as how_many: Bmg::Summarizer.count + status: :max, # same as status: Bmg::Summarizer.max(:status) + min_status: Bmg::Summarizer.min(:status) +}) +``` + +The following summarizers are available and translated to SQL: + +```ruby +Bmg::Summarizer.count # count the number of tuples +Bmg::Summarizer.distinct(:a) # collect distinct values (as an array) +Bmg::Summarizer.distinct_count(:a) # count of distinct values +Bmg::Summarizer.min(:a) # min value for attribute :a +Bmg::Summarizer.max(:a) # max value +Bmg::Summarizer.sum(:a) # sum :a's values +Bmg::Summarizer.avg(:a) # average +``` + +The following summarizers are implemented in Ruby (they are supported when +querying SQL databases, but not compiled to SQL): + +```ruby +Bmg::Summarizer.collect(:a) # collect :a's values (as an array) +Bmg::Summarizer.concat(:a, opts: { ... }) # concat :a's values (opts, e.g. {between: ','}) +Bmg::Summarizer.first(:a, order: ...) # smallest seen a:'s value according to a tuple ordering +Bmg::Summarizer.last(:a, order: ...) # largest seen a:'s value according to a tuple ordering +Bmg::Summarizer.variance(:a) # variance +Bmg::Summarizer.stddev(:a) # standard deviation +Bmg::Summarizer.percentile(:a, nth) # (continuous) nth percentile +Bmg::Summarizer.percentile_disc(:a, nth) # discrete nth percentile +Bmg::Summarizer.value_by(:a, :by => :b) # { :b => :a } as a Hash ``` ## How is this different? ### ... from similar libraries?