README.md in alf-0.12.2 vs README.md in alf-0.13.0
- old
+ new
@@ -1,866 +1,196 @@
-# Alf - Relational Algebra at your fingertips (version 0.10.0)
+# Alf
-[![Build Status](https://secure.travis-ci.org/blambeau/alf.png)](http://travis-ci.org/blambeau/alf)
-[![Dependency Status](https://gemnasium.com/blambeau/alf.png)](https://gemnasium.com/blambeau/alf)
+Relational Algebra at your fingertips
-## Description
+[![Build Status](https://secure.travis-ci.org/alf-tool/alf.png)](http://travis-ci.org/alf-tool/alf)
+[![Dependency Status](https://gemnasium.com/alf-tool/alf.png)](https://gemnasium.com/alf-tool/alf)
-### What & Why
+## Links
-Alf brings the relational algebra both in Shell and in Ruby. In Shell, because
-manipulating any relation-like data source should be as straightforward as a
-one-liner. In Ruby, because I've never understood why programming languages
-provide data structures like arrays, hashes, sets, trees and graphs but not
-_relations_... Let's stop the segregation ;-)
+* [Official documentation](http://blambeau.github.com/alf)
+* [Source and Issues](http://github.com/alf-tool/alf)
+* [Relational basics as needed](http://www.revision-zero.org/relational-basics-2)
-### Install
+## What & Why
- % [sudo] gem install alf [fastercsv, ...]
- % alf --help
+Alf brings the relational algebra both in Shell and in Ruby. In Shell, because
+manipulating any relation-like data source should be as straightforward as a one-liner.
+In Ruby, because I've never understood why programming languages provide data structures
+like arrays, hashes, sets, trees and graphs but not _relations_...
-### Bundler & Require
+## Shell Example
- # API is not considered stable enough for now, please use
- gem "alf", "= 0.10.0"
-
- # The following should not break your code, but is a bit less safe,
- # until 1.0.0 has been reached
- gem "alf", "~> 0.10.0"
+ % alf --examples show suppliers
-### Links
-
-* http://blambeau.github.com/alf
-* http://rubydoc.info/gems/alf
-* http://github.com/blambeau/alf
-* http://rubygems.org/gems/alf
-
-### Quick overview
-
-Alf is a commandline tool and Ruby library to manipulate data with all the power
-of a truly relational algebra approach. Objectives behind Alf are manifold:
-
-* Pragmatically, Alf aims at being a useful commandline executable for manipulating
- relational-like data: database records, csv files, or **whatever can be interpreted
- as (the physical encoding of) a relation**. See 'alf --help' for the list of
- available commands and implemented relational operators.
-
- % alf restrict suppliers -- "city == 'London'" | alf join cities
-
-* Alf is also a 100% Ruby relational algebra implementation shipped with a simple
- to use, powerful, functional DSL for compiling and evaluating relational queries.
- Alf is not limited to simple scalar values, but admits values of arbitrary
- complexity (under a few requirements about their implementation, see next
- section). See 'alf --help' as well as .alf files in the examples directory
- for syntactic examples.
-
- Alf.lispy.evaluate {
- (join (restrict :suppliers, lambda{ city == 'London' }), :cities)
- }
-
- In addition to this functional syntax, Alf comes bundled with an in-memory
- Relation data structure that provides an object-oriented way of manipulating
- relations in simplest cases:
-
- suppliers = Alf::Relation[
- {:sid => 'S1', :name => 'Smith', :status => 20, :city => 'London'},
- {:sid => 'S2', :name => 'Jones', :status => 10, :city => 'Paris'},
- {:sid => 'S3', :name => 'Blake', :status => 30, :city => 'Paris'},
- {:sid => 'S4', :name => 'Clark', :status => 20, :city => 'London'},
- {:sid => 'S5', :name => 'Adams', :status => 30, :city => 'Athens'},
- ]
- cities = ...
- puts suppliers.restrict(lambda{ city == 'London' }).join(cities)
-
-* Alf is also an educational tool, that I've written to draw people attention
- about the ill-known relational theory (and ill-represented by SQL). The tool
- is largely inspired from **Tutorial D**, the tutorial language of Chris Date and
- Hugh Darwen in their books, more specifically in
- {http://www.thethirdmanifesto.com/ *The Third Manifesto* (TTM)}.
- However, Alf only provides an overview of the relational _algebra_ defined
- there (Alf is neither a relational _database_, nor a relational _language_).
- I hope that people (especially talented developers) will be sufficiently
- enticed by features shown here to open that book, read it more deeply, and
- implement new stuff around Date & Darwen's vision. Have a look at the result of
- the following query for the kind of things that you'll never ever have in SQL
- (see also 'alf help quota', 'alf help wrap', 'alf help group', ...):
-
- % alf --text summarize supplies -- sid -- total "sum{ qty }" which "collect{ pid }"
-
-* Last, but not least, Alf is an attempt to help me test some research ideas and
- communicate about them with people that already know (all or part) of the TTM
- vision of relational theory. These people include members of the TTM mailing
- list as well as other people implementing some of the TTM ideas (see
- {https://github.com/dkubb/veritas Dan Kubb's Veritas project} for example). For
- this reason, specific features and/or operators are mine, should be considered
- 'research work in progress', and used with care because not necessarily in
- conformance with the TTM.
-
- % alf --text quota supplies -- sid -- qty -- pos "count()"
-
-## Overview of relational theory
-
-We quickly recall relational theory in this section, as described in the TTM
-book. Readers not familiar with Date and Darwen's vision of relational theory
-should probably read this section, even if fluent in SQL. Others may probably
-skip this section. A quick test?
-
-> _A relation is a value, precisely a set of tuples, which are themselves values.
- Therefore, a relation is immutable, not ordered, does not contain duplicates,
- and does not have null/nil attributes._
-
-Familiar? Skip. Otherwise, read on.
-
-### The example database
-
-This README file shows a lot of examples built on top of the following suppliers
-& parts database (almost identical to the original version in C. J. Date's database
-books). By default, the alf command line is wired to this embedded example. All
-examples shown here should therefore work immediately, if you want to reproduce
-them!
-
- % alf show database
-
- +-------------------------------------+-------------------------------------------------+-------------------------+------------------------+
- | :suppliers | :parts | :cities | :supplies |
- +-------------------------------------+-------------------------------------------------+-------------------------+------------------------+
- | +------+-------+---------+--------+ | +------+-------+--------+------------+--------+ | +----------+----------+ | +------+------+------+ |
- | | :sid | :name | :status | :city | | | :pid | :name | :color | :weight | :city | | | :city | :country | | | :sid | :pid | :qty | |
- | +------+-------+---------+--------+ | +------+-------+--------+------------+--------+ | +----------+----------+ | +------+------+------+ |
- | | S1 | Smith | 20 | London | | | P1 | Nut | Red | 12.0000000 | London | | | London | England | | | S1 | P1 | 300 | |
- | | S2 | Jones | 10 | Paris | | | P2 | Bolt | Green | 17.0000000 | Paris | | | Paris | France | | | S1 | P2 | 200 | |
- | | S3 | Blake | 30 | Paris | | | P3 | Screw | Blue | 17.0000000 | Oslo | | | Athens | Greece | | | S1 | P3 | 400 | |
- | | S4 | Clark | 20 | London | | | P4 | Screw | Red | 14.0000000 | London | | | Brussels | Belgium | | | S1 | P4 | 200 | |
- | | S5 | Adams | 30 | Athens | | | P5 | Cam | Blue | 12.0000000 | Paris | | +----------+----------+ | | S1 | P5 | 100 | |
- | +------+-------+---------+--------+ | | P6 | Cog | Red | 19.0000000 | London | | | | S1 | P6 | 100 | |
- | | +------+-------+--------+------------+--------+ | | | S2 | P1 | 300 | |
- | | | | | S2 | P2 | 400 | |
- | | | | | S3 | P2 | 200 | |
- | | | | | S4 | P2 | 200 | |
- | | | | | S4 | P4 | 300 | |
- | | | | | S4 | P5 | 400 | |
- | | | | +------+------+------+ |
- +-------------------------------------+-------------------------------------------------+-------------------------+------------------------+
-
-Many people think that relational databases are necessary 'flat', that they are
-necessarily limited to simple scalar values put in two dimension tables. This is
-wrong; most SQL databases are indeed 'flat', but _relations_ (in the mathematical
-sense of the relational theory) are not! Look, **the example above is a relation!**;
-that 'contains' other relations as particular values, which, in turn, could
-'contain' relations or any other 'simple' or more 'complex' value... This is not
-"flat" at all, after all :-)
-
-### Types and Values
-
-To understand what is a relation exactly, one needs to remember elementary
-notions of set theory and the concepts of _type_ and _value_.
-
-* A _type_ is a finite set of values; it is not particularly ordered and, being
-a set, it does never contain two values which are equal (any type is necessarily
-accompanied with an equality operator, denoted here by '==').
-
-* A _value_ is **immutable** (you cannot 'change' a value, in any way), has no
-localization in time and space, and is always typed (that is, it is always
-accompanied by some identification of the type it belongs to).
-
-As you can see, _type_ and _value_ are not the same concepts as _class_ and
-_object_, which you are probably more familiar with. Alf considers that the
-latter are _implementations_ of the former. Alf assumes _valid_ implementations
-(equality and hash methods must be correct) and _valid_ usage (objects used for
-representing values are kept immutable in practice). Alf _assumes_ this, but
-does not _enforces_ it: it is your responsibility to use Alf in conformance with
-these preconditions. That being said, if you want **arrays, colors, ranges, or
-whatever in your relations**, just do it! You can even join on them, restrict on
-them, summarize on them, and so on:
-
- % alf extend suppliers -- chars "name.chars.to_a" | alf --text restrict -- "chars.last == 's'"
-
- +------+-------+---------+--------+-----------------+
- | :sid | :name | :status | :city | :chars |
- +------+-------+---------+--------+-----------------+
- | S2 | Jones | 10 | Paris | [J, o, n, e, s] |
- | S5 | Adams | 30 | Athens | [A, d, a, m, s] |
- +------+-------+---------+--------+-----------------+
-
-A last, very important word about values. **Null/nil is not a value**. Strictly
-speaking therefore, you may not use null/nil inside your data files or datasources
-representing relations. That being said, Alf provides specific support for handling
-them, because they appear in today's databases in practice and that Alf aims at
-being a tool that helps you tackling _practical_ problems. See the section with
-title "Why is Alf Exactly?" later.
-
-### Tuples and Relations
-
-Tuples (aka records) and relations are values as well, which explains why you
-can have them inside relations!
-
-* Logically speaking, a tuple is a set of (attribute name, attribute value)
- pairs. Moreover, it does not contain two attributes with the same name and is
- **not particularly ordered**. Also, **a tuple is a _value_, and is therefore
- immutable**. Last, but not least, a tuple **does not admit nulls/nils**. Tuples
- in Alf are simply implemented with ruby hashes, taken as tuple implementations.
- Not all hashes are valid tuple implementations, of course (those containing nil
- are not, for example). Alf _assumes_ valid tuples, but does not _enforce_ this
- precondition. It's up to you to use Alf the right way! No support is or will
- ever be provided for ordering tuple attributes. However, as hashes are ordered
- in Ruby 1.9, Alf implements a best effort strategy to keep a friendly ordering
- when rendering tuples and relations. This is a very good practical reason for
- migrating to ruby 1.9 if not already done!
-
- {:sid => "S1", :name => "Smith", :status => 20, :city => "London"}
-
-* A _relation_ is a set of tuples. Being a set, a relation does **never contain
- duplicates** (unlike SQL that works on bags, not on sets) and is **not
- particularly ordered**. Moreover, all tuples of a relation must have the same
- _heading_, that is, the same set of attribute (name, type) pairs. Also, **a
- relation is a _value_, is therefore immutable** and **does not admit null/nil**.
-
-Alf is mainly an implementation of relational algebra (see section below). The
-implemented operators consider any Iterator of tuples as potentially valid
-operand. In addition Alf provides a Relation ruby class, that acts as an
-in-memory data structure that provides an Object-Oriented API to call operators
-(see "Interfacing Alf in Ruby" below).
-
-### Relational Algebra
-
-In classical algebra, you can make computations like <code>(5 + 2) - 3</code>.
-In relational algebra, you can make similar things on relations. Alf uses an
-infix, functional programming-oriented syntax for algebra expressions:
-
- (minus (union :suppliers, xxx), yyy)
-
-All relational operators take relation operands in input and return a relation
-as output. We say that the relational algebra is _closed_ under its operators.
-In practice, it means that operands may always be sub-expressions, **always**.
-
- (minus (union (restrict :suppliers, lambda{ zzz }), xxx), yyy)
-
-In shell, the closure property means that you can pipe alf invocations the way
-you want! The same query, in shell:
-
- alf restrict suppliers -- "zzz" | alf union xxx | alf minus yyy
-
-## What is Alf exactly?
-
-*The Third Manifesto* defines a series of prescriptions, proscriptions and very
-strong suggestions for designing a truly relational _language_, called a _D_,
-as an alternative to SQL for managing relational databases. This is far behind
-my objective with Alf, as it does not touch at database issues at all (persistence,
-transactions, and so on.) and don't actually define a programming language (only
-a small functional ruby DSL).
-
-Alf must simply be interpreted as a ruby library implementing (a variant of)
-Date and Darwen's relational algebra. This library is designed as a set of operator
-implementations, that work as tuple iterators taking other tuple iterators as
-input. Under the pre-condition that you provide them _valid_ tuple iterators as
-input (no duplicates, no nil, + other preconditions on an operator basis), the
-result is a valid iterator as well. Unless explicitely stated otherwise, any
-behavior observed when not respecting these preconditions, even an interesting
-behavior, is not guaranteed and might change with tiny version changes (see
-section about versioning policy at the end of this file).
-
-### The command line utility
-
- #
- # Provided that suppliers and cities are valid relation representations
- # [something similar]
- #
- % alf restrict suppliers -- "city == 'London'" | alf join cities
-
- # the resulting stream is a valid relation representation in the output
- # stream format that you have selected (.rash by default). It can therefore
- # be piped to another alf shell invocation, or saved to a file and re-read
- # later (under the assumption that input and output data formats match, or
- # course). [Something similar about responsibility and bug].
-
-If you take a look at .alf example files, you'll find functional ruby expressions
-like the following (called Lispy expressions):
-
- % cat examples/operators/minus.alf
-
- # Give all suppliers, except those living in Paris
- (minus :suppliers,
- (restrict :suppliers, lambda{ city == 'Paris' }))
-
- # This is a contrived example for illustrating minus, as the
- # following is equivalent
- (restrict :suppliers, lambda{ city != 'Paris' })
-
-You can simply execute such expressions with the alf command line itself (the
-three following invocations return the same result):
-
- % alf examples/operators/minus.alf | alf show
- % alf show minus
- % alf -e "(restrict :suppliers, lambda{ city != 'Paris' })" | alf show
-
-Symbols are magically resolved from the environment, which is wired to the
-examples by default. See the dedicated sections below to update this behavior
-to your needs.
-
-### The algebra compiler
-
- #
- # Provided that :suppliers and :cities are valid relation representations
- # (under the responsibility shared by you and the Reader and Environment
- # subclasses you use -- see later), then,
- #
- op = Alf.lispy.compile {
- (join (restrict :suppliers, lambda{ city == 'London' }), :cities)
- }
-
- # op is a thread-safe Enumerable of tuples, that can be taken as a valid
- # relation representation. It can therefore be used as the input operand
- # of any other expression. This is under Alf's responsibility, and any
- # failure must be considered a bug!
-
-### The Relation data structure
-
-In addition, Alf is bundled with an in-memory Relation data structure that
-provided a more abstract API for manipulating relations in simple cases (the
-rules are the same about pre and post-conditions):
-
- # The query above can be done as follows. Note that relations are always
- # loaded in memory here!
- suppliers = Alf::Relation[ ... ]
- cities = Alf::Relation[ ... ]
- suppliers.restrict(lambda{ city == 'London' }).
- join(cities)
- # => Alf::Relation[ ... ]
-
-All relational operators have an instance method equivalent on the Alf::Relation
-class. Semantically, the receiver object is simply the first operand of the
-functional call, as illustrated above.
-
-### Where do relations come from?
-
-Relation literals can simply be written as follows:
-
- suppliers = Alf::Relation[
- {:sid => 'S1', :name => 'Smith', :status => 20, :city => 'London'},
- {:sid => 'S2', :name => 'Jones', :status => 10, :city => 'Paris'},
- {:sid => 'S3', :name => 'Blake', :status => 30, :city => 'Paris'},
- {:sid => 'S4', :name => 'Clark', :status => 20, :city => 'London'},
- {:sid => 'S5', :name => 'Adams', :status => 30, :city => 'Athens'},
- ]
-
-Environment classes serve datasets (see later) that always have a to_rel method
-for obtaining in-memory relations:
-
- env = Alf::Environment.examples
- env.dataset(:suppliers).to_rel
- # => Alf::Relation[ ... ]
-
-Compiled expressions always have a to_rel method that allows obtaining an
-in-memory relation:
-
- op = Alf.lispy.compile {
- (join (restrict :suppliers, lambda{ city == 'London' }), :cities)
- }
- op.to_rel
- # => Alf::Relation[...]
-
-Lispy provides an 'evaluate' method which is precisely equivalent to the chain
-above. Therefore:
-
- rel = Alf.lispy.evaluate {
- (join (restrict :suppliers, lambda{ city == 'London' }), :cities)
- }
- # => Alf::Relation[...]
-
-### Algebra is closed under its operators!
-
-Of course, from the closure property of a relational algebra (that states that
-operators works on relations and return relations), you can use a sub expression
-*everytime* a relational operand is expected, everytime:
-
- # Compute the total qty supplied in each country together with the subset
- # of products shipped there. Only consider suppliers that have a status
- # greater than 10, however.
- (summarize \
- (join \
- (join (restrict :suppliers, lambda{ status > 10 }),
- :supplies),
- :cities),
- [:country],
- :which => Agg::collect(:pid),
- :total => Agg::sum{ qty })
-
-Of course, complex queries quickly become unreadable that way. But you can always
-split complex tasks in more simple ones:
-
- kept_suppliers = (restrict :suppliers, lambda{ status > 10 })
- with_countries = (join kept_suppliers, :cities),
- supplying = (join with_countries, :supplies)
- (summarize supplying,
- [:country],
- :which => Agg::collect(:pid),
- :total => Agg::sum{ qty })
-
-And here is the result !
-
- +------+--------+--------------------------+
- | :sid | :total | :which |
- +------+--------+--------------------------+
- | S1 | 1300 | [P1, P2, P3, P4, P5, P6] |
- | S2 | 700 | [P1, P2] |
- | S3 | 200 | [P2] |
- | S4 | 900 | [P2, P4, P5] |
- +------+--------+--------------------------+
-
-### Reference API
-
-For now, the Ruby API is documented in the commandline help itself (a cheatsheet
-or something will be provided as soon as possible). For example, you'll find the
-allowed syntaxes for RESTRICT as follows:
-
- % alf help restrict
-
- ...
- API & EXAMPLE
-
- # Restrict to suppliers with status greater than 20
- (restrict :suppliers, lambda{ status > 20 })
-
- # Restrict to suppliers that live in London
- (restrict :suppliers, lambda{ city == 'London' })
- ...
-
-### Coping with non-relational data sources (nil, duplicates, etc.)
-
-Alf aims at being a tool that helps you tackling practical problems, and
-denormalized and/or noisy data is one of them. Missing values occur. Duplicates
-abound in SQL databases lacking primary keys, and so on. Using Alf's relational
-operators on such inputs is not a good idea, because it is a strong precondition
-violation. This is not because relational theory is weak, but because extending
-it to handle null/nil and duplicates correctly has been proven at best a nightmare,
-and at worse a mess. As a practical exercice, try to extend classical algebra
-with versions of +, - * and / that handle nil in such a way that the resulting
-theory is sound and still looks intuitive! Then do it on boolean algebra with
-_and_, _or_ and _not_. Then, add null/nil to classical set theory. Classical
-algebra, boolean algebra, and set theory are important building blocks behind
-relational algebra because almost all of its operators are defined on top of
-them...
-
-So what? The approach choosen in Alf to handle this conflict is very pragmatic.
-First of all, Alf implements a best effort strategy -- where possible -- to
-remain friendly in presence of null/nil on attributes that have no influence on
-an operator's job. For example, the query below will certainly fail if _status_
-is null/nil, but it won't probably fail if any other attribute is nil.
-
- % alf restrict suppliers -- "status > 10"
-
-This best-effort strategy is not enough, and striclty speaking, must be considered
-unsound (for example, it strongly hurts optimization possibilities). Therefore,
-I strongly encourage you to go a step further: **if relational operators want
-true relations as input, please, give them!**. For this, Alf also provides a few
-non-relational operators in addition to relational ones. Those operators must be
-interpreted as "pre-relational" operators, in the sense that they help obtaining
-valid relation representations from invalid ones. Provided that you use them
-correctly, their output can safely be used as input of a relational operator.
-You'll find,
-
-* <code>alf autonum</code> -- ensure no duplicates by generating a unique attribute
-* <code>alf compact</code> -- brute-force duplicates removal
-* <code>alf defaults</code> -- replace nulls/nil by valid values, on an attribute basis
-
-Play the game, it's easy!
-
-- _Give id, name and status of suppliers whose status is greater that 10_
-- Hey man, we don't know the status for all suppliers! What about these cases?
-- _Ignore them_
-- No problem dude!
-
- % alf defaults --strict suppliers -- sid '' name '' status 0 | alf restrict -- "status > 10"
-
-### Alf is duck-typed
-
-The relational theory is often considered under a statically-typed point
-of view. When considering tuples and relations, for example, the notion of
-_heading_, a set of (name,type) pairs, is central. For example, a heading for
-a supplier tuple/relation could be:
-
- {:sid => String, :name => Name, :status => Integer, :city => String}
-
-Most relational operators have preconditions in terms of the headings of their
-operands. For example, _minus_ and _union_ require their operands to have same
-heading, while _rename_ requires renamed attributes to exist in operand's
-heading, and so on. Given an expression in relational algebra, it is always
-possible to compute the heading of the resulting relation, by statically
-analyzing the whole query expression in the light of a catalog of typed
-operators. This way, a tool can check that a query is statically valid, i.e.
-that it respects operator preconditions. While this approach has the major
-advantage of allowing strong optimizations, it also has a few drawbacks (as
-the need to know the heading of used datasources in advance) and is difficult to
-mary with dynamically-typed languages like Ruby. Therefore, Alf takes another
-approach, which is similar to duck-typing. In essence, this approach can be
-summarized as follows:
-
-- _You have the responsibility of not violating operators' preconditions. If you
- do, Alf has the responsibility of returning correct results._.
-- No problem dude!
-
-## More about the shell command line
-
- % alf --help
-
-The help command will display the list of available operators. Each of them is
-completely described with 'alf help OPERATOR'. They all have a similar invocation
-syntax in shell:
-
- % alf operator operands... -- args...
-
-For example, try the following:
-
- # display suppliers that live in Paris
- % alf restrict suppliers -- "city == 'Paris'"
-
- # join suppliers and cities (no args here)
- % alf join suppliers cities
-
-### Recognized data streams/files (.rash files)
-
-For educational purposes, 'suppliers' and 'cities' inputs are magically resolved
-as denoting the files examples/operators/suppliers.rash and
-examples/operators/cities.rash, respectively. You'll find other data files:
-parts.rash, supplies.rash that are resolved magically as well and with which you
-can play. For non-educational purposes, operands may always be explicit files,
-or you can force the folder in which datasource files have to be found:
-
- # The following invocations are equivalent
- % alf restrict /tmp/foo.rash -- "..."
- % alf --env=/tmp restrict foo -- "..."
-
-A .rash file is simply a file in which each line is a ruby Hash, intended to
-represent a tuple. Under theory-driven preconditions, a .rash file can be seen
-as a valid (straightforward but useful) physical representation of a relation!
-When used in shell, alf dumps query results in the .rash format by default,
-which opens the ability of piping invocations! Indeed, unary operators read their
-operand on standard input if not specific as command argument. For example, the
-invocation below is equivalent to the one given above.
-
- # display suppliers that live in Paris
- % cat examples/operators/suppliers.rash | alf restrict -- "city == 'Paris'"
-
-Similarly, when only one operand is present in invocations of binary operators,
-they read their left operand from standard input. Therefore, the join given in
-previous section can also be written as follows:
-
- % cat examples/operators/suppliers.rash | alf join cities
-
-The relational algebra is _closed_ under its operators, which means that these
-operators take relations as operands and return a relation. Therefore operator
-invocations can be nested, that is, operands can be other relational expressions.
-When you use alf in a shell, it simply means that you can pipe operators as you
-want:
-
- % alf show --rash suppliers | alf join cities | alf restrict -- "status > 10"
-
-### Obtaining a friendly output
-
-The show command (which is **not** a relational operator) can be used to obtain
-a more friendly output:
-
- # it renders a text table by default
- % alf show [--text] suppliers
-
+------+-------+---------+--------+
| :sid | :name | :status | :city |
+------+-------+---------+--------+
| S1 | Smith | 20 | London |
| S2 | Jones | 10 | Paris |
| S3 | Blake | 30 | Paris |
| S4 | Clark | 20 | London |
| S5 | Adams | 30 | Athens |
+------+-------+---------+--------+
- # and reads from standard input without argument!
- % alf restrict suppliers "city == 'Paris'" | alf show
+ % alf --examples group suppliers -- size name status -- in_that_city
- +------+-------+---------+-------+
- | :sid | :name | :status | :city |
- +------+-------+---------+-------+
- | S2 | Jones | 10 | Paris |
- | S3 | Blake | 30 | Paris |
- +------+-------+---------+-------+
+ +--------+----------------------------+
+ | :city | :in_that_city |
+ +--------+----------------------------+
+ | London | +------+-------+---------+ |
+ | | | :sid | :name | :status | |
+ | | +------+-------+---------+ |
+ | | | S1 | Smith | 20 | |
+ | | | S4 | Clark | 20 | |
+ | | +------+-------+---------+ |
+ | Paris | +------+-------+---------+ |
+ | | | :sid | :name | :status | |
+ | | +------+-------+---------+ |
+ | | | S2 | Jones | 10 | |
+ | | | S3 | Blake | 30 | |
+ | | +------+-------+---------+ |
+ | Athens | +------+-------+---------+ |
+ | | | :sid | :name | :status | |
+ | | +------+-------+---------+ |
+ | | | S5 | Adams | 30 | |
+ | | +------+-------+---------+ |
+ +--------+----------------------------+
-Other formats can be obtained (see 'alf help show'). For example, you can generate
-a .yaml file, as follows:
+## Ruby Example
- % alf restrict suppliers -- "city == 'Paris'" | alf show --yaml
+ # Let get the same database in ruby
+ db = Alf.examples
-### Executing .alf files
+ # Group suppliers by city
+ grouped = db.query{
+ group(:suppliers, [:sid, :name, :status], :in_that_city)
+ }
+ # => same result as in shell
-You'll also find .alf files in the examples folder, that contain more complex
-examples in the Ruby functional syntax (see section below).
+ # Let make some computations on the sub-relations
+ db.query{
+ extend(grouped, how_many: ->{ in_that_city.count },
+ avg_status: ->{ in_that_city.avg{ status } })
+ }
+ # +--------+----------------------------+-----------+-------------+
+ # | :city | :in_that_city | :how_many | :avg_status |
+ # +--------+----------------------------+-----------+-------------+
+ # | London | +------+-------+---------+ | 2 | 20.000 |
+ # | | | :sid | :name | :status | | | |
+ # | | +------+-------+---------+ | | |
+ # | | | S1 | Smith | 20 | | | |
+ # | | | S4 | Clark | 20 | | | |
+ # | | +------+-------+---------+ | | |
+ # | Paris | +------+-------+---------+ | 2 | 20.000 |
+ # | | | :sid | :name | :status | | | |
+ # | | +------+-------+---------+ | | |
+ # | | | S2 | Jones | 10 | | | |
+ # | | | S3 | Blake | 30 | | | |
+ # | | +------+-------+---------+ | | |
+ # | Athens | +------+-------+---------+ | 1 | 30.000 |
+ # | | | :sid | :name | :status | | | |
+ # | | +------+-------+---------+ | | |
+ # | | | S5 | Adams | 30 | | | |
+ # | | +------+-------+---------+ | | |
+ # +--------+----------------------------+-----------+-------------+
- % cat examples/operators/group.alf
- #!/usr/bin/env alf
- (group :supplies, [:pid, :qty], :supplying)
+ # Now observe that the same result can also be expressed as follows (and can be
+ # optimized more easily)
+ summarized = db.query{
+ summary = summarize(:suppliers, [ :city ], how_many: count, avg_status: avg{ status })
+ join(grouped, summary)
+ }
-You can simply execute these files with alf directly as follows:
+ # Oh, and of course...
+ require 'json'
+ puts summarized.to_json
+ # [{"city":"London","in_that_city":[{"sid":"S1","name":"Smith","status":20},{"sid":"S4"...
- # the following works, as well as the shortcut 'alf show group'
- % alf examples/group.alf | alf show
-
- +------+-----------------+
- | :sid | :supplying |
- +------+-----------------+
- | S1 | +------+------+ |
- | | | :pid | :qty | |
- | | +------+------+ |
- | | | P1 | 300 | |
- | | | P2 | 200 | |
- ...
-
-Also, mimicing the ruby executable, the following invocation is also possible:
+## Install, bundler, require
- % alf -e "(restrict :suppliers, lambda{ city == 'Paris' })"
+ % [sudo] gem install alf [fastercsv, ...]
+ % alf --help
-where the argument is a relational expression in Alf's Lispy dialect, which
-is detailed in the next section.
+ # API is not considered stable enough for now, please use
+ gem "alf", "= 0.13.0"
-## More about Alf in Ruby
+ # The following should not break your code, but is a bit less safe,
+ # until 1.0.0 has been reached
+ gem "alf", "~> 0.13.0"
-### Calling commands 'ala' shell
-
-For simple cases, the easiest way of using Alf in ruby is probably to mimic
-what you have in shell:
-
- % alf restrict suppliers -- "city == 'Paris'"
-
-Then, in ruby
-
- #
- # 1. create an engine on an environment (see section about environments later)
- # 2. run a command
- # 3. op is a thread-safe enumerable of tuples, see the Lispy section below)
- #
- lispy = Alf.lispy(Alf::Environment.examples)
- op = lispy.run(['restrict', 'suppliers', '--', "city == 'Paris'"])
-
-If this kind of API is not sufficiently expressive for you, you'll have to learn
-the APIs deeper, and use the Lispy functional style that Alf provides, which can
-be compiled and used as explained in the next section.
-
-### Compiler vs. Relation data structure
-
-The compilers allow you to manipulate algebra expressions. Just obtain a Lispy
-instance on an environment and you're ready:
-
- #
- # Expressions can simply be compiled as illustrated below. We use the
- # examples environment here, see the dedicated section later about other
- # available environments.
- #
- lispy = Alf.lispy(Alf::Environment.examples)
- london_suppliers = lispy.compile do
- (restrict :suppliers, lambda{ city == 'London' })
- end
-
- #
- # Returned operator is an enumerable of ruby hashes. Provided that datasets
- # offered by the environment (:suppliers here) can be enumerated more than
- # once, the operator may be used multiple times and is even thread safe!
- #
- london_suppliers.each do |tuple|
- # tuple is a ruby Hash
- end
-
- #
- # Now, maybe you want to reuse op in a larger query, for example
- # by projecting on the city attribute... Here is how this can be
- # done:
- #
- projection = (project london_suppliers, [:city])
-
-Note that the examples above manipulate algebra operators, not relations per se.
-This means that equality and other such operators, that operate on relation
-_values_, do not operate correctly here:
-
- projection == Alf::Relation[{:city => 'London'}]
- # => nil
-
-In contrast, you can use such operators when operating on true relation values:
-
- projection.to_rel == Alf::Relation[{:city => 'London'}]
- # => true
-
-### Using/Implementing other Environments
-
-An Environment instance if passed as first argument of <code>Alf.lispy</code>
-and is responsible of resolving named datasets. A base class Environment::Folder
-is provided with the Alf distribution, with a factory method on the Environment
-class itself.
-
- env = Alf::Environment.folder("path/to/a/folder")
-
-An environment built that way will look for .rash and .alf files in the specified
-folder and sub-folders. I'll of course strongly consider any contribution
-implementing the Environment contract on top of SQL or NoSQL databases or anything
-that can be useful to manipulate with relational algebra. Such contributions can
-be added to the project directly. A base template would look like:
-
- class Foo < Alf::Environment
-
- #
- # You should at least implement the _dataset_ method that resolves a
- # name (a Symbol instance) to an Enumerable of tuples (typically a
- # Reader). See Alf::Environment for exact contract details.
- #
- def dataset(name)
- end
-
- end
-
-Read more about Environment's API so as to let your environment be recognized
-in shell (--env=...) on rubydoc.info
-
-### Adding file decoders, aka Readers
-
-Environments should not be confused with Readers (see Reader class and its
-subclasses). While the former resolve named datasets, the latter decode files
-and/or other resources as tuple enumerables. Environments typically serve Reader
-instances in response to dataset resolving.
-
-Reader implementations decoding .rash and .alf files are provided in the main
-alf.rb file. It's relatively easy to implement the Reader contract by extending
-the Reader class and implementing an each method. Once again, contributions are
-very welcome in lib/alf/reader (.csv files, .log files, and so on). A basic
-template for this is as follows:
-
- class Bar < Alf::Reader
-
- #
- # You should at least implement each, see Alf::Reader which provides a
- # base implementation and a few tools
- #
- def each
- # [...]
- end
-
- # By registering it, the Folder environment will automatically
- # recognize and decode .bar files correctly!
- Alf::Reader.register(:bar, [".bar"], self)
-
- end
-
-### Adding outputters, aka Renderers
-
-Similarly, you can contribute renderers to output relations in html, or whatever
-format you would consider interesting. See the Renderer class, and consider the
-following template for contributions in lib/alf/renderer
-
- class Glim < Alf::Renderer
-
- #
- # You should at least implement the execute method that renders tuples
- # given in _input_ (an Enumerable of tuples) on the output buffer
- # and returns the latter. See Alf::Renderer for the exact contract
- # details.
- #
- def execute(output = $stdout)
- # [...]
- output
- end
-
-
- # By registering it, the output options of 'alf show' will
- # automatically provide your --glim contribution
- Alf::Renderer.register(:glim, "as a .glim file", self)
-
- end
-
## Related Work & Tools
-- You should certainly have a look at the Third Manifesto website: {http://www.thethirdmanifesto.com/}
-- Why not reading the {http://www.dcs.warwick.ac.uk/~hugh/TTM/DBE-Chapter01.pdf
- third manifesto paper} itself?
-- Also have a look at {http://www.dcs.warwick.ac.uk/~hugh/TTM/Projects.html other
- implementation projects}, especially {http://dbappbuilder.sourceforge.net/Rel.php Rel}
- which provides an implementation of the **Tutorial D** language.
-- {https://github.com/dkubb/veritas Dan Kubb's Veritas} project is worth considering
+- You should certainly have a look at the
+ [Third Manifesto website](http://www.thethirdmanifesto.com/):
+- Why not reading the
+ [third manifesto](http://www.dcs.warwick.ac.uk/~hugh/TTM/DBE-Chapter01.pdf) ?
+- Also have a look at
+ [other implementation projects](http://www.dcs.warwick.ac.uk/~hugh/TTM/Projects.html)
+ especially [Rel](http://dbappbuilder.sourceforge.net/Rel.php) which provides an
+ implementation of the **Tutorial D** language.
+- [Dan Kubb's Veritas](https://github.com/dkubb/veritas) project is worth considering
also in the Ruby community. While very similar to Alf in providing a pure ruby
algebra implementation, Veritas mostly provides a framework for manipulating
- and statically analyzing algebra expressions so as to be able to
- {https://github.com/dkubb/veritas-optimizer optimize them} and
- {https://github.com/dkubb/veritas-sql-generator compile them to SQL}. We are
- working together with Dan Kubb to see how Alf and Veritas could be closer from
- each other in the future, if not in their codebase, at least in using the very
- same terminology for the same concepts.
-
+ and statically analyzing algebra expressions so as to be able to
+ [optimize them](https://github.com/dkubb/veritas-optimizer) and
+ [compile them to SQL](https://github.com/dkubb/veritas-sql-generator).
+
## Contributing
-### Alf is open source
+You know the rules:
-You know the rules:
-
* The code is on github https://github.com/blambeau/alf
* Please report any problem or bug in the issue tracker on github
* Don't hesitate to fork and send me a pull request for any contribution/idea!
Alf is distributed under a MIT licence. Please let me know if it does not fit
your needs and I'll see what I can do!
-### Internals -- Tribute to Sinatra
+## Roadmap
-Alf's code style is very inspired from what I've found in Sinatra when looking
-at its internals a few months ago. Alf, as Sinatra, is mostly implemented in a
-single file, lib/alf.rb. Everything is there except specific third-party contributions
-(in lib/alf/...). You'll need an editor or IDE that supports code folding/unfolding.
-Then, follow the guide:
-
-1. Fold everything but the Alf module.
-2. Main concepts, first level of abstraction, should fit on the screen
-3. Unfold the concept you're interested in, and return to the previous bullet
-
-### Roadmap
-
Below is what I've imagined about Alf's future. However, this is to be interpreted
as my own wish list, while I would love hearing yours instead.
-- Towards 1.0.0, I would like to stabilize and document Alf public APIs as well
+- Towards 1.0.0, I would like to stabilize and document Alf public APIs as well
as internals (a few concepts are still unstable there). Alf also has a certain
number of limitations that are worth overcoming for version 1.0.0. The latter
- include the semantically wrong way of applying joins on sub-relations, the
+ include the semantically wrong way of applying joins on sub-relations, the
impossibility to use Lispy expressions on sub-relations in extend, and the error
management which is unspecific and unfriendly so far.
-- I also would like starting collecting Reader, Renderer and Environment
- contributions for common data sources (SQL, NoSQL, CSV, LOGS) and output
- formats (HTML, XML, JSON). Contributions could be either developped as different
- gem projects or distributed with Alf's gem and source code, I still need to
+- I also would like starting collecting Reader, Renderer and Connection
+ contributions for common data sources (SQL, NoSQL, CSV, LOGS) and output
+ formats (HTML, XML, JSON). Contributions could be either developped as different
+ gem projects or distributed with Alf's gem and source code, I still need to
decide the exact policy (suggestions are more than welcome here)
- Alf will remain a practical tool before everything else. In the middle term,
I would like to complete the set of available operators (relational and non-
- relational ones). Some of them will be operators described in D & D books
- while others will be new suggestions of mine.
-- In the long term Alf should be able to avoid loading tuples in memory (under
- a certain number of conditions on datasources) for almost all queries.
-- Without targetting a fast tool at all, I also would like Alf to provide a basic
- optimizer that would be able to push equality restrictions down and materialize
- sub-expressions used more than once in with expressions.
+ relational ones). Some of them will be operators described in D & D books
+ while others will be new suggestions of mine.
+- In the long term Alf should be able to avoid loading tuples in memory (under
+ a certain number of conditions on datasources) for almost all queries.
+- Without targetting a fast tool at all, I also would like Alf to provide a basic
+ optimizer that would be able to push equality restrictions down and materialize
+ sub-expressions used more than once in with expressions.
-### Versioning policy
+## Versioning policy
-Alf respects {http://semver.org/ semantic versioning}, which means that it has
+Alf respects [semantic versioning](http://semver.org/), which means that it has
a X.Y.Z version number and follows a few rules.
-- The public API is made of the commandline tool, the Lispy dialect and the
- Relation datastructure. This API will become stable with version 1.0.0 in a
+- The public API is made of the commandline tool, the Lispy dialect and the
+ Relation datastructure. This API will become stable with version 1.0.0 in a
near future.
-- Currently, version 1.0.0 **has not been reached**. It means that **anything
- may change at any time**. Best effort will be done to upgrade Y when backward
+- Currently, version 1.0.0 **has not been reached**. It means that **anything
+ may change at any time**. Best effort will be done to upgrade Y when backward
incompatible changes occur.
- Once 1.0.0 will be reached, the following rules will be followed:
- - Backward compatible bug fixes will increase Z.
- - New features and enhancements that do not break backward compatibility of
+ - Backward compatible bug fixes will increase Z.
+ - New features and enhancements that do not break backward compatibility of
the public API will increase the Y number.
- - Non backward compatible changes of the public API will increase the X
+ - Non backward compatible changes of the public API will increase the X
number.
-All classes and modules but Alf module, the Lispy DSL and Alf::Relation are part
+All classes and modules but Alf module, the Lispy DSL and Alf::Relation are part
of the private API and may change at any time. A best-effort strategy is followed
to avoid breaking internals on tiny (Z) version increases, especially extension
-points like Reader and Renderer.
-
-## Enjoy Alf!
-
-- No problem dude!
+points like Reader and Renderer.