README in citrus-1.2.1 vs README in citrus-1.2.2
- old
+ new
@@ -5,62 +5,322 @@
Parsing Expressions for Ruby
Citrus is a compact and powerful parsing library for Ruby that combines the
elegance and expressiveness of the language with the simplicity and power of
-parsing expression grammars.
+parsing expressions.
-Citrus grammars look very much like Treetop grammars but take a completely
-different approach. Instead of generating parsers from your grammars, Citrus
-evaluates grammars and rules in memory as Ruby modules. In fact, you can even
-define your grammars as Ruby modules in the first place, entirely skipping the
-parsing/evaluation step.
-Terminals are represented as either strings or regular expressions. Support for
-sequences, choices, labels, repetition, and lookahead (both positive and
-negative) are all included, as well as character classes and the dot-matches-
-anything symbol.
+ ** Installation **
-To try it out, fire up an IRB session from the root of the project and run one
-of the examples.
- $ irb -Ilib
- > require 'citrus'
- => true
- > Citrus.load 'examples/calc'
- => [Calc]
- > match = Calc.parse '1 + 5'
- => #<Citrus::Match ...
- > match.value
- => 6
+Via RubyGems:
-Be sure to try requiring `citrus/debug' (instead of just `citrus') if you'd like
-some better visualization of the match results.
+ $ sudo gem install citrus
-The code base is very small and it's well-documented and tested, so it should be
-fairly easy to understand for anyone who is familiar with parsing expressions.
+From a local copy:
+ $ git clone git://github.com/mjijackson/citrus.git
+ $ cd citrus
+ $ rake package && sudo rake install
- ** Links **
+ ** Background **
-http://pdos.csail.mit.edu/~baford/packrat/
-http://en.wikipedia.org/wiki/Parsing_expression_grammar
-http://treetop.rubyforge.org/index.html
+In order to be able to use Citrus effectively, you must first understand the
+difference between syntax and semantics. Syntax is a set of rules that govern
+the way letters and punctuation may be used in a language. For example, English
+syntax dictates that proper nouns should start with a capital letter and that
+sentences should end with a period.
- ** Installation **
+Semantics are the rules by which meaning may be derived in a language. For
+example, as you read a book you are able to make some sense of the particular
+way in which words on a page are combined to form thoughts and express ideas
+because you understand what the words themselves mean and you can understand
+what they mean collectively.
+Computers use a similar process when interpreting code. First, the code must be
+parsed into recognizable symbols or tokens. These tokens may then be passed to
+an interpreter which is responsible for forming actual instructions from them.
-Via RubyGems:
+Citrus is a pure Ruby library that allows you to perform both lexical analysis
+and semantic interpretation quickly and easily. Using Citrus you can write
+powerful parsers that are simple to understand and easy to create and maintain.
- $ sudo gem install citrus
+In Citrus, there are three main types of objects: rules, grammars, and matches.
-From a local copy:
+== Rules
- $ git clone git://github.com/mjijackson/citrus.git
- $ cd citrus
- $ rake package && sudo rake install
+A rule is an object that specifies some matching behavior on a string. There are
+two types of rules: terminals and non-terminals. Terminals can be either Ruby
+strings or regular expressions that specify some input to match. For example, a
+terminal created from the string "end" would match any sequence of the
+characters "e", "n", and "d", in that order. A terminal created from a regular
+expression uses Ruby's regular expression engine to attempt to create a match.
+
+Non-terminals are rules that may contain other rules but do not themselves match
+directly on the input. For example, a Repeat is a non-terminal that may contain
+one other rule that will try and match a certain number of times. Several other
+types of non-terminals are available that will be discussed later.
+
+Rule objects may also have semantic information associated with them in the form
+of Ruby modules. These modules contain methods that will be used to extend any
+match objects created by the rule with which they are associated.
+
+== Grammars
+
+A grammar is a container for rules. Usually the rules in a grammar collectively
+form a complete specification for some language, or a well-defined subset
+thereof.
+
+A Citrus grammar is really just a souped-up Ruby module. These modules may be
+included in other grammar modules in the same way that Ruby modules are normally
+used. This property allows you to divide a complex grammar into reusable pieces
+that may be combined dynamically at runtime. Any grammar rule with the same name
+as a rule in an included grammar may access that rule with a mechanism similar
+to Ruby's super keyword.
+
+== Matches
+
+Matches are created by rule objects when they match on the input. A match
+contains the string of text that made up the match as well as its offset in the
+original input string. During a parse, matches are arranged in a tree structure
+where any match may contain any number of other matches. This structure is
+determined by the way in which the rule that generated each match is used in the
+grammar.
+
+For example, a match that is created from a non-terminal rule that contains
+several other terminals will likewise contain several matches, one for each
+terminal.
+
+Match objects may be extended with semantic information in the form of methods.
+These methods can interpret the text of a match using the wealth of information
+available to them including the text of the match, its position in the input,
+and any submatches.
+
+
+ ** Syntax **
+
+
+The most straightforward way to compose a Citrus grammar is to use Citrus' own
+custom grammar syntax. This syntax borrows heavily from Ruby, so it should
+already be familiar to Ruby programmers.
+
+== Terminals
+
+Terminals may be represented by a string or a regular expression. Both follow
+the same rules as Ruby string and regular expression literals.
+
+ 'abc'
+ "abc\n"
+ /\xFF/
+
+Character classes and the dot (match anything) symbol are supported as well for
+compatibility with other parsing expression implementations.
+
+ [a-z0-9] # match any lowercase letter or digit
+ [\x00-\xFF] # match any octet
+ . # match anything, even new lines
+
+== Repetition
+
+Quantifiers may be used after any expression to specify a number of times it
+must match. The universal form of a quantifier is N*M where N is the minimum and
+M is the maximum number of times the expression may match.
+
+ 'abc'1*2 # match "abc" a minimum of one, maximum
+ # of two times
+ 'abc'1* # match "abc" at least once
+ 'abc'*2 # match "abc" a maximum of twice
+
+The + and ? operators are supported as well for the common cases of 1* and *1
+respectively.
+
+ 'abc'+ # match "abc" at least once
+ 'abc'? # match "abc" a maximum of once
+
+== Lookahead
+
+Both positive and negative lookahead are supported in Citrus. Use the & and !
+operators to indicate that an expression either should or should not match. In
+neither case is any input consumed.
+
+ &'a' 'b' # match a "b" preceded by an "a"
+ !'a' 'b' # match a "b" that is not preceded by an "a"
+ !'a' . # match any character except for "a"
+
+== Sequences
+
+Sequences of expressions may be separated by a space to indicate that the rules
+should match in that order.
+
+ 'a' 'b' 'c' # match "a", then "b", then "c"
+ 'a' [0-9] # match "a", then a numeric digit
+
+== Choices
+
+Ordered choice is indicated by a vertical bar that separates two expressions.
+Note that any operator binds more tightly than the bar.
+
+ 'a' | 'b' # match "a" or "b"
+ 'a' 'b' | 'c' # match "a" then "b" (in sequence), or "c"
+
+== Super
+
+When including a grammar inside another, all rules in the child that have the
+same name as a rule in the parent also have access to the super keyword to
+invoke the parent rule.
+
+== Labels
+
+Match objects may be referred to by a different name than the rule that
+originally generated them. Labels are created by placing the label and a colon
+immediately preceding any expression.
+
+ chars:/[a-z]+/ # the characters matched by the regular
+ # expression may be referred to as "chars"
+ # in a block method
+
+
+ ** Example **
+
+
+Below is an example of a simple grammar that is able to parse strings of
+integers separated by any amount of white space and a + symbol.
+
+ grammar Addition
+ rule additive
+ number plus (additive | number)
+ end
+
+ rule number
+ [0-9]+ space
+ end
+
+ rule plus
+ '+' space
+ end
+
+ rule space
+ [ \t]*
+ end
+ end
+
+Several things to note about the above example:
+
+* Grammar and rule declarations end with the "end" keyword
+* A Sequence of rules is created by separating expressions with a space
+* Likewise, ordered choice is represented with a vertical bar
+* Parentheses may be used to override the natural binding order
+* Rules may refer to other rules in their own definitions simply by using the
+ other rule's name
+* Any expression may be followed by a quantifier
+
+== Interpretation
+
+The grammar above is able to parse simple mathematical expressions such as "1+2"
+and "1 + 2+3", but it does not have enough semantic information to be able to
+actually interpret these expressions.
+
+At this point, when the grammar parses a string it generates a tree of Match
+objects. Each match is created by a rule. A match will know what text it
+contains, its offset in the original input, and what submatches it contains.
+
+Submatches are created whenever a rule contains another rule. For example, in
+the grammar above the number rule matches a string of digits followed by white
+space. Thus, a match generated by the number rule will contain two submatches.
+
+We can use Ruby's block syntax to create a module that will be attached to these
+matches when they are created and is used to lazily extend them when we want to
+interpret them. The following example shows one way to do this.
+
+ grammar Addition
+ rule additive
+ (number plus term) {
+ def value
+ number.value + term.value
+ end
+ }
+ end
+
+ rule term
+ (additive | number) {
+ def value
+ first.value
+ end
+ }
+ end
+
+ rule number
+ ([0-9]+ space) {
+ def value
+ text.strip.to_i
+ end
+ }
+ end
+
+ rule plus
+ '+' space
+ end
+
+ rule space
+ [ \t]*
+ end
+ end
+
+In this version of the grammar the additive rule has been refactored to use the
+term rule. This makes it a little cleaner to define our semantic blocks. It's
+easiest to explain what is going on here by starting with the lowest level
+block, which is defined within the number rule.
+
+The semantic block associated with the number rule defines one method, value.
+This method will be present on all matches that result from this rule. Inside
+this method, we can see that the value of a number match is determined to be
+its text value, stripped of white space and converted to an integer.
+
+Similarly, the block that is applied to term matches also defines a value
+method. However, this method works a bit differently. Since a term matches an
+additive or a number a term match will contain one submatch, the match that
+resulted from either additive or number. The first method retrieves the first
+submatch. So, the value of a term is determined to be the value of its first
+submatch.
+
+Finally, the additive rule also extends its matches with a value method. Here,
+the value of an additive is determined to be the values of its number and term
+matches added together using Ruby's addition operator.
+
+Since additive is the first rule defined in the grammar, any match that results
+from parsing a string with this grammar will have a value method that can be
+used to recursively calculate the collective value of the entire match tree.
+
+To give it a try, save the code for the Addition grammar in a file called
+addition.citrus. Next, assuming you have the Citrus gem installed, try the
+following sequence of commands in a terminal.
+
+ $ irb
+ > require 'citrus'
+ => true
+ > Citrus.load 'addition'
+ => [Addition]
+ > m = Addition.parse '1 + 2 + 3'
+ => #<Citrus::Match ...
+ > m.value
+ => 6
+
+Congratulations! You just ran your first piece of Citrus code.
+
+Take a look at examples/calc.citrus for an example of a calculator that is able
+to parse and evaluate more complex mathematical expressions.
+
+
+ ** Links **
+
+
+http://mjijackson.com/citrus
+http://pdos.csail.mit.edu/~baford/packrat/
+http://en.wikipedia.org/wiki/Parsing_expression_grammar
+http://treetop.rubyforge.org/index.html
** License **