README in citrus-2.2.2 vs README in citrus-2.3.0

- old
+ new

@@ -68,35 +68,34 @@ Rule objects may also have semantic information associated with them in the form of Ruby modules. Rules use these modules to extend the matches they create. ## Grammars -A grammar is a container for rules. Usually the rules in a grammar collectively -form a complete specification for some language, or a well-defined subset -thereof. +A [Grammar](api/classes/Citrus/Grammar.html) is a container for rules. Usually +the rules in a grammar collectively form a complete specification for some +language, or a well-defined subset thereof. A Citrus grammar is really just a souped-up Ruby [module](http://ruby-doc.org/core/classes/Module.html). These modules may be included in other grammar modules in the same way that Ruby modules are normally used. This property allows you to divide a complex grammar into more manageable, -reusable pieces that may be combined at runtime. Any grammar rule with the same -name as a rule in an included grammar may access that rule with a mechanism -similar to Ruby's super keyword. +reusable pieces that may be combined at runtime. Any rule with the same name as +a rule in an included grammar may access that rule with a mechanism similar to +Ruby's `super` keyword. ## Matches -Matches are created by rule objects when they match on the input. A -[Match](api/classes/Citrus/Match.html) is actually a -[String](http://ruby-doc.org/core/classes/String.html) object with some extra -information attached such as the name(s) of the rule(s) from which it was -generated and any submatches it may contain. +A [Match](api/classes/Citrus/Match.html) object represents a successful +recognition of some piece of the input. Matches are created by rule objects during a parse. -During a parse, matches are arranged in a tree structure where any match may -contain any number of other matches. This structure is determined by the way in -which the rule that generated each match is used in the grammar. For example, a -match that is created from a non-terminal rule that contains several other -terminals will likewise contain several matches, one for each terminal. +Matches are arranged in a tree structure where any match may contain any number +of other matches. Each match contains information about its own subtree. The +structure of the tree is determined by the way in which the rule that generated +each match is used in the grammar. For example, a match that is created from a +nonterminal rule that contains several other terminals will likewise contain +several matches, one for each terminal. However, this is an implementation +detail and should be relatively transparent to the user. Match objects may be extended with semantic information in the form of methods. These methods should provide various interpretations for the semantic value of a match. @@ -128,10 +127,13 @@ Also, strings may use backticks instead of quotes to indicate that they should match in a case-insensitive manner. `abc` # match "abc" in any case +Besides case sensitivity, case-insensitive strings have the same behavior as +double quoted strings. + See [Terminal](api/classes/Citrus/Terminal.html) and [StringTerminal](api/classes/Citrus/StringTerminal.html) for more information. ## Repetition @@ -170,10 +172,13 @@ that does not match a given expression. ~'a' # match all characters until an "a" ~/xyz/ # match all characters until /xyz/ matches +When using this operator (the tilde), at least one character must be consumed +for the rule to succeed. + See [AndPredicate](api/classes/Citrus/AndPredicate.html), [NotPredicate](api/classes/Citrus/NotPredicate.html), and [ButPredicate](api/classes/Citrus/ButPredicate.html) for more information. ## Sequences @@ -199,29 +204,29 @@ tightly than the vertical bar. A full chart of operators and their respective levels of precedence is below. See [Choice](api/classes/Citrus/Choice.html) for more information. +## Grouping + +As is common in many programming languages, parentheses may be used to override +the normal binding order of operators. In the following example parentheses are +used to make the vertical bar between `'b'` and `'c'` bind tighter than the +space between `'a'` and `'b'`. + + 'a' ('b' | 'c') # match "a", then "b" or "c" + ## Labels Match objects may be referred to by a different name than the rule that -originally generated them. Labels are created by placing the label and a colon +originally generated them. Labels are added by placing the label and a colon immediately preceding any expression. chars:/[a-z]+/ # the characters matched by the regular expression # may be referred to as "chars" in an extension # method -See [Label](api/classes/Citrus/Label.html) for more information. - -## Grouping - -As is common in many programming languages, parentheses may be used to override -the normal binding order of operators. - - 'a' ('b' | 'c') # match "a", then "b" or "c" - ## Extensions Extensions may be specified using either "module" or "block" syntax. When using module syntax, specify the name of a module that is used to extend match objects in between less than and greater than symbols. @@ -229,53 +234,70 @@ [a-z0-9]5*9 <CouponCode> # match a string that consists of any lower # cased letter or digit between 5 and 9 # times and extend the match with the # CouponCode module -Additionally, extensions may be specified inline using curly braces. Inside the -curly braces you may embed method definitions that will be used to extend match -objects. +Additionally, extensions may be specified inline using curly braces. When using +this method, the code inside the curly braces may be invoked by calling the +`value` method on the match object. - # match any digit and return its integer value when calling the - # #value method on the match object - [0-9] { - def value - to_i - end - } + [0-9] { to_i } # match any digit and return its integer value when + # calling the #value method on the match object +Note that when using the inline block method you may also specify arguments in +between vertical bars immediately following the opening curly brace, just like +in Ruby blocks. + ## Super When including a grammar inside another, all rules in the child that have the same name as a rule in the parent also have access to the `super` keyword to invoke the parent rule. + grammar Number + def number + [0-9]+ + end + end + + grammar FloatingPoint + include Number + + rule number + super ('.' super)? + end + end + +In the example above, the `FloatingPoint` grammar includes `Number`. Both have a +rule named `number`, so `FloatingPoint#number` has access to `Number#number` by +means of using `super`. + See [Super](api/classes/Citrus/Super.html) for more information. ## Precedence The following table contains a list of all Citrus symbols and operators and their precedence. A higher precedence indicates tighter binding. Operator | Name | Precedence --------- | ------------------------- | ---------- -'' | String (single quoted) | 6 -"" | String (double quoted) | 6 -`` | String (case insensitive) | 6 -[] | Character class | 6 -. | Dot (any character) | 6 -// | Regular expression | 6 -() | Grouping | 6 -* | Repetition (arbitrary) | 5 -+ | Repetition (one or more) | 5 -? | Repetition (zero or one) | 5 -& | And predicate | 4 -! | Not predicate | 4 -~ | But predicate | 4 -: | Label | 4 -<> | Extension (module name) | 3 -{} | Extension (literal) | 3 +'' | String (single quoted) | 7 +"" | String (double quoted) | 7 +`` | String (case insensitive) | 7 +[] | Character class | 7 +. | Dot (any character) | 7 +// | Regular expression | 7 +() | Grouping | 7 +* | Repetition (arbitrary) | 6 ++ | Repetition (one or more) | 6 +? | Repetition (zero or one) | 6 +& | And predicate | 5 +! | Not predicate | 5 +~ | But predicate | 5 +<> | Extension (module name) | 4 +{} | Extension (literal) | 4 +: | Label | 3 e1 e2 | Sequence | 2 e1 | e2 | Ordered choice | 1 # Example @@ -286,28 +308,28 @@ grammar Addition rule additive number plus (additive | number) end - + rule number [0-9]+ space end - + rule plus '+' space end - + rule space [ \t]* end end Several things to note about the above example: * Grammar and rule declarations end with the `end` keyword -* A Sequence of rules is created by separating expressions with a space +* A sequence of rules is created by separating expressions with a space * Likewise, ordered choice is represented with a vertical bar * Parentheses may be used to override the natural binding order * Rules may refer to other rules in their own definitions simply by using the other rule's name * Any expression may be followed by a quantifier @@ -324,62 +346,58 @@ Submatches are created whenever a rule contains another rule. For example, in the grammar above `number` matches a string of digits followed by white space. Thus, a match generated by this rule will contain two submatches. -We can define methods inside a set of curly braces that will be used to extend -matches when they are created. This works in similar fashion to using Ruby's +We can define a method inside a set of curly braces that will be used to extend +a particular rule's matches. This works in similar fashion to using Ruby's blocks. Let's extend the `Addition` grammar using this technique. grammar Addition rule additive (number plus term:(additive | number)) { - def value - number.value + term.value - end + number.value + term.value } end - + rule number ([0-9]+ space) { - def value - strip.to_i - end + to_i } end - + rule plus '+' space end - + rule space [ \t]* end end In this version of the grammar we have added two semantic blocks, one each for -the additive and number rules. These blocks contain methods that will be present -on all match objects that result from matches of those particular rules. It's +the `additive` and `number` rules. These blocks contain code that we can +execute by calling `value` on match objects that result from those rules. It's easiest to explain what is going on here by starting with the lowest level -block, which is defined within the number rule. +block, which is defined within `number`. -The semantic block associated with the number rule defines one method, `value`. -Inside this method, we can see that the value of a number match is determined to -be its text value, stripped of white space and converted to an integer. -[Remember](background.html) that matches are simply strings, so the `strip` -method in this case is actually -[String#strip](http://ruby-doc.org/core/classes/String.html#M000820). +Inside this block we see a call to another method, namely `to_i`. When called in +the context of a match object, methods that are not defined may be called on a +match's internal string object via `method_missing`. Thus, the call to `to_i` +should return the integer value of the match. -The `additive` rule also extends its matches with a `value` method. Notice the -use of the `term` label within the rule definition. This label allows the match -that is created by either the additive or the number rule to be retrieved using -the `term` label. The value of an additive is determined to be the values of its -`number` and `term` matches added together using Ruby's addition operator. +Similarly, matches created by `additive` will also have a `value` method. Notice +the use of the `term` label within the rule definition. This label allows the +match that is created by the choice between `additive` and `number` to be +retrieved using the `term` method. The value of an additive match is determined +to be the values of its `number` and `term` matches added together using Ruby's +addition operator. -Since additive is the first rule defined in the grammar, any match that results -from parsing a string with this grammar will have a `value` method that can be -used to recursively calculate the collective value of the entire match tree. +Since `additive` is the first rule defined in the grammar, any match that +results from parsing a string with this grammar will have a `value` method that +can be used to recursively calculate the collective value of the entire match +tree. To give it a try, save the code for the `Addition` grammar in a file called addition.citrus. Next, assuming you have the Citrus [gem](https://rubygems.org/gems/citrus) installed, try the following sequence of commands in a terminal. @@ -406,32 +424,76 @@ Take a look at [examples/calc.citrus](http://github.com/mjijackson/citrus/blob/master/examples/calc.citrus) for an example of a calculator that is able to parse and evaluate more complex mathematical expressions. -## Implicit Value +## Additional Methods -It is very common for a grammar to only have one interpretation for a given -symbol. For this reason, you may find yourself writing a `value` method for -every rule in your grammar. Because this can be tedious, Citrus allows you to -omit defining such a method if you choose. For example, the `additive` and -`number` rules from the simple calculator example above could also be written -as: +If you need more than just a `value` method on your match object, you can attach +additional methods as well. There are two ways to do this. The first lets you +define additional methods inline in your semantic block. This block will be used +to create a new Module using [Module#new](http://ruby-doc.org/core/classes/Module.html#M001682). Using the +`Addition` example above, we might refactor the `additive` rule to look like +this: rule additive (number plus term:(additive | number)) { - number.value + term.value + def lhs + number.value + end + + def rhs + term.value + end + + def value + lhs + rhs + end } end - rule number - ([0-9]+ space) { - strip.to_i - } +Now, in addition to having a `value` method, matches that result from the +`additive` rule will have a `lhs` and a `rhs` method as well. Although not +particularly useful in this example, this technique can be useful when unit +testing more complex rules. For example, using this method you might make the +following assertions in a unit test: + + match = Addition.parse('1 + 4') + assert_equal(1, match.lhs) + assert_equal(4, match.rhs) + assert_equal(5, match.value) + +If you would like to abstract away the code in a semantic block, simply create +a separate Ruby module (in another file) that contains the extension methods you +want and use the angle bracket notation to indicate that a rule should use that +module when extending matches. + +To demonstrate this method with the above example, in a Ruby file you would +define the following module. + + module Additive + def lhs + number.value + end + + def rhs + term.value + end + + def value + lhs + rhs + end end -Since no method name is explicitly specified in the semantic blocks, they may be -called using the `value` method. +Then, in your Citrus grammar file the rule definition would look like this: + + rule additive + (number plus term:(additive | number)) <Additive> + end + +This method of defining extensions can help keep your grammar files cleaner. +However, you do need to make sure that your extension modules are already loaded +before using `Citrus.load` to load your grammar file. # Testing