README.md in yoga-0.2.0 vs README.md in yoga-0.2.1

- old
+ new

@@ -1,36 +1,258 @@ # Yoga +[![Build Status][build-status]][build-status-link] [![Coverage Status][coverage-status]][coverage-status-link] -Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/yoga`. To experiment with that code, run `bin/console` for an interactive prompt. +A helper for your Ruby parsers. This adds helper methods to make parsing +(and scanning!) easier and more structured. If you're looking for an LALR +parser generator, that isn't this. This is designed to help you construct +Recursive Descent parsers - which are solely LL(k). If you want an LALR parser +generator, see [_Antelope_](https://github.com/medcat/antelope) or +[Bison](https://www.gnu.org/software/bison/). -TODO: Delete this and the text above, and describe your gem +Yoga requires [Mixture](https://github.com/medcat/mixture) for parser node +attributes. However, the use of the parser nodes included with Yoga are +completely optional. ## Installation Add this line to your application's Gemfile: ```ruby -gem 'yoga' +gem "yoga" ``` And then execute: $ bundle -Or install it yourself as: +## Usage - $ gem install yoga +To begin your parser, you will first have to create a scanner. A scanner +takes the source text and generates "tokens." These tokens are abstract +representations of the source text of the document. For example, for the +text `class A do`, you could have the tokens `:class`, `:CNAME`, and `:do`. +The actual names of the tokens are completely up to you. These token names +are later used in the parser to set up expectations - for example, for the +definition of a class, you could expect a `:class`, `:CNAME`, and a `:do` +token. -## Usage +Essentially, the scanner breaks up the text into usable, bite-sized pieces +for the parser to chomp on. Here's what scanner may look like: -TODO: Write usage instructions here +```ruby +module MyLanguage + class Scanner + # All of the behavior from Yoga for scanners. This provides the + # `match/2` method, the `call/0` method, the `match_line/1` method, + # the `location/1` method, and the `emit/2` method. The major ones that + # are used are the `match/2`, the `call/0`, and the `match_line/1` + # methods. + include Yoga::Scanner -## Development + # This must be implemented. This is called for the next token. This + # should only return a Token, or true. + def scan + # Match with a string value escapes the string, then turns it into a + # regular expression. + match("[") || match("]") || + # Match with a symbol escapes the symbol, and turns it into a regular + # expression, suffixing it with `symbol_negative_assertion`. This is + # to prevent issues with identifiers and keywords. + match(:class) || match(:func) || + # With a regular expression, it's matched exactly. However, a token + # name is highly recommended. + match(/[a-z][a-zA-Z0-9_]*[!?=]?/, :IDENT) + end + end +end +``` -After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment. +And that's it! You now have a fully functioning scanner. In order to use it, +all you have to do is this: -To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org). +```ruby +source = "class alpha [func a []]" +MyLanguage::Scanner.new(source).call # => #<Enumerable ...> +``` +Note that `Scanner#call` returns an enumerable. `#call` is aliased as `#each`. +What this means is that tokens aren't generated until they're requested by the +parser - each token is generated from the source incrementally. If you want +to retrieve all of the tokens immediately, you have to first convert it into +a string, or perform some other operation on the enumerable (since it isn't +lazy): + +```ruby +MyLanguage::Scanner.new(source).call.to_a # => [...] +``` + +The scanner also automatically adds location information to all of the tokens. +This is handled automatically by `match/2` and `emit/2` - the only issue being +that all regular expressions **must not** include a newline. Newlines should +be matched with `match_line/1`; if lines must be emitted as a token, you can +pass the kind of token to emit to `match_line/1` using the `kind:` keyword. + +You may notice that all of the tokens have `<anon>` set as the location's file. +This is the default location, which is provided to the initializer: + +```ruby +MyLanguage::Scanner.new(source, "foo").call.first.location.to_s # => "foo:1.1-6" +``` + +Parsers are a little bit more complicated. Before we can pull up the parser, +let's define a grammar and some node classes. + +``` +; This is the grammar. +<root> = *<statement> +<statement> = <expression> ';' +<expression> = <expression> <op> <expression> +<expression> /= <int> ; here, <int> is defined by the scanner. +<op> = '+' / '-' / '*' / '/' / '^' / '%' / '=' +``` + +```ruby +module MyLanguage + class Parser + class Root < Yoga::Node + # An attribute on the node. This is required for Yoga nodes since the + # update syntax requires them. The type for the attribute is optional. + attribute :statements, type: [Yoga::Node] + end + + class Expression < Yoga::Node + end + + class Operation < Expression + attribute :operator, type: ::Symbol + attribute :left, type: Expression + attribute :right, type: Expression + end + + class Literal < Expression + attribute :value, type: ::Integer + end + end +end +``` + +With those out of the way, let's take a look at the parser itself. + +```ruby +module MyLanguage + class Parser + # This provides all of the parser helpers. This is the same as adding + # `Yoga::Parser::Helpers` as an include statement as well. + include Yoga::Parser + + # Like the `scan/0` method on the scanner, this must be implemented. This + # is the entry point for the parser. However, public usage should use the + # `call/0` method. This should return a node of some sort. + def parse_root + # This "collects" a series of nodes in sequence. It iterates until it + # reaches the `:EOF` token (in this case). The first parameter to + # collect is the "terminating token," and can be any value that + # `expect/1` or `peek?/1` accepts. The second, optional parameter to + # collect is the "joining token," and is required between each node. + # We're not using the semicolon as a joining token because that is + # required for _all_ statements. The joining token can be used for + # things like argument lists. The parameter can be any value that + # `expect/1` or `peek?/1` accepts. + children = collect(:EOF) { parse_statement } + + # "Unions" the location of all of the statements in the list. + location = children.map(&:location).inject(:union) + Parser::Root.new(statements: children, location: location) + end + + # Parses a statement. This is the same as the <statement> rule as above. + def parse_statement + expression = parse_expression + # This says that the next token should be a semicolon. If the next token + # isn't, it throws an error with a detailed error message, denoting + # what was expected (in this case, a semicolon), what was given, and + # where the error was located in the source file. + expect(:";") + + expression + end + + + # A switch statement, essentially. This is defined beforehand to make it + # _faster_ (not really; it's just useful). The first parameter to the + # switch function is the name of the switch. This is used later to + # actually perform the switch; it is also used to define a first set with + # the allowed tokens for the switch. The second parameter defines a key + # value pair. The keys are the tokens that are allowed; a symbol or an + # array of symbols can be used. The value is the block or the method that + # is executed upon encountering that token. + switch(:Operation, + "=": proc { |left| parse_operation(:"=", left) }, + "+": proc { |left| parse_operation(:"+", left) }, + "-": proc { |left| parse_operation(:"-", left) }, + "*": proc { |left| parse_operation(:"*", left) }, + "/": proc { |left| parse_operation(:"/", left) }, + "^": proc { |left| parse_operation(:"^", left) }, + "%": proc { |left| parse_operation(:"%", left) }) + + def parse_expression + # Parse a literal. All expressions must contain a literal of some sort; + # we're just going to use a numeric literal here. + left = parse_expression_literal + + # Whenever the `.switch` function is called, it creates a + # "first set" that can be used like this. The first set consists of + # a set of tokens that are allowed for the switch statement. In this + # case, it just makes sure that the next token is an operator. If it + # is, it parses it as an operation. + if peek?(first(:Operation)) + # Uses the switch defined below. If a token is found as a key, its + # block is executed; otherwise, it errors, giving a detailed error of + # what was expected. + switch(:Operation, left) + else + left + end + end + + def parse_operation(op, left) + token = expect(op) + right = parse_expression + + Parser::Operation.new(left: left, op: op, right: right, location: + left.location | op.location | right.location) + end + + def parse_expression_literal + token = expect(:NUMERIC) + Parser::Literal.new(value: token.value, location: token.location) + end + end +end +``` + +This parser can then be used as such: + +```ruby +source = "a = 2;\nb = a + 2;\n" +scanner = MyLanguage::Scanner.new(source).call +MyLanguage::Parser.new(scanner).call # => #<MyLanguage::Parser::Root ...> +``` + +That's about it! If you have any questions, you can email me at +<jeremy.rodi@medcat.me>, open an issue, or do what you like. + +For more documentation, see [the Documentation][documentation] - Yoga has a +requirement of 100% documentation. + ## Contributing -Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/yoga. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct. +Bug reports and pull requests are welcome on GitHub at +<https://github.com/medcat/yoga>. This project is intended to be a safe, +welcoming space for collaboration, and contributors are expected to adhere to +the [Contributor Covenant](http://contributor-covenant.org) code of conduct. +[build-status]: https://travis-ci.org/medcat/yoga.svg?branch=master +[documentation]: http://www.rubydoc.info/github/medcat/yoga/master +[coverage-status]: https://coveralls.io/repos/github/medcat/yoga/badge.svg?branch=master +[build-status-link]: https://travis-ci.org/medcat/yoga +[coverage-status-link]: https://coveralls.io/github/medcat/yoga?branch=master