= panini
Panini is a flexible toolkit that enables you to generate sentences from a context-free grammar, also known as a CFG.
== CFG Background
Informally, a context-free grammar consists of a set of productions rules where a _nonterminal_ on the right
hand side of the produces a string of terminals and nonterminals on the right hand side. Like this:
S -> AB
A -> a
B -> b
I the above example, _S_, _A_, and _B_ are all nonterminals. _a_ and _b_ are terminals. Furthermore, the nonterminal _S_ is the _start_ _symbol_ for this CFG. By applying the productions as follows:
S (start symbol)
AB (apply S -> AB)
aB (apply A -> a)
ab (apply B -> b)
The sentence _ab_ is generated. In fact, this is the only sentence this grammar can produce! By adding one additional production to the grammar:
S -> ASB
The grammar may now potentially create an infinite number of sentences. They will all have the form of _a_i_b_i where _i_ > 1. Here is one more example derivation:
S (start symbol)
ASB (apply S -> ASB)
aSB (apply A -> a)
aSb (apply B -> b)
aaSbb (apply S -> ASB)
aaABbb (apply S -> AB)
aaaBbb (apply A -> a)
aaabbb (apply B -> b)
You learn more about CFGs, you can reference the CFG article[http://en.wikipedia.org/wiki/Context-free_grammar] on Wikipedia.
== Getting Started With Panini
=== Defining a Grammar
Defining a grammar is easy. Create a grammar object, add some nonterminals and then add the productions to those nonterminals.
Here's how the grammar from above is defined:
grammar = Panini::Grammar.new
nt_s = grammar.add_nonterminal
nt_a = grammar.add_nonterminal
nt_b = grammar.add_nonterminal
n_s.add_production([n_a, n_b]) # S -> AB
n_s.add_production([n_a, n_s, n_b]) # S -> ASB
n_a.add_production(['a']) # A -> 'a'
n_b.add_production(['b']) # A -> 'b'
=== Start Symbols
In order to derive sentences, the grammar needs a start symbol. Any nonterminal in the grammar can be used as the start symbol. If a start symbol is not explicitly set, then the first nonterminal added to the grammar is used.
grammar = Panini::Grammar.new
nt_0 = grammar.add_nonterminal
nt_1 = grammar.add_nonterminal
grammar.start = nt_1
=== Derivators
Derivators are objects that take a Panini::Grammar and then apply the rules to generate a sentence. Creating the sentences
from the grammar can be tricky, and certain derivation strategies may be better for some grammars. There are currently two main derivators.
==== Panini::DerivationStrategy::RandomDampened
This strategy creates random sentences given a grammar. It employs a dampening factor to keep the computation of the sentence from blowing up.
derivator = Panini::DerivationStrategy::RandomDampened.new(grammar)
This will return a different sentence each time it is called. It may (and probably will) return the same sentence multiple times.
==== Panini::DerivationStrategy::Exhaustive
This strategy is used to exhaustively create all of the sentences that may be created by a grammar.
derivator = Panini::DerivationStrategy::RandomDampened.new(grammar, length_limit)
This will return a new sentence with each call. If there are no additional sentences to be created it will return nil. The same sentence may be returned multiple times if the grammar can derive the sentence in multiple ways.
You can optionally pass a limit for the size of sentences to be generated.
=== Generating a Sentence
To generate a sentence, call the derivator's sentence method like thus:
derivator.sentence -> ['a', 'a', 'b', 'b']
You will get a new sentence (depending on the grammar) with every call:
derivator.sentence -> ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b']
== Example
In this example, we create a grammar that generates mathematical expressions.
# ================
# = Nonterminals =
# ================
expression = grammar.add_nonterminal("EXPR")
term = grammar.add_nonterminal("TERM")
factor = grammar.add_nonterminal("FACT")
identifier = grammar.add_nonterminal("ID")
number = grammar.add_nonterminal("NUM")
# =============
# = Terminals =
# =============
expression.add_production([term, '+', term])
expression.add_production([term, '-', term])
expression.add_production([term])
term.add_production([factor, '*', term])
term.add_production([factor, '/', term])
term.add_production([factor])
factor.add_production([identifier])
factor.add_production([number])
factor.add_production(['(', expression, ')'])
('a'..'z').each do |v|
identifier.add_production([v])
end
(0..100).each do |n|
number.add_production([n])
end
# ===============================================
# = Choose a strategy and create some sentences =
# ===============================================
deriver = Panini::DerivationStrategy::RandomDampened.new(grammar)
10.times do
puts "#{deriver.sentence.join(' ')}"
end
== Contributing to panini
Contributions to this gem are appreciated!
* Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet
* Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it
* Fork the project
* Start a feature/bugfix branch
* Commit and push until you are happy with your contribution
* Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
* Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
== To Do
=== Features
* Detect invalid grammars
* Weighted productions.
* Arbitrary start symbol.
* Support Enumerator!
* DSL or string based grammar definitions
* Purdom Derivator?
* Actions
=== Examples
* Natural language
* XML
* JSON
* Address
* Tree/Flower (PS?)
* Simulated user actions
== Copyright
Copyright (c) 2011 Matthew Bellantoni. See LICENSE.txt for further details.