/home/verequus/Arbeit/ANTLR/code/antlr/main/runtime/CSharp/Sources/Antlr3.Runtime/bin/Debug/net-2.0/Antlr3.Runtime
A character stream - an - that loads
and caches the contents of it's underlying file fully during
object construction
This looks very much like an ANTLReaderStream or an ANTLRInputStream
but, it is a special case. Since we know the exact size of the file to
load, we can avoid lots of data copying and buffer resizing.
Initializes a new instance of the ANTLRFileStream class
Initializes a new instance of the ANTLRFileStream class for the
specified file name
Initializes a new instance of the ANTLRFileStream class for the
specified file name and encoding
Fully qualified name of the stream's underlying file
Gets the file name of this ANTLRFileStream underlying file
Loads and buffers the specified file to be used as this
ANTLRFileStream's source
File to load
Encoding to apply to file
A pretty quick that uses a character array
directly as it's underlying source.
Initializes a new instance of the ANTLRStringStream class
Initializes a new instance of the ANTLRStringStream class for the
specified string. This copies data from the string to a local
character array
Initializes a new instance of the ANTLRStringStream class for the
specified character array. This is the preferred constructor as
no data is copied
The data for the stream
How many characters are actually in the buffer?
Index in our array for the next char (0..n-1)
Current line number within the input (1..n )
The index of the character relative to the beginning of the
line (0..n-1)
Tracks the depth of nested calls
A list of CharStreamState objects that tracks the stream state
(i.e. line, charPositionInLine, and p) that can change as you
move through the input stream. Indexed from 1..markDepth.
A null is kept @ index 0. Create upon first call to Mark().
Track the last Mark() call result value for use in Rewind().
What is name or source of this char stream?
Current line position in stream.
Current character position on the current line stream
(i.e. columnn position)
Returns the size of the stream
Resets the stream so that it is in the same state it was
when the object was created *except* the data array is not
touched.
Advances the read position of the stream. Updates line and column state
Return lookahead characters at the specified offset from the current read position.
The lookahead offset can be negative.
Return the current input symbol index 0..n where n indicates the
last symbol has been read. The index is the index of char to
be returned from LA(1).
Returns the size of the stream
Seeks to the specified position.
Consume ahead until p==index; can't just set p=index as we must
update line and charPositionInLine.
A stripped-down version of org.antlr.misc.BitSet that is just
good enough to handle runtime requirements such as FOLLOW sets
for automatic error recovery.
Construct a bitset of size one word (64 bits)
Construction from a static array of ulongs
Construction from a list of integers
Construct a bitset given the size
The size of the bitset in bits
We will often need to do a mod operator (i mod nbits).
Its turns out that, for powers of two, this mod operation is
same as . Since mod is slow, we use a precomputed
mod mask to do the mod instead.
The actual data bits
return "this | a" in a new set
Or this element into this set (grow as necessary to accommodate)
Grows the set to a larger number of bits.
element that must fit in set
return how much space is being used by the bits array not
how many actually have member bits on.
Sets the size of a set.
how many words the new set should be
A source of characters for an ANTLR lexer
The current line in the character stream (ANTLR tracks the
line information automatically. To support rewinding character
streams, we are able to [re-]set the line.
The index of the character relative to the beginning of the
line (0..n-1). To support rewinding character streams, we are
able to [re-]set the character position.
Get the ith character of lookahead. This is usually the same as
LA(i). This will be used for labels in the generated lexer code.
I'd prefer to return a char here type-wise, but it's probably
better to be 32-bit clean and be consistent with LA.
This primarily a useful interface for action code (just make sure
actions don't use this on streams that don't support it).
For infinite streams, you don't need this.
This is the complete state of a stream.
When walking ahead with cyclic DFA for syntactic predicates, we
need to record the state of the input stream (char index, line,
etc...) so that we can rewind the state after scanning ahead.
Index into the char stream of next lookahead char
What line number is the scanner at before processing buffer[p]?
What char position 0..n-1 in line is scanner before processing buffer[p]?
A Token object like we'd use in ANTLR 2.x; has an actual string created
and associated with this object. These objects are needed for imaginary
tree nodes that have payload objects. We need to create a Token object
that has a string; the tree node will point at this token. CommonToken
has indexes into a char stream and hence cannot be used to introduce
new strings.
What token number is this from 0..n-1 tokens
We need to be able to change the text once in a while. If
this is non-null, then getText should return this. Note that
start/stop are not affected by changing this.
What token number is this from 0..n-1 tokens; < 0 implies invalid index
The char position into the input buffer where this token starts
The char position into the input buffer where this token stops
A DFA implemented as a set of transition tables.
Any state that has a semantic predicate edge is special; those states are
generated with if-then-else structures in a SpecialStateTransition()
which is generated by cyclicDFA template.
There are at most 32767 states (16-bit signed short). Could get away with byte
sometimes but would have to generate different types and the simulation code too.
As a point of reference, the Tokens rule DFA for the lexer in the Java grammar
sample has approximately 326 states.
Which recognizer encloses this DFA? Needed to check backtracking
From the input stream, predict what alternative will succeed using this
DFA (representing the covering regular approximation to the underlying CFL).
Input stream
Return an alternative number 1..n. Throw an exception upon error.
A hook for debugging interface
The recognizer did not match anything for a (..)+ loop.
Used for remote debugger deserialization
A semantic predicate failed during validation. Validation of predicates
occurs when normally parsing the alternative just like matching a token.
Disambiguating predicate evaluation occurs when we hoist a predicate into
a prediction decision.
Used for remote debugger deserialization
A simple stream of integers. This is useful when all we care about is the char
or token type sequence (such as for interpretation).
Returns the size of the entire stream.
Only makes sense for streams that buffer everything up probably,
but might be useful to display the entire stream or for testing.
This value includes a single EOF.
Where are you getting symbols from? Normally, implementations will
pass the buck all the way to the lexer who can ask its input stream
for the file name or whatever.
Get int at current input pointer + i ahead (where i=1 is next int)
Negative indexes are allowed. LA(-1) is previous token (token just matched).
LA(-i) where i is before first token should yield -1, invalid char or EOF.
Tell the stream to start buffering if it hasn't already.
Executing Rewind(Mark()) on a stream should not affect the input position.
The Lexer tracks line/col info as well as input index so its markers are
not pure input indexes. Same for tree node streams. */
Return a marker that can be passed to
to return to the current position.
This could be the current input position, a value return from
, or some other marker.
Return the current input symbol index 0..n where n indicates the
last symbol has been read. The index is the symbol about to be
read not the most recently read symbol.
Resets the stream so that the next call to
would return marker.
The marker will usually be but
it doesn't have to be. It's just a marker to indicate what
state the stream was in. This is essentially calling
and .
If there are other markers created after the specified marker,
this routine must unroll them like a stack. Assumes the state the
stream was in when this marker was created.
Rewind to the input position of the last marker.
Used currently only after a cyclic DFA and just before starting
a sem/syn predicate to get the input position back to the start
of the decision. Do not "pop" the marker off the state. Mark(i)
and Rewind(i) should balance still. It is like invoking
Rewind(last marker) but it should not "pop" the marker off.
It's like Seek(last marker's input position).
You may want to commit to a backtrack but don't want to force the
stream to keep bookkeeping objects around for a marker that is
no longer necessary. This will have the same behavior as
except it releases resources without
the backward seek.
This must throw away resources for all markers back to the marker
argument. So if you're nested 5 levels of Mark(), and then Release(2)
you have to release resources for depths 2..5.
Set the input cursor to the position indicated by index. This is
normally used to seek ahead in the input stream.
No buffering is required to do this unless you know your stream
will use seek to move backwards such as when backtracking.
This is different from rewind in its multi-directional requirement
and in that its argument is strictly an input cursor (index).
For char streams, seeking forward must update the stream state such
as line number. For seeking backwards, you will be presumably
backtracking using the
/
mechanism that restores state and so this method does not need to
update state when seeking backwards.
Currently, this method is only used for efficient backtracking using
memoization, but in the future it may be used for incremental parsing.
The index is 0..n-1. A seek to position i means that LA(1) will return
the ith symbol. So, seeking to 0 means LA(1) will return the first
element in the stream.
Returns the size of the entire stream.
Only makes sense for streams that buffer everything up probably,
but might be useful to display the entire stream or for testing.
This value includes a single EOF.
Used for remote debugger deserialization
Used for remote debugger deserialization
Used for remote debugger deserialization
A mismatched char or Token or tree node.
Used for remote debugger deserialization
Used for remote debugger deserialization
A parser for TokenStreams. Parser grammars result in a subclass
of this.
Set the token stream and reset the parser
Rules that return more than a single value must return an object
containing all the values. Besides the properties defined in
RuleLabelScope.PredefinedRulePropertiesScope there may be user-defined
return values. This class simply defines the minimum properties that
are always defined and methods to access the others that might be
available depending on output option such as template and tree.
Note text is not an actual property of the return value, it is computed
from start and stop using the input stream's ToString() method. I
could add a ctor to this so that we can pass in and store the input
stream, but I'm not sure we want to do that. It would seem to be undefined
to get the .text property anyway if the rule matches tokens from multiple
input streams.
I do not use getters for fields of objects that are used simply to
group values such as this aggregate.
Return the start token or tree
Return the stop token or tree
The root of the ANTLR exception hierarchy.
To avoid English-only error messages and to generally make things
as flexible as possible, these exceptions are not created with strings,
but rather the information necessary to generate an error. Then
the various reporting methods in Parser and Lexer can be overridden
to generate a localized error message. For example, MismatchedToken
exceptions are built with the expected token type.
So, don't expect getMessage() to return anything.
You can access the stack trace, which means that you can compute the
complete trace of rules from the start symbol. This gives you considerable
context information with which to generate useful error messages.
ANTLR generates code that throws exceptions upon recognition error and
also generates code to catch these exceptions in each rule. If you
want to quit upon first error, you can turn off the automatic error
handling mechanism using rulecatch action, but you still need to
override methods mismatch and recoverFromMismatchSet.
In general, the recognition exceptions can track where in a grammar a
problem occurred and/or what was the expected input. While the parser
knows its state (such as current input symbol and line info) that
state can change before the exception is reported so current token index
is computed and stored at exception time. From this info, you can
perhaps print an entire line of input not just a single token, for example.
Better to just say the recognizer had a problem and then let the parser
figure out a fancy report.
Used for remote debugger deserialization
What input stream did the error occur in?
What is index of token/char were we looking at when the error occurred?
The current Token when an error occurred. Since not all streams
can retrieve the ith Token, we have to track the Token object.
[Tree parser] Node with the problem.
The current char when an error occurred. For lexers.
Track the line at which the error occurred in case this is
generated from a lexer. We need to track this since the
unexpected char doesn't carry the line info.
If you are parsing a tree node stream, you will encounter some
imaginary nodes w/o line/col info. We now search backwards looking
for most recent token with line/col info, but notify getErrorHeader()
that info is approximate.
Returns the input stream in which the error occurred
Returns the token/char index in the stream when the error occurred
Returns the current Token when the error occurred (for parsers
although a tree parser might also set the token)
Returns the [tree parser] node where the error occured (for tree parsers).
Returns the current char when the error occurred (for lexers)
Returns the character position in the line when the error
occurred (for lexers)
Returns the line at which the error occurred (for lexers)
Returns the token type or char of the unexpected input element
Rules can return start/stop info as well as possible trees and templates
Return the start token or tree
Return the stop token or tree
Has a value potentially if output=AST;
Has a value potentially if output=template;
Don't use StringTemplate type to avoid dependency on ST assembly
imaginary tree navigation type; traverse "get child" link
imaginary tree navigation type; finish with a child list
All tokens go to the parser (unless skip() is called in that rule)
on a particular "channel". The parser tunes to a particular channel
so that whitespace etc... can go to the parser on a "hidden" channel.
Anything on different channel than DEFAULT_CHANNEL is not parsed by parser.
In an action, a lexer rule can set token to this SKIP_TOKEN and ANTLR
will avoid creating a token for this symbol and try to fetch another.
A source of tokens must provide a sequence of tokens via NextToken()
and also must reveal it's source of characters; CommonToken's text is
computed from a CharStream; it only store indices into the char stream.
Errors from the lexer are never passed to the parser. Either you want
to keep going or you do not upon token recognition error. If you do not
want to continue lexing then you do not want to continue parsing. Just
throw an exception not under RecognitionException and Java will naturally
toss you all the way out of the recognizers. If you want to continue
lexing then you should not throw an exception to the parser--it has already
requested a token. Keep lexing until you get a valid one. Just report
errors and keep going, looking for a valid token.
Where are you getting tokens from? normally the implication will simply
ask lexers input stream.
Returns a Token object from the input stream (usually a CharStream).
Does not fail/return upon lexing error; just keeps chewing on the
characters until it gets a good one; errors are not passed through
to the parser.
We were expecting a token but it's not found. The current token
is actually what we wanted next. Used for tree node errors too.
Used for remote debugger deserialization
A node representing erroneous token range in token stream
An extra token while parsing a TokenStream.
Used for remote debugger deserialization
Returns a string representation of this IList.
The string representation is a list of the collection's elements in the order
they are returned by its IEnumerator, enclosed in square brackets ("[]").
The separator is a comma followed by a space i.e. ", ".
Collection whose string representation will be returned
A string representation of the specified collection or "null"
Returns a string representation of this IDictionary.
The string representation is a list of the collection's elements in the order
they are returned by its IEnumerator, enclosed in curly brackets ("{}").
The separator is a comma followed by a space i.e. ", ".
Dictionary whose string representation will be returned
A string representation of the specified dictionary or "null"
An Hashtable-backed dictionary that enumerates Keys and Values in
insertion order.
Stack abstraction that also supports the IList interface
Adds an element to the top of the stack list.
Removes the element at the top of the stack list and returns it.
The element at the top of the stack.
Removes the element at the top of the stack list without removing it.
The element at the top of the stack.
A generic tree implementation with no payload. You must subclass to
actually have any user data. ANTLR v3 uses a list of children approach
instead of the child-sibling approach in v2. A flat tree (a list) is
an empty node whose children represent the list. An empty, but
non-null node is called "nil".
Create a new node from an existing node does nothing for BaseTree
as there are no fields other than the children list, which cannot
be copied as the children are not considered part of this node.
Get the children internal list of children. Manipulating the list
directly is not a supported operation (i.e. you do so at your own risk)
BaseTree doesn't track child indexes.
BaseTree doesn't track parent pointers.
Add t as child of this node.
Warning: if t has no children, but child does and child isNil then
this routine moves children to t via t.children = child.children;
i.e., without copying the array.
Add all elements of kids list as children of this node
Delete children from start to stop and replace with t even if t is
a list (nil-root tree).
Number of children can increase or decrease.
For huge child lists, inserting children can force walking rest of
children to set their childindex; could be slow.
Override in a subclass to change the impl of children list
Set the parent and child index values for all child of t
Walk upwards looking for ancestor with this token type.
Walk upwards and get first ancestor with this token type.
Return a list of all ancestors of this node. The first node of
list is the root and the last is the parent of this node.
Print out a whole tree not just a node
Force base classes override and say how a node (not a tree)
should look as text
A TreeAdaptor that works with any Tree implementation
A map of tree node to unique IDs.
Next available unique ID.
Create tree node that holds the start and stop tokens associated
with an error.
If you specify your own kind of tree nodes, you will likely have to
override this method. CommonTree returns Token.INVALID_TOKEN_TYPE
if no token payload but you might have to set token type for diff
node type.
You don't have to subclass CommonErrorNode; you will likely need to
subclass your own tree node class to avoid class cast exception.
This is generic in the sense that it will work with any kind of
tree (not just the ITree interface). It invokes the adaptor routines
not the tree node routines to do the construction.
Add a child to the tree t. If child is a flat tree (a list), make all
in list children of t.
Warning: if t has no children, but child does and child isNil
then you can decide it is ok to move children to t via
t.children = child.children; i.e., without copying the array.
Just make sure that this is consistent with how the user will build
ASTs.
If oldRoot is a nil root, just copy or move the children to newRoot.
If not a nil root, make oldRoot a child of newRoot.
old=^(nil a b c), new=r yields ^(r a b c)
old=^(a b c), new=r yields ^(r ^(a b c))
If newRoot is a nil-rooted single child tree, use the single
child as the new root node.
old=^(nil a b c), new=^(nil r) yields ^(r a b c)
old=^(a b c), new=^(nil r) yields ^(r ^(a b c))
If oldRoot was null, it's ok, just return newRoot (even if isNil).
old=null, new=r yields r
old=null, new=^(nil r) yields ^(nil r)
Return newRoot. Throw an exception if newRoot is not a
simple node or nil root with a single child node--it must be a root
node. If newRoot is ^(nil x) return x as newRoot.
Be advised that it's ok for newRoot to point at oldRoot's
children; i.e., you don't have to copy the list. We are
constructing these nodes so we should have this control for
efficiency.
Transform ^(nil x) to x and nil to null
For identifying trees. How to identify nodes so we can say "add node
to a prior node"?
System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode() is
not available in .NET 1.0. It is "broken/buggy" in .NET 1.1
(for multi-appdomain scenarios).
We are tracking uniqueness of IDs ourselves manually since ANTLR
v3.1 release using hashtables. We will be tracking . Even though
it is expensive, we will create a hashtable with all tree nodes
in it as this is only for debugging.
Tell me how to create a token for use with imaginary token nodes.
For example, there is probably no input symbol associated with imaginary
token DECL, but you need to create it as a payload or whatever for
the DECL node as in ^(DECL type ID).
If you care what the token payload objects' type is, you should
override this method and any other createToken variant.
Tell me how to create a token for use with imaginary token nodes.
For example, there is probably no input symbol associated with imaginary
token DECL, but you need to create it as a payload or whatever for
the DECL node as in ^(DECL type ID).
This is a variant of createToken where the new token is derived from
an actual real input token. Typically this is for converting '{'
tokens to BLOCK etc... You'll see
r : lc='{' ID+ '}' -> ^(BLOCK[$lc] ID+) ;
If you care what the token payload objects' type is, you should
override this method and any other createToken variant.
Who is the parent node of this node; if null, implies node is root.
If your node type doesn't handle this, it's ok but the tree rewrites
in tree parsers need this functionality.
What index is this node in the child list? Range: 0..n-1
If your node type doesn't handle this, it's ok but the tree rewrites
in tree parsers need this functionality.
Replace from start to stop child index of parent with t, which might
be a list. Number of children may be different after this call.
If parent is null, don't do anything; must be at root of overall tree.
Can't replace whatever points to the parent externally. Do nothing.
A tree node that is wrapper for a Token object.
After 3.0 release while building tree rewrite stuff, it became clear
that computing parent and child index is very difficult and cumbersome.
Better to spend the space in every tree node. If you don't want these
extra fields, it's easy to cut them out in your own BaseTree subclass.
What token indexes bracket all tokens associated with this node
and below?
A single token is the payload
Who is the parent node of this node; if null, implies node is root
What index is this node in the child list? Range: 0..n-1
A TreeAdaptor that works with any Tree implementation. It provides
really just factory methods; all the work is done by BaseTreeAdaptor.
If you would like to have different tokens created than ClassicToken
objects, you need to override this and then set the parser tree adaptor to
use your subclass.
To get your parser to build nodes of a different type, override
Create(Token), ErrorNode(), and to be safe, YourTreeClass.DupNode().
DupNode() is called to duplicate nodes during rewrite operations.
Duplicate a node. This is part of the factory;
override if you want another kind of node to be built.
I could use reflection to prevent having to override this
but reflection is slow.
Create an imaginary token from a type and text
Tell me how to create a token for use with imaginary token nodes.
For example, there is probably no input symbol associated with imaginary
token DECL, but you need to create it as a payload or whatever for
the DECL node as in ^(DECL type ID).
If you care what the token payload objects' type is, you should
override this method and any other createToken variant.
Create an imaginary token, copying the contents of a previous token
Tell me how to create a token for use with imaginary token nodes.
For example, there is probably no input symbol associated with imaginary
token DECL, but you need to create it as a payload or whatever for
the DECL node as in ^(DECL type ID).
This is a variant of createToken where the new token is derived from
an actual real input token. Typically this is for converting '{'
tokens to BLOCK etc... You'll see
r : lc='{' ID+ '}' -> ^(BLOCK[$lc] ID+) ;
If you care what the token payload objects' type is, you should
override this method and any other createToken variant.
track start/stop token for subtree root created for a rule
Track start/stop token for subtree root created for a rule.
Only works with Tree nodes. For rules that match nothing,
seems like this will yield start=i and stop=i-1 in a nil node.
Might be useful info so I'll not force to be i..i.
What is the Token associated with this node?
If you are not using CommonTree, then you must override this in your
own adaptor.
A buffered stream of tree nodes. Nodes can be from a tree of ANY kind.
This node stream sucks all nodes out of the tree specified in the
constructor during construction and makes pointers into the tree
using an array of Object pointers. The stream necessarily includes
pointers to DOWN and UP and EOF nodes.
This stream knows how to mark/release for backtracking.
This stream is most suitable for tree interpreters that need to
jump around a lot or for tree parsers requiring speed (at cost of memory).
There is some duplicated functionality here with UnBufferedTreeNodeStream
but just in bookkeeping, not tree walking etc...
The complete mapping from stream index to tree node. This buffer
includes pointers to DOWN, UP, and EOF nodes.
It is built upon ctor invocation. The elements are type Object
as we don't what the trees look like. Load upon first need of
the buffer so we can set token types of interest for reverseIndexing.
Slows us down a wee bit to do all of the if p==-1 testing everywhere though.
Pull nodes from which tree?
IF this tree (root) was created from a token stream, track it
What tree adaptor was used to build these trees
Reuse same DOWN, UP navigation nodes unless this is true
The index into the nodes list of the current node (next node
to consume). If -1, nodes array not filled yet.
Track the last mark() call result value for use in rewind().
Stack of indexes used for push/pop calls
Where is this stream pulling nodes from? This is not the name, but
the object that provides node objects.
Expensive to compute so I won't bother doing the right thing.
This method only returns how much input has been seen so far. So
after parsing it returns true size.
Walk tree with depth-first-search and fill nodes buffer.
Don't do DOWN, UP nodes if its a list (t is isNil).
Returns the stream index for the spcified node in the range 0..n-1 or,
-1 if node not found.
As we flatten the tree, we use UP, DOWN nodes to represent
the tree structure. When debugging we need unique nodes
so instantiate new ones when uniqueNavigationNodes is true.
Look backwards k nodes
Make stream jump to a new location, saving old location.
Switch back with pop().
Seek back to previous index saved during last Push() call.
Return top of stack (return index).
Record the current state of the tree walk which includes
the current node and stack state.
Rewind the current state of the tree walk to the state it
was in when Mark() was called and it returned marker. Also,
wipe out the lookahead which will force reloading a few nodes
but it is better than making a copy of the lookahead buffer
upon Mark().
Consume() ahead until we hit index. Can't just jump ahead--must
spit out the navigation nodes.
Expensive to compute so I won't bother doing the right thing.
This method only returns how much input has been seen so far. So
after parsing it returns true size.
Used for testing, just return the token type stream
Debugging
What does a tree look like? ANTLR has a number of support classes
such as CommonTreeNodeStream that work on these kinds of trees. You
don't have to make your trees implement this interface, but if you do,
you'll be able to use more support code.
NOTE: When constructing trees, ANTLR can build any kind of tree; it can
even use Token objects as trees if you add a child list to your tokens.
This is a tree node without any payload; just navigation and factory stuff.
This node is what child index? 0..n-1
Indicates the node is a nil node but may still have children, meaning
the tree is a flat list.
Return a token type; needed for tree parsing
In case we don't have a token payload, what is the line for errors?
What is the smallest token index (indexing from 0) for this node
and its children?
What is the largest token index (indexing from 0) for this node
and its children?
Is there is a node above with token type ttype?
Walk upwards and get first ancestor with this token type.
A
A
Return a list of all ancestors of this node. The first node of
list is the root and the last is the parent of this node.
A
Set (or reset) the parent and child index values for all children
Add t as a child to this node. If t is null, do nothing. If t
is nil, add all children of t to this' children.
Tree to add
Set ith child (0..n-1) to t; t must be non-null and non-nil node
Delete children from start to stop and replace with t even if t is
a list (nil-root tree). num of children can increase or decrease.
For huge child lists, inserting children can force walking rest of
children to set their childindex; could be slow.
How to create and navigate trees. Rather than have a separate factory
and adaptor, I've merged them. Makes sense to encapsulate.
This takes the place of the tree construction code generated in the
generated code in 2.x and the ASTFactory.
I do not need to know the type of a tree at all so they are all
generic Objects. This may increase the amount of typecasting needed. :(
Create a tree node from Token object; for CommonTree type trees,
then the token just becomes the payload.
This is the most common create call. Override if you want another kind of node to be built.
Duplicate a single tree node
Override if you want another kind of node to be built.
Duplicate tree recursively, using DupNode() for each node
Return a nil node (an empty but non-null node) that can hold
a list of element as the children. If you want a flat tree (a list)
use "t=adaptor.nil(); t.AddChild(x); t.AddChild(y);"
Return a tree node representing an error. This node records the
tokens consumed during error recovery. The start token indicates the
input symbol at which the error was detected. The stop token indicates
the last symbol consumed during recovery.
You must specify the input stream so that the erroneous text can
be packaged up in the error node. The exception could be useful
to some applications; default implementation stores ptr to it in
the CommonErrorNode.
This only makes sense during token parsing, not tree parsing.
Tree parsing should happen only when parsing and tree construction
succeed.
Is tree considered a nil node used to make lists of child nodes?
Add a child to the tree t. If child is a flat tree (a list), make all
in list children of t.
Warning: if t has no children, but child does and child isNil then you
can decide it is ok to move children to t via t.children = child.children;
i.e., without copying the array. Just make sure that this is consistent
with have the user will build ASTs. Do nothing if t or child is null.
This is for construction and I'm not sure it's completely general for
a tree's addChild method to work this way. Make sure you differentiate
between your tree's addChild and this parser tree construction addChild
if it's not ok to move children to t with a simple assignment.
If oldRoot is a nil root, just copy or move the children to newRoot.
If not a nil root, make oldRoot a child of newRoot.
old=^(nil a b c), new=r yields ^(r a b c)
old=^(a b c), new=r yields ^(r ^(a b c))
If newRoot is a nil-rooted single child tree, use the single
child as the new root node.
old=^(nil a b c), new=^(nil r) yields ^(r a b c)
old=^(a b c), new=^(nil r) yields ^(r ^(a b c))
If oldRoot was null, it's ok, just return newRoot (even if isNil).
old=null, new=r yields r
old=null, new=^(nil r) yields ^(nil r)
Return newRoot. Throw an exception if newRoot is not a
simple node or nil root with a single child node--it must be a root
node. If newRoot is ^(nil x) return x as newRoot.
Be advised that it's ok for newRoot to point at oldRoot's
children; i.e., you don't have to copy the list. We are
constructing these nodes so we should have this control for
efficiency.
Given the root of the subtree created for this rule, post process
it to do any simplifications or whatever you want. A required
behavior is to convert ^(nil singleSubtree) to singleSubtree
as the setting of start/stop indexes relies on a single non-nil root
for non-flat trees.
Flat trees such as for lists like "idlist : ID+ ;" are left alone
unless there is only one ID. For a list, the start/stop indexes
are set in the nil node.
This method is executed after all rule tree construction and right
before SetTokenBoundaries().
For identifying trees. How to identify nodes so we can say "add node
to a prior node"?
Even BecomeRoot is an issue. Ok, we could:
- Number the nodes as they are created?
-
Use the original framework assigned hashcode that's unique
across instances of a given type.
WARNING: This is usually implemented either as IL to make a
non-virt call to object.GetHashCode() or by via a call to
System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode().
Both have issues especially on .NET 1.x and Mono.
Create a node for newRoot make it the root of oldRoot.
If oldRoot is a nil root, just copy or move the children to newRoot.
If not a nil root, make oldRoot a child of newRoot.
Return node created for newRoot.
Create a new node derived from a token, with a new token type.
This is invoked from an imaginary node ref on right side of a
rewrite rule as IMAG[$tokenLabel].
This should invoke createToken(Token).
Same as Create(tokenType,fromToken) except set the text too.
This is invoked from an imaginary node ref on right side of a
rewrite rule as IMAG[$tokenLabel, "IMAG"].
This should invoke createToken(Token).
Create a new node derived from a token, with a new token type.
This is invoked from an imaginary node ref on right side of a
rewrite rule as IMAG["IMAG"].
This should invoke createToken(int,String).
For tree parsing, I need to know the token type of a node
Node constructors can set the type of a node
Node constructors can set the text of a node
Return the token object from which this node was created.
Currently used only for printing an error message. The error
display routine in BaseRecognizer needs to display where the
input the error occurred. If your tree of limitation does not
store information that can lead you to the token, you can create
a token filled with the appropriate information and pass that back.
Where are the bounds in the input token stream for this node and
all children?
Each rule that creates AST nodes will call this
method right before returning. Flat trees (i.e., lists) will
still usually have a nil root node just to hold the children list.
That node would contain the start/stop indexes then.
Get the token start index for this subtree; return -1 if no such index
Get the token stop index for this subtree; return -1 if no such index
Get a child 0..n-1 node
Set ith child (0..n-1) to t; t must be non-null and non-nil node
Remove ith child and shift children down from right.
How many children? If 0, then this is a leaf node
Who is the parent node of this node; if null, implies node is root.
If your node type doesn't handle this, it's ok but the tree rewrites
in tree parsers need this functionality.
What index is this node in the child list? Range: 0..n-1
If your node type doesn't handle this, it's ok but the tree rewrites
in tree parsers need this functionality.
Replace from start to stop child index of parent with t, which might
be a list. Number of children may be different after this call.
If parent is null, don't do anything; must be at root of overall tree.
Can't replace whatever points to the parent externally. Do nothing.
A stream of tree nodes, accessing nodes from a tree of some kind
Where is this stream pulling nodes from? This is not the name, but
the object that provides node objects.
TODO: do we really need this?
Get the ITokenStream from which this stream's Tree was created
(may be null)
If the tree associated with this stream was created from a
TokenStream, you can specify it here. Used to do rule $text
attribute in tree parser. Optional unless you use tree parser
rule text attribute or output=template and rewrite=true options.
What adaptor can tell me how to interpret/navigate nodes and trees.
E.g., get text of a node.
As we flatten the tree, we use UP, DOWN nodes to represent
the tree structure. When debugging we need unique nodes
so we have to instantiate new ones. When doing normal tree
parsing, it's slow and a waste of memory to create unique
navigation nodes. Default should be false;
Get a tree node at an absolute index i; 0..n-1.
If you don't want to buffer up nodes, then this method makes no
sense for you.
Get tree node at current input pointer + i ahead where i=1 is next node.
i<0 indicates nodes in the past. So LT(-1) is previous node, but
implementations are not required to provide results for k < -1.
LT(0) is undefined. For i>=n, return null.
Return null for LT(0) and any index that results in an absolute address
that is negative.
This is analogus to the LT() method of the TokenStream, but this
returns a tree node instead of a token. Makes code gen identical
for both parser and tree grammars. :)
Return the text of all nodes from start to stop, inclusive.
If the stream does not buffer all the nodes then it can still
walk recursively from start until stop. You can always return
null or "" too, but users should not access $ruleLabel.text in
an action of course in that case.
Replace from start to stop child index of parent with t, which might
be a list. Number of children may be different after this call.
The stream is notified because it is walking the tree and might need
to know you are monkeying with the underlying tree. Also, it might be
able to modify the node stream to avoid restreaming for future phases.
If parent is null, don't do anything; must be at root of overall tree.
Can't replace whatever points to the parent externally. Do nothing.
A record of the rules used to Match a token sequence. The tokens
end up as the leaves of this tree and rule nodes are the interior nodes.
This really adds no functionality, it is just an alias for CommonTree
that is more meaningful (specific) and holds a String to display for a node.
Emit a token and all hidden nodes before. EOF node holds all
* hidden tokens after last real token.
Print out the leaves of this tree, which means printing original
* input back out.
A parser for a stream of tree nodes. "tree grammars" result in a subclass
of this. All the error reporting and recovery is shared with Parser via
the BaseRecognizer superclass.
Set the input stream
Reset the parser
Match '.' in tree parser.
Match '.' in tree parser has special meaning. Skip node or
entire tree if node has children. If children, scan until
corresponding UP node.
We have DOWN/UP nodes in the stream that have no line info; override.
plus we want to alter the exception type. Don't try to recover
from tree parser errors inline...
Prefix error message with the grammar name because message is
always intended for the programmer because the parser built
the input tree not the user.
Tree parsers parse nodes they usually have a token object as
payload. Set the exception token and do the default behavior.
This is identical to the ParserRuleReturnScope except that
the start property is a tree node and not a Token object
when you are parsing trees. To be generic the tree node types
have to be Object :(
First node or root node of tree matched for this rule.
Return the start token or tree
A proxy debug event listener that forwards events over a socket to
debugger (or any other listener) using a simple text-based protocol;
one event per line.
ANTLRWorks listens on server socket with a
RemoteDebugEventSocketListener instance. These two objects must therefore
be kept in sync. New events must be handled on both sides of socket.
Almost certainly the recognizer will have adaptor set, but
we don't know how to cast it (Parser or TreeParser) to get
the adaptor field. Must be set with a constructor. :(
Create a normal parser except wrap the token stream in a debug
proxy that fires consume events.
Who to notify when events in the parser occur.
Used to differentiate between fixed lookahead and cyclic DFA decisions
while profiling.
Provide a new debug event listener for this parser. Notify the
input stream too that it should send events to this listener.
Track the last Mark() call result value for use in Rewind().
consume all initial off-channel tokens
All debugging events that a recognizer can trigger.
I did not create a separate AST debugging interface as it would create
lots of extra classes and DebugParser has a dbg var defined, which makes
it hard to change to ASTDebugEventListener. I looked hard at this issue
and it is easier to understand as one monolithic event interface for all
possible events. Hopefully, adding ST debugging stuff won't be bad. Leave
for future. 4/26/2006.
The parser has just entered a rule. No decision has been made about
which alt is predicted. This is fired AFTER init actions have been
executed. Attributes are defined and available etc...
The grammarFileName allows composite grammars to jump around among
multiple grammar files.
Because rules can have lots of alternatives, it is very useful to
know which alt you are entering. This is 1..n for n alts.
This is the last thing executed before leaving a rule. It is
executed even if an exception is thrown. This is triggered after
error reporting and recovery have occurred (unless the exception is
not caught in this rule). This implies an "exitAlt" event.
The grammarFileName allows composite grammars to jump around among
multiple grammar files.
Track entry into any (...) subrule other EBNF construct
Every decision, fixed k or arbitrary, has an enter/exit event
so that a GUI can easily track what LT/Consume events are
associated with prediction. You will see a single enter/exit
subrule but multiple enter/exit decision events, one for each
loop iteration.
An input token was consumed; matched by any kind of element.
Trigger after the token was matched by things like Match(), MatchAny().
An off-channel input token was consumed.
Trigger after the token was matched by things like Match(), MatchAny().
(unless of course the hidden token is first stuff in the input stream).
Somebody (anybody) looked ahead. Note that this actually gets
triggered by both LA and LT calls. The debugger will want to know
which Token object was examined. Like ConsumeToken, this indicates
what token was seen at that depth. A remote debugger cannot look
ahead into a file it doesn't have so LT events must pass the token
even if the info is redundant.
The parser is going to look arbitrarily ahead; mark this location,
the token stream's marker is sent in case you need it.
After an arbitrairly long lookahead as with a cyclic DFA (or with
any backtrack), this informs the debugger that stream should be
rewound to the position associated with marker.
Rewind to the input position of the last marker.
Used currently only after a cyclic DFA and just
before starting a sem/syn predicate to get the
input position back to the start of the decision.
Do not "pop" the marker off the state. Mark(i)
and Rewind(i) should balance still.
To watch a parser move through the grammar, the parser needs to
inform the debugger what line/charPos it is passing in the grammar.
For now, this does not know how to switch from one grammar to the
other and back for island grammars etc...
This should also allow breakpoints because the debugger can stop
the parser whenever it hits this line/pos.
A recognition exception occurred such as NoViableAltException. I made
this a generic event so that I can alter the exception hierachy later
without having to alter all the debug objects.
Upon error, the stack of enter rule/subrule must be properly unwound.
If no viable alt occurs it is within an enter/exit decision, which
also must be rewound. Even the rewind for each mark must be unwount.
In the C# target this is pretty easy using try/finally, if a bit
ugly in the generated code. The rewind is generated in DFA.Predict()
actually so no code needs to be generated for that. For languages
w/o this "finally" feature (C++?), the target implementor will have
to build an event stack or something.
Across a socket for remote debugging, only the RecognitionException
data fields are transmitted. The token object or whatever that
caused the problem was the last object referenced by LT. The
immediately preceding LT event should hold the unexpected Token or
char.
Here is a sample event trace for grammar:
b : C ({;}A|B) // {;} is there to prevent A|B becoming a set
| D
;
The sequence for this rule (with no viable alt in the subrule) for
input 'c c' (there are 3 tokens) is:
Commence
LT(1)
EnterRule b
Location 7 1
enter decision 3
LT(1)
exit decision 3
enterAlt1
Location 7 5
LT(1)
ConsumeToken ,1:0]]]>
Location 7 7
EnterSubRule 2
enter decision 2
LT(1)
LT(1)
RecognitionException NoViableAltException 2 1 2
exit decision 2
ExitSubRule 2
BeginResync
LT(1)
ConsumeToken ,1:1]]]>
LT(1)
EndResync
LT(-1)
ExitRule b
Terminate
Indicates the recognizer is about to consume tokens to resynchronize
the parser. Any Consume events from here until the recovered event
are not part of the parse--they are dead tokens.
Indicates that the recognizer has finished consuming tokens in order
to resychronize. There may be multiple BeginResync/EndResync pairs
before the recognizer comes out of errorRecovery mode (in which
multiple errors are suppressed). This will be useful
in a gui where you want to probably grey out tokens that are consumed
but not matched to anything in grammar. Anything between
a BeginResync/EndResync pair was tossed out by the parser.
A semantic predicate was evaluate with this result and action text
Announce that parsing has begun. Not technically useful except for
sending events over a socket. A GUI for example will launch a thread
to connect and communicate with a remote parser. The thread will want
to notify the GUI when a connection is made. ANTLR parsers
trigger this upon entry to the first rule (the ruleLevel is used to
figure this out).
Parsing is over; successfully or not. Mostly useful for telling
remote debugging listeners that it's time to quit. When the rule
invocation level goes to zero at the end of a rule, we are done
parsing.
Input for a tree parser is an AST, but we know nothing for sure
about a node except its type and text (obtained from the adaptor).
This is the analog of the ConsumeToken method. Again, the ID is
the hashCode usually of the node so it only works if hashCode is
not implemented. If the type is UP or DOWN, then
the ID is not really meaningful as it's fixed--there is
just one UP node and one DOWN navigation node.
The tree parser lookedahead. If the type is UP or DOWN,
then the ID is not really meaningful as it's fixed--there is
just one UP node and one DOWN navigation node.
Announce the creation of a nil node
A nil was created (even nil nodes have a unique ID...
they are not "null" per se). As of 4/28/2006, this
seems to be uniquely triggered when starting a new subtree
such as when entering a subrule in automatic mode and when
building a tree in rewrite mode.
If you are receiving this event over a socket via
RemoteDebugEventSocketListener then only t.ID is set.
Upon syntax error, recognizers bracket the error with an error node
if they are building ASTs.
The object
Announce a new node built from token elements such as type etc...
If you are receiving this event over a socket via
RemoteDebugEventSocketListener then only t.ID, type,
text are set.
Announce a new node built from an existing token.
If you are receiving this event over a socket via
RemoteDebugEventSocketListener then only node.ID
and token.tokenIndex are set.
Make a node the new root of an existing root.
Note: the newRootID parameter is possibly different
than the TreeAdaptor.BecomeRoot() newRoot parameter.
In our case, it will always be the result of calling
TreeAdaptor.BecomeRoot() and not root_n or whatever.
The listener should assume that this event occurs
only when the current subrule (or rule) subtree is
being reset to newRootID.
If you are receiving this event over a socket via
RemoteDebugEventSocketListener then only IDs are set.
Make childID a child of rootID.
If you are receiving this event over a socket via
RemoteDebugEventSocketListener then only IDs are set.
Set the token start/stop token index for a subtree root or node
If you are receiving this event over a socket via
RemoteDebugEventSocketListener then only IDs are set.
A TreeAdaptor proxy that fires debugging events to a DebugEventListener
delegate and uses the TreeAdaptor delegate to do the actual work. All
AST events are triggered by this adaptor; no code gen changes are needed
in generated rules. Debugging events are triggered *after* invoking
tree adaptor routines.
Trees created with actions in rewrite actions like "-> ^(ADD {foo} {bar})"
cannot be tracked as they might not use the adaptor to create foo, bar.
The debug listener has to deal with tree node IDs for which it did
not see a CreateNode event. A single <unknown> node is sufficient even
if it represents a whole tree.
^(A B C): emit create A, create B, add child, ...
Global constants
A strongly-typed resource class, for looking up localized strings, etc.
Returns the cached ResourceManager instance used by this class.
Overrides the current thread's CurrentUICulture property for all
resource lookups using this strongly typed resource class.
Debug any tree node stream. The constructor accepts the stream
and a debug listener. As node stream calls come in, debug events
are triggered.
Track the last mark() call result value for use in rewind().
It is normally this object that instructs the node stream to
create unique nav nodes, but to satisfy interface, we have to
define it. It might be better to ignore the parameter but
there might be a use for it later, so I'll leave.
A blank listener that does nothing; useful for real classes so
they don't have to have lots of blank methods and are less
sensitive to updates to debug interface.
Version of ANTLR (dictates events)
Track the last token index we saw during a consume. If same, then
set a flag that we have a problem.
Create a thread to listen to the remote running recognizer
Print out (most of) the events... Useful for debugging, testing...
Broadcast debug events to multiple listeners.
Lets you debug and still use the event mechanism to build
parse trees etc...
Not thread-safe. Don't add events in one thread while parser
fires events in another.
Add another listener to broadcast events too.
Not thread-safe. Don't add events in one thread while parser
fires events in another.
A simple event repeater (proxy) that delegates all functionality to
the listener sent into the ctor.
Useful if you want to listen in on a few debug events w/o
interrupting the debugger. Just subclass the repeater and override
the methods you want to listen in on. Remember to call the method
in this class so the event will continue on to the original recipient.
Create a normal parser except wrap the token stream in a debug
proxy that fires consume events.
Who to notify when events in the parser occur.
Used to differentiate between fixed lookahead and cyclic DFA decisions
while profiling.
Provide a new debug event listener for this parser. Notify the
input stream too that it should send events to this listener.
This parser listener tracks rule entry/exit and token matches
to build a simple parse tree using ParseTree nodes.
What kind of node to create. You might want to override
so I factored out creation here.
Backtracking or cyclic DFA, don't want to add nodes to tree
Using the debug event interface, track what is happening in the parser
and record statistics about the runtime.
Because I may change the stats, I need to track that for later
computations to be consistent.
Track memoization
This is not part of standard debug interface but is triggered by
profiling. Code gen inserts an override for this method in the
recognizer, which triggers this method.
The parser is in a decision if the decision depth > 0. This works
for backtracking also, which can have nested decisions.
Track refs to lookahead if in a fixed/nonfixed decision.
Track backtracking decisions. You'll see a fixed or cyclic decision
and then a backtrack.
enter rule
...
enter decision
LA and possibly consumes (for cyclic DFAs)
begin backtrack level
mark m
rewind m
end backtrack level, success
exit decision
...
exit rule
Successful or not, track how much lookahead synpreds use
Get num hidden tokens between i..j inclusive
The default tracer mimics the traceParser behavior of ANTLR 2.x.
This listens for debugging events from the parser and implies
that you cannot debug and trace at the same time.
Stats routines needed by profiler etc...
Note that these routines return 0.0 if no values exist in X[]
which is not "correct" but, it is useful so I don't generate NaN
in my output
Compute the sample (unbiased estimator) standard deviation
The computation follows:
Computing Deviations: Standard Accuracy
Tony F. Chan and John Gregg Lewis
Stanford University
Communications of ACM September 1979 of Volume 22 the ACM Number 9
The "two-pass" method from the paper; supposed to have better
numerical properties than the textbook summation/sqrt. To me
this looks like the textbook method, but I ain't no numerical
methods guy.
Compute the sample mean
A minimal ANTLR3 error [message] manager with the ST bits
Return first non ErrorManager code location for generating messages
Current exception
Build and navigate trees with this object. Must know about the names
of tokens so you have to pass in a map or array of token names (from which
this class can build the map). I.e., Token DECL means nothing unless the
class can translate it to a token type.
In order to create nodes and navigate, this class needs a TreeAdaptor.
This class can build a token type -> node index for repeated use or for
iterating over the various nodes with a particular type.
This class works in conjunction with the TreeAdaptor rather than moving
all this functionality into the adaptor. An adaptor helps build and
navigate trees using methods. This class helps you do it with string
patterns like "(A B C)". You can create a tree from that pattern or
match subtrees against it.
When using %label:TOKENNAME in a tree for parse(), we must track the label.
This adaptor creates TreePattern objects for use during scan()
Compute a Map<String, Integer> that is an inverted index of
tokenNames (which maps int token types to names).
Using the map of token names to token types, return the type.
Walk the entire tree and make a node name to nodes mapping.
For now, use recursion but later nonrecursive version may be
more efficient. Returns Map<Integer, List> where the List is
of your AST node type. The Integer is the token type of the node.
TODO: save this index so that find and visit are faster
Do the work for index
Return a List of tree nodes with token type ttype
Return a List of subtrees matching pattern
Visit every ttype node in t, invoking the visitor.
This is a quicker
version of the general visit(t, pattern) method. The labels arg
of the visitor action method is never set (it's null) since using
a token type rather than a pattern doesn't let us set a label.
Do the recursive work for visit
For all subtrees that match the pattern, execute the visit action.
The implementation uses the root node of the pattern in combination
with visit(t, ttype, visitor) so nil-rooted patterns are not allowed.
Patterns with wildcard roots are also not allowed.
Given a pattern like (ASSIGN %lhs:ID %rhs:.) with optional labels
on the various nodes and '.' (dot) as the node/subtree wildcard,
return true if the pattern matches and fill the labels Map with
the labels pointing at the appropriate nodes. Return false if
the pattern is malformed or the tree does not match.
If a node specifies a text arg in pattern, then that must match
for that node in t.
TODO: what's a better way to indicate bad pattern? Exceptions are a hassle
Do the work for Parse(). Check to see if the t2 pattern fits the
structure and token types in t1. Check text if the pattern has
text arguments on nodes. Fill labels map with pointers to nodes
in tree matched against nodes in pattern with labels.
Create a tree or node from the indicated tree pattern that closely
follows ANTLR tree grammar tree element syntax:
(root child1 ... child2).
You can also just pass in a node: ID
Any node can have a text argument: ID[foo]
(notice there are no quotes around foo--it's clear it's a string).
nil is a special name meaning "give me a nil node". Useful for
making lists: (nil A B C) is a list of A B C.
Compare t1 and t2; return true if token types/text, structure match exactly.
The trees are examined in their entirety so that (A B) does not match
(A B C) nor (A (B C)).
TODO: allow them to pass in a comparator
TODO: have a version that is nonstatic so it can use instance adaptor
I cannot rely on the tree node's equals() implementation as I make
no constraints at all on the node types nor interface etc...
Compare type, structure, and text of two trees, assuming adaptor in
this instance of a TreeWizard.
The tree pattern to lex like "(A B C)"
Index into input string
Current char
How long is the pattern in char?
Set when token type is ID or ARG (name mimics Java's StreamTokenizer)
Queues up nodes matched on left side of -> in a tree parser. This is
the analog of RewriteRuleTokenStream for normal parsers.
Create a stream with one element
Create a stream, but feed off an existing list
Create a stream, but feed off an existing list
Base class for all exceptions thrown during AST rewrite construction.
This signifies a case where the cardinality of two or more elements
in a subrule are different: (ID INT)+ where |ID|!=|INT|
Returns the line at which the error occurred (for lexers)
No elements within a (...)+ in a rewrite rule
Ref to ID or expr but no tokens in ID stream or subtrees in expr stream
A generic list of elements tracked in an alternative to be used in
a -> rewrite rule. We need to subclass to fill in the next() method,
which returns either an AST node wrapped around a token payload or
an existing subtree.
Once you start next()ing, do not try to add more elements. It will
break the cursor tracking I believe.
TODO: add mechanism to detect/puke on modification after reading from stream
Create a stream with one element
Create a stream, but feed off an existing list
Create a stream, but feed off an existing list
Cursor 0..n-1. If singleElement!=null, cursor is 0 until you next(),
which bumps it to 1 meaning no more elements.
Track single elements w/o creating a list. Upon 2nd add, alloc list
The list of tokens or subtrees we are tracking
Tracks whether a node or subtree has been used in a stream
Once a node or subtree has been used in a stream, it must be dup'd
from then on. Streams are reset after subrules so that the streams
can be reused in future subrules. So, reset must set a dirty bit.
If dirty, then next() always returns a dup.
The element or stream description; usually has name of the token or
rule reference that this list tracks. Can include rulename too, but
the exception would track that info.
Reset the condition of this stream so that it appears we have
not consumed any of its elements. Elements themselves are untouched.
Once we reset the stream, any future use will need duplicates. Set
the dirty bit.
Return the next element in the stream.
Do the work of getting the next element, making sure that
it's a tree node or subtree.
Deal with the optimization of single-element list versus
list of size > 1. Throw an exception if the stream is
empty or we're out of elements and size>1.
Ensure stream emits trees; tokens must be converted to AST nodes.
AST nodes can be passed through unmolested.
Create a stream with one element
Create a stream, but feed off an existing list
Create a stream, but feed off an existing list
This delegate is used to allow the outfactoring of some common code.
The to be processed object
Treat next element as a single node even if it's a subtree.
This is used instead of next() when the result has to be a
tree root node. Also prevents us from duplicating recently-added
children; e.g., ^(type ID)+ adds ID to type and then 2nd iteration
must dup the type node, but ID has been added.
Referencing a rule result twice is ok; dup entire tree as
we can't be adding trees as root; e.g., expr expr.
This method has the common code of two other methods, which differed in only one
function call.
The delegate, which has the chosen function
The required object
Tests, if the to be returned object requires duplication
true
, if positive, false
, if negative.
Return the next element in the stream.
If out of elements, throw an exception unless Count==1.
If Count is 1, then return elements[0].
Return a duplicate node/subtree if stream is out of
elements and Count==1.
If we've already used the element, dup (dirty bit set).
When constructing trees, sometimes we need to dup a token or AST
subtree. Dup'ing a token means just creating another AST node
around it. For trees, you must call the adaptor.dupTree()
unless the element is for a tree root; then it must be a node dup
Create a stream with one element
Create a stream, but feed off an existing list
Create a stream, but feed off an existing list
Get next token from stream and make a node for it.
ITreeAdaptor.Create() returns an object, so no further restrictions possible.
Don't convert to a tree unless they explicitly call NextTree().
This way we can do hetero tree nodes in rewrite.
A stream of tree nodes, accessing nodes from a tree of ANY kind.
No new nodes should be created in tree during the walk. A small buffer
of tokens is kept to efficiently and easily handle LT(i) calls, though
the lookahead mechanism is fairly complicated.
For tree rewriting during tree parsing, this must also be able
to replace a set of children without "losing its place".
That part is not yet implemented. Will permit a rule to return
a different tree and have it stitched into the output tree probably.
When walking ahead with cyclic DFA or for syntactic predicates,
we need to record the state of the tree node stream. This
class wraps up the current state of the UnBufferedTreeNodeStream.
Calling Mark() will push another of these on the markers stack.
Record state of the nodeStack
Record state of the indexStack
Reuse same DOWN, UP navigation nodes unless this is true
Pull nodes from which tree?
IF this tree (root) was created from a token stream, track it.
What tree adaptor was used to build these trees
As we walk down the nodes, we must track parent nodes so we know
where to go after walking the last child of a node. When visiting
a child, push current node and current index.
Track which child index you are visiting for each node we push.
TODO: pretty inefficient...use int[] when you have time
Which node are we currently visiting?
Which node did we visit last? Used for LT(-1) calls.
Which child are we currently visiting? If -1 we have not visited
this node yet; next Consume() request will set currentIndex to 0.
What node index did we just consume? i=0..n-1 for n node trees.
IntStream.next is hence 1 + this value. Size will be same.
Buffer tree node stream for use with LT(i). This list grows
to fit new lookahead depths, but Consume() wraps like a circular
buffer.
lookahead[head] is the first symbol of lookahead, LT(1).
Add new lookahead at lookahead[tail]. tail wraps around at the
end of the lookahead buffer so tail could be less than head.
Calls to Mark() may be nested so we have to track a stack of them.
The marker is an index into this stack. This is a List<TreeWalkState>.
Indexed from 1..markDepth. A null is kept at index 0. It is created
upon first call to Mark().
tracks how deep Mark() calls are nested
Track the last Mark() call result value for use in Rewind().
Where is this stream pulling nodes from? This is not the name, but
the object that provides node objects.
Expensive to compute; recursively walk tree to find size;
include navigation nodes and EOF. Reuse functionality
in CommonTreeNodeStream as we only really use this
for testing.
Navigates to the next node found during a depth-first walk of root.
Also, adds these nodes and DOWN/UP imaginary nodes into the lokoahead
buffer as a side-effect. Normally side-effects are bad, but because
we can Emit many tokens for every MoveNext() call, it's pretty hard to
use a single return value for that. We must add these tokens to
the lookahead buffer.
This routine does *not* cause the 'Current' property to ever return the
DOWN/UP nodes; those are only returned by the LT() method.
Ugh. This mechanism is much more complicated than a recursive
solution, but it's the only way to provide nodes on-demand instead
of walking once completely through and buffering up the nodes. :(
Get tree node at current input pointer + i ahead where i=1 is next node.
i < 0 indicates nodes in the past. So -1 is previous node and -2 is
two nodes ago. LT(0) is undefined. For i>=n, return null.
Return null for LT(0) and any index that results in an absolute address
that is negative.
This is analogus to the LT() method of the TokenStream, but this
returns a tree node instead of a token. Makes code gen identical
for both parser and tree grammars. :)
Make sure we have at least k symbols in lookahead buffer
Add a node to the lookahead buffer. Add at lookahead[tail].
If you tail+1 == head, then we must create a bigger buffer
and copy all the nodes over plus reset head, tail. After
this method, LT(1) will be lookahead[0].
Record the current state of the tree walk which includes
the current node and stack state as well as the lookahead
buffer.
Rewind the current state of the tree walk to the state it
was in when Mark() was called and it returned marker. Also,
wipe out the lookahead which will force reloading a few nodes
but it is better than making a copy of the lookahead buffer
upon Mark().
Consume() ahead until we hit index. Can't just jump ahead--must
spit out the navigation nodes.
Expensive to compute; recursively walk tree to find size;
include navigation nodes and EOF. Reuse functionality
in CommonTreeNodeStream as we only really use this
for testing.
As we flatten the tree, we use UP, DOWN nodes to represent
the tree structure. When debugging we need unique nodes
so instantiate new ones when uniqueNavigationNodes is true.
Walk upwards looking for a node with more children to walk.
Print out the entire tree including DOWN/UP nodes. Uses
a recursive walk. Mostly useful for testing as it yields
the token types not text.
TODO: not sure this is what we want for trees.
A character stream - an - that loads
and caches the contents of it's underlying
fully during object construction
Useful for reading from stdin and, for specifying file encodings etc...
Initializes a new instance of the ANTLRInputStream class
Initializes a new instance of the ANTLRInputStream class for the
specified stream
Initializes a new instance of the ANTLRInputStream class for the
specified stream and encoding
Initializes a new instance of the ANTLRInputStream class for the
specified stream and initial data buffer size
Initializes a new instance of the ANTLRInputStream class for the
specified stream, encoding and initial data buffer size
Initializes a new instance of the ANTLRInputStream class for the
specified stream, encoding, initial data buffer size and, using
a read buffer of the specified size
An ANTLRStringStream that caches all the input from a TextReader. It
behaves just like a plain ANTLRStringStream
Manages the buffer manually to avoid unnecessary data copying.
If you need encoding, use ANTLRInputStream.
Initializes a new instance of the ANTLRReaderStream class
Initializes a new instance of the ANTLRReaderStream class for the
specified TextReader
Initializes a new instance of the ANTLRReaderStream class for the
specified TextReader and initial data buffer size
Initializes a new instance of the ANTLRReaderStream class for the
specified TextReader, initial data buffer size and, using
a read buffer of the specified size
Default size (in characters) of the buffer used for IO reads
Initial size (in characters) of the data cache
Loads and buffers the contents of the specified reader to be
used as this ANTLRReaderStream's source
A generic recognizer that can handle recognizers generated from
lexer, parser, and tree grammars. This is all the parsing
support code essentially; most of it is error recovery stuff and
backtracking.
An externalized representation of the - shareable - internal state of
this lexer, parser or tree parser.
The state of a lexer, parser, or tree parser are collected into
external state objects so that the state can be shared. This sharing
is needed to have one grammar import others and share same error
variables and other state variables. It's a kind of explicit multiple
inheritance via delegation of methods and shared state.
Get number of recognition errors (lexer, parser, tree parser). Each
recognizer tracks its own number. So parser and lexer each have
separate count. Does not count the spurious errors found between
an error and next valid token match
See also ReportError()
For debugging and other purposes, might want the grammar name.
Have ANTLR generate an implementation for this property.
For debugging and other purposes, might want the source name.
Have ANTLR provide a hook for this property.
The source name
Used to print out token names like ID during debugging and
error reporting. The generated parsers implement a method
that overrides this to point to their string[] tokenNames.
Return whether or not a backtracking attempt failed.
Reset the parser's state. Subclasses must rewind the input stream.
Match current input symbol against ttype. Attempt
single token insertion or deletion error recovery. If
that fails, throw MismatchedTokenException.
To turn off single token insertion or deletion error
recovery, override RecoverFromMismatchedToken() and have it call
pthrow an exception. See TreeParser.RecoverFromMismatchedToken().
This way any error in a rule will cause an exception and
immediate exit from rule. Rule would recover by resynchronizing
to the set of symbols that can follow rule ref.
Match the wildcard: in a symbol
Report a recognition problem.
This method sets errorRecovery to indicate the parser is recovering
not parsing. Once in recovery mode, no errors are generated.
To get out of recovery mode, the parser must successfully Match
a token (after a resync). So it will go:
1. error occurs
2. enter recovery mode, report error
3. consume until token found in resynch set
4. try to resume parsing
5. next Match() will reset errorRecovery mode
If you override, make sure to update syntaxErrors if you care about that.
What error message should be generated for the various exception types?
Not very object-oriented code, but I like having all error message generation
within one method rather than spread among all of the exception classes. This
also makes it much easier for the exception handling because the exception
classes do not have to have pointers back to this object to access utility
routines and so on. Also, changing the message for an exception type would be
difficult because you would have to subclassing exception, but then somehow get
ANTLR to make those kinds of exception objects instead of the default.
This looks weird, but trust me--it makes the most sense in terms of flexibility.
For grammar debugging, you will want to override this to add more information
such as the stack frame with GetRuleInvocationStack(e, this.GetType().Fullname)
and, for no viable alts, the decision description and state etc...
Override this to change the message generated for one or more exception types.
What is the error header, normally line/character position information?
How should a token be displayed in an error message? The default
is to display just the text, but during development you might
want to have a lot of information spit out. Override in that case
to use t.ToString() (which, for CommonToken, dumps everything about
the token). This is better than forcing you to override a method in
your token objects because you don't have to go modify your lexer
so that it creates a new type.
Override this method to change where error messages go
Recover from an error found on the input stream. This is
for NoViableAlt and mismatched symbol exceptions. If you enable
single token insertion and deletion, this will usually not
handle mismatched symbol exceptions but there could be a mismatched
token that the Match() routine could not recover from.
A hook to listen in on the token consumption during error recovery.
The DebugParser subclasses this to fire events to the listenter.
Attempt to Recover from a single missing or extra token.
EXTRA TOKEN
LA(1) is not what we are looking for. If LA(2) has the right token,
however, then assume LA(1) is some extra spurious token. Delete it
and LA(2) as if we were doing a normal Match(), which advances the
input.
MISSING TOKEN
If current token is consistent with what could come after
ttype then it is ok to "insert" the missing token, else throw
exception For example, Input "i=(3;" is clearly missing the
')'. When the parser returns from the nested call to expr, it
will have call chain:
stat -> expr -> atom
and it will be trying to Match the ')' at this point in the
derivation:
=> ID '=' '(' INT ')' ('+' atom)* ';'
^
Match() will see that ';' doesn't Match ')' and report a
mismatched token error. To Recover, it sees that LA(1)==';'
is in the set of tokens that can follow the ')' token
reference in rule atom. It can assume that you forgot the ')'.
Not currently used
Consume tokens until one matches the given token set
Returns List <String> of the rules in your parser instance
leading up to a call to this method. You could override if
you want more details such as the file/line info of where
in the parser source code a rule is invoked.
This is very useful for error messages and for context-sensitive
error recovery.
A more general version of GetRuleInvocationStack where you can
pass in, for example, a RecognitionException to get it's rule
stack trace. This routine is shared with all recognizers, hence,
static.
TODO: move to a utility class or something; weird having lexer call this
A convenience method for use most often with template rewrites.
Convert a List<Token> to List<String>
Given a rule number and a start token index number, return
MEMO_RULE_UNKNOWN if the rule has not parsed input starting from
start index. If this rule has parsed input starting from the
start index before, then return where the rule stopped parsing.
It returns the index of the last token matched by the rule.
For now we use a hashtable and just the slow Object-based one.
Later, we can make a special one for ints and also one that
tosses out data after we commit past input position i.
Has this rule already parsed input at the current index in the
input stream? Return the stop token index or MEMO_RULE_UNKNOWN.
If we attempted but failed to parse properly before, return
MEMO_RULE_FAILED.
This method has a side-effect: if we have seen this input for
this rule and successfully parsed before, then seek ahead to
1 past the stop token matched for this rule last time.
Record whether or not this rule parsed the input at this position
successfully. Use a standard hashtable for now.
Return how many rule/input-index pairs there are in total.
TODO: this includes synpreds. :(
Factor out what to do upon token mismatch so tree parsers can behave
differently. Override and call RecoverFromMismatchedToken()
to get single token insertion and deletion. Use this to turn off
single token insertion and deletion. Override mismatchRecover
to call this instead.
TODO: fix this comment, mismatchRecover doesn't exist, for example
Compute the context-sensitive FOLLOW set for current rule.
This is set of token types that can follow a specific rule
reference given a specific call chain. You get the set of
viable tokens that can possibly come next (lookahead depth 1)
given the current call chain. Contrast this with the
definition of plain FOLLOW for rule r:
FOLLOW(r)={x | S=>*alpha r beta in G and x in FIRST(beta)}
where x in T* and alpha, beta in V*; T is set of terminals and
V is the set of terminals and nonterminals. In other words,
FOLLOW(r) is the set of all tokens that can possibly follow
references to r in *any* sentential form (context). At
runtime, however, we know precisely which context applies as
we have the call chain. We may compute the exact (rather
than covering superset) set of following tokens.
For example, consider grammar:
stat : ID '=' expr ';' // FOLLOW(stat)=={EOF}
| "return" expr '.'
;
expr : atom ('+' atom)* ; // FOLLOW(expr)=={';','.',')'}
atom : INT // FOLLOW(atom)=={'+',')',';','.'}
| '(' expr ')'
;
The FOLLOW sets are all inclusive whereas context-sensitive
FOLLOW sets are precisely what could follow a rule reference.
For input input "i=(3);", here is the derivation:
stat => ID '=' expr ';'
=> ID '=' atom ('+' atom)* ';'
=> ID '=' '(' expr ')' ('+' atom)* ';'
=> ID '=' '(' atom ')' ('+' atom)* ';'
=> ID '=' '(' INT ')' ('+' atom)* ';'
=> ID '=' '(' INT ')' ';'
At the "3" token, you'd have a call chain of
stat -> expr -> atom -> expr -> atom
What can follow that specific nested ref to atom? Exactly ')'
as you can see by looking at the derivation of this specific
input. Contrast this with the FOLLOW(atom)={'+',')',';','.'}.
You want the exact viable token set when recovering from a
token mismatch. Upon token mismatch, if LA(1) is member of
the viable next token set, then you know there is most likely
a missing token in the input stream. "Insert" one by just not
throwing an exception.
Match needs to return the current input symbol, which gets put
into the label for the associated token ref; e.g., x=ID. Token
and tree parsers need to return different objects. Rather than test
for input stream type or change the IntStream interface, I use
a simple method to ask the recognizer to tell me what the current
input symbol is.
This is ignored for lexers.
Conjure up a missing token during error recovery.
The recognizer attempts to recover from single missing
symbols. But, actions might refer to that missing symbol.
For example, x=ID {f($x);}. The action clearly assumes
that there has been an identifier matched previously and that
$x points at that token. If that token is missing, but
the next token in the stream is what we want we assume that
this token is missing and we keep going. Because we
have to return some token to replace the missing token,
we have to conjure one up. This method gives the user control
over the tokens returned for missing tokens. Mostly,
you will want to create something special for identifier
tokens. For literals such as '{' and ',', the default
action in the parser or tree parser works. It simply creates
a CommonToken of the appropriate type. The text will be the token.
If you change what tokens must be created by the lexer,
override this method to create the appropriate tokens.
Push a rule's follow set using our own hardcoded stack
The most common stream of tokens is one where every token is buffered up
and tokens are prefiltered for a certain channel (the parser will only
see these tokens and cannot change the filter channel number during the
parse).
TODO: how to access the full token stream? How to track all tokens matched per rule?
Record every single token pulled from the source so we can reproduce
chunks of it later.
]]> to override some Tokens' channel numbers
;]]> discard any tokens with this type
Skip tokens on any channel but this one; this is how we skip whitespace...
By default, track all incoming tokens
Track the last Mark() call result value for use in Rewind().
The index into the tokens list of the current token (next token
to consume). p==-1 indicates that the tokens list is empty
Gets or sets the token source for this stream (i.e. the source
that supplies the stream with Token objects).
Setting the token source resets the stream.
Get the ith token from the current position 1..n where k=1 is the
first symbol of lookahead.
Return absolute token i; ignore which channel the tokens are on;
that is, count all tokens not just on-channel tokens.
Move the input pointer to the next incoming token. The stream
must become active with LT(1) available. Consume() simply
moves the input pointer so that LT(1) points at the next
input symbol. Consume at least one token.
Walk past any token not on the channel the parser is listening to.
Load all tokens from the token source and put in tokens.
This is done upon first LT request because you might want to
set some token type / channel overrides before filling buffer.
Given a starting index, return the index of the first on-channel
token.
A simple filter mechanism whereby you can tell this token stream
to force all tokens of type ttype to be on channel.
For example,
when interpreting, we cannot exec actions so we need to tell
the stream to force all WS and NEWLINE to be a different, ignored
channel.
Given a start and stop index, return a List of all tokens in
the token type BitSet. Return null if no tokens were found. This
method looks at both on and off channel tokens.
Look backwards k tokens on-channel tokens
The set of fields needed by an abstract recognizer to recognize input
and recover from errors
As a separate state object, it can be shared among multiple grammars;
e.g., when one grammar imports another.
These fields are publicly visible but the actual state pointer per
parser is protected.
Tracks the set of token types that can follow any rule invocation.
Stack grows upwards. When it hits the max, it grows 2x in size
and keeps going.
This is true when we see an error and before having successfully
matched a token. Prevents generation of more than one error message
per error.
The index into the input stream where the last error occurred.
This is used to prevent infinite loops where an error is found
but no token is consumed during recovery...another error is found,
ad naseum. This is a failsafe mechanism to guarantee that at least
one token/tree node is consumed for two errors.
In lieu of a return value, this indicates that a rule or token
has failed to match. Reset to false upon valid token match.
Did the recognizer encounter a syntax error? Track how many.
If 0, no backtracking is going on. Safe to exec actions etc...
If >0 then it's the level of backtracking.
An array[size num rules] of Map<Integer,Integer> that tracks
the stop token index for each rule.
ruleMemo[ruleIndex] is the memoization table for ruleIndex.
For key ruleStartIndex, you get back the stop token for
associated rule or MEMO_RULE_FAILED.
This is only used if rule memoization is on (which it is by default).
Token object normally returned by NextToken() after matching lexer rules.
The goal of all lexer rules/methods is to create a token object.
This is an instance variable as multiple rules may collaborate to
create a single token. nextToken will return this object after
matching lexer rule(s). If you subclass to allow multiple token
emissions, then set this to the last token to be matched or
something nonnull so that the auto token emit mechanism will not
emit another token.
What character index in the stream did the current token start at?
Needed, for example, to get the text for current token. Set at
the start of nextToken.
The line on which the first character of the token resides
The character position of first character within the line
The channel number for the current token
The token type for the current token
You can set the text for the current token to override what is in
the input char buffer. Use setText() or can set this instance var.
The line number on which this token was matched; line=1..n
The index of the first character relative to the beginning of the line 0..n-1
An index from 0..n-1 of the token object in the input stream
This must be valid in order to use the ANTLRWorks debugger.
The text of the token
When setting the text, it might be a NOP such as for the CommonToken,
which doesn't have string pointers, just indexes into a char buffer.
A stream of tokens accessing tokens from a TokenSource
Where is this stream pulling tokens from? This is not the name, but
the object that provides Token objects.
Get Token at current input pointer + i ahead (where i=1 is next
Token).
i < 0 indicates tokens in the past. So -1 is previous token and -2 is
two tokens ago. LT(0) is undefined. For i>=n, return Token.EOFToken.
Return null for LT(0) and any index that results in an absolute address
that is negative.
Get a token at an absolute index i; 0..n-1. This is really only
needed for profiling and debugging and token stream rewriting.
If you don't want to buffer up tokens, then this method makes no
sense for you. Naturally you can't use the rewrite stream feature.
I believe DebugTokenStream can easily be altered to not use
this method, removing the dependency.
Return the text of all tokens from start to stop, inclusive.
If the stream does not buffer all the tokens then it can just
return "" or null; Users should not access $ruleLabel.text in
an action of course in that case.
Because the user is not required to use a token with an index stored
in it, we must provide a means for two token objects themselves to
indicate the start/end location. Most often this will just delegate
to the other toString(int,int). This is also parallel with
the TreeNodeStream.toString(Object,Object).
A lexer is recognizer that draws input symbols from a character stream.
lexer grammars result in a subclass of this object. A Lexer object
uses simplified Match() and error recovery mechanisms in the interest
of speed.
Where is the lexer drawing characters from?
Set the char stream and reset the lexer
What is the index of the current character of lookahead?
Gets or sets the 'lexeme' for the current token.
The getter returns the text matched so far for the current token or any
text override.
The setter sets the complete text of this token. It overrides/wipes any
previous changes to the text.
Return a token from this source; i.e., Match a token on the char stream.
Instruct the lexer to skip creating a token for current lexer rule and
look for another token. NextToken() knows to keep looking when a lexer
rule finishes with token set to SKIP_TOKEN. Recall that if token==null
at end of any token rule, it creates one for you and emits it.
This is the lexer entry point that sets instance var 'token'
Currently does not support multiple emits per nextToken invocation
for efficiency reasons. Subclass and override this method and
nextToken (to push tokens into a list and pull from that list rather
than a single variable as this implementation does).
The standard method called to automatically emit a token at the
outermost lexical rule. The token object should point into the
char buffer start..stop. If there is a text override in 'text',
use that to set the token's text.
Override this method to emit custom Token objects.
If you are building trees, then you should also override
Parser or TreeParser.getMissingSymbol().
Lexers can normally Match any char in it's vocabulary after matching
a token, so do the easy thing and just kill a character and hope
it all works out. You can instead use the rule invocation stack
to do sophisticated error recovery if you are in a Fragment rule.
Useful for dumping out the input stream after doing some
augmentation or other manipulations.
You can insert stuff, Replace, and delete chunks. Note that the
operations are done lazily--only if you convert the buffer to a
String. This is very efficient because you are not moving data around
all the time. As the buffer of tokens is converted to strings, the
ToString() method(s) check to see if there is an operation at the
current index. If so, the operation is done and then normal String
rendering continues on the buffer. This is like having multiple Turing
machine instruction streams (programs) operating on a single input tape. :)
Since the operations are done lazily at ToString-time, operations do not
screw up the token index values. That is, an insert operation at token
index i does not change the index values for tokens i+1..n-1.
Because operations never actually alter the buffer, you may always get
the original token stream back without undoing anything. Since
the instructions are queued up, you can easily simulate transactions and
roll back any changes if there is an error just by removing instructions.
For example,
CharStream input = new ANTLRFileStream("input");
TLexer lex = new TLexer(input);
TokenRewriteStream tokens = new TokenRewriteStream(lex);
T parser = new T(tokens);
parser.startRule();
Then in the rules, you can execute
IToken t,u;
...
input.InsertAfter(t, "text to put after t");}
input.InsertAfter(u, "text after u");}
System.out.println(tokens.ToString());
Actually, you have to cast the 'input' to a TokenRewriteStream. :(
You can also have multiple "instruction streams" and get multiple
rewrites from a single pass over the input. Just name the instruction
streams and use that name again when printing the buffer. This could be
useful for generating a C file and also its header file--all from the
same buffer:
tokens.InsertAfter("pass1", t, "text to put after t");}
tokens.InsertAfter("pass2", u, "text after u");}
System.out.println(tokens.ToString("pass1"));
System.out.println(tokens.ToString("pass2"));
If you don't use named rewrite streams, a "default" stream is used as
the first example shows.
What index into rewrites List are we?
Token buffer index.
Execute the rewrite operation by possibly adding to the buffer.
Return the index of the next token to operate on.
I'm going to try replacing range from x..y with (y-x)+1 ReplaceOp
instructions.
You may have multiple, named streams of rewrite operations.
I'm calling these things "programs."
Maps String (name) -> rewrite (IList)
Map String (program name) -> Integer index
Rollback the instruction stream for a program so that
the indicated instruction (via instructionIndex) is no
longer in the stream. UNTESTED!
Reset the program so that no instructions exist
Return a map from token index to operation.
We need to combine operations and report invalid operations (like
overlapping replaces that are not completed nested). Inserts to
same index need to be combined etc... Here are the cases:
I.i.u I.j.v leave alone, nonoverlapping
I.i.u I.i.v combine: Iivu
R.i-j.u R.x-y.v | i-j in x-y delete first R
R.i-j.u R.i-j.v delete first R
R.i-j.u R.x-y.v | x-y in i-j ERROR
R.i-j.u R.x-y.v | boundaries overlap ERROR
I.i.u R.x-y.v | i in x-y delete I
I.i.u R.x-y.v | i not in x-y leave alone, nonoverlapping
R.x-y.v I.i.u | i in x-y ERROR
R.x-y.v I.x.u R.x-y.uv (combine, delete I)
R.x-y.v I.i.u | i not in x-y leave alone, nonoverlapping
I.i.u = insert u before op @ index i
R.x-y.u = replace x-y indexed tokens with u
First we need to examine replaces. For any replace op:
1. wipe out any insertions before op within that range.
2. Drop any replace op before that is contained completely within
that range.
3. Throw exception upon boundary overlap with any previous replace.
Then we can deal with inserts:
1. for any inserts to same index, combine even if not adjacent.
2. for any prior replace with same left boundary, combine this
insert with replace and delete this replace.
3. throw exception if index in same range as previous replace
Don't actually delete; make op null in list. Easier to walk list.
Later we can throw as we add to index -> op map.
Note that I.2 R.2-2 will wipe out I.2 even though, technically, the
inserted stuff would be before the replace range. But, if you
add tokens in front of a method body '{' and then delete the method
body, I think the stuff before the '{' you added should disappear too.
Get all operations before an index of a particular kind
How to execute code for node t when a visitor visits node t. Execute
Pre() before visiting children and execute Post() after visiting children.
Execute an action before visiting children of t. Return t or
a rewritten t. Children of returned value will be visited.
Execute an action after visiting children of t. Return t or
a rewritten t. It is up to the visitor to decide what to do
with the return value.
Do a depth first walk of a tree, applying pre() and post() actions
as we discover and finish nodes.
Visit every node in tree t and trigger an action for each node
before/after having visited all of its children.
Execute both actions even if t has no children.
If a child visit yields a new child, it can update its
parent's child list or just return the new child. The
child update code works even if the child visit alters its parent
and returns the new tree.
Return result of applying post action to this node.