TaskJuggler::TextParser

encoding: UTF-8

TokenDoc.rb — The TaskJuggler III Project Management Software

              by Chris Schlaeger <chris@linux.com>

This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.

encoding: UTF-8

State.rb — The TaskJuggler III Project Management Software

              by Chris Schlaeger <chris@linux.com>

This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.

The TextParser implements a somewhat modified LL(1) parser. It uses a dynamically compiled state machine. Dynamically means, that the syntax can be extended during the parse process. This allows support for languages that can extend their syntax during the parse process. The TaskJuggler syntax is such an beast.

This class is just a base class. A complete parser would derive from this class and implement the rule set and the functions _nextToken()_ and _returnToken()_. It also needs to set the array variables to declare all variables ($SOMENAME) that the scanner may deliver.

To describe the syntax the functions TextParser#pattern, TextParser#optional and TextParser#repeatable can be used. When the rule set is changed during parsing, TextParser#updateParserTables must be called to make the changes effective. The parser can also document the syntax automatically. To document a pattern, the functions TextParser#doc, TextParser#descr, TextParser#also and TextParser#arg can be used.

In contrast to conventional LL grammars, we use a slightly improved syntax descriptions. Repeated patterns are not described by recursive call but we use a repeat flag for syntax rules that consists of repeatable patterns. This removes the need for recursion elimination when compiling the state machine and makes the syntax a lot more readable. However, it adds a bit more complexity to the state machine. Optional patterns are described by a rule flag, not by adding an empty pattern.

To start parsing the input the function TextParser#parse needs to be called with the name of the start rule.

Attributes

rules[R]

messageHandler[R]

Public Class Methods

new(messageHandler) click to toggle source

Create a new TextParser object.

    # File lib/TextParser.rb, line 78
78:     def initialize(messageHandler)
79:       # The message handler will collect all error messages.
80:       @messageHandler = messageHandler
81:       # This Hash will store the ruleset that the parser is operating on.
82:       @rules = { }
83:       # Array to hold the token types that the scanner can return.
84:       @variables = []
85:       # An list of token types that are not allowed in the current context.
86:       # For performance reasons we use a hash with the token as key. The value
87:       # is irrelevant.
88:       @blockedVariables = {}
89:       # The currently processed rule.
90:       @cr = nil
91: 
92:       @states = {}
93:       # The stack used by the FSM.
94:       @stack = nil
95:     end

Public Instance Methods

error(id, text, sfi = nil, data = nil) click to toggle source

     # File lib/TextParser.rb, line 220
220:     def error(id, text, sfi = nil, data = nil)
221:       sfi ||= sourceFileInfo
222:       if @scanner
223:         # The scanner has some more context information, so we pass the error
224:         # on to the TextScanner.
225:         @scanner.error(id, text, sfi, data)
226:       else
227:         @messageHandler.error(id, text, sfi, data)
228:       end
229:     end

initRules() click to toggle source

Call all methods that start with ‘rule_’ to initialize the rules.

     # File lib/TextParser.rb, line 112
112:     def initRules
113:       methods.each do |m|
114:         if m[0, 5] == 'rule_'
115:           # Create a new rule with the suffix of the function name as name.
116:           newRule(m[5..1])
117:           # Call the function.
118:           send(m)
119:         end
120:       end
121:     end

limitTokenSet(tokenSet) click to toggle source

Limit the allowed tokens of the scanner to the subset passed by the tokenSet Array.

     # File lib/TextParser.rb, line 99
 99:     def limitTokenSet(tokenSet)
100:       return unless tokenSet
101: 
102:       # Create a copy of all supported variables.
103:       blockedVariables = @variables.dup
104:       # Then delete all that are in the limited set.
105:       blockedVariables.delete_if { |v| tokenSet.include?(v) }
106:       # And convert the list into a Hash for faster lookups.
107:       @blockedVariables = {}
108:       blockedVariables.each { |v| @blockedVariables[v] = true }
109:     end

newRule(name) click to toggle source

Add a new rule to the rule set. name must be a unique identifier. The function also sets the class variable @cr to the new rule. Subsequent calls to TextParser#pattern, TextParser#optional or TextParser#repeatable will then implicitely operate on the most recently added rule.

     # File lib/TextParser.rb, line 128
128:     def newRule(name)
129:       # Use a symbol instead of a String.
130:       name = name.intern
131:       raise "Fatal Error: Rule #{name} already exists" if @rules.has_key?(name)
132: 
133:       if block_given?
134:         saveCr = @cr
135:         @rules[name] = @cr = TextParser::Rule.new(name)
136:         yield
137:         @cr = saveCr
138:       else
139:         @rules[name] = @cr = TextParser::Rule.new(name)
140:       end
141:     end

optional() click to toggle source

Identify the patterns of the most recently added rule as optional syntax elements.

     # File lib/TextParser.rb, line 160
160:     def optional
161:       @cr.setOptional
162:     end

parse(ruleName) click to toggle source

To parse the input this function needs to be called with the name of the rule to start with. It returns the result of the processing function of the top-level parser rule that was specified by ruleName. In case of an error, the result is false.

     # File lib/TextParser.rb, line 197
197:     def parse(ruleName)
198:       @stack = []
199:       @@expectedTokens = []
200:       begin
201:         result = parseFSM(@rules[ruleName])
202:       rescue TjException => msg
203:         if msg.message && !msg.message.empty?
204:           @messageHandler.critical('parse', msg.message)
205:         end
206:         return false
207:       end
208: 
209:       result
210:     end

pattern(tokens, func = nil) click to toggle source

Add a new pattern to the most recently added rule. tokens is an array of strings that specify the syntax elements of the pattern. Each token must start with an character that identifies the type of the token. The following types are supported.

! a reference to another rule
$ a variable token as delivered by the scanner
_ a literal token.

func is a Proc object that is called whenever the parser has completed the processing of this rule.

     # File lib/TextParser.rb, line 154
154:     def pattern(tokens, func = nil)
155:       @cr.addPattern(TextParser::Pattern.new(tokens, func))
156:     end

repeatable() click to toggle source

Identify the patterns of the most recently added rule as repeatable syntax elements.

     # File lib/TextParser.rb, line 166
166:     def repeatable
167:       @cr.setRepeatable
168:     end

sourceFileInfo() click to toggle source

Return the SourceFileInfo of the TextScanner at the beginning of the currently processed TextParser::Rule. Or return nil if we don’t have a current position.

     # File lib/TextParser.rb, line 215
215:     def sourceFileInfo
216:       return @scanner.sourceFileInfo if @stack.nil? || @stack.length <= 1
217:       @stack.last.firstSourceFileInfo
218:     end

updateParserTables() click to toggle source

This function needs to be called whenever new rules or patterns have been added and before the next call to TextParser#parse. It’s perfectly ok to call this function from within a parse() call as long as the states that are currently on the stack have not been modified.

     # File lib/TextParser.rb, line 174
174:     def updateParserTables
175:       saveFsmStack
176:       # Invalidate some cached data.
177:       @rules.each_value { |rule| rule.flushCache }
178:       @states = {}
179:       # Generate the parser states for all patterns of all rules.
180:       @rules.each_value do |rule|
181:         rule.generateStates.each do |s|
182:           @states[[ s.rule, s.pattern, s.index ]] = s
183:         end
184:         checkRule(rule)
185:       end
186:       # Compute the transitions between the generated states.
187:       @states.each_value do |state|
188:         state.addTransitions(@states, @rules)
189:       end
190:       restoreFsmStack
191:     end

warning(id, text, sfi = nil, data = nil) click to toggle source

     # File lib/TextParser.rb, line 231
231:     def warning(id, text, sfi = nil, data = nil)
232:       sfi ||= sourceFileInfo
233:       if @scanner
234:         # The scanner has some more context information, so we pass the
235:         # warning on to the TextScanner.
236:         @scanner.warning(id, text, sfi, data)
237:       else
238:         @messageHandler.warning(id, text, sfi, data)
239:       end
240:     end

Private Instance Methods

checkRule(rule) click to toggle source

     # File lib/TextParser.rb, line 244
244:     def checkRule(rule)
245:       if rule.patterns.empty?
246:         raise "Rule #{rule.name} must have at least one pattern"
247:       end
248: 
249:       rule.patterns.each do |pat|
250:         pat.each do |type, name|
251:           if type == :variable
252:             if @variables.index(name).nil?
253:               error('unsupported_token',
254:                     "The token #{name} is not supported here.")
255:             end
256:           elsif type == :reference
257:             if @rules[name].nil?
258:               raise "Fatal Error: Reference to unknown rule #{name} in " +
259:                     "pattern '#{pat}' of rule #{rule.name}"
260:             end
261:           end
262:         end
263:       end
264:     end

dumpStack() click to toggle source

     # File lib/TextParser.rb, line 405
405:     def dumpStack
406:       #puts "Stack level #{@stack.length}"
407:       @stack.each do |sl|
408:         print "#{@stack.index(sl)}: "
409:         sl.each do |v|
410:           if v.is_a?(Array)
411:             begin
412:               print "[#{v.join('|')}]|"
413:             rescue
414:               print "[#{v[0].class}...]|"
415:             end
416:           else
417:             begin
418:               print "#{v}|"
419:             rescue
420:               print v.class
421:             end
422:           end
423:         end
424:         print " -> #{sl.state ? sl.state.to_s(true) : 'nil'} #{sl.function.nil? ? '' : '(Called)'}"
425:         puts ""
426:       end
427:     end

finishPattern(token) click to toggle source

     # File lib/TextParser.rb, line 355
355:     def finishPattern(token)
356:       #dumpStack
357:       # To finish a pattern we need to pop the StackElement with the token
358:       # values from the stack.
359:       stackEntry = @stack.pop
360:       if stackEntry.nil? || @stack.empty?
361:         # Check if we have reached the bottom of the stack.
362:         token = getNextToken unless token
363:         if token[0] == :endOfText
364:           # If the token is the end of the top-level file, we're done. We push
365:           # back the StackEntry since it holds the overall result of the
366:           # parsing.
367:           @stack.push(stackEntry)
368:           return true
369:         end
370:         # If it's not the EOF token, we found a token that violates the syntax
371:         # rules.
372:         error('unexpctd_token', "Unexpected token '#{token[1]}' found. " +
373:               "Expecting one of " +
374:               "#{stackEntry.state.expectedTokens.join(', ')}")
375:       end
376:       # Memorize if the rule for this pattern was repeatable. Then we will
377:       # store the result of the pattern in an Array.
378:       ruleIsRepeatable = stackEntry.state.rule.repeatable
379: 
380:       state = stackEntry.state
381:       result = nil
382:       if state.pattern.function
383:         # Make the token values and their SourceFileInfo available.
384:         @val = stackEntry.val
385:         @sourceFileInfo = stackEntry.sourceFileInfo
386:         # Now call the pattern action to compute the value of the pattern.
387:         begin
388:           result = state.pattern.function.call
389:         rescue AttributeOverwrite
390:           @scanner.warning('attr_overwrite', $!.to_s)
391:         end
392:       end
393: 
394:       # We use the SourceFileInfo of the first token of the pattern to store
395:       # it with the result of the pattern.
396:       firstSourceFileInfo = stackEntry.firstSourceFileInfo
397:       # Store the result at the correct position into the next lower level of
398:       # the stack.
399:       stackEntry = @stack.last
400:       stackEntry.insert(stackEntry.state.index, result,
401:                         firstSourceFileInfo, ruleIsRepeatable)
402:       false
403:     end

getNextToken() click to toggle source

     # File lib/TextParser.rb, line 452
452:     def getNextToken
453:       token = nextToken
454:       #Log << "Token: [#{token[0]}][#{token[1]}]"
455:       if @blockedVariables[token[0]]
456:         error('unsupported_token',
457:               "The token #{token[1]} is not supported in this context.",
458:               token[2])
459:       end
460:       token
461:     end

parseFSM(rule) click to toggle source

     # File lib/TextParser.rb, line 266
266:     def parseFSM(rule)
267:       unless (state = @states[[ rule, nil, 0 ]])
268:         error("no_start_state", "No start state for rule #{rule.name} found")
269:       end
270:       @stack = [ TextParser::StackElement.new(nil, state) ]
271: 
272:       loop do
273:         if state.transitions.empty?
274:           # The final states of each pattern have no pre-compiled transitions.
275:           # For such a state, we don't need to get a new token.
276:           transition = token = nil
277:         else
278:           transition = state.transition(token = getNextToken)
279:         end
280: 
281:         if transition
282:           # Shift: This for normal state transitions. This may be from one
283:           # token of a pattern to the next token of the same pattern, to the
284:           # start of a new pattern or a loop-back to the start of a pattern of
285:           # the same rule. The transition tells us what state we have to
286:           # process next.
287:           state = transition.state
288: 
289:           # If we have looped-back we need to finish the pattern first. Final
290:           # tokens of repeatable rules do have transitions!
291:           finishPattern(token) if transition.loopBack
292: 
293:           # Transitions that enter rules generate states which we need to
294:           # resume at when a rule has been completely processed. We push this
295:           # list of states on the @stack.
296:           stackElement = @stack.last
297:           first = true
298:           transition.stateStack.each do |s|
299:             if first && s.pattern == stackElement.state.pattern
300:               # The first state in the list may just be another state of the
301:               # current pattern. In this case, we already have the
302:               # StackElement on the @stack. We only need to update the State
303:               # for the current StackElement.
304:               stackElement.state = s
305:             else
306:               # For other patterns, we just push a new StackElement onto the
307:               # @stack.
308:               @stack.push(TextParser::StackElement.new(nil, s))
309:             end
310:             first = false
311:           end
312: 
313:           if state.index == 0
314:             # If we have just started with a new pattern (or loop-ed back) we
315:             # need to push a new StackEntry onto the @stack. The StackEntry
316:             # stores the result of the pattern and keeps the State that we
317:             # need to return to in case we jump to other patterns from this
318:             # pattern.
319:             function = state.index == state.pattern.tokens.length - 1 ?
320:                        state.pattern.function : nil
321:             @stack.push(TextParser::StackElement.new(state.pattern.function,
322:                                                      state))
323:           end
324: 
325:           # Store the token value in the result Array.
326:           @stack.last.insert(state.index, token[1], token[2], false)
327:         else
328:           # Reduce: We've reached the end of a rule. There is no pre-compiled
329:           # transition available. The current token, if we have one, is of no
330:           # use to us during this state. We just return it to the scanner. The
331:           # next state is determined by the first matching state from the
332:           # @stack.
333:           if state.noReduce
334:             # Only states that finish a rule may trigger a reduce operation.
335:             # Other states have the noReduce flag set. If a reduce for such a
336:             # state is triggered, we found a token that is not supported by
337:             # the syntax rules.
338:             error("no_reduce",
339:                   "Unexpected token '#{token[1]}' found. " +
340:                   "Expecting one of " +
341:                   "#{@stack.last.state.expectedTokens.join(', ')}")
342:           end
343:           returnToken(token) if token
344:           if finishPattern(token)
345:             # Accept: We're done with parsing.
346:             break
347:           end
348:           state = @stack.last.state
349:         end
350:       end
351: 
352:       @stack[0].val[0]
353:     end

restoreFsmStack() click to toggle source

Convert the FSM stack state entries from [ rule, pattern, index ] into the respective State objects again.

     # File lib/TextParser.rb, line 442
442:     def restoreFsmStack
443:       return unless @stack
444: 
445:       @stack.each do |s|
446:         next unless (state = @states[s.state])
447:         raise "Stack restore failed. Cannot find state" unless state
448:         s.state = state
449:       end
450:     end

saveFsmStack() click to toggle source

Convert the FSM stack state entries from State objects into [ rule, pattern, index ] equivalents.

     # File lib/TextParser.rb, line 431
431:     def saveFsmStack
432:       return unless @stack
433: 
434:       @stack.each do |s|
435:         next unless (st = s.state)
436:         s.state = [ st.rule, st.pattern, st.index ]
437:       end
438:     end

Home Classes Methods

In Files

Parent

Namespace

Methods

Files