Parent

Class Index [+]

Quicksearch

TaskJuggler::TextParser

The TextParser implements a regular LALR parser. But it uses a recursive rule traversor instead of the more commonly found state machine generated by yacc-like tools. Since stack depths is not really an issue with a Ruby implementation this approach has one big advantage. The syntax of the parser can be modified during parsing. This allows support for languages that can extend themself. The TaskJuggler syntax is such an beast. Traditional yacc generated parsers would fail with such a syntax.

This class is just a base class. A complete parser would derive from this class and implement the rule set and the functions _nextToken()_ and _returnToken()_. It also needs to set the array variables to declare all variables ($SOMENAME) that the scanner may deliver.

To describe the syntax the functions TextParser#pattern, TextParser#optional and TextParser#repeatable can be used. When the rule set is changed during parsing, TextParser#updateParserTables must be called to make the changes effective. The parser can also document the syntax automatically. To document a pattern, the functions TextParser#doc, TextParser#descr, TextParser#also and TextParser#arg can be used.

To start parsing the input the function TextParser#parse needs to be called with the name of the start rule.


encoding: UTF-8

Rule.rb — The TaskJuggler III Project Management Software

Copyright © 2006, 2007, 2008, 2009, 2010 by Chris Schlaeger

This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.


encoding: UTF-8

TokenDoc.rb — The TaskJuggler III Project Management Software

Copyright © 2006, 2007, 2008, 2009, 2010 by Chris Schlaeger

This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.


encoding: UTF-8

StackElement.rb — The TaskJuggler III Project Management Software

Copyright © 2006, 2007, 2008, 2009, 2010 by Chris Schlaeger

This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.

Attributes

rules[R]
messageHandler[R]

Public Class Methods

new(messageHandler) click to toggle source

Create a new TextParser object.

    # File lib/TextParser.rb, line 71
71:     def initialize(messageHandler)
72:       # The message handler will collect all error messages.
73:       @messageHandler = messageHandler
74:       # This Hash will store the ruleset that the parser is operating on.
75:       @rules = { }
76:       # Array to hold the token types that the scanner can return.
77:       @variables = []
78:       # An list of token types that are not allowed in the current context.
79:       @badVariables = []
80:       # The currently processed rule.
81:       @cr = nil
82:     end

Public Instance Methods

error(id, text, sfi = nil, data = nil) click to toggle source
     # File lib/TextParser.rb, line 197
197:     def error(id, text, sfi = nil, data = nil)
198:       sfi ||= sourceFileInfo
199:       if @scanner
200:         # The scanner has some more context information, so we pass the error
201:         # on to the TextScanner.
202:         @scanner.error(id, text, sfi, data)
203:       else
204:         @messageHandler.error(id, text, sfi, data)
205:       end
206:     end
initRules() click to toggle source

Call all methods that start with ‘rule_’ to initialize the rules.

     # File lib/TextParser.rb, line 94
 94:     def initRules
 95:       methods.each do |m|
 96:         if m[0, 5] == 'rule_'
 97:           # Create a new rule with the suffix of the function name as name.
 98:           newRule(m[5..1])
 99:           # Call the function.
100:           send(m)
101:         end
102:       end
103:     end
limitTokenSet(tokenSet) click to toggle source

Limit the allowed tokens of the scanner to the subset passed by the tokenSet Array.

    # File lib/TextParser.rb, line 86
86:     def limitTokenSet(tokenSet)
87:       return unless tokenSet
88: 
89:       @badVariables = @variables.dup
90:       @badVariables.delete_if { |v| tokenSet.include?(v) }
91:     end
matchingRules(keyword) click to toggle source
     # File lib/TextParser.rb, line 188
188:     def matchingRules(keyword)
189:       matches = []
190:       @rules.each do |name, rule|
191:         patIdx = rule.matchingPatternIndex('_' + keyword)
192:         matches << [ rule, patIdx ] if patIdx
193:       end
194:       matches
195:     end
newRule(name) click to toggle source

Add a new rule to the rule set. name must be a unique identifier. The function also sets the class variable @cr to the new rule. Subsequent calls to TextParser#pattern, TextParser#optional or TextParser#repeatable will then implicitely operate on the most recently added rule.

     # File lib/TextParser.rb, line 110
110:     def newRule(name)
111:       raise "Fatal Error: Rule #{name} already exists" if @rules.has_key?(name)
112: 
113:       if block_given?
114:         saveCr = @cr
115:         @rules[name] = @cr = TextParser::Rule.new(name)
116:         yield
117:         @cr = saveCr
118:       else
119:         @rules[name] = @cr = TextParser::Rule.new(name)
120:       end
121:     end
optional() click to toggle source

Identify the patterns of the most recently added rule as optional syntax elements.

     # File lib/TextParser.rb, line 140
140:     def optional
141:       @cr.setOptional
142:     end
parse(ruleName) click to toggle source

To parse the input this function needs to be called with the name of the rule to start with. It returns the result of the processing function of the top-level parser rule that was specified by ruleName. In case of an error, the result is false.

     # File lib/TextParser.rb, line 164
164:     def parse(ruleName)
165:       @stack = []
166:       @@expectedTokens = []
167:       updateParserTables
168:       begin
169:         result = parseRuleR(@rules[ruleName])
170:       rescue TjException => msg
171:         if msg.message && !msg.message.empty?
172:           @messageHandler.critical('parse', msg.message)
173:         end
174:         return false
175:       end
176: 
177:       result
178:     end
pattern(tokens, func = nil) click to toggle source

Add a new pattern to the most recently added rule. tokens is an array of strings that specify the syntax elements of the pattern. Each token must start with an character that identifies the type of the token. The following types are supported.

  • ! a reference to another rule

  • $ a variable token as delivered by the scanner

  • _ a literal token.

func is a Proc object that is called whenever the parser has completed the processing of this rule.

     # File lib/TextParser.rb, line 134
134:     def pattern(tokens, func = nil)
135:       @cr.addPattern(TextParser::Pattern.new(tokens, func))
136:     end
repeatable() click to toggle source

Identify the patterns of the most recently added rule as repeatable syntax elements.

     # File lib/TextParser.rb, line 146
146:     def repeatable
147:       @cr.setRepeatable
148:     end
sourceFileInfo() click to toggle source

Return the SourceFileInfo of the TextScanner at the beginning of the currently processed TextParser::Rule. Or return nil if we don’t have a current position.

     # File lib/TextParser.rb, line 183
183:     def sourceFileInfo
184:       return nil if @stack.nil? || @stack.empty?
185:       @stack.last.sourceFileInfo[0]
186:     end
updateParserTables() click to toggle source

This function needs to be called whenever new rules or patterns have been added and before the next call to TextParser#parse.

     # File lib/TextParser.rb, line 152
152:     def updateParserTables
153:       @rules.each_value { |rule| rule.transitions = {} }
154:       @rules.each_value do |rule|
155:         getTransitions(rule)
156:         checkRule(rule)
157:       end
158:     end
warning(id, text, sfi = nil, data = nil) click to toggle source
     # File lib/TextParser.rb, line 208
208:     def warning(id, text, sfi = nil, data = nil)
209:       sfi ||= sourceFileInfo
210:       if @scanner
211:         # The scanner has some more context information, so we pass the
212:         # warning on to the TextScanner.
213:         @scanner.warning(id, text, sfi, data)
214:       else
215:         @messageHandler.warning(id, text, sfi, data)
216:       end
217:     end

Private Instance Methods

checkRule(rule) click to toggle source
     # File lib/TextParser.rb, line 277
277:     def checkRule(rule)
278:       if rule.patterns.empty?
279:         raise "Rule #{rule.name} must have at least one pattern"
280:       end
281: 
282:       rule.patterns.each do |pat|
283:         pat.each do |tok|
284:           type = tok[0]
285:           token = tok[1..1]
286:           if type == $$
287:             if @variables.index(token).nil?
288:               error('unsupported_token',
289:                     "The token #{token} is not supported here.", token[2])
290:             end
291:           elsif type == !!
292:             if @rules[token].nil?
293:               raise "Fatal Error: Reference to unknown rule #{token} in " +
294:                     "pattern '#{pat}' of rule #{rule.name}"
295:             end
296:           end
297:         end
298:       end
299:     end
findPattern(rule, token, repeatMode) click to toggle source
     # File lib/TextParser.rb, line 537
537:     def findPattern(rule, token, repeatMode)
538:       # The scanner cannot differentiate between keywords and identifiers.  So
539:       # whenever an identifier is returned we have to see if we have a
540:       # matching keyword first. If none is found, then look for normal
541:       # identifiers.
542:       if token[0] == 'ID'
543:         if (patIdx = rule.matchingPatternIndex('_' + token[1])).nil?
544:           patIdx = rule.matchingPatternIndex("$ID")
545:         end
546:       elsif token[0] == 'LITERAL'
547:         patIdx = rule.matchingPatternIndex('_' + token[1])
548:       elsif token[0] == false
549:         patIdx = rule.matchingPatternIndex('.')
550:       else
551:         patIdx = rule.matchingPatternIndex('$' + token[0])
552:       end
553: 
554:       # If no matching pattern is found for the token we have to check if the
555:       # rule is optional or we are in repeat mode. If this is the case, return
556:       # the token back to the scanner. Otherwise, we have found a token we
557:       # cannot handle at this point.
558:       if patIdx.nil?
559:         # Append the list of expected tokens to the @@expectedToken array.
560:         # This may be used in a later rule to provide more details when an
561:         # error occured.
562:         rule.transitions.each do |transition|
563:           keys = transition.keys
564:           keys.collect! { |key| key[1..1] }
565:           @@expectedTokens += keys
566:           @@expectedTokens.sort!
567:         end
568: 
569:         unless rule.optional?(@rules) || repeatMode
570:           error('unexpctd_token',
571:                 (token[0] != false ?
572:                  "Unexpected token '#{token[1]}' of type " +
573:                  "'#{token[0]}'. " :
574:                  "Unexpected end of file in #{@scanner.fileName}. ") +
575:                 (@@expectedTokens.length > 1 ?
576:                  "Expecting one of #{@@expectedTokens.join(', ')}" :
577:                  "Expecting #{@@expectedTokens[0]}"), token[2])
578:         end
579:         returnToken(token)
580:         return nil
581:       end
582: 
583:       rule.pattern(patIdx)
584:     end
getNextToken() click to toggle source
     # File lib/TextParser.rb, line 526
526:     def getNextToken
527:       token = nextToken
528:       #Log << "Token: [#{token[0]}][#{token[1]}]"
529:       if @badVariables.include?(token[0])
530:         error('unsupported_token',
531:               "The token #{token[1]} is not supported in this context.",
532:               token[2])
533:       end
534:       token
535:     end
getTransitions(rule) click to toggle source

getTransitions recursively determines all possible target tokens that the rule matches. A target token can either be a fixed token (prefixed with _), a variable token (prefixed with $) or an end token (just a .). The list of found target tokens is stored in the transitions list of the rule. For each rule pattern we store the transitions for this pattern in a token -> rule hash.

     # File lib/TextParser.rb, line 227
227:     def getTransitions(rule)
228:       # If we have processed this rule before we can just return a copy
229:       # of the transitions of this rule. This avoids endless recursions.
230:       return rule.transitions.dup unless rule.transitions.empty?
231: 
232:       rule.transitions = []
233:       rule.patterns.each do |pat|
234:         allTokensOptional = true
235:         transitions = { }
236:         pat.each do |token|
237:           tokenId = token[1..1]
238:           if token[0] == !!
239:             unless @rules.has_key?(tokenId)
240:               raise "Fatal Error: Unknown reference to #{tokenId} in pattern " +
241:                     "#{pat} + of rule #{rule.name}"
242:             end
243:             refRule = @rules[tokenId]
244:             # If the referenced rule describes optional content, we need to look
245:             # at the next token as well.
246:             res = getTransitions(@rules[tokenId])
247:             allTokensOptional = false unless refRule.optional?(@rules)
248:             # Combine the hashes for each pattern into a single hash
249:             res.each do |pat_i|
250:               pat_i.each { |tok, r| transitions[tok] = r }
251:             end
252:           elsif '_$.'.include?(token[0])
253:             transitions[token] = rule
254:             allTokensOptional = false
255:           else
256:             raise 'Fatal Error: Illegal token type specifier used for token' +
257:                   ": #{tokenId}"
258:           end
259:           break unless allTokensOptional
260:         end
261:         # Make sure that we only have one possible transition for each
262:         # target.
263:         transitions.each do |key, value|
264:           rule.transitions.each do |trans|
265:             if trans.has_key?(key)
266:               rule.dump
267:               raise "Fatal Error: Rule #{rule.name} has ambiguous " +
268:                     "transitions for target #{key}"
269:             end
270:           end
271:         end
272:         rule.transitions << transitions
273:       end
274:       rule.transitions.dup
275:     end
parseRuleNR(rule) click to toggle source

This function processes the input starting with the syntax description of rule. It’s implemented as an unrolled recursion. It recursively iterates over the rule tree as controlled by the input file. This version is not limited by the size of the system stack. So far, I’m not aware of any project that is too large for the system stack. Since the recursive version parseRuleR is about 8% faster and has cleaner code, we use that by default.

     # File lib/TextParser.rb, line 387
387:     def parseRuleNR(rule)
388:       elementIdx = 0
389:       recursionResult = nil
390:       # These flags are used to managed the control flow to and from the
391:       # recursion point.
392:       recur = resume = false
393:       # The stack that holds the context for the recursion levels. It's either
394:       # just a rule to start a new recursion or an Array of state variables.
395:       recursionStack = [ rule ]
396:       begin
397:         # Pop the top entry from the recursion stack.
398:         se = recursionStack.pop
399:         if se.is_a?(Array)
400:           # We have essentially finished a recursion level and need to get
401:           # back to the place where we started the recursion. First, we need
402:           # to restore the state again.
403:           rule, pattern, elementIdx, result, repeatMode, sfi = se
404:           #Log << "Recursion loop started in resume mode for rule #{rule.name}"
405:           # Now jump to the recursion point without doing anything else.
406:           resume = true
407:         else
408:           # Start a new recursion level. The rule tells us how to interpret
409:           # the input text.
410:           rule = se
411:           #Log.enter('parseRuleNR', "Parsing with rule #{rule.name}")
412:           resume = false
413:         end
414: 
415:         unless resume
416:           result = rule.repeatable ? TextParserResultArray.new : nil
417:           # Rules can be marked 'repeatable'. This flag will be set to true
418:           # after the first iteration has been completed.
419:           repeatMode = false
420:         end
421: 
422:         loop do
423:           unless resume
424:             # At the beginning of a rule we need a token from the input to
425:             # determine which pattern of the rule needs to be processed.
426:             token = getNextToken
427: 
428:             break unless (pattern = findPattern(rule, token, repeatMode))
429:             # The @stack will store the resulting value of each element in the
430:             # pattern.
431:             @stack << TextParser::StackElement.new(pattern.function)
432: 
433:             # Once we've found the right pattern, we need to process each
434:             # element.
435:             elementIdx = 0
436:           end
437: 
438:           elementCount = pattern.length
439:           while elementIdx < elementCount
440:             element = pattern[elementIdx]
441:             # Separate the type and token text for pattern element.
442:             elType = element[0]
443:             elToken = element[1..1]
444:             if elType == !!
445:               unless resume
446:                 # The element is a reference to another rule. Return the token
447:                 # if we still have one and continue with the referenced rule.
448:                 if token
449:                   sfi = token[2]
450:                   returnToken(token)
451:                   token = nil
452:                 else
453:                   sfi = nil
454:                 end
455:                 # This is where the recursion would happen. Instead, we push
456:                 # the state variables and then the next rule onto the
457:                 # recursion stack.
458:                 recursionStack.push([ rule, pattern, elementIdx, result,
459:                                       repeatMode, sfi ])
460:                 recursionStack.push(@rules[elToken])
461:                 # Now terminate all but the outer loops without doing anything
462:                 # else.
463:                 recur = true
464:                 break
465:               else
466:                 # We're back right after where the recursion started. Store
467:                 # the result and turn resume mode off again.
468:                 @stack.last.store(recursionResult, sfi)
469:                 resume = false
470:               end
471:             else
472:               # In case the element is a keyword or variable we have to get a
473:               # new token if we don't have one anymore.
474:               token = getNextToken unless token
475: 
476:               processNormalElements(elType, elToken, token)
477: 
478:               # The token has been consumed. Reset the variable.
479:               token = nil
480:               @@expectedTokens = []
481:             end
482:             elementIdx += 1
483:           end # of pattern while loop
484: 
485:           # Skip the rest of the loop in recur mode.
486:           break if recur
487: 
488:           elementIdx = 0
489: 
490:           # Once the complete pattern has been processed we call the
491:           # processing function for this pattern to operate on the value
492:           # array. Then pop the entry for this rule from the stack. The
493:           # called function will use @val and @sourceFileInfo to retrieve
494:           # data from the parser.
495:           @val = @stack.last.val
496:           @sourceFileInfo = @stack.last.sourceFileInfo
497:           res = @stack.last.function ? @stack.last.function.call : nil
498:           @stack.pop
499: 
500:           # If the rule is not repeatable we can store the result and break
501:           # the outer loop to exit the function.
502:           unless rule.repeatable
503:             result = res
504:             break
505:           end
506: 
507:           # Otherwise we append the result to the result array and turn repeat
508:           # mode on.
509:           result << res
510:           # We have completed the first iteration. Set the repeat mode flag to
511:           # indicate that further iterations are already re-runs.
512:           repeatMode = true
513:         end # of rule processing loop
514: 
515:         if recur
516:           recur = false
517:         else
518:           #Log.exit('parseRuleNR', "Finished rule #{rule.name}")
519:           recursionResult = result
520:         end
521:       end while !recursionStack.empty?
522: 
523:       return result
524:     end
parseRuleR(rule) click to toggle source

This function processes the input starting with the syntax description of rule. It recursively calls this function whenever the syntax description contains the reference to another rule. This recursive version has cleaner code and is about 8% faster than parseRuleNR.

     # File lib/TextParser.rb, line 306
306:     def parseRuleR(rule)
307:       #Log.enter('parseRuleR', "Parsing with rule #{rule.name}")
308:       #puts "Parsing with rule #{rule.name}"
309:       result = rule.repeatable ? TextParserResultArray.new : nil
310:       # Rules can be marked 'repeatable'. This flag will be set to true after
311:       # the first iteration has been completed.
312:       repeatMode = false
313:       loop do
314:         # At the beginning of a rule we need a token from the input to determine
315:         # which pattern of the rule needs to be processed.
316:         token = getNextToken
317: 
318:         return result unless (pattern = findPattern(rule, token, repeatMode))
319:         # The @stack will store the resulting value of each element in the
320:         # pattern.
321:         @stack << TextParser::StackElement.new(pattern.function)
322: 
323:         pattern.each do |element|
324:           # Separate the type and token text for pattern element.
325:           elType = element[0]
326:           elToken = element[1..1]
327:           if elType == !!
328:             # The element is a reference to another rule. Return the token if
329:             # we still have one and continue with the referenced rule.
330:             unless token.nil?
331:               sfi = token[2]
332:               returnToken(token)
333:               token = nil
334:             else
335:               sfi = nil
336:             end
337:             @stack.last.store(parseRuleR(@rules[elToken]), sfi)
338:             #Log << "Resuming rule #{rule.name}"
339:             #puts "Resuming rule #{rule.name}"
340:           else
341:             # In case the element is a keyword or variable we have to get a new
342:             # token if we don't have one anymore.
343:             token = getNextToken unless token
344: 
345:             processNormalElements(elType, elToken, token)
346: 
347:             # The token has been consumed. Reset the variable.
348:             token = nil
349:             @@expectedTokens = []
350:           end
351:         end
352: 
353:         # Once the complete pattern has been processed we call the processing
354:         # function for this pattern to operate on the value array. Then pop the
355:         # entry for this rule from the stack.
356:         @val = @stack.last.val
357:         @sourceFileInfo = @stack.last.sourceFileInfo
358:         res = nil
359:         res = @stack.last.function.call unless @stack.last.function.nil?
360:         @stack.pop
361: 
362:         # If the rule is not repeatable we can store the result and break the
363:         # outer loop to exit the function.
364:         unless rule.repeatable
365:           result = res
366:           break
367:         end
368: 
369:         # Otherwise we append the result to the result array and turn repeat
370:         # mode on.
371:         result << res
372:         repeatMode = true
373:       end
374: 
375:       #Log.exit('parseRuleR', "Finished rule #{rule.name}")
376:       #puts "Finished rule #{rule.name}"
377:       return result
378:     end
processNormalElements(elType, elToken, token) click to toggle source

Handle the elements that don’t trigger a recursion.

     # File lib/TextParser.rb, line 587
587:     def processNormalElements(elType, elToken, token)
588:       if elType == __
589:         # If the element requires a keyword the token must match this
590:         # keyword.
591:         if elToken != token[1]
592:           text = "'#{elToken}' expected but found " +
593:                  "'#{token[1]}' (#{token[0]})."
594:           unless @@expectedTokens.empty?
595:             text = "#{@@expectedTokens.join(', ')} or " + text
596:           end
597:           error('spec_keywork_expctd', text, token[2])
598:         end
599:         @stack.last.store(elToken, token[2])
600:       elsif elType == ..
601:         if token[0..1] != [ '.', '<END>' ]
602:           error('end_expected',
603:                 "Found garbage at expected end of text: #{token[1]}\n" +
604:                 "If you see this in the middle of your text, you probably " +
605:                 "have closed your context too early.", token[2])
606:         end
607:       else
608:         # The token must match the expected variable type.
609:         if token[0] != elToken
610:           text = "'#{elToken}' expected but found " +
611:                  "'#{token[1]}' (#{token[0]})."
612:           unless @@expectedTokens.empty?
613:             text = "#{@@expectedTokens.join(', ')} or " + text
614:           end
615:           error('spec_token_expctd', text, token[2])
616:         end
617:         # If the element is a variable store the value of the token.
618:         @stack.last.store(token[1], token[2])
619:       end
620:     end

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.