encoding: UTF-8
Copyright © 2006, 2007, 2008, 2009, 2010, 2011
by Chris Schlaeger <chris@linux.com>
This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.
encoding: UTF-8
Copyright © 2006, 2007, 2008, 2009, 2010, 2011
by Chris Schlaeger <chris@linux.com>
This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.
The TextParser implements a somewhat modified LL(1) parser. It uses a dynamically compiled state machine. Dynamically means, that the syntax can be extended during the parse process. This allows support for languages that can extend their syntax during the parse process. The TaskJuggler syntax is such an beast.
This class is just a base class. A complete parser would derive from this class and implement the rule set and the functions _nextToken()_ and _returnToken()_. It also needs to set the array variables to declare all variables ($SOMENAME) that the scanner may deliver.
To describe the syntax the functions TextParser#pattern, TextParser#optional and TextParser#repeatable can be used. When the rule set is changed during parsing, TextParser#updateParserTables must be called to make the changes effective. The parser can also document the syntax automatically. To document a pattern, the functions TextParser#doc, TextParser#descr, TextParser#also and TextParser#arg can be used.
In contrast to conventional LL grammars, we use a slightly improved syntax descriptions. Repeated patterns are not described by recursive call but we use a repeat flag for syntax rules that consists of repeatable patterns. This removes the need for recursion elimination when compiling the state machine and makes the syntax a lot more readable. However, it adds a bit more complexity to the state machine. Optional patterns are described by a rule flag, not by adding an empty pattern.
To start parsing the input the function TextParser#parse needs to be called with the name of the start rule.
Create a new TextParser object.
# File lib/TextParser.rb, line 78 78: def initialize(messageHandler) 79: # The message handler will collect all error messages. 80: @messageHandler = messageHandler 81: # This Hash will store the ruleset that the parser is operating on. 82: @rules = { } 83: # Array to hold the token types that the scanner can return. 84: @variables = [] 85: # An list of token types that are not allowed in the current context. 86: # For performance reasons we use a hash with the token as key. The value 87: # is irrelevant. 88: @blockedVariables = {} 89: # The currently processed rule. 90: @cr = nil 91: 92: @states = {} 93: # The stack used by the FSM. 94: @stack = nil 95: end
# File lib/TextParser.rb, line 220 220: def error(id, text, sfi = nil, data = nil) 221: sfi ||= sourceFileInfo 222: if @scanner 223: # The scanner has some more context information, so we pass the error 224: # on to the TextScanner. 225: @scanner.error(id, text, sfi, data) 226: else 227: @messageHandler.error(id, text, sfi, data) 228: end 229: end
Call all methods that start with ‘rule_’ to initialize the rules.
# File lib/TextParser.rb, line 112 112: def initRules 113: methods.each do |m| 114: if m[0, 5] == 'rule_' 115: # Create a new rule with the suffix of the function name as name. 116: newRule(m[5..1]) 117: # Call the function. 118: send(m) 119: end 120: end 121: end
Limit the allowed tokens of the scanner to the subset passed by the tokenSet Array.
# File lib/TextParser.rb, line 99 99: def limitTokenSet(tokenSet) 100: return unless tokenSet 101: 102: # Create a copy of all supported variables. 103: blockedVariables = @variables.dup 104: # Then delete all that are in the limited set. 105: blockedVariables.delete_if { |v| tokenSet.include?(v) } 106: # And convert the list into a Hash for faster lookups. 107: @blockedVariables = {} 108: blockedVariables.each { |v| @blockedVariables[v] = true } 109: end
Add a new rule to the rule set. name must be a unique identifier. The function also sets the class variable @cr to the new rule. Subsequent calls to TextParser#pattern, TextParser#optional or TextParser#repeatable will then implicitely operate on the most recently added rule.
# File lib/TextParser.rb, line 128 128: def newRule(name) 129: # Use a symbol instead of a String. 130: name = name.intern 131: raise "Fatal Error: Rule #{name} already exists" if @rules.has_key?(name) 132: 133: if block_given? 134: saveCr = @cr 135: @rules[name] = @cr = TextParser::Rule.new(name) 136: yield 137: @cr = saveCr 138: else 139: @rules[name] = @cr = TextParser::Rule.new(name) 140: end 141: end
Identify the patterns of the most recently added rule as optional syntax elements.
# File lib/TextParser.rb, line 160 160: def optional 161: @cr.setOptional 162: end
To parse the input this function needs to be called with the name of the rule to start with. It returns the result of the processing function of the top-level parser rule that was specified by ruleName. In case of an error, the result is false.
# File lib/TextParser.rb, line 197 197: def parse(ruleName) 198: @stack = [] 199: @@expectedTokens = [] 200: begin 201: result = parseFSM(@rules[ruleName]) 202: rescue TjException => msg 203: if msg.message && !msg.message.empty? 204: @messageHandler.critical('parse', msg.message) 205: end 206: return false 207: end 208: 209: result 210: end
Add a new pattern to the most recently added rule. tokens is an array of strings that specify the syntax elements of the pattern. Each token must start with an character that identifies the type of the token. The following types are supported.
! a reference to another rule
$ a variable token as delivered by the scanner
_ a literal token.
func is a Proc object that is called whenever the parser has completed the processing of this rule.
# File lib/TextParser.rb, line 154 154: def pattern(tokens, func = nil) 155: @cr.addPattern(TextParser::Pattern.new(tokens, func)) 156: end
Identify the patterns of the most recently added rule as repeatable syntax elements.
# File lib/TextParser.rb, line 166 166: def repeatable 167: @cr.setRepeatable 168: end
Return the SourceFileInfo of the TextScanner at the beginning of the currently processed TextParser::Rule. Or return nil if we don’t have a current position.
# File lib/TextParser.rb, line 215 215: def sourceFileInfo 216: return @scanner.sourceFileInfo if @stack.nil? || @stack.length <= 1 217: @stack.last.firstSourceFileInfo 218: end
This function needs to be called whenever new rules or patterns have been added and before the next call to TextParser#parse. It’s perfectly ok to call this function from within a parse() call as long as the states that are currently on the stack have not been modified.
# File lib/TextParser.rb, line 174 174: def updateParserTables 175: saveFsmStack 176: # Invalidate some cached data. 177: @rules.each_value { |rule| rule.flushCache } 178: @states = {} 179: # Generate the parser states for all patterns of all rules. 180: @rules.each_value do |rule| 181: rule.generateStates.each do |s| 182: @states[[ s.rule, s.pattern, s.index ]] = s 183: end 184: checkRule(rule) 185: end 186: # Compute the transitions between the generated states. 187: @states.each_value do |state| 188: state.addTransitions(@states, @rules) 189: end 190: restoreFsmStack 191: end
# File lib/TextParser.rb, line 231 231: def warning(id, text, sfi = nil, data = nil) 232: sfi ||= sourceFileInfo 233: if @scanner 234: # The scanner has some more context information, so we pass the 235: # warning on to the TextScanner. 236: @scanner.warning(id, text, sfi, data) 237: else 238: @messageHandler.warning(id, text, sfi, data) 239: end 240: end
# File lib/TextParser.rb, line 244 244: def checkRule(rule) 245: if rule.patterns.empty? 246: raise "Rule #{rule.name} must have at least one pattern" 247: end 248: 249: rule.patterns.each do |pat| 250: pat.each do |type, name| 251: if type == :variable 252: if @variables.index(name).nil? 253: error('unsupported_token', 254: "The token #{name} is not supported here.") 255: end 256: elsif type == :reference 257: if @rules[name].nil? 258: raise "Fatal Error: Reference to unknown rule #{name} in " + 259: "pattern '#{pat}' of rule #{rule.name}" 260: end 261: end 262: end 263: end 264: end
# File lib/TextParser.rb, line 405 405: def dumpStack 406: #puts "Stack level #{@stack.length}" 407: @stack.each do |sl| 408: print "#{@stack.index(sl)}: " 409: sl.each do |v| 410: if v.is_a?(Array) 411: begin 412: print "[#{v.join('|')}]|" 413: rescue 414: print "[#{v[0].class}...]|" 415: end 416: else 417: begin 418: print "#{v}|" 419: rescue 420: print v.class 421: end 422: end 423: end 424: print " -> #{sl.state ? sl.state.to_s(true) : 'nil'} #{sl.function.nil? ? '' : '(Called)'}" 425: puts "" 426: end 427: end
# File lib/TextParser.rb, line 355 355: def finishPattern(token) 356: #dumpStack 357: # To finish a pattern we need to pop the StackElement with the token 358: # values from the stack. 359: stackEntry = @stack.pop 360: if stackEntry.nil? || @stack.empty? 361: # Check if we have reached the bottom of the stack. 362: token = getNextToken unless token 363: if token[0] == :endOfText 364: # If the token is the end of the top-level file, we're done. We push 365: # back the StackEntry since it holds the overall result of the 366: # parsing. 367: @stack.push(stackEntry) 368: return true 369: end 370: # If it's not the EOF token, we found a token that violates the syntax 371: # rules. 372: error('unexpctd_token', "Unexpected token '#{token[1]}' found. " + 373: "Expecting one of " + 374: "#{stackEntry.state.expectedTokens.join(', ')}") 375: end 376: # Memorize if the rule for this pattern was repeatable. Then we will 377: # store the result of the pattern in an Array. 378: ruleIsRepeatable = stackEntry.state.rule.repeatable 379: 380: state = stackEntry.state 381: result = nil 382: if state.pattern.function 383: # Make the token values and their SourceFileInfo available. 384: @val = stackEntry.val 385: @sourceFileInfo = stackEntry.sourceFileInfo 386: # Now call the pattern action to compute the value of the pattern. 387: begin 388: result = state.pattern.function.call 389: rescue AttributeOverwrite 390: @scanner.warning('attr_overwrite', $!.to_s) 391: end 392: end 393: 394: # We use the SourceFileInfo of the first token of the pattern to store 395: # it with the result of the pattern. 396: firstSourceFileInfo = stackEntry.firstSourceFileInfo 397: # Store the result at the correct position into the next lower level of 398: # the stack. 399: stackEntry = @stack.last 400: stackEntry.insert(stackEntry.state.index, result, 401: firstSourceFileInfo, ruleIsRepeatable) 402: false 403: end
# File lib/TextParser.rb, line 452 452: def getNextToken 453: token = nextToken 454: #Log << "Token: [#{token[0]}][#{token[1]}]" 455: if @blockedVariables[token[0]] 456: error('unsupported_token', 457: "The token #{token[1]} is not supported in this context.", 458: token[2]) 459: end 460: token 461: end
# File lib/TextParser.rb, line 266 266: def parseFSM(rule) 267: unless (state = @states[[ rule, nil, 0 ]]) 268: error("no_start_state", "No start state for rule #{rule.name} found") 269: end 270: @stack = [ TextParser::StackElement.new(nil, state) ] 271: 272: loop do 273: if state.transitions.empty? 274: # The final states of each pattern have no pre-compiled transitions. 275: # For such a state, we don't need to get a new token. 276: transition = token = nil 277: else 278: transition = state.transition(token = getNextToken) 279: end 280: 281: if transition 282: # Shift: This for normal state transitions. This may be from one 283: # token of a pattern to the next token of the same pattern, to the 284: # start of a new pattern or a loop-back to the start of a pattern of 285: # the same rule. The transition tells us what state we have to 286: # process next. 287: state = transition.state 288: 289: # If we have looped-back we need to finish the pattern first. Final 290: # tokens of repeatable rules do have transitions! 291: finishPattern(token) if transition.loopBack 292: 293: # Transitions that enter rules generate states which we need to 294: # resume at when a rule has been completely processed. We push this 295: # list of states on the @stack. 296: stackElement = @stack.last 297: first = true 298: transition.stateStack.each do |s| 299: if first && s.pattern == stackElement.state.pattern 300: # The first state in the list may just be another state of the 301: # current pattern. In this case, we already have the 302: # StackElement on the @stack. We only need to update the State 303: # for the current StackElement. 304: stackElement.state = s 305: else 306: # For other patterns, we just push a new StackElement onto the 307: # @stack. 308: @stack.push(TextParser::StackElement.new(nil, s)) 309: end 310: first = false 311: end 312: 313: if state.index == 0 314: # If we have just started with a new pattern (or loop-ed back) we 315: # need to push a new StackEntry onto the @stack. The StackEntry 316: # stores the result of the pattern and keeps the State that we 317: # need to return to in case we jump to other patterns from this 318: # pattern. 319: function = state.index == state.pattern.tokens.length - 1 ? 320: state.pattern.function : nil 321: @stack.push(TextParser::StackElement.new(state.pattern.function, 322: state)) 323: end 324: 325: # Store the token value in the result Array. 326: @stack.last.insert(state.index, token[1], token[2], false) 327: else 328: # Reduce: We've reached the end of a rule. There is no pre-compiled 329: # transition available. The current token, if we have one, is of no 330: # use to us during this state. We just return it to the scanner. The 331: # next state is determined by the first matching state from the 332: # @stack. 333: if state.noReduce 334: # Only states that finish a rule may trigger a reduce operation. 335: # Other states have the noReduce flag set. If a reduce for such a 336: # state is triggered, we found a token that is not supported by 337: # the syntax rules. 338: error("no_reduce", 339: "Unexpected token '#{token[1]}' found. " + 340: "Expecting one of " + 341: "#{@stack.last.state.expectedTokens.join(', ')}") 342: end 343: returnToken(token) if token 344: if finishPattern(token) 345: # Accept: We're done with parsing. 346: break 347: end 348: state = @stack.last.state 349: end 350: end 351: 352: @stack[0].val[0] 353: end
Convert the FSM stack state entries from [ rule, pattern, index ] into the respective State objects again.
# File lib/TextParser.rb, line 442 442: def restoreFsmStack 443: return unless @stack 444: 445: @stack.each do |s| 446: next unless (state = @states[s.state]) 447: raise "Stack restore failed. Cannot find state" unless state 448: s.state = state 449: end 450: end
Convert the FSM stack state entries from State objects into [ rule, pattern, index ] equivalents.
# File lib/TextParser.rb, line 431 431: def saveFsmStack 432: return unless @stack 433: 434: @stack.each do |s| 435: next unless (st = s.state) 436: s.state = [ st.rule, st.pattern, st.index ] 437: end 438: end
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.