The TextScanner class is an abstract text scanner with support for nested include files and text macros. The tokenizer will operate on rules that must be provided by a derived class. The scanner is modal. Each mode operates only with the subset of token patterns that are assigned to the current mode. The current line is tracked accurately and can be used for error reporting. The scanner can operate on Strings or Files.
Create a new instance of TextScanner. masterFile must be a String that either contains the name of the file to start with or the text itself. messageHandler is a MessageHandler that is used for error messages.
# File lib/TextScanner.rb, line 188 188: def initialize(masterFile, messageHandler, tokenPatterns, defaultMode) 189: @masterFile = masterFile 190: @messageHandler = messageHandler 191: # This table contains all macros that may be expanded when found in the 192: # text. 193: @macroTable = MacroTable.new(messageHandler) 194: # The currently processed IO object. 195: @cf = nil 196: # This Array stores the currently processed nested files. It's an Array 197: # of Arrays. The nested Array consists of 2 elements, the IO object and 198: # the @tokenBuffer. 199: @fileStack = [] 200: # This flag is set if we have reached the end of a file. Since we will 201: # only know when the next new token is requested that the file is really 202: # done now, we have to use this flag. 203: @finishLastFile = false 204: # True if the scanner operates on a buffer. 205: @fileNameIsBuffer = false 206: # A SourceFileInfo of the start of the currently processed token. 207: @startOfToken = nil 208: # Line number correction for error messages. 209: @lineDelta = 0 210: # Lists of regexps that describe the detectable tokens. The Arrays are 211: # grouped by mode. 212: @patternsByMode = { } 213: # The currently active scanner mode. 214: @scannerMode = nil 215: # Points to the currently active pattern set as defined by the mode. 216: @activePatterns = nil 217: 218: tokenPatterns.each do |pat| 219: type = pat[0] 220: regExp = pat[1] 221: mode = pat[2] || :tjp 222: postProc = pat[3] 223: addPattern(type, regExp, mode, postProc) 224: end 225: self.mode = defaultMode 226: end
Add a Macro to the macro translation table.
# File lib/TextScanner.rb, line 440 440: def addMacro(macro) 441: @macroTable.add(macro) 442: end
Add a new pattern to the scanner. type is either nil for tokens that will be ignored, or some identifier that will be returned with each token of this type. regExp is the RegExp that describes the token. mode identifies the scanner mode where the pattern is active. If it’s only a single mode, mode specifies the mode directly. For multiple modes, it’s an Array of modes. postProc is a method reference. This method is called after the token has been detected. The method gets the type and the matching String and returns them again in an Array.
# File lib/TextScanner.rb, line 236 236: def addPattern(type, regExp, mode, postProc = nil) 237: if mode.is_a?(Array) 238: mode.each do |m| 239: # The pattern is active in multiple modes 240: @patternsByMode[m] = [] unless @patternsByMode.include?(m) 241: @patternsByMode[m] << [ type, regExp, postProc ] 242: end 243: else 244: # The pattern is only active in one specific mode. 245: @patternsByMode[mode] = [] unless @patternsByMode.include?(mode) 246: @patternsByMode[mode] << [ type, regExp, postProc ] 247: end 248: end
Finish processing and reset all data structures.
# File lib/TextScanner.rb, line 278 278: def close 279: unless @fileNameIsBuffer 280: Log.startProgressMeter("Reading file #{@masterFile}") 281: Log.stopProgressMeter 282: end 283: @fileStack = [] 284: @cf = @tokenBuffer = nil 285: end
Call this function to report any errors related to the parsed input.
# File lib/TextScanner.rb, line 469 469: def error(id, text, sfi = nil, data = nil) 470: message(:error, id, text, sfi, data) 471: end
Expand a macro and inject it into the input stream. prefix is any string that was found right before the macro call. We have to inject it before the expanded macro. args is an Array of Strings. The first is the macro name, the rest are the parameters.
# File lib/TextScanner.rb, line 453 453: def expandMacro(prefix, args) 454: # Get the expanded macro from the @macroTable. 455: macro, text = @macroTable.resolve(args, sourceFileInfo) 456: unless macro && text 457: error('undefined_macro', "Undefined macro '#{args[0]}' called") 458: end 459: 460: # If the expanded macro is empty, we can ignore it. 461: return if text == '' 462: 463: unless @cf.injectMacro(macro, args, prefix + text) 464: error('macro_stack_overflow', "Too many nested macro calls.") 465: end 466: end
Return the name of the currently processed file. If we are working on a text buffer, the text will be returned.
# File lib/TextScanner.rb, line 334 334: def fileName 335: @cf ? @cf.fileName : @masterFile 336: end
Continue processing with a new file specified by includeFileName. When this file is finished, we will continue in the old file after the location where we started with the new file. The method returns the full qualified name of the included file.
# File lib/TextScanner.rb, line 291 291: def include(includeFileName, sfi) 292: if includeFileName[0] != '/' 293: pathOfCallingFile = @fileStack.last[0].dirname 294: path = pathOfCallingFile.empty? ? '' : pathOfCallingFile + '/' 295: # If the included file is not an absolute name, we interpret the file 296: # name relative to the including file. 297: includeFileName = path + includeFileName 298: end 299: 300: # Try to dectect recursive inclusions. This will not work if files are 301: # accessed via filesystem links. 302: @fileStack.each do |entry| 303: if includeFileName == entry[0].fileName 304: error('include_recursion', 305: "Recursive inclusion of #{includeFileName} detected", sfi) 306: end 307: end 308: 309: # Save @tokenBuffer in the record of the parent file. 310: @fileStack.last[1] = @tokenBuffer unless @fileStack.empty? 311: @tokenBuffer = nil 312: @finishLastFile = false 313: 314: # Open the new file and push the handle on the @fileStack. 315: begin 316: @fileStack << [ (@cf = FileStreamHandle.new(includeFileName)), nil, ] 317: Log << "Parsing file #{includeFileName}" 318: rescue StandardError 319: error('bad_include', "Cannot open include file #{includeFileName}", sfi) 320: end 321: 322: # Return the name of the included file. 323: includeFileName 324: end
Return true if the Macro name has been added already.
# File lib/TextScanner.rb, line 445 445: def macroDefined?(name) 446: @macroTable.include?(name) 447: end
Switch the parser to another mode. The scanner will then only detect patterns of that newMode.
# File lib/TextScanner.rb, line 252 252: def mode=(newMode) 253: #puts "**** New mode: #{newMode}" 254: @activePatterns = @patternsByMode[newMode] 255: raise "Undefined mode #{newMode}" unless @activePatterns 256: @scannerMode = newMode 257: end
Return the next token from the input stream. The result is an Array with 3 entries: the token type, the token String and the SourceFileInfo where the token started.
# File lib/TextScanner.rb, line 353 353: def nextToken 354: # If we have a pushed-back token, return that first. 355: unless @tokenBuffer.nil? 356: res = @tokenBuffer 357: @tokenBuffer = nil 358: return res 359: end 360: 361: if @finishLastFile 362: # The previously processed file has now really been processed to 363: # completion. Close it and remove the corresponding entry from the 364: # @fileStack. 365: @finishLastFile = false 366: #Log << "Completed file #{@cf.fileName}" 367: @cf.close if @cf 368: @fileStack.pop 369: 370: if @fileStack.empty? 371: # We are done with the top-level file now. 372: @cf = @tokenBuffer = nil 373: @finishLastFile = true 374: return [ :endOfText, '<EOT>', @startOfToken ] 375: else 376: # Continue parsing the file that included the current file. 377: @cf, tokenBuffer = @fileStack.last 378: Log << "Parsing file #{@cf.fileName} ..." 379: # If we have a left over token from previously processing this file, 380: # return it now. 381: if tokenBuffer 382: @finishLastFile = true if tokenBuffer[0] == :eof 383: return tokenBuffer 384: end 385: end 386: end 387: 388: # Start processing characters from the input. 389: @startOfToken = sourceFileInfo 390: loop do 391: match = nil 392: begin 393: @activePatterns.each do |type, re, postProc| 394: if (match = @cf.scan(re)) 395: if match == :scannerEOF 396: # We've found the end of an input file. Return a special token 397: # that describes the end of a file. 398: @finishLastFile = true 399: return [ :eof, '<END>', @startOfToken ] 400: end 401: 402: raise "#{re} matches empty string" if match.empty? 403: # If we have a post processing method, call it now. It may modify 404: # the type or the found token String. 405: type, match = postProc.call(type, match) if postProc 406: 407: break if type.nil? # Ignore certain tokens with nil type. 408: 409: return [ type, match, @startOfToken ] 410: end 411: end 412: rescue ArgumentError 413: error('scan_encoding_error', $!.to_s) 414: end 415: 416: if match.nil? 417: if @cf.eof? 418: error('unexpected_eof', 419: "Unexpected end of file found") 420: else 421: error('no_token_match', 422: "Unexpected characters found: '#{@cf.peek(10)}...'") 423: end 424: end 425: end 426: end
Start the processing. if fileNameIsBuffer is true, we operate on a String, else on a File.
# File lib/TextScanner.rb, line 262 262: def open(fileNameIsBuffer = false) 263: @fileNameIsBuffer = fileNameIsBuffer 264: if fileNameIsBuffer 265: @fileStack = [ [ @cf = BufferStreamHandle.new(@masterFile), nil ] ] 266: else 267: begin 268: @fileStack = [ [ @cf = FileStreamHandle.new(@masterFile), nil ] ] 269: rescue StandardError 270: error('open_file', "Cannot open file #{@masterFile}") 271: end 272: end 273: @masterPath = @cf.dirname + '/' 274: @tokenBuffer = nil 275: end
Return a token to retrieve it with the next nextToken() call again. Only 1 token can be returned before the next nextToken() call.
# File lib/TextScanner.rb, line 430 430: def returnToken(token) 431: #Log << "-> Returning Token: [#{token[0]}][#{token[1]}]" 432: unless @tokenBuffer.nil? 433: $stderr.puts @tokenBuffer 434: raise "Fatal Error: Cannot return more than 1 token in a row" 435: end 436: @tokenBuffer = token 437: end
Return SourceFileInfo for the current processing prosition.
# File lib/TextScanner.rb, line 327 327: def sourceFileInfo 328: @cf ? SourceFileInfo.new(fileName, @cf.lineNo - @lineDelta, 0) : 329: SourceFileInfo.new(@masterFile, 0, 0) 330: end
# File lib/TextScanner.rb, line 479 479: def message(type, id, text, sfi, data) 480: unless text.empty? 481: line = @cf ? @cf.line : nil 482: sfi ||= sourceFileInfo 483: 484: if @cf && !@cf.macroStack.empty? 485: @messageHandler.info('macro_stack', 'Macro call history:', nil) 486: 487: @cf.macroStack.reverse_each do |entry| 488: macro = entry.macro 489: args = entry.args[1..1] 490: args.collect! { |a| '"' + a + '"' } 491: @messageHandler.info('macro_stack', 492: " ${#{macro.name} #{args.join(' ')}}", 493: macro.sourceFileInfo) 494: end 495: end 496: 497: case type 498: when :error 499: @messageHandler.error(id, text, sfi, line, data) 500: when :warning 501: @messageHandler.warning(id, text, sfi, line, data) 502: else 503: raise "Unknown message type #{type}" 504: end 505: end 506: end
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.