The TextScanner class can scan text files and chop then into tokens to be used by a parser. Files can be nested. A file can include an other file.
Create a new instance of TextScanner. masterFile must be a String that either contains the name of the file to start with or the text itself. messageHandler is a MessageHandler that is used for error messages.
# File lib/TextScanner.rb, line 158 158: def initialize(masterFile, messageHandler) 159: @masterFile = masterFile 160: @messageHandler = messageHandler 161: # This table contains all macros that may be expanded when found in the 162: # text. 163: @macroTable = MacroTable.new(messageHandler) 164: # This Array stores the currently processed nested files. It's an Array 165: # of Arrays. The nested Array consists of 3 elements, the @cf, 166: # @tokenBuffer and the @pos of the file. 167: @fileStack = [] 168: # This Array stores the currently processed nested macros. 169: @macroStack = [] 170: # In certain situation we want to ignore Macro replacement and this flag 171: # is set to true. 172: @ignoreMacros = false 173: @fileNameIsBuffer = false 174: end
Add a Macro to the macro translation table.
# File lib/TextScanner.rb, line 347 347: def addMacro(macro) 348: @macroTable.add(macro) 349: end
Finish processing and reset all data structures.
# File lib/TextScanner.rb, line 194 194: def close 195: unless @fileNameIsBuffer 196: Log.startProgressMeter("Reading file #{@masterFile}") 197: Log.stopProgressMeter 198: end 199: @fileStack = [] 200: @cf = @tokenBuffer = nil 201: end
Call this function to report any errors related to the parsed input.
# File lib/TextScanner.rb, line 373 373: def error(id, text, property = nil) 374: message('error', id, text, property) 375: end
# File lib/TextScanner.rb, line 356 356: def expandMacro(args) 357: macro, text = @macroTable.resolve(args, sourceFileInfo) 358: return if text == '' 359: 360: if @macroStack.length > 20 361: error('macro_stack_overflow', "Too many nested macro calls.") 362: end 363: @macroStack << [ macro, args ] 364: # Mark end of macro with a 0 element 365: @cf.charBuffer << 0 366: text.reverse.each_utf8_char do |c| 367: @cf.charBuffer << c 368: end 369: @cf.line = '' 370: end
Return the name of the currently processed file. If we are working on a text buffer, the text will be returned.
# File lib/TextScanner.rb, line 247 247: def fileName 248: @cf ? @cf.fileName : @masterFile 249: end
Continue processing with a new file specified by fileName. When this file is finished, we will continue in the old file after the location where we started with the new file.
# File lib/TextScanner.rb, line 206 206: def include(fileName) 207: if fileName[0] != '/' 208: if @fileStack.empty? 209: path = @masterPath == './' ? '' : @masterPath 210: else 211: pathOfCallingFile = @fileStack.last[0].dirname 212: path = pathOfCallingFile.empty? ? '' : pathOfCallingFile + '/' 213: @fileStack.last[1, 2] = [ @tokenBuffer, @pos ] 214: end 215: # If the included file is not an absolute name, we interpret the file 216: # name relative to the including file. 217: fileName = path + fileName 218: end 219: 220: @tokenBuffer = nil 221: 222: # Check the current file stack to find recusions. 223: if @pos && fileName == @pos.fileName 224: error('include_recursion', "Recursive inclusion of #{fileName} detected") 225: else 226: @fileStack.each do |entry| 227: if fileName == entry[0].fileName 228: error('include_recursion', "Recursive inclusion of #{fileName} " + 229: "detected") 230: end 231: end 232: end 233: begin 234: @fileStack << [ (@cf = FileStreamHandle.new(fileName)), nil, nil ] 235: rescue StandardError 236: error('bad_include', "Cannot open include file #{fileName}") 237: end 238: end
Return true if the Macro name has been added already.
# File lib/TextScanner.rb, line 352 352: def macroDefined?(name) 353: @macroTable.include?(name) 354: end
# File lib/TextScanner.rb, line 381 381: def message(type, id, text, property) 382: unless text.empty? 383: message = Message.new(id, type, text + "\n" + line.to_s, 384: property, nil, sourceFileInfo) 385: @messageHandler.send(message) 386: 387: until @macroStack.empty? 388: macro, args = @macroStack.pop 389: args.collect! { |a| '"' + a + '"' } 390: message = Message.new('macro_stack', 'info', 391: " #{macro.name} #{args.join(' ')}", nil, nil, 392: macro.sourceFileInfo) 393: @messageHandler.send(message) 394: end 395: end 396: 397: # An empty strings signals an already reported error 398: raise TjException.new, '' if type == 'error' 399: end
Scan for the next token in the input stream and return it. The result will be an Array of the form [ TokenType, TokenValue ].
# File lib/TextScanner.rb, line 265 265: def nextToken 266: # If we have a pushed-back token, return that first. 267: unless @tokenBuffer.nil? 268: res = @tokenBuffer 269: @tokenBuffer = nil 270: @pos = @tokenBufferPos 271: return res 272: end 273: 274: # Start processing characters from the input. 275: @startOfToken = SourceFileInfo.new(fileName, lineNo, columnNo) 276: token = [ '.', '<END>' ] 277: while c = nextChar 278: case c 279: when ' ', "\n", "\t" 280: if (tok = readBlanks(c)) 281: token = tok 282: break 283: end 284: @startOfToken = SourceFileInfo.new(fileName, lineNo, columnNo) 285: when '#' 286: skipComment 287: @startOfToken = SourceFileInfo.new(fileName, lineNo, columnNo) 288: when '/' 289: skipCPlusPlusComments 290: @startOfToken = SourceFileInfo.new(fileName, lineNo, columnNo) 291: when '0'..'9' 292: token = readNumber(c) 293: break 294: when "'" 295: token = readString(c) 296: break 297: when '"' 298: token = readString(c) 299: break 300: when '-' 301: token = handleDash 302: break 303: when '!' 304: if (c = nextChar) == '=' 305: token = [ 'LITERAL', '!=' ] 306: else 307: returnChar(c) 308: token = [ 'LITERAL', '!' ] 309: end 310: break 311: when 'a'..'z', 'A'..'Z', '_' 312: token = readId(c) 313: break 314: when '<', '>', '=' 315: token = readOperator(c) 316: break 317: when '[' 318: token = readMacro 319: break 320: when nil 321: # We've reached an end of file or buffer 322: break 323: else 324: str = "" 325: str << c 326: token = [ 'LITERAL', str ] 327: break 328: end 329: end 330: @pos = @startOfToken 331: return token 332: end
Start the processing. if fileNameIsBuffer is true, we operate on a String, else on a File.
# File lib/TextScanner.rb, line 178 178: def open(fileNameIsBuffer = false) 179: @fileNameIsBuffer = fileNameIsBuffer 180: if fileNameIsBuffer 181: @fileStack = [ [ @cf = BufferStreamHandle.new(@masterFile), nil, nil ] ] 182: else 183: begin 184: @fileStack = [ [ @cf = FileStreamHandle.new(@masterFile), nil, nil ] ] 185: rescue StandardError 186: raise TjException.new, "Cannot open file #{@masterFile}" 187: end 188: end 189: @masterPath = @cf.dirname + '/' 190: @tokenBuffer = @pos = @lastPos = nil 191: end
Return a token to retrieve it with the next nextToken() call again. Only 1 token can be returned before the next nextToken() call.
# File lib/TextScanner.rb, line 336 336: def returnToken(token) 337: unless @tokenBuffer.nil? 338: $stderr.puts @tokenBuffer 339: raise "Fatal Error: Cannot return more than 1 token in a row" 340: end 341: @tokenBuffer = token 342: @tokenBufferPos = @pos 343: @pos = @lastPos 344: end
Return SourceFileInfo for the current processing prosition.
# File lib/TextScanner.rb, line 241 241: def sourceFileInfo 242: @pos ? @pos.dup : SourceFileInfo.new(fileName, lineNo, columnNo) 243: end
# File lib/TextScanner.rb, line 894 894: def errorEOF(no, token) 895: error("eof_in_istring#{no}", 896: "Unexpected end of file in string '#{token[0,20]}...'" + 897: "starting in line #{@startOfToken.lineNo}.") 898: end
# File lib/TextScanner.rb, line 524 524: def handleDash 525: if (c1 = nextChar) == '8' 526: if (c2 = nextChar) == '<' 527: if (c3 = nextChar) == '-' 528: return readIndentedString 529: else 530: returnChar(c3) 531: returnChar(c2) 532: returnChar(c1) 533: end 534: else 535: returnChar(c2) 536: returnChar(c1) 537: end 538: else 539: returnChar(c1) 540: end 541: return [ 'LITERAL', ' - '] 542: end
This function is called by the scanner to get the next character. It features a FIFO buffer that can hold any amount of returned characters. When it has reached the end of the master file it returns nil.
# File lib/TextScanner.rb, line 406 406: def nextChar 407: # We've started to find the next token. @pos no longer marks the 408: # position of the current token. Since we also store the EOF position in 409: # @pos, don't reset it if we have processed all files completely. 410: if @pos && @cf 411: @lastPos = @pos 412: @pos = nil 413: end 414: 415: if (c = nextCharI) == '$' && !@ignoreMacros 416: # Double $ are reduced to a single $. 417: return c if (c = nextCharI) == '$' 418: 419: # Macros start with $( or ${. All other $. are ignored. 420: if c != '(' && c != '{' 421: returnChar(c) 422: return '$' 423: end 424: 425: @ignoreMacros = true 426: returnChar(c) 427: macroParser = MacroParser.new(self, @messageHandler) 428: begin 429: macroParser.parse('macroCall') 430: rescue TjException 431: end 432: @ignoreMacros = false 433: return nextCharI 434: else 435: return c 436: end 437: end
# File lib/TextScanner.rb, line 439 439: def nextCharI 440: # This can only happen when a previous call already returned nil. 441: return nil if @cf.nil? 442: 443: c = nil 444: # If there are characters in the return buffer process them first. 445: # Otherwise get next character from input stream. 446: unless @cf.charBuffer.empty? 447: c = @cf.charBuffer.pop 448: @cf.lineNo -= 1 if c == "\n" && !@macroStack.empty? 449: while !@cf.charBuffer.empty? && @cf.charBuffer[1] == 0 450: @cf.charBuffer.pop 451: @macroStack.pop 452: end 453: else 454: # If EOF has been reached, try the parent file until even the master 455: # file has been processed completely. 456: if (c = @cf.getc).nil? 457: # Safe current position so an EOF related error can be properly 458: # reported. 459: @pos = sourceFileInfo 460: 461: @cf.close 462: @fileStack.pop 463: if @fileStack.empty? 464: # We are done with the top-level file now. 465: @cf = @tokenBuffer = nil 466: else 467: @cf, @tokenBuffer, @lastPos = @fileStack.last 468: Log << "Parsing file #{@cf.fileName} ..." 469: # We have been called by nextToken() already, so we can't just 470: # restore @tokenBuffer and be done. We need to feed the token text 471: # back into the charBuffer and return the first character. 472: if @tokenBuffer 473: @tokenBuffer[1].reverse.each_utf8_char do |ch| 474: @cf.charBuffer.push(ch) 475: end 476: @tokenBuffer = nil 477: end 478: end 479: return nil 480: end 481: end 482: unless c.nil? 483: @cf.lineNo += 1 if c == "\n" 484: @cf.line = "" if @cf.line[1] == \n\ 485: @cf.line << c 486: end 487: c 488: end
# File lib/TextScanner.rb, line 544 544: def readBlanks(c) 545: loop do 546: if c == ' ' 547: if (c2 = nextChar) == '-' 548: # Special case for the dash between period dates. It must be 549: # surrounded by blanks. 550: if (c3 = nextChar) == ' ' 551: return [ 'LITERAL', ' - '] 552: end 553: returnChar(c3) 554: end 555: returnChar(c2) 556: elsif c != "\n" && c != "\t" 557: returnChar(c) 558: return nil 559: end 560: c = nextChar 561: end 562: end
# File lib/TextScanner.rb, line 598 598: def readDate(token) 599: year = token.to_i 600: if year < 1970 || year > 2030 601: raise TjException.new, "Year must be between 1970 and 2030" 602: end 603: 604: month = readDigits.to_i 605: if month < 1 || month > 12 606: raise TjException.new, "Month must be between 1 and 12" 607: end 608: if nextChar != '-' 609: raise TjException.new, "Corrupted date" 610: end 611: 612: day = readDigits.to_i 613: if day < 1 || day > 31 614: raise TjException.new, "Day must be between 1 and 31" 615: end 616: 617: if (c = nextChar) != '-' 618: returnChar(c) 619: return [ 'DATE', TjTime.local(year, month, day) ] 620: end 621: 622: hour = readDigits.to_i 623: if hour < 0 || hour > 23 624: raise TjException.new, "Hour must be between 0 and 23" 625: end 626: 627: if nextChar != ':' 628: raise TjException.new, "Corrupted time. ':' expected." 629: end 630: 631: minutes = readDigits.to_i 632: if minutes < 0 || minutes > 59 633: raise TjException.new, "Minutes must be between 0 and 59" 634: end 635: 636: if (c = nextChar) == ':' 637: seconds = readDigits.to_i 638: if seconds < 0 || seconds > 59 639: raise TjException.new, "Seconds must be between 0 and 59" 640: end 641: else 642: seconds = 0 643: returnChar(c) 644: end 645: 646: if (c = nextChar) != '-' 647: returnChar(c) 648: return [ 'DATE', TjTime.local(year, month, day, hour, minutes, seconds) ] 649: end 650: 651: if (c = nextChar) == '-' 652: delta = 1 653: elsif c == '+' 654: delta = 1 655: else 656: # An actual time zone name 657: tz = readId(c)[1] 658: oldTz = ENV['TZ'] 659: ENV['TZ'] = tz 660: timeVal = TjTime.local(year, month, day, hour, minutes, seconds) 661: ENV['TZ'] = oldTz 662: if timeVal.to_a[9] != tz 663: raise TjException.new, "Unknown time zone #{tz}" 664: end 665: return [ 'DATE', timeVal ] 666: end 667: 668: utcDiff = readDigits 669: utcHour = utcDiff[0, 2].to_i 670: if utcHour < 0 || utcHour > 23 671: raise TjException.new, "Hour must be between 0 and 23" 672: end 673: utcMin = utcDiff[2, 2].to_i 674: if utcMin < 0 || utcMin > 59 675: raise TjException.new, "Minutes must be between 0 and 59" 676: end 677: 678: [ 'DATE', TjTime.gm(year, month, day, hour, minutes, seconds) + 679: delta * ((utcHour * 3600) + utcMin * 60) ] 680: end
Read only decimal digits and return the result als Fixnum.
# File lib/TextScanner.rb, line 865 865: def readDigits 866: token = "" 867: while ('0'..'9') === (c = nextChar) 868: token << c 869: end 870: # Make sure that we have read at least one digit. 871: if token == "" 872: raise TjException.new, "Digit (0 - 9) expected" 873: end 874: # Push back the non-digit that terminated the digits. 875: returnChar(c) 876: token 877: end
# File lib/TextScanner.rb, line 801 801: def readId(c) 802: token = "" 803: token << c 804: while (c = nextChar) && 805: (('a'..'z') === c || ('A'..'Z') === c || ('0'..'9') === c || 806: c == '_') 807: token << c 808: end 809: if c == ':' 810: return [ 'ID_WITH_COLON', token ] 811: elsif c == '.' 812: token << c 813: loop do 814: token += readIdentifier 815: break if (c = nextChar) != '.' 816: token += '.' 817: end 818: returnChar c 819: 820: return [ 'ABSOLUTE_ID', token ] 821: else 822: returnChar c 823: return [ 'ID', token ] 824: end 825: end
# File lib/TextScanner.rb, line 879 879: def readIdentifier(noDigit = true) 880: token = "" 881: while (c = nextChar) && 882: (('a'..'z') === c || ('A'..'Z') === c || 883: (!noDigit && (('0'..'9') === c)) || c == '_') 884: token << c 885: noDigit = false 886: end 887: returnChar(c) 888: if token == "" 889: raise TjException.new, "Identifier expected" 890: end 891: token 892: end
# File lib/TextScanner.rb, line 698 698: def readIndentedString 699: state = 0 700: indent = '' 701: token = '' 702: while true 703: case state 704: when 0 # Determining indent 705: # Skip trailing spaces and tabs. 706: while (c = nextChar) == ' ' || c == "\t" do 707: # empty on purpose 708: end 709: if c != "\n" 710: error('junk_after_cut', 711: 'The cut mark -8<- must be immediately followed by a ' + 712: 'line break.') 713: end 714: while (c = nextChar) == ' ' || c == "\t" 715: indent << c 716: end 717: @startOfToken = SourceFileInfo.new(fileName, lineNo, columnNo) 718: returnChar(c) 719: state = 1 720: when 1 # reading '-' or first content line character 721: if (c = nextChar) == '-' 722: state = 3 723: elsif c.nil? 724: errorEOF(1, token) 725: elsif c == "\n" 726: token << c 727: state = 6 728: else 729: token << c 730: state = 2 731: end 732: when 2 # reading content line 733: # The '->8-' is only valid if no other content preceded it on this 734: # line. 735: onlyBlanks = true 736: while (c = nextChar) != "\n" && !(c == '-' && onlyBlanks) 737: onlyBlanks = false if c != ' ' && c != "\t" 738: errorEOF(2, token) if c.nil? 739: token << c 740: end 741: if c == '-' 742: # we may have found the start of '->8-' 743: state = 3 744: else 745: token << c 746: state = 6 747: end 748: when 3 # reading '>' of '->8-' 749: if (c = nextChar) == '>' 750: state = 4 751: else 752: errorEOF(3, token) if c.nil? 753: token << '-' 754: token << c 755: state = 2 756: end 757: when 4 # reading '8' of '->8-' 758: if (c = nextChar) == '8' 759: state = 5 760: else 761: errorEOF(4, token) if c.nil? 762: token << c 763: state = 2 764: end 765: when 5 # reading '-' of '->8-' 766: if (c = nextChar) == '-' 767: return [ 'STRING', token ] 768: else 769: errorEOF(5, token) if c.nil? 770: token << c 771: state = 2 772: end 773: when 6 # reading indentation 774: state = 1 775: indent.each_utf8_char do |ci| 776: if ci != (c = nextChar) 777: if c == '-' 778: state = 3 779: break 780: elsif c == "\n" 781: returnChar(c) 782: break 783: else 784: warning('bad_indent', 785: "Not all lines of string have same indentation. " + 786: "The first line of the string determines the " + 787: "indentation for all subsequent lines of the same " + 788: "string.") 789: token << c 790: state = 2 791: break 792: end 793: end 794: end 795: else 796: raise "State machine error" 797: end 798: end 799: end 800: 801: def readId(c) 802: token = "" 803: token << c 804: while (c = nextChar) && 805: (('a'..'z') === c || ('A'..'Z') === c || ('0'..'9') === c || 806: c == '_') 807: token << c 808: end 809: if c == ':' 810: return [ 'ID_WITH_COLON', token ] 811: elsif c == '.' 812: token << c 813: loop do 814: token += readIdentifier 815: break if (c = nextChar) != '.' 816: token += '.' 817: end 818: returnChar c 819: 820: return [ 'ABSOLUTE_ID', token ] 821: else 822: returnChar c 823: return [ 'ID', token ] 824: end 825: end 826: 827: def readMacro 828: # We can deal with ']' inside of the macro as long as each ']' is 829: # preceeded by a corresponding '['. 830: token = '' 831: bracketLevel = 1 832: while bracketLevel > 0 833: case (c = nextCharI) 834: when nil 835: error('unterminated_macro', "Unterminated macro #{token}") 836: when '[' 837: bracketLevel += 1 838: when ']' 839: bracketLevel -= 1 840: end 841: token << c unless bracketLevel == 0 842: end 843: return [ 'MACRO', token ] 844: end 845: 846: # Read operators of logical expressions. 847: def readOperator(c) 848: case c 849: when '=' 850: return [ 'LITERAL', '=' ] 851: when '>' 852: return [ 'LITERAL', '>=' ] if (c = nextChar) == '=' 853: returnChar(c) 854: return [ 'LITERAL', '>' ] 855: when '<' 856: return [ 'LITERAL', '<=' ] if (c = nextChar) == '=' 857: returnChar(c) 858: return [ 'LITERAL', '<' ] 859: else 860: raise "Unsupported operator #{c}" 861: end 862: end 863: 864: # Read only decimal digits and return the result als Fixnum. 865: def readDigits 866: token = "" 867: while ('0'..'9') === (c = nextChar) 868: token << c 869: end 870: # Make sure that we have read at least one digit. 871: if token == "" 872: raise TjException.new, "Digit (0 - 9) expected" 873: end 874: # Push back the non-digit that terminated the digits. 875: returnChar(c) 876: token 877: end 878: 879: def readIdentifier(noDigit = true) 880: token = "" 881: while (c = nextChar) && 882: (('a'..'z') === c || ('A'..'Z') === c || 883: (!noDigit && (('0'..'9') === c)) || c == '_') 884: token << c 885: noDigit = false 886: end 887: returnChar(c) 888: if token == "" 889: raise TjException.new, "Identifier expected" 890: end 891: token 892: end 893: 894: def errorEOF(no, token) 895: error("eof_in_istring#{no}", 896: "Unexpected end of file in string '#{token[0,20]}...'" + 897: "starting in line #{@startOfToken.lineNo}.") 898: end 899: 900: end
# File lib/TextScanner.rb, line 827 827: def readMacro 828: # We can deal with ']' inside of the macro as long as each ']' is 829: # preceeded by a corresponding '['. 830: token = '' 831: bracketLevel = 1 832: while bracketLevel > 0 833: case (c = nextCharI) 834: when nil 835: error('unterminated_macro', "Unterminated macro #{token}") 836: when '[' 837: bracketLevel += 1 838: when ']' 839: bracketLevel -= 1 840: end 841: token << c unless bracketLevel == 0 842: end 843: return [ 'MACRO', token ] 844: end
# File lib/TextScanner.rb, line 564 564: def readNumber(c) 565: token = "" 566: token << c 567: while ('0'..'9') === (c = nextChar) 568: token << c 569: end 570: if c == '-' 571: return readDate(token) 572: elsif c == '.' 573: frac = readDigits 574: 575: return [ 'FLOAT', token.to_f + frac.to_f / (10.0 ** frac.length) ] 576: elsif c == ':' 577: hours = token.to_i 578: mins = readDigits.to_i 579: if hours < 0 || hours > 24 580: raise TjException.new, "Hour must be between 0 and 23" 581: end 582: if mins < 0 || mins > 59 583: raise TjException.new, "Minutes must be between 0 and 59" 584: end 585: if hours == 24 && mins != 0 586: raise TjException.new, "Time may not be larger than 24:00" 587: end 588: 589: # Return time as seconds of day since midnight. 590: return [ 'TIME', hours * 60 * 60 + mins * 60 ] 591: else 592: returnChar(c) 593: end 594: 595: [ 'INTEGER', token.to_i ] 596: end
Read operators of logical expressions.
# File lib/TextScanner.rb, line 847 847: def readOperator(c) 848: case c 849: when '=' 850: return [ 'LITERAL', '=' ] 851: when '>' 852: return [ 'LITERAL', '>=' ] if (c = nextChar) == '=' 853: returnChar(c) 854: return [ 'LITERAL', '>' ] 855: when '<' 856: return [ 'LITERAL', '<=' ] if (c = nextChar) == '=' 857: returnChar(c) 858: return [ 'LITERAL', '<' ] 859: else 860: raise "Unsupported operator #{c}" 861: end 862: end
# File lib/TextScanner.rb, line 682 682: def readString(terminator) 683: token = "" 684: while (c = nextChar) && c != terminator 685: if c == "\\" 686: # Terminators can be used as regular characters when prefixed by a \. 687: if (c = nextChar) && c != terminator 688: # \ followed by non-terminator. Just add both. 689: token << "\\" 690: end 691: end 692: token << c 693: end 694: 695: [ 'STRING', token ] 696: end
# File lib/TextScanner.rb, line 490 490: def returnChar(c) 491: return if @cf.nil? 492: 493: @cf.line.chop! if c 494: @cf.charBuffer << c 495: @cf.lineNo -= 1 if c == "\n" && @macroStack.empty? 496: end
# File lib/TextScanner.rb, line 507 507: def skipCPlusPlusComments 508: if (c = nextChar) == '*' 509: # /* */ style multi-line comment 510: @ignoreMacros = true 511: begin 512: while (c = nextChar) != '*' 513: end 514: end until (c = nextChar) == '/' 515: @ignoreMacros = false 516: elsif c == '/' 517: # // style single line comment 518: skipComment 519: else 520: error('bad_comment', "'/' or '*' expected after start of comment") 521: end 522: end
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.