-=RubyLexer 0.6.2=- RubyLexer is a lexer library for Ruby, written in Ruby. My goal with Rubylexer was to create a lexer for Ruby that's complete and correct; all legal Ruby code should be lexed correctly by RubyLexer as well. Just enough parsing capability is included to give RubyLexer enough context to tokenize correctly in all cases. (This turned out to be more parsing than I had thought or wanted to take on at first.) Other Ruby lexers exist, but most are inadequate. For instance, irb has it's own little lexer, as does, (I believe) RDoc, so do all the ide's that can colorize. I've seen several stand-alone libraries as well. All or almost all suffer from the same problems: they skip the hard part of lexing. RubyLexer handles the hard things like complicated strings, the ambiguous nature of some punctuation characters and keywords in ruby, and distinguishing methods and local variables. RubyLexer is not particularly clean code. As I progressed in writing this, I've learned a little about how these things are supposed to be done; the lexer is not supposed to have any state of it's own, instead it gets whatever it needs to know from the parser. As a stand-alone lexer, Rubylexer maintains quite a lot of state. Every instance variable in the RubyLexer class is some sort of lexer state. Most of the complication and ugly code in RubyLexer is in maintaining or using this state. For information about using RubyLexer in your program, please see howtouse.txt. For my notes on the testing of RubyLexer, see testing.txt. If you have any questions, comments, problems, new feature requests, or just want to figure out how to make it work for what you need to do, contact me: rubylexer _at_ inforadical.net RubyLexer is a RubyForge project. RubyForge is another good place to send your bug reports or whatever: http://rubyforge.org/projects/rubylexer/ (There aren't any bug filed against RubyLexer there yet, but don't be afraid that your report will get lonely.) Status: RubyLexer can correctly lex all legal Ruby 1.8 code that I've been able to find on my Debian system. It can also handle (most of) my catalog of nasty test cases (in testdata/p.rb). At this point, new bugs are almost exclusively found by my home-grown test code, rather than ruby code gathered 'from the wild'. A largish sample of ruby recently tested for the first time (that is, Rubyx) had _0_ lex errors. (And this is not the only example.) There are a number of issues i know about and plan to fix, but it seems that Ruby coders don't write code complex enough to trigger them very often. Although incomplete, RubyLexer is nevertheless better than many existing ad-hoc lexers. For instance, RubyLexer can correctly distinguish all cases of the different uses the following operators, depending on context: % can be modulus operator or start of fancy string / can be division operator or start of regex * & + - can be unary or binary operator [] can be for array literal or index method << can be here document or left shift operator (or in class<