Lexical analysis is performed by obtaining a tokenizer of the appropriate class and calling @tokenize@ on it, passing the text to be tokenized. Each token is yielded to the associated block as it is discovered. {{{lang=ruby,number=true,caption=Tokenizing a Ruby script require 'syntax' tokenizer = Syntax.load "ruby" tokenizer.tokenize( File.read( "program.rb" ) ) do |token| puts token puts " group: #{token.group}" puts " instruction: #{token.instruction}" end }}} If you need finer control over the process, you can use the lower-level API: {{{lang=ruby,number=true,caption=Tokenizing a Ruby script via step require 'syntax' tokenizer = Syntax.load "ruby" tokenizer.start( File.read( "program.rb" ) ) do |token| puts token puts " group: #{token.group}" puts " instruction: #{token.instruction}" end tokenizer.step tokenizer.step ... tokenizer.finish }}} In this case, each time @#step@ is invoked, it results in tokens being consumed and yielded to the block. However, a single step may result in multiple tokens being detected and yielded--there is no way to guarantee a single token at a time, unless the corresponding syntax module was written to work that way. For efficiency, the existing modules will yield multiple tokens when processing (for instance) strings, regular expressions, and heredocs.