RubygemsResearch

Sha256: bad72dfdd28c0a8caeb1b262701f65e3397378e05c1de24508d86765c6fb8f53

Contents?: true

Size: 1.3 KB

Versions: 20

Compression:

Stored size: 1.3 KB

Within a tokenizer, you have access to a rich set of methods for scanning the text. These methods correspond to the methods of the StringScanner class (i.e., @scan@, @scan_until@, @bol?@, etc.).

Additionally, subgroups of recent regexps (used in @scan@, etc.) can be obtained via @subgroup@, which takes as a parameter the group you want to query.

Tokenizing proceeds as follows:

# Identify a token (using @#peek@, @#scan@, etc.).
# Start a new token group (using @#start_group@, passing the symbol for the group and optionally any text you want to seed the group with).
# Append text to the current group either with additional calls to @#start_group@ using the same group, or with @#append@ (which just takes the text to append to the current group)

Instead of @#start_group@, you can also use @#start_region@, which begins a new region for the given group, and @#end_region@, which closes the region.

Here is an example of a very, very simple tokenizer, that simple extracts words and numbers from the text:

{{{lang=ruby,number=true,caption=Simple tokenizer
require 'syntax'

class SimpleTokenizer < Syntax::Tokenizer
  def step
    if digits = scan(/\d+/)
      start_group :digits, digits
    elsif words = scan(/\w+/)
      start_group :words, words
    else
      start_group :normal, scan(/./)
    end
  end
end
}}}

Version data entries

20 entries across 20 versions & 1 rubygems

Version	Path
typo-3.99.0	vendor/syntax/doc/manual/parts/0009.txt
typo-3.99.2	vendor/syntax/doc/manual/parts/0009.txt
typo-3.99.3	vendor/syntax/doc/manual/parts/0009.txt
typo-3.99.1	vendor/syntax/doc/manual/parts/0009.txt
typo-4.0.2	vendor/syntax/doc/manual/parts/0009.txt
typo-4.0.1	vendor/syntax/doc/manual/parts/0009.txt
typo-3.99.4	vendor/syntax/doc/manual/parts/0009.txt
typo-4.0.0	vendor/syntax/doc/manual/parts/0009.txt
typo-4.1.1	vendor/syntax/doc/manual/parts/0009.txt
typo-4.0.3	vendor/syntax/doc/manual/parts/0009.txt
typo-5.0.2	vendor/syntax/doc/manual/parts/0009.txt
typo-4.1	vendor/syntax/doc/manual/parts/0009.txt
typo-5.0.1	vendor/syntax/doc/manual/parts/0009.txt
typo-5.0.3.98.1	vendor/syntax/doc/manual/parts/0009.txt
typo-5.0	vendor/syntax/doc/manual/parts/0009.txt
typo-5.0.3.98	vendor/syntax/doc/manual/parts/0009.txt
typo-5.1.2	vendor/syntax/doc/manual/parts/0009.txt
typo-5.1.1	vendor/syntax/doc/manual/parts/0009.txt
typo-5.1.3	vendor/syntax/doc/manual/parts/0009.txt
typo-5.1	vendor/syntax/doc/manual/parts/0009.txt