Parent

Class Index [+]

Quicksearch

String

This is an extension and modification of the standard String class. We do a lot of UTF-8 character processing in the parser. Ruby 1.8 does not have good enough UTF-8 support and Ruby 1.9 only handles UTF-8 characters as Strings. This is very inefficient compared to representing them as Fixnum objects. Some of these hacks can be removed once we have switched to 1.9 support only.

Public Instance Methods

<<(obj) click to toggle source

Replacement for the existing << operator that also works for characters above Fixnum 255 (UTF-8 characters).

    # File lib/UTF8String.rb, line 59
59:     def << (obj)
60:       if obj.is_a?(String) || (obj < 256)
61:         # In this case we can use the built-in concat.
62:         concat(obj)
63:       else
64:         # UTF-8 characters have a maximum length of 4 byte and no byte is 0.
65:         mask = 0xFF000000
66:         pos = 3
67:         while pos >= 0
68:           # Use the built-in concat operator for each byte.
69:           concat((obj & mask) >> (8 * pos)) if (obj & mask) != 0
70:           # Move mask and position to the next byte.
71:           mask = mask >> 8
72:           pos -= 1
73:         end
74:       end
75:     end
Also aliased as: old_double_left_angle
each_utf8_char() click to toggle source

Iterate over the String calling the block for each UTF-8 character in the String. This implementation looks more awkward but is noticeably faster than the often propagated regexp based implementations.

    # File lib/UTF8String.rb, line 28
28:     def each_utf8_char
29:       c = ''
30:       length = 0
31:       each_byte do |b|
32:         c << b
33:         if length > 0
34:           # subsequent unicode byte
35:           if (length -= 1) == 0
36:             # end of unicode character reached
37:             yield c
38:             c = ''
39:           end
40:         elsif (b & 0xC0) == 0xC0
41:           # first unicode byte
42:           length = 1
43:           while (b & 0x80) != 0
44:             length += 1
45:             b = b << 1
46:           end
47:         else
48:           # ASCII character
49:           yield c
50:           c = ''
51:         end
52:       end
53:     end
length_utf8() click to toggle source

Return the number of UTF8 characters in the String. We don’t override the built-in length() function here as we don’t know who else uses it for what purpose.

    # File lib/UTF8String.rb, line 80
80:     def length_utf8
81:       len = 0
82:       each_utf8_char { |c| len += 1 }
83:       len
84:     end
old_double_left_angle(obj) click to toggle source
Alias for: <<
old_reverse() click to toggle source
Alias for: reverse
reverse() click to toggle source

UTF-8 aware version of reverse that replaces the built-in one.

    # File lib/UTF8String.rb, line 89
89:     def reverse
90:       a = []
91:       each_utf8_char { |c| a << c }
92:       a.reverse.join
93:     end
Also aliased as: old_reverse
to_quoted_printable() click to toggle source
     # File lib/UTF8String.rb, line 100
100:   def to_quoted_printable
101:     [self].pack('M').gsub(/\n/, "\r\n")
102:   end

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.