Sha256: ada454ea6ca5f318ff9606a1461436b22600943a93564930d572c6a9073589b8

Contents?: true

Size: 1.58 KB

Versions: 6

Compression:

Stored size: 1.58 KB

Contents

= A lightweight UTF8-aware String class meant for use with Ruby 1.8.x

The scanning code is implemented in C (would love a Java version as well, if any of you want to help with that).

At the moment, this gem is tested on 1.8.7, 1.9.2 and Rubinius 1.2.1dev - it may work on others but ymmv.

== String::UTF8 Example

The String::UTF8#[] API isn't fully supported yet, and may never support regular expressions. I'll update this readme as I make progress.

  # String (on 1.8)
  str = "你好世界"
  str.length # 12
  str[0] # 228
  str.chars.to_a.inspect # ["\344", "\275", "\240", "\345", "\245", "\275", "\344", "\270", "\226", "\347", "\225", "\214"]


  # String::UTF8 (on 1.8)
  utf8_str = "你好世界".as_utf8
  utf8_str.length # 4
  utf8.chars.to_a.inspect # => ["你", "好", "世", "界"]

  utf8[0]        # => "你"
  utf8_str[0, 3] # => "你好世"
  utf8_str[0..3] # => "你好世界"

== StringScanner::UTF8 Example

The full StringScanner API hasn't been made UTF8-aware yet. I may try and finish it later, but the getch method is all I need at the moment.

  # StringScanner (on 1.8)
  scanner = StringScanner.new("你好世界")
  scanner.getch # => "\344"

  # StringScanner::UTF8
  require 'utf8/string_scanner'

  # NOTE: we could have used scanner.as_utf8 instead
  utf8_scanner = StringScanner::UTF8.new("你好世界")
  utf8_scanner.getch # => "你"
  utf8_scanner.getch # => "好"
  utf8_scanner.getch # => "世"
  utf8_scanner.reset
  utf8_scanner.getch # => "你"

== Disclaimer

I'm only planning on making the methods I need UTF8-aware, if you need others I welcome pull requests :)

Version data entries

6 entries across 6 versions & 1 rubygems

Version Path
utf8-0.1.7 README.rdoc
utf8-0.1.6 README.rdoc
utf8-0.1.5 README.rdoc
utf8-0.1.4 README.rdoc
utf8-0.1.3 README.rdoc
utf8-0.1.2 README.rdoc