README.md in characteristics-0.5.2 vs README.md in characteristics-0.6.0
- old
+ new
@@ -5,11 +5,11 @@
- Is a character valid according to its encoding?
- Is a character assigned?
- Is a character a special control character?
- Could a character be invisible (blank)?
-The [unibits](https://github.com/janlelis/unibits) gem makes use of this data to visualize it accordingliy.
+The [unibits](https://github.com/janlelis/unibits) and [uniscribe](https://github.com/janlelis/uniscribe) gems makes use of this data to visualize it accordingliy.
## Setup
Add to your `Gemfile`:
@@ -18,16 +18,17 @@
```
## Usage
```ruby
-char_info = Characteristics.new(character)
+char_info = Characteristics.create(character)
char_info.valid? # => true / false
char_info.unicode? # => true / false
char_info.assigned? # => true / false
char_info.control? # => true / false
char_info.blank? # => true / false
+char_info.separator? # => true / false
char_info.format? # => true / false
```
## Types of Encodings
@@ -36,21 +37,21 @@
- **:unicode** Unicode familiy of multi-byte encodings
- *UTF-X*
- **:byte** Known single-byte encoding
- *ISO-8859-X*, *Windows-125X*, *IBMX*, *CP85X*, *macX*, *TIS-620*, *Windows-874*, *KOI-X*
- **:ascii** 7-Bit ASCII
- - *US-ASCII*
+ - *US-ASCII*, *GB1988*
- **:binary** Arbitrary string
- *ASCII-8BIT*
Other encodings are not supported, yet.
## Predicates
### `valid?`
-Validness is determined by Ruby's String#valid_encoding?
+Validness is determined by Ruby's `String#valid_encoding?`
### `unicode?`
`true` for Unicode encodings (`UTF-X`)
@@ -64,19 +65,22 @@
- For other byte based encodings, a character is considered assigned if it is not on the exception list included in this library. C0 control characters (and `\x7F`) are always considered assigned. C1 control characters are treated as assigned, if the encoding generally does not assign characters in the C1 region.
- For Unicode, the general category is considered
### `blank?`
-The library includes a list of characters that might not be rendered visually. This list does not include unassigned codepoints, control characters (except for `\t`, `\n`, `\v`, `\f`, `\r`), or special formatting characters (right-to-left marker, variation selectors, etc).
+The library includes a list of characters that might not be rendered visually. This list does not include unassigned codepoints, control characters (except for `\t`, `\n`, `\v`, `\f`, `\r`, and `\u{85}` in Unicode), or special formatting characters (right-to-left markers, variation selectors, etc).
+### `separator?`
+
+Returns true if character is considered a separator. All separators also return true for the `blank?` check. In Unicode, the following characters are separators: `\n`, `\v`, `\f`, `\r`, `\u{85}` (next line), `\u{2028}` (line separator), and `\u{2029}` (paragraph separator)
+
### `format?`
This flag is `true` only for special formatting characters, which are not control characters, like Right-to-left marks. In Unicode, this means codepoints with the General Category of **Cf**.
## Todo
- Support all non-dummy encodings that Ruby supports
-- Complete test matrix
## MIT License
Copyright (C) 2017 Jan Lelis <http://janlelis.com>. Released under the MIT license.