README.md in characteristics-0.2.0 vs README.md in characteristics-0.3.0
- old
+ new
@@ -24,20 +24,21 @@
char_info.valid? # => true / false
char_info.unicode? # => true / false
char_info.assigned? # => true / false
char_info.control? # => true / false
char_info.blank? # => true / false
+char_info.format? # => true / false
```
## Types of Encodings
This library knows of four different kinds of encodings:
- **:unicode** Unicode familiy of multibyte encodings (*UTF-X*)
- **:ascii** 7-Bit ASCII (*US-ASCII*)
- **:binary** Arbitrary string (*ASCII-8BIT*)
-- **:byte** Known byte encoding (*ISO-8859-X*, *Windows-125X*)
+- **:byte** Known single byte encoding (*ISO-8859-X*, *Windows-125X*, *IBMX*, *CP85X*, *macX*, *TIS-620*, *Windows-874*, **KOI-X**)
Other encodings are not supported, yet.
## Predicates
@@ -49,20 +50,24 @@
`true` for Unicode encodings (`UTF-X`)
### `control?`
-Control characters are codepoints in the is [C0, delete or C1 control character range](https://en.wikipedia.org/wiki/C0_and_C1_control_codes).
+Control characters are codepoints in the is [C0, delete or C1 control character range](https://en.wikipedia.org/wiki/C0_and_C1_control_codes). Characters in this range of [IBM codepage 437](https://en.wikipedia.org/wiki/Code_page_437) based encodings are always treated as control characters.
### `assigned?`
- All valid ASCII and BINARY characters are considered assigned
- For other byte based encodings, a character is considered assigned if it is not on the exception list included in this library. C0 control characters (and `\x7F`) are always considered assigned. C1 control characters are treated as assigned, if the encoding generally does not assign characters in the C1 region.
- For Unicode, the general category is considered
### `blank?`
The library includes a list of characters that might not be rendered visually. This list does not include unassigned codepoints, control characters (except for `\t`, `\n`, `\v`, `\f`, `\r`), or special formatting characters (right-to-left marker, variation selectors, etc).
+
+### `format?`
+
+This flag is `true` only for special formatting characters, which are not control characters, like Right-to-left marks. In Unicode, this means codepoints with the General Category of **Cf**.
## Todo
- Support all non-dummy encodings that Ruby supports
- Complete test matrix