# unibits | Reveal the Unicode [![[version]](https://badge.fury.io/rb/unibits.svg)](http://badge.fury.io/rb/unibits) [![[travis]](https://travis-ci.org/janlelis/unibits.svg)](https://travis-ci.org/janlelis/unibits) Ruby library and CLI command that visualizes various Unicode and ASCII/single byte encodings in the terminal: - Makes analyzing encodings easier - Helps you with debugging strings - Supports **UTF-8**, **UTF-16LE**/**UTF-16BE**, **UTF-32LE**/**UTF-32BE**, **ISO-8859-X**, **Windows-125X**, **IBMX**, **CP85X**, **macX**, **TIS-620**/**Windows-874**, **KOI8-R**/**KOI8-U**, arbitrary **BINARY** data, and 7-Bit **ASCII** - Highlights invalid/special/blank bytes/characters/codepoints ## Color Coding Each byte of the given string is highlighted using the following mechanism (characters -> codepoints): - Red for invalid bytes - Orange for unassigned bytes/characters - Blue for control characters - Light blue for blanks - Non-control formatting characters in pink - Random color for all other characters ## Setup Make sure you have Ruby installed and installing gems works properly. Then do: ``` $ gem install unibits ``` ## Usage Pass the string to debug to unibits: ### From CLI ``` $ unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪" ``` ### From Ruby ```ruby require 'unibits/kernel_method' unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪" ``` ### Advanced Options `unibits` takes some optional options: - *encoding (e)*: The encoding of the given string (uses the string's default encoding if none given) - *convert (c)*: An encoding the string should be converted to before visualizing it - *stats*: Whether to show a short stats header (default: `true`), you can deactivate on the CLI with `--no-stats` - *wide-ambiguous*: Treat characters of ambiguous width as 2 spaces instead of 1 ([more info](https://github.com/janlelis/unicode-display_width)) - *width (w)*: Set a custom column width, if not set, *unibits* will retrieve it from the terminal or just use 80 ## Output of Different Valid Encodings ### UTF-8 CLI: `$ unibits -e utf-8 -c utf-8 "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"` Ruby: `unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'utf-8', convert: 'utf-8'` ![Screenshot UTF-8](/screenshots/utf-8.png?raw=true "UTF-8") ### UTF-16LE CLI: `$ unibits -e utf-8 -c utf-16le "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"` Ruby: `unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'utf-8', convert: 'utf-16le'` ![Screenshot UTF-16LE](/screenshots/utf-16le.png?raw=true "UTF-16LE") ### UTF-16BE CLI: `$ unibits -e utf-8 -c utf-16be "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"` Ruby: `unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'utf-8', convert: 'utf-16be'` ![Screenshot UTF-16BE](/screenshots/utf-16be.png?raw=true "UTF-16BE") ### UTF-32LE CLI: `$ unibits -e utf-8 -c utf-32le "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"` Ruby: `unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'utf-8', convert: 'utf-32le'` ![Screenshot UTF-32LE](/screenshots/utf-32le.png?raw=true "UTF-32LE") ### UTF-32BE CLI: `$ unibits -e utf-8 -c utf-32be "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"` Ruby: `unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'utf-8', convert: 'utf-32be'` ![Screenshot UTF-32BE](/screenshots/utf-32be.png?raw=true "UTF-32BE") ### BINARY CLI: `$ unibits -e binary "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"` Ruby: `unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'binary'` ![Screenshot BINARY](/screenshots/binary.png?raw=true "BINARY") ### ASCII CLI: `$ unibits -e utf-8 -c ascii "ascii"` Ruby: `unibits "ASCII String", encoding: 'utf-8', convert: 'ascii'` ![Screenshot ASCII](/screenshots/ascii.png?raw=true "ASCII") ## Invalid Encodings ### UTF-8 Example in Ruby: `unibits "unexpected \x80 | not enough \xF0\x9F\x8C | overlong \xE0\x81\x81 | surrogate \xED\xA0\x80 | too large \xF5\x8F\xBF\xBF"` ![Screenshot invalid UTF-8](/screenshots/utf-8.invalid.png?raw=true "Invalid UTF-8") ### ASCII Example in Ruby: `unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'ascii'` ![Screenshot invalid ASCII](/screenshots/ascii.invalid.png?raw=true "Invalid ASCII") ### BINARY Not possible to produce invalid binary strings ## Notes Also see - [Ruby's Encoding class](https://ruby-doc.org/core/Encoding.html) - [Characteristics gem](https://github.com/janlelis/characteristics) - [UTF-8 (Wikipedia)](https://en.wikipedia.org/wiki/UTF-8#Description) - [UTF-16 (Wikipedia)](https://en.wikipedia.org/wiki/UTF-16#Description) - [UTF-32 (Wikipedia)](https://en.wikipedia.org/wiki/UTF-32) - [Difference between BINARY and ASCII](http://idiosyncratic-ruby.com/56-us-ascii-8bit.html) - [Unicode Micro Libraries for Ruby](https://github.com/janlelis/unicode-x) Lots of thanks to @damienklinnert for the motivation and inspiration required to build this! 馃巻 Copyright (C) 2017 Jan Lelis . Released under the MIT license.