README.md in unicode-emoji-1.1.0 vs README.md in unicode-emoji-2.0.0

- old
+ new

@@ -1,14 +1,14 @@ -# Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](http://badge.fury.io/rb/unicode-emoji) [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji) +# Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji) [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji) A small Ruby library which provides Unicode Emoji data and regexes. Also includes a categorized list of recommended Emoji. -Emoji version: **11.0** +Emoji version: **12.0** (February 2018) -Supported Rubies: **2.5**, **2.4**, **2.3** +Supported Rubies: **2.6**, **2.5**, **2.4**, **2.3** If you are stuck on an older Ruby version, checkout the latest [0.9 version](https://rubygems.org/gems/unicode-emoji/versions/0.9.3) of this gem. ## Gemfile @@ -18,11 +18,11 @@ ## Usage ### Regex -Five Emoji regexes are included, which are compiled out of various Emoji Unicode data. +The gem includes a bunch of Emoji regexes, which are compiled out of various Emoji Unicode data sources. ```ruby require "unicode/emoji" string = "String which contains all kinds of emoji: @@ -38,20 +38,68 @@ " string.scan(Unicode::Emoji::REGEX) # => ["😴", "▢️", "πŸ›ŒπŸ½", "πŸ‡΅πŸ‡Ή", "🏴󠁧󠁒󠁳󠁣󠁴󠁿", "2️⃣", "πŸ€ΎπŸ½β€β™€οΈ"] ``` +#### Main Regexes + +Matches (non-textual) Emoji of all kinds: + Regex | Description | Example Matches | Example Non-Matches ------------------------------|-------------|-----------------|-------------------- -`Unicode::Emoji::REGEX` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences, but restrict ZWJ and TAG sequences to recommended sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’` -`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅` -`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences | `😴`, `▢️` | `😴︎`, `β–Ά`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’` -`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `😴︎`, `β–Ά` | `😴`, `▢️`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’` -`Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors or tags) | `😴`, `β–Ά`, `🏻`, `πŸ›Œ`, `🏽`, `πŸ‡΅`, `πŸ‡Ή`, `2`, `🏴`, `🀾`, `♀`, `🀠`, `🀒` | - +`Unicode::Emoji::REGEX` | **Use this if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *recommended* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’` +`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *valid* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅` +`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *well-formed* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅` | `😴︎`, `β–Ά`, `🏻` +##### Picking the Right Emoji Regex + +- Usually you just want `REGEX` (RGI set) +- If you want broader matching (e.g. more sub-regions), choose `REGEX_VALID` +- If you even want to match for invalid sequences, too, use `REGEX_WELL_FORMED` + +Please see [the standard](http://www.unicode.org/reports/tr51/#Emoji_Sets) for details. + +Property | `REGEX` (RGI / Recommended) | `REGEX_VALID` (Valid) | `REGEX_WELL_FORMED` (Well-formed) +---------|-----------------------------|-----------------------|---------------------------------- +Region "πŸ‡΅πŸ‡Ή" | Yes | Yes | Yes +Region "πŸ‡΅πŸ‡΅" | No | No | Yes +Tag Sequence "🏴󠁧󠁒󠁳󠁣󠁴󠁿" | Yes | Yes | Yes +Tag Sequence "🏴󠁧󠁒󠁑󠁧󠁒󠁿" | No | Yes | Yes +Tag Sequence "😴󠁧󠁒󠁑󠁑󠁑󠁿" | No | No | Yes +ZWJ Sequence "πŸ€ΎπŸ½β€β™€οΈ" | Yes | Yes | Yes +ZWJ Sequence "πŸ€ β€πŸ€’" | No | Yes | Yes + More info about valid vs. recommended Emoji in this [blog article on Emojipedia](http://blog.emojipedia.org/unicode-behind-the-curtain/). +#### Singleton Regexes + +Matches only simple one-codepoint (+ optional variation selector) Emoji: + +Regex | Description | Example Matches | Example Non-Matches +------------------------------|-------------|-----------------|-------------------- +`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `😴`, `▢️` | `😴︎`, `β–Ά`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’` +`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `😴︎`, `β–Ά` | `😴`, `▢️`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’` + +#### Include Textual Emoji + +By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes. However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix: + +Regex | Description | Example Matches | Example Non-Matches +------------------------------|-------------|-----------------|-------------------- +`Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `😴︎`, `β–Ά` | `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’` +`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `😴︎`, `β–Ά` | `🏻`, `πŸ‡΅πŸ‡΅` +`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `😴︎`, `β–Ά` | `🏻` + +#### Partial Regexes + +Matches potential Emoji parts (often, this is not what you want): + +Regex | Description | Example Matches | Example Non-Matches +------------------------------|-------------|-----------------|-------------------- +`Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors, tags, or zero-width joiners). Please not that this will match Emoji-parts rather than complete Emoji, for example, single digits! | `😴`, `β–Ά`, `🏻`, `πŸ›Œ`, `🏽`, `πŸ‡΅`, `πŸ‡Ή`, `2`, `🏴`, `🀾`, `♀`, `🀠`, `🀒` | - + + ### List Use `Unicode::Emoji::LIST` or the list method to get a grouped (and ordered) list of Emoji: ```ruby @@ -63,10 +111,12 @@ Unicode::Emoji.list("Food & Drink", "food-asian") => ["🍱", "🍘", "πŸ™", "🍚", "πŸ›", "🍜", "🍝", "🍠", "🍒", "🍣", "🍀", "πŸ₯", "🍑", "\u{1F95F}", "\u{1F960}", "\u{1F961}"] ``` +Please note that categories might change with future versions of the Emoji standard. This gem will issue warnings when attemting to retrieve old categories using the `#list` method. + A markdown file with all recommended Emoji can be found [in this gist](https://gist.github.com/janlelis/72f9be1f0ecca07372c64cf13894b801). ### Properties Allows you to access the codepoint data form Unicode's [emoji-data.txt](http://unicode.org/Public/emoji/11.0/emoji-data.txt) file: @@ -85,7 +135,7 @@ - Ruby gem which displays [Emoji sequence names](https://github.com/janlelis/unicode-sequence_name) - Part of [unicode-x](https://github.com/janlelis/unicode-x) ## MIT -- Copyright (C) 2017, 2018 Jan Lelis <http://janlelis.com>. Released under the MIT license. +- Copyright (C) 2017-2019 Jan Lelis <http://janlelis.com>. Released under the MIT license. - Unicode data: http://www.unicode.org/copyright.html#Exhibit1