README.md in unicode-emoji-1.1.0 vs README.md in unicode-emoji-2.0.0
- old
+ new
@@ -1,14 +1,14 @@
-# Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](http://badge.fury.io/rb/unicode-emoji) [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji)
+# Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji) [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji)
A small Ruby library which provides Unicode Emoji data and regexes.
Also includes a categorized list of recommended Emoji.
-Emoji version: **11.0**
+Emoji version: **12.0** (February 2018)
-Supported Rubies: **2.5**, **2.4**, **2.3**
+Supported Rubies: **2.6**, **2.5**, **2.4**, **2.3**
If you are stuck on an older Ruby version, checkout the latest [0.9 version](https://rubygems.org/gems/unicode-emoji/versions/0.9.3) of this gem.
## Gemfile
@@ -18,11 +18,11 @@
## Usage
### Regex
-Five Emoji regexes are included, which are compiled out of various Emoji Unicode data.
+The gem includes a bunch of Emoji regexes, which are compiled out of various Emoji Unicode data sources.
```ruby
require "unicode/emoji"
string = "String which contains all kinds of emoji:
@@ -38,20 +38,68 @@
"
string.scan(Unicode::Emoji::REGEX) # => ["π΄", "βΆοΈ", "ππ½", "π΅πΉ", "π΄σ §σ ’σ ³σ £σ ΄σ Ώ", "2οΈβ£", "π€Ύπ½ββοΈ"]
```
+#### Main Regexes
+
+Matches (non-textual) Emoji of all kinds:
+
Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences, but restrict ZWJ and TAG sequences to recommended sequences | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π€Ύπ½ββοΈ` | `π΄οΈ`, `βΆ`, `π»`, `π΅π΅`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€ βπ€’`
-`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€ βπ€’` | `π΄οΈ`, `βΆ`, `π»`, `π΅π΅`
-`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences | `π΄`, `βΆοΈ` | `π΄οΈ`, `βΆ`, `π»`, `ππ½`, `π΅πΉ`, `π΅π΅`,`2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€ βπ€’`
-`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `π΄οΈ`, `βΆ` | `π΄`, `βΆοΈ`, `π»`, `ππ½`, `π΅πΉ`, `π΅π΅`,`2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€ βπ€’`
-`Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors or tags) | `π΄`, `βΆ`, `π»`, `π`, `π½`, `π΅`, `πΉ`, `2`, `π΄`, `π€Ύ`, `β`, `π€ `, `π€’` | -
+`Unicode::Emoji::REGEX` | **Use this if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *recommended* Emoji sequences | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π€Ύπ½ββοΈ` | `π΄οΈ`, `βΆ`, `π»`, `π΅π΅`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€ βπ€’`
+`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *valid* Emoji sequences | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€ βπ€’` | `π΄οΈ`, `βΆ`, `π»`, `π΅π΅`
+`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *well-formed* Emoji sequences | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€ βπ€’`, `π΅π΅` | `π΄οΈ`, `βΆ`, `π»`
+##### Picking the Right Emoji Regex
+
+- Usually you just want `REGEX` (RGI set)
+- If you want broader matching (e.g. more sub-regions), choose `REGEX_VALID`
+- If you even want to match for invalid sequences, too, use `REGEX_WELL_FORMED`
+
+Please see [the standard](http://www.unicode.org/reports/tr51/#Emoji_Sets) for details.
+
+Property | `REGEX` (RGI / Recommended) | `REGEX_VALID` (Valid) | `REGEX_WELL_FORMED` (Well-formed)
+---------|-----------------------------|-----------------------|----------------------------------
+Region "π΅πΉ" | Yes | Yes | Yes
+Region "π΅π΅" | No | No | Yes
+Tag Sequence "π΄σ §σ ’σ ³σ £σ ΄σ Ώ" | Yes | Yes | Yes
+Tag Sequence "π΄σ §σ ’σ ‘σ §σ ’σ Ώ" | No | Yes | Yes
+Tag Sequence "π΄σ §σ ’σ ‘σ ‘σ ‘σ Ώ" | No | No | Yes
+ZWJ Sequence "π€Ύπ½ββοΈ" | Yes | Yes | Yes
+ZWJ Sequence "π€ βπ€’" | No | Yes | Yes
+
More info about valid vs. recommended Emoji in this [blog article on Emojipedia](http://blog.emojipedia.org/unicode-behind-the-curtain/).
+#### Singleton Regexes
+
+Matches only simple one-codepoint (+ optional variation selector) Emoji:
+
+Regex | Description | Example Matches | Example Non-Matches
+------------------------------|-------------|-----------------|--------------------
+`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `π΄`, `βΆοΈ` | `π΄οΈ`, `βΆ`, `π»`, `ππ½`, `π΅πΉ`, `π΅π΅`,`2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€ βπ€’`
+`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `π΄οΈ`, `βΆ` | `π΄`, `βΆοΈ`, `π»`, `ππ½`, `π΅πΉ`, `π΅π΅`,`2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€ βπ€’`
+
+#### Include Textual Emoji
+
+By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes. However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix:
+
+Regex | Description | Example Matches | Example Non-Matches
+------------------------------|-------------|-----------------|--------------------
+`Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π€Ύπ½ββοΈ`, `π΄οΈ`, `βΆ` | `π»`, `π΅π΅`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€ βπ€’`
+`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€ βπ€’`, `π΄οΈ`, `βΆ` | `π»`, `π΅π΅`
+`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€ βπ€’`, `π΅π΅`, `π΄οΈ`, `βΆ` | `π»`
+
+#### Partial Regexes
+
+Matches potential Emoji parts (often, this is not what you want):
+
+Regex | Description | Example Matches | Example Non-Matches
+------------------------------|-------------|-----------------|--------------------
+`Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors, tags, or zero-width joiners). Please not that this will match Emoji-parts rather than complete Emoji, for example, single digits! | `π΄`, `βΆ`, `π»`, `π`, `π½`, `π΅`, `πΉ`, `2`, `π΄`, `π€Ύ`, `β`, `π€ `, `π€’` | -
+
+
### List
Use `Unicode::Emoji::LIST` or the list method to get a grouped (and ordered) list of Emoji:
```ruby
@@ -63,10 +111,12 @@
Unicode::Emoji.list("Food & Drink", "food-asian")
=> ["π±", "π", "π", "π", "π", "π", "π", "π ", "π’", "π£", "π€", "π₯", "π‘", "\u{1F95F}", "\u{1F960}", "\u{1F961}"]
```
+Please note that categories might change with future versions of the Emoji standard. This gem will issue warnings when attemting to retrieve old categories using the `#list` method.
+
A markdown file with all recommended Emoji can be found [in this gist](https://gist.github.com/janlelis/72f9be1f0ecca07372c64cf13894b801).
### Properties
Allows you to access the codepoint data form Unicode's [emoji-data.txt](http://unicode.org/Public/emoji/11.0/emoji-data.txt) file:
@@ -85,7 +135,7 @@
- Ruby gem which displays [Emoji sequence names](https://github.com/janlelis/unicode-sequence_name)
- Part of [unicode-x](https://github.com/janlelis/unicode-x)
## MIT
-- Copyright (C) 2017, 2018 Jan Lelis <http://janlelis.com>. Released under the MIT license.
+- Copyright (C) 2017-2019 Jan Lelis <http://janlelis.com>. Released under the MIT license.
- Unicode data: http://www.unicode.org/copyright.html#Exhibit1