README.md in unicode-emoji-3.6.0 vs README.md in unicode-emoji-3.7.0
- old
+ new
@@ -1,32 +1,29 @@
# Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji) [![[ci]](https://github.com/janlelis/unicode-emoji/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-emoji/actions?query=workflow%3ATest)
-Provides Unicode Emoji data and regexes, incorporating the latest Unicode and Emoji standards.
+Provides regular expressions to find Emoji in strings, incorporating the latest Unicode and Emoji standards.
-Also includes a categorized list of recommended Emoji.
+Additional features:
+- A categorized list of recommended Emoji
+- Retrieve Emoji properties info about specific codepoints (Emoji_Modifier, Emoji_Presentation, etc.)
+
Emoji version: **16.0** (September 2024)
CLDR version (used for sub-region flags): **45** (April 2024)
-Supported Rubies: **3.x**
-
-No longer supported Rubies, but might still work: **2.7**, **2.6**, **2.5**, **2.4**, **2.3**
-
-If you are stuck on an older Ruby version, checkout the latest [0.9 version](https://rubygems.org/gems/unicode-emoji/versions/0.9.3) of this gem.
-
## Gemfile
```ruby
gem "unicode-emoji"
```
## Usage
### Regex
-The gem includes a bunch of Emoji regexes, which are compiled out of various Emoji Unicode data sources.
+The gem includes multiple Emoji regexes, which are compiled out of various Emoji Unicode data sources.
```ruby
require "unicode/emoji"
string = "String which contains all kinds of emoji:
@@ -44,50 +41,52 @@
string.scan(Unicode::Emoji::REGEX) # => ["๐ด", "โถ๏ธ", "๐๐ฝ", "๐ต๐น", "๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ", "2๏ธโฃ", "๐คพ๐ฝโโ๏ธ"]
```
#### Main Regexes
-Matches (non-textual) Emoji of all kinds:
+There are multiple levels of Emoji detection:
Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX` | **Use this if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *recommended* Emoji sequences | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐คพ๐ฝโโ๏ธ` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐ต๐ต`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐ค โ๐คข`
-`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *valid* Emoji sequences | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐ต๐ต`
-`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *well-formed* Emoji sequences | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `๐ต๐ต` | `๐ด๏ธ`, `โถ`, `๐ป`
+`Unicode::Emoji::REGEX` | **Use this one if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *recommended* Emoji sequences | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐คพ๐ฝโโ๏ธ` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐ต๐ต`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐ค โ๐คข`, `1`
+`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *valid* Emoji sequences | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐ต๐ต`, `1`
+`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *well-formed* Emoji sequences | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `๐ต๐ต` | `๐ด๏ธ`, `โถ`, `๐ป`, `1`
+`Unicode::Emoji::REGEX_POSSIBLE` | Matches all singleton Emoji, singleton components, all kinds of Emoji sequences, and even single digits | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `๐ต๐ต`, `๐ด๏ธ`, `โถ`, `๐ป`, `1` |
##### Picking the Right Emoji Regex
- Usually you just want `REGEX` (RGI set)
-- If you want broader matching (e.g. more sub-regions), choose `REGEX_VALID`
-- If you even want to match for invalid sequences, too, use `REGEX_WELL_FORMED`
+- If you want broader matching (any ZJW sequences, more sub-region flags), choose `REGEX_VALID`
+- Even brolader is `REGEX_WELL_FORMED`, which will also match any region flag and any tag sequence
+- And then there is `REGEX_POSSIBLE` , which is a quick check for possible Emoji, which might contain false positives, [suggested in the Unicode Standard](https://www.unicode.org/reports/tr51/#EBNF_and_Regex)
-Please see [the standard](https://www.unicode.org/reports/tr51/#Emoji_Sets) for details.
+Property | Escaped | `REGEX` (RGI / Recommended) | `REGEX_VALID` (Valid) | `REGEX_WELL_FORMED` (Well-formed) | `REGEX_POSSIBLE`
+---------|---------|-----------------------------|-----------------------|-----------------------------------|-----------------
+Region "๐ต๐น" | `\u{1F1F5 1F1F9}` | Yes | Yes | Yes | Yes
+Region "๐ต๐ต" | `\u{1F1F5 1F1F5}` | No | No | Yes | Yes
+Tag Sequence "๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ" | `\u{1F3F4 E0067 E0062 E0073 E0063 E0074 E007F}` | Yes | Yes | Yes | Yes
+Tag Sequence "๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ" | `\u{1F3F4 E0067 E0062 E0061 E0067 E0062 E007F}` | No | Yes | Yes | Yes
+Tag Sequence "๐ด๓ ง๓ ข๓ ก๓ ก๓ ก๓ ฟ" | `\u{1F634 E0067 E0062 E0061 E0061 E0061 E007F}` | No | No | Yes | Yes
+ZWJ Sequence "๐คพ๐ฝโโ๏ธ" | `\u{1F93E 1F3FD 200D 2640 FE0F}` | Yes | Yes | Yes | Yes
+ZWJ Sequence "๐ค โ๐คข" | `\u{1F920 200D 1F922}` | No | Yes | Yes | Yes
-Property | `REGEX` (RGI / Recommended) | `REGEX_VALID` (Valid) | `REGEX_WELL_FORMED` (Well-formed)
----------|-----------------------------|-----------------------|----------------------------------
-Region "๐ต๐น" | Yes | Yes | Yes
-Region "๐ต๐ต" | No | No | Yes
-Tag Sequence "๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ" | Yes | Yes | Yes
-Tag Sequence "๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ" | No | Yes | Yes
-Tag Sequence "๐ด๓ ง๓ ข๓ ก๓ ก๓ ก๓ ฟ" | No | No | Yes
-ZWJ Sequence "๐คพ๐ฝโโ๏ธ" | Yes | Yes | Yes
-ZWJ Sequence "๐ค โ๐คข" | No | Yes | Yes
+Please see [the standard](https://www.unicode.org/reports/tr51/#Emoji_Sets) for more details, examples, explanations.
-More info about valid vs. recommended Emoji in this [blog article on Emojipedia](https://blog.emojipedia.org/unicode-behind-the-curtain/).
+More info about valid vs. recommended Emoji can also be found in this [blog article on Emojipedia](https://blog.emojipedia.org/unicode-behind-the-curtain/).
#### Singleton Regexes
Matches only simple one-codepoint (+ optional variation selector) Emoji:
Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `๐ด`, `โถ๏ธ` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐๐ฝ`, `๐ต๐น`, `๐ต๐ต`,`2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`
-`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `๐ด๏ธ`, `โถ` | `๐ด`, `โถ๏ธ`, `๐ป`, `๐๐ฝ`, `๐ต๐น`, `๐ต๐ต`,`2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`
+`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `๐ด`, `โถ๏ธ` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐๐ฝ`, `๐ต๐น`, `๐ต๐ต`,`2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `1`
+`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digits) | `๐ด๏ธ`, `โถ` | `๐ด`, `โถ๏ธ`, `๐ป`, `๐๐ฝ`, `๐ต๐น`, `๐ต๐ต`,`2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `1`
#### Include Textual Emoji
-By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes. However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix:
+By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes (except in `REGEX_POSSIBLE`). However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix:
Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
`Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ด๏ธ`, `โถ` | `๐ป`, `๐ต๐ต`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐ค โ๐คข`
`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `๐ด๏ธ`, `โถ` | `๐ป`, `๐ต๐ต`
@@ -125,15 +124,15 @@
=> ["๐ฑ", "๐", "๐", "๐", "๐", "๐", "๐", "๐ ", "๐ข", "๐ฃ", "๐ค", "๐ฅ", "๐ฅฎ", "๐ก", "๐ฅ", "๐ฅ ", "๐ฅก"]
```
Please note that categories might change with future versions of the Emoji standard. This gem will issue warnings when attempting to retrieve old categories using the `#list` method.
-A list of all Emoji can be found at [character.construction](https://character.construction).
+A list of all Emoji (generated from this gem) can be found at [character.construction/emoji](https://character.construction/emoji).
### Properties
-Allows you to access the codepoint data form Unicode's [emoji-data.txt](https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt) file:
+Allows you to access the codepoint data form Unicode's [emoji-data.txt](https://www.unicode.org/Public/16.0.0/ucd/emoji/emoji-data.txt) file:
```ruby
require "unicode/emoji"
Unicode::Emoji.properties "โ" # => ["Emoji", "Emoji_Modifier_Base"]
@@ -141,10 +140,10 @@
## Also See
- [Unicodeยฎ Technical Standard #51](https://www.unicode.org/reports/tr51/)
- [Emoji categories](https://unicode.org/emoji/charts/emoji-ordering.html)
-- Ruby gem which displays [Emoji sequence names](https://github.com/janlelis/unicode-sequence_name) (here [as website](https://character.construction/name))
+- Ruby gem which displays [Emoji sequence names](https://github.com/janlelis/unicode-sequence_name) ([as website](https://character.construction/name))
- Part of [unicode-x](https://github.com/janlelis/unicode-x)
## MIT
- Copyright (C) 2017-2024 Jan Lelis <https://janlelis.com>. Released under the MIT license.