README.md in unicode-emoji-3.7.0 vs README.md in unicode-emoji-3.8.0
- old
+ new
@@ -1,120 +1,163 @@
# Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji) [![[ci]](https://github.com/janlelis/unicode-emoji/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-emoji/actions?query=workflow%3ATest)
-Provides regular expressions to find Emoji in strings, incorporating the latest Unicode and Emoji standards.
+Provides regular expressions to find Emoji in strings, incorporating the latest Unicode / Emoji standards.
Additional features:
-- A categorized list of recommended Emoji
+- A categorized list of Emoji (RGI: Recommended for General Interchange)
- Retrieve Emoji properties info about specific codepoints (Emoji_Modifier, Emoji_Presentation, etc.)
Emoji version: **16.0** (September 2024)
-CLDR version (used for sub-region flags): **45** (April 2024)
+CLDR version (used for sub-region flags): **46** (October 2024)
## Gemfile
```ruby
gem "unicode-emoji"
```
-## Usage
+## Usage โ Regex Matching
-### Regex
-
The gem includes multiple Emoji regexes, which are compiled out of various Emoji Unicode data sources.
```ruby
require "unicode/emoji"
-string = "String which contains all kinds of emoji:
+string = "String which contains all types of Emoji sequences:
- Singleton Emoji: ๐ด
- Textual singleton Emoji with Emoji variation: โถ๏ธ
- Emoji with skin tone modifier: ๐๐ฝ
- Region flag: ๐ต๐น
- Sub-Region flag: ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ
- Keycap sequence: 2๏ธโฃ
- Sequence using ZWJ (zero width joiner): ๐คพ๐ฝโโ๏ธ
-
"
string.scan(Unicode::Emoji::REGEX) # => ["๐ด", "โถ๏ธ", "๐๐ฝ", "๐ต๐น", "๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ", "2๏ธโฃ", "๐คพ๐ฝโโ๏ธ"]
```
-#### Main Regexes
+Depending on your exact usecase, you can choose between multiple levels of Emoji detection:
-There are multiple levels of Emoji detection:
+### Main Regexes
Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX` | **Use this one if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *recommended* Emoji sequences | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐คพ๐ฝโโ๏ธ` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐ต๐ต`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐ค โ๐คข`, `1`
-`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *valid* Emoji sequences | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐ต๐ต`, `1`
-`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *well-formed* Emoji sequences | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `๐ต๐ต` | `๐ด๏ธ`, `โถ`, `๐ป`, `1`
-`Unicode::Emoji::REGEX_POSSIBLE` | Matches all singleton Emoji, singleton components, all kinds of Emoji sequences, and even single digits | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `๐ต๐ต`, `๐ด๏ธ`, `โถ`, `๐ป`, `1` |
+`Unicode::Emoji::REGEX` | **Use this one if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *recommended* Emoji sequences (RGI/FQE) | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐คพ๐ฝโโ๏ธ` | `๐คพ๐ฝโโ`, `๐โโ๏ธ`, `๐ด๏ธ`, `โถ`, `๐ป`, `๐ต๐ต`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐ค โ๐คข`, `1`, `1โฃ`
+`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *valid* Emoji sequences | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐คพ๐ฝโโ` ,`๐โโ๏ธ`, `๐ค โ๐คข` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐ต๐ต`, `1`, `1โฃ`
+`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *well-formed* Emoji sequences | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐คพ๐ฝโโ`,`๐โโ๏ธ` , `๐ค โ๐คข`, `๐ต๐ต` | `๐ด๏ธ`, `โถ`, `๐ป`, `1`, `1โฃ`
+`Unicode::Emoji::REGEX_POSSIBLE` | Matches all singleton Emoji, singleton components, all kinds of Emoji sequences, and even single digits (except for: unqualified keycap sequences) | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐คพ๐ฝโโ`, `๐โโ๏ธ`, `๐ค โ๐คข`, `๐ต๐ต`, `๐ด๏ธ`, `โถ`, `๐ป`, `1` | `1โฃ`
-##### Picking the Right Emoji Regex
+#### Include Text Emoji
-- Usually you just want `REGEX` (RGI set)
-- If you want broader matching (any ZJW sequences, more sub-region flags), choose `REGEX_VALID`
-- Even brolader is `REGEX_WELL_FORMED`, which will also match any region flag and any tag sequence
-- And then there is `REGEX_POSSIBLE` , which is a quick check for possible Emoji, which might contain false positives, [suggested in the Unicode Standard](https://www.unicode.org/reports/tr51/#EBNF_and_Regex)
+By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes (except in `REGEX_POSSIBLE`). However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix:
-Property | Escaped | `REGEX` (RGI / Recommended) | `REGEX_VALID` (Valid) | `REGEX_WELL_FORMED` (Well-formed) | `REGEX_POSSIBLE`
----------|---------|-----------------------------|-----------------------|-----------------------------------|-----------------
-Region "๐ต๐น" | `\u{1F1F5 1F1F9}` | Yes | Yes | Yes | Yes
-Region "๐ต๐ต" | `\u{1F1F5 1F1F5}` | No | No | Yes | Yes
-Tag Sequence "๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ" | `\u{1F3F4 E0067 E0062 E0073 E0063 E0074 E007F}` | Yes | Yes | Yes | Yes
-Tag Sequence "๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ" | `\u{1F3F4 E0067 E0062 E0061 E0067 E0062 E007F}` | No | Yes | Yes | Yes
-Tag Sequence "๐ด๓ ง๓ ข๓ ก๓ ก๓ ก๓ ฟ" | `\u{1F634 E0067 E0062 E0061 E0061 E0061 E007F}` | No | No | Yes | Yes
-ZWJ Sequence "๐คพ๐ฝโโ๏ธ" | `\u{1F93E 1F3FD 200D 2640 FE0F}` | Yes | Yes | Yes | Yes
-ZWJ Sequence "๐ค โ๐คข" | `\u{1F920 200D 1F922}` | No | Yes | Yes | Yes
+Regex | Description | Example Matches | Example Non-Matches
+------------------------------|-------------|-----------------|--------------------
+`Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ด๏ธ`, `โถ`, `1โฃ` | `๐คพ๐ฝโโ`, `๐โโ๏ธ`, `๐ป`, `๐ต๐ต`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐ค โ๐คข`, `1`
+`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐คพ๐ฝโโ`, `๐โโ๏ธ`, `๐ค โ๐คข`, `๐ด๏ธ`, `โถ`, `1โฃ` | `๐ป`, `๐ต๐ต`, `1`
+`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐คพ๐ฝโโ`, `๐โโ๏ธ`, `๐ค โ๐คข`, `๐ต๐ต`, `๐ด๏ธ`, `โถ`, `1โฃ` | `๐ป`, `1`
-Please see [the standard](https://www.unicode.org/reports/tr51/#Emoji_Sets) for more details, examples, explanations.
+#### Minimally-qualified and Unqualified Sequences
-More info about valid vs. recommended Emoji can also be found in this [blog article on Emojipedia](https://blog.emojipedia.org/unicode-behind-the-curtain/).
+Regex | Description | Example Matches | Example Non-Matches
+------------------------------|-------------|-----------------|--------------------
+`Unicode::Emoji::REGEX_INCLUDE_MQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors, where the first partial Emoji has all required Variation Selectors | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐คพ๐ฝโโ` | `๐โโ๏ธ`, `๐ด๏ธ`, `โถ`, `๐ป`, `๐ต๐ต`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐ค โ๐คข`, `1`, `1โฃ`
+`Unicode::Emoji::REGEX_INCLUDE_MQE_UQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐คพ๐ฝโโ`, `๐โโ๏ธ` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐ต๐ต`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐ค โ๐คข`, `1`, `1โฃ`
+[List of MQE and UQE Emoji sequences](https://character.construction/unqualified-emoji)
+
#### Singleton Regexes
Matches only simple one-codepoint (+ optional variation selector) Emoji:
Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `๐ด`, `โถ๏ธ` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐๐ฝ`, `๐ต๐น`, `๐ต๐ต`,`2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `1`
-`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digits) | `๐ด๏ธ`, `โถ` | `๐ด`, `โถ๏ธ`, `๐ป`, `๐๐ฝ`, `๐ต๐น`, `๐ต๐ต`,`2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `1`
+`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `๐ด`, `โถ๏ธ` | `๐ด๏ธ`, `โถ`, `๐ป`, `๐๐ฝ`, `๐ต๐น`, `๐ต๐ต`,`2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐คพ๐ฝโโ`, `๐โโ๏ธ`, `๐ค โ๐คข`, `1`
+`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digits) | `๐ด๏ธ`, `โถ` | `๐ด`, `โถ๏ธ`, `๐ป`, `๐๐ฝ`, `๐ต๐น`, `๐ต๐ต`,`2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐คพ๐ฝโโ`, `๐โโ๏ธ`, `๐ค โ๐คข`, `1`
-#### Include Textual Emoji
+Here is a list of all Emoji that can be matched using the two regexes: [character.construction/emoji-vs-text](https://character.construction/emoji-vs-text)
-By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes (except in `REGEX_POSSIBLE`). However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix:
+While `REGEX_BASIC` is part of the above regexes, `REGEX_TEXT` is only included in the `*_INCLUDE_TEXT` or `*_UQE` variants.
-Regex | Description | Example Matches | Example Non-Matches
-------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ด๏ธ`, `โถ` | `๐ป`, `๐ต๐ต`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐ค โ๐คข`
-`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `๐ด๏ธ`, `โถ` | `๐ป`, `๐ต๐ต`
-`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `๐ด`, `โถ๏ธ`, `๐๐ฝ`, `๐ต๐น`, `2๏ธโฃ`, `๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ`, `๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ`, `๐คพ๐ฝโโ๏ธ`, `๐ค โ๐คข`, `๐ต๐ต`, `๐ด๏ธ`, `โถ` | `๐ป`
+### Comparison
-#### Extended Pictographic Regex
+1) Fully-qualified RGI Emoji ZWJ sequence
+2) Minimally-qualified RGI Emoji ZWJ sequence (lacks Emoji Presentation Selectors, but not in the first Emoji character)
+3) Unqualified RGI Emoji ZWJ sequence (lacks Emoji Presentation Selector, including in the first Emoji character). Unqualified Emoji include all basic Emoji in Text Presentation (see column 11/12).
+4) Non-RGI Emoji ZWJ sequence
+5) Valid Region made from a pair of Regional Indicators
+6) Any Region made from a pair of Regional Indicators
+7) RGI Flag Emoji Tag Sequences (England, Scotland, Wales)
+8) Valid Flag Emoji Tag Sequences (any known subdivision)
+9) Any Emoji Tag Sequences (any tag sequence with any base)
+10) Basic Default Emoji Presentation Characters or Text characters with Emoji Presentation Selector
+11) Basic Default Text Presentation Characters or Basic Emoji with Text Presentation Selector
+12) Non-Emoji (unqualified) keycap
+Regex | 1 RGI/FQE | 2 RGI/MQE | 3 RGI/UQE | 4 Non-RGI | 5 Valid Reยญgion | 6 Any Reยญgion | 7 RGI Tag | 8 Valid Tag | 9 Any Tag | 10 Basic Emoji | 11 Basic Text | 12 Text Keyยญcap
+-|-|-|-|-|-|-|-|-|-|-|-|-
+REGEX | โ
| โ | โ | โ | โ
| โ | โ
| โ | โ | โ
| โ | โ
+REGEX INCLUDE TEXT | โ
| โ | โ | โ | โ
| โ | โ
| โ | โ | โ
| โ
| โ
+REGEX INCLUDE MQE | โ
| โ
| โ | โ | โ
| โ | โ
| โ | โ | โ
| โ | โ
+REGEX INCLUDE MQE UQE | โ
| โ
| โ
| โ | โ
| โ | โ
| โ | โ | โ
| โ
| โ
+REGEX VALID | โ
| โ
| (โ
)ยน | โ
| โ
| โ | โ
| โ
| โ | โ
| โ | โ
+REGEX VALID INCLUDE TEXT | โ
| โ
| โ
| โ
| โ
| โ | โ
| โ
| โ | โ
| โ
| โ
+REGEX WELL FORMED | โ
| โ
| (โ
)ยน | โ
| โ
| โ
| โ
| โ
| โ
| โ
| โ | โ
+REGEX WELL FORMED INCLUDE TEXT | โ
| โ
| โ
| โ
| โ
| โ
| โ
| โ
| โ
| โ
| โ
| โ
+REGEX POSSIBLE | โ
| โ
| โ
| โ
| โ
| โ
| โ
| โ
| โ
| โ
| โ
| โ
+REGEX BASIC | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ
| โ | โ
+REGEX TEXT | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ
| โ
+
+ยน Matches all unqualified Emoji, except for textual singleton Emoji (see columns 11, 12)
+
+See [spec files](/spec) for detailed examples about which regex matches which kind of Emoji.
+
+### Picking the Right Emoji Regex
+
+- Usually you just want `REGEX` (recommended Emoji set, RGI)
+- Use `REGEX_INCLUDE_MQE` or `REGEX_INCLUDE_MQE_UQE` if you want to catch Emoji sequences with missing Variation Selectors.
+- If you want broader matching (any ZWJ sequences, more sub-region flags), choose `REGEX_VALID`
+- If you need to match any region flag and any tag sequence, choose `REGEX_WELL_FORMED`
+- Use the `_INCLUDE_TEXT` suffix with any of the above base regexes, if you want to also match basic textual Emoji
+- And finally, there is also the option to use `REGEX_POSSIBLE`, which is a simplified test for possible Emoji, comparable to `REGEX_WELL_FORMED*`. It might contain false positives, however, the regex is less complex and [suggested in the Unicode standard itself](https://www.unicode.org/reports/tr51/#EBNF_and_Regex) as a first check.
+
+### Examples
+
+Desc | Emoji | Escaped | `REGEX` (RGI/FQE) | `REGEX_INCLUDE_MQE` (RGI/MQE) | `REGEX_VALID` | `REGEX_WELL_FORMED` / `REGEX_POSSIBLE`
+-----|-------|---------|---------------|-----------------------|-----------------------------------|-----------------
+RGI ZWJ Sequence | ๐คพ๐ฝโโ๏ธ | `\u{1F93E 1F3FD 200D 2640 FE0F}` | โ
| โ
| โ
| โ
+RGI ZWJ Sequence MQE | ๐คพ๐ฝโโ | `\u{1F93E 1F3FD 200D 2640}` | โ | โ
| โ
| โ
+Valid ZWJ Sequence, Non-RGI | ๐ค โ๐คข | `\u{1F920 200D 1F922}` | โ | โ | โ
| โ
+Known Region | ๐ต๐น | `\u{1F1F5 1F1F9}` | โ
| โ
| โ
| โ
+Unknown Region | ๐ต๐ต | `\u{1F1F5 1F1F5}` | โ | โ | โ | โ
+RGI Tag Sequence | ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ | `\u{1F3F4 E0067 E0062 E0073 E0063 E0074 E007F}` | โ
| โ
| โ
| โ
+Valid Tag Sequence | ๐ด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ | `\u{1F3F4 E0067 E0062 E0061 E0067 E0062 E007F}` | โ | โ | โ
| โ
+Well-formed Tag Sequence | ๐ด๓ ง๓ ข๓ ก๓ ก๓ ก๓ ฟ | `\u{1F634 E0067 E0062 E0061 E0061 E0061 E007F}` | โ | โ | โ | โ
+
+Please see [the standard](https://www.unicode.org/reports/tr51/#Emoji_Sets) for more details, examples, explanations.
+
+More info about valid vs. recommended Emoji can also be found in this [blog article on Emojipedia](https://blog.emojipedia.org/unicode-behind-the-curtain/).
+
+### Extended Pictographic Regex
+
`Unicode::Emoji::REGEX_PICTO` matches single codepoints with the **Extended_Pictographic** property. For example, it will match `โ` BLACK SAFETY SCISSORS.
`Unicode::Emoji::REGEX_PICTO_NO_EMOJI` matches single codepoints with the **Extended_Pictographic** property, but excludes Emoji characters.
See [character.construction/picto](https://character.construction/picto) for a list of all non-Emoji pictographic characters.
-#### Partial Regexes
+### Partial Regexes
-Matches potential Emoji parts (often, this is not what you want):
+`Unicode::Emoji::REGEX_ANY`, same as `\p{Emoji}`. Deprecated: Will be removed or renamed in the future.
-Regex | Description | Example Matches | Example Non-Matches
-------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors, tags, or zero-width joiners). Please not that this will match Emoji-parts rather than complete Emoji, for example, single digits! | `๐ด`, `โถ`, `๐ป`, `๐`, `๐ฝ`, `๐ต`, `๐น`, `2`, `๐ด`, `๐คพ`, `โ`, `๐ค `, `๐คข` | -
+## Usage โ List
+Use `Unicode::Emoji::LIST` or the **list** method to get a ordered and categorized list of Emoji:
-### List
-
-Use `Unicode::Emoji::LIST` or the list method to get a grouped (and ordered) list of Emoji:
-
```ruby
Unicode::Emoji.list.keys
# => ["Smileys & Emotion", "People & Body", "Component", "Animals & Nature", "Food & Drink", "Travel & Places", "Activities", "Objects", "Symbols", "Flags"]
Unicode::Emoji.list("Food & Drink").keys
@@ -122,16 +165,16 @@
Unicode::Emoji.list("Food & Drink", "food-asian")
=> ["๐ฑ", "๐", "๐", "๐", "๐", "๐", "๐", "๐ ", "๐ข", "๐ฃ", "๐ค", "๐ฅ", "๐ฅฎ", "๐ก", "๐ฅ", "๐ฅ ", "๐ฅก"]
```
-Please note that categories might change with future versions of the Emoji standard. This gem will issue warnings when attempting to retrieve old categories using the `#list` method.
+Please note that categories might change with future versions of the Emoji standard, although this has not happened often.
A list of all Emoji (generated from this gem) can be found at [character.construction/emoji](https://character.construction/emoji).
-### Properties
+## Usage โ Properties Data
-Allows you to access the codepoint data form Unicode's [emoji-data.txt](https://www.unicode.org/Public/16.0.0/ucd/emoji/emoji-data.txt) file:
+Allows you to access the codepoint data for a single character form Unicode's [emoji-data.txt](https://www.unicode.org/Public/16.0.0/ucd/emoji/emoji-data.txt) file:
```ruby
require "unicode/emoji"
Unicode::Emoji.properties "โ" # => ["Emoji", "Emoji_Modifier_Base"]