README.md in unicode-emoji-3.8.0 vs README.md in unicode-emoji-4.0.0
- old
+ new
@@ -1,8 +1,9 @@
# Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji) [![[ci]](https://github.com/janlelis/unicode-emoji/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-emoji/actions?query=workflow%3ATest)
-Provides regular expressions to find Emoji in strings, incorporating the latest Unicode / Emoji standards.
+Provides various sophisticated regular expressions to work with Emoji in strings,
+incorporating the latest Unicode / Emoji standards.
Additional features:
- A categorized list of Emoji (RGI: Recommended for General Interchange)
- Retrieve Emoji properties info about specific codepoints (Emoji_Modifier, Emoji_Presentation, etc.)
@@ -24,16 +25,17 @@
```ruby
require "unicode/emoji"
string = "String which contains all types of Emoji sequences:
-- Singleton Emoji: π΄
-- Textual singleton Emoji with Emoji variation: βΆοΈ
+- Basic Emoji: π΄
+- Textual Emoji with Emoji variation (VS16): βΆοΈ
- Emoji with skin tone modifier: ππ½
- Region flag: π΅πΉ
- Sub-Region flag: π΄σ §σ ’σ ³σ £σ ΄σ Ώ
- Keycap sequence: 2οΈβ£
+- Skin tone modifier: π»
- Sequence using ZWJ (zero width joiner): π€Ύπ½ββοΈ
"
string.scan(Unicode::Emoji::REGEX) # => ["π΄", "βΆοΈ", "ππ½", "π΅πΉ", "π΄σ §σ ’σ ³σ £σ ΄σ Ώ", "2οΈβ£", "π€Ύπ½ββοΈ"]
```
@@ -42,44 +44,44 @@
### Main Regexes
Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX` | **Use this one if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *recommended* Emoji sequences (RGI/FQE) | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π€Ύπ½ββοΈ` | `π€Ύπ½ββ`, `πββοΈ`, `π΄οΈ`, `βΆ`, `π»`, `π΅π΅`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€ βπ€’`, `1`, `1β£`
-`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *valid* Emoji sequences | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ` ,`πββοΈ`, `π€ βπ€’` | `π΄οΈ`, `βΆ`, `π»`, `π΅π΅`, `1`, `1β£`
-`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *well-formed* Emoji sequences | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`,`πββοΈ` , `π€ βπ€’`, `π΅π΅` | `π΄οΈ`, `βΆ`, `π»`, `1`, `1β£`
-`Unicode::Emoji::REGEX_POSSIBLE` | Matches all singleton Emoji, singleton components, all kinds of Emoji sequences, and even single digits (except for: unqualified keycap sequences) | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ`, `π€ βπ€’`, `π΅π΅`, `π΄οΈ`, `βΆ`, `π»`, `1` | `1β£`
+`Unicode::Emoji::REGEX` | **Use this one if unsure!** Matches (non-textual) Basic Emoji and all kinds of *recommended* Emoji sequences (RGI/FQE) | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π€Ύπ½ββοΈ`, `π»` | `π€Ύπ½ββ`, `πββοΈ`, `π΄οΈ`, `βΆ`, `π΅π΅`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€ βπ€’`, `1`, `1β£`
+`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) Basic Emoji and all kinds of *valid* Emoji sequences | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ` ,`πββοΈ`, `π€ βπ€’`, `π»` | `π΄οΈ`, `βΆ`, `π΅π΅`, `1`, `1β£`
+`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) Basic Emoji and all kinds of *well-formed* Emoji sequences | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`,`πββοΈ` , `π€ βπ€’`, `π΅π΅`, `π»` | `π΄οΈ`, `βΆ`, `1`, `1β£`
+`Unicode::Emoji::REGEX_POSSIBLE` | Matches all singleton Emoji, all kinds of Emoji sequences, and even non-Emoji singleton components like digits. Only exception: Unqualified keycap sequences are not matched | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ`, `π€ βπ€’`, `π΅π΅`, `π΄οΈ`, `βΆ`, `π»`, `1` | `1β£`
#### Include Text Emoji
By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes (except in `REGEX_POSSIBLE`). However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix:
Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π€Ύπ½ββοΈ`, `π΄οΈ`, `βΆ`, `1β£` | `π€Ύπ½ββ`, `πββοΈ`, `π»`, `π΅π΅`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€ βπ€’`, `1`
-`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ`, `π€ βπ€’`, `π΄οΈ`, `βΆ`, `1β£` | `π»`, `π΅π΅`, `1`
-`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ`, `π€ βπ€’`, `π΅π΅`, `π΄οΈ`, `βΆ`, `1β£` | `π»`, `1`
+`Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π€Ύπ½ββοΈ`, `π΄οΈ`, `βΆ`, `1β£` , `π»`| `π€Ύπ½ββ`, `πββοΈ`, `π΅π΅`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€ βπ€’`, `1`
+`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ`, `π€ βπ€’`, `π΄οΈ`, `βΆ`, `1β£` , `π»` | `π΅π΅`, `1`
+`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ`, `π€ βπ€’`, `π΅π΅`, `π΄οΈ`, `βΆ`, `1β£` , `π»` | `1`
#### Minimally-qualified and Unqualified Sequences
Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX_INCLUDE_MQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors, where the first partial Emoji has all required Variation Selectors | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ` | `πββοΈ`, `π΄οΈ`, `βΆ`, `π»`, `π΅π΅`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€ βπ€’`, `1`, `1β£`
-`Unicode::Emoji::REGEX_INCLUDE_MQE_UQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ` | `π΄οΈ`, `βΆ`, `π»`, `π΅π΅`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€ βπ€’`, `1`, `1β£`
+`Unicode::Emoji::REGEX_INCLUDE_MQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors, where the first partial Emoji has all required Variation Selectors | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `π»` | `πββοΈ`, `π΄οΈ`, `βΆ`, `π΅π΅`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€ βπ€’`, `1`, `1β£`
+`Unicode::Emoji::REGEX_INCLUDE_MQE_UQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors | `π΄`, `βΆοΈ`, `ππ½`, `π΅πΉ`, `2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ`, `π»` | `π΄οΈ`, `βΆ`, `π΅π΅`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€ βπ€’`, `1`, `1β£`
[List of MQE and UQE Emoji sequences](https://character.construction/unqualified-emoji)
#### Singleton Regexes
Matches only simple one-codepoint (+ optional variation selector) Emoji:
Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `π΄`, `βΆοΈ` | `π΄οΈ`, `βΆ`, `π»`, `ππ½`, `π΅πΉ`, `π΅π΅`,`2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ`, `π€ βπ€’`, `1`
-`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digits) | `π΄οΈ`, `βΆ` | `π΄`, `βΆοΈ`, `π»`, `ππ½`, `π΅πΉ`, `π΅π΅`,`2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ`, `π€ βπ€’`, `1`
+`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) Basic Emoji, but no sequences at all | `π΄`, `βΆοΈ`, `π»` | `π΄οΈ`, `βΆ`, `ππ½`, `π΅πΉ`, `π΅π΅`,`2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ`, `π€ βπ€’`, `1`
+`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji | `π΄οΈ`, `βΆ` | `π΄`, `βΆοΈ`, `π»`, `ππ½`, `π΅πΉ`, `π΅π΅`,`2οΈβ£`, `π΄σ §σ ’σ ³σ £σ ΄σ Ώ`, `π΄σ §σ ’σ ‘σ §σ ’σ Ώ`, `π€Ύπ½ββοΈ`, `π€Ύπ½ββ`, `πββοΈ`, `π€ βπ€’`, `1`
-Here is a list of all Emoji that can be matched using the two regexes: [character.construction/emoji-vs-text](https://character.construction/emoji-vs-text)
+Here is a list of all Emoji that can be matched using the two regexes: [character.construction/emoji-vs-text](https://character.construction/emoji-vs-text). The `REGEX_BASIC` regex also matches visual Emoji components (skin tone modifiers and hair components).
While `REGEX_BASIC` is part of the above regexes, `REGEX_TEXT` is only included in the `*_INCLUDE_TEXT` or `*_UQE` variants.
### Comparison
@@ -138,20 +140,28 @@
Please see [the standard](https://www.unicode.org/reports/tr51/#Emoji_Sets) for more details, examples, explanations.
More info about valid vs. recommended Emoji can also be found in this [blog article on Emojipedia](https://blog.emojipedia.org/unicode-behind-the-curtain/).
-### Extended Pictographic Regex
+### Emoji Property Regexes
+Ruby includes native regex Emoji properties, as listed in the following table. You can also opt-in to use the `*_PROP_*` regexes to get the Emoji support level of this gem (instead of Ruby's).
+
+Gem Regex (`Unicode::Emoji`'s Emoji support level) | Native Regex (Ruby's Emoji support level)
+---------------------------------------------------|------------------------------------------
+`Unicode::Emoji::REGEX_PROP_EMOJI` | `/\p{Emoji}/`
+`Unicode::Emoji::REGEX_PROP_MODIFIER` | `/\p{EMod}/`
+`Unicode::Emoji::REGEX_PROP_MODIFIER_BASE` | `/\p{EBase}/`
+`Unicode::Emoji::REGEX_PROP_COMPONENT` | `/\p{EComp}/`
+`Unicode::Emoji::REGEX_PROP_PRESENTATION` | `/\p{EPres}/`
+
+#### Extended Pictographic Regex
+
`Unicode::Emoji::REGEX_PICTO` matches single codepoints with the **Extended_Pictographic** property. For example, it will match `β` BLACK SAFETY SCISSORS.
`Unicode::Emoji::REGEX_PICTO_NO_EMOJI` matches single codepoints with the **Extended_Pictographic** property, but excludes Emoji characters.
See [character.construction/picto](https://character.construction/picto) for a list of all non-Emoji pictographic characters.
-
-### Partial Regexes
-
-`Unicode::Emoji::REGEX_ANY`, same as `\p{Emoji}`. Deprecated: Will be removed or renamed in the future.
## Usage β List
Use `Unicode::Emoji::LIST` or the **list** method to get a ordered and categorized list of Emoji: