README.md in unicode-emoji-3.8.0 vs README.md in unicode-emoji-4.0.0

- old
+ new

@@ -1,8 +1,9 @@ # Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji) [![[ci]](https://github.com/janlelis/unicode-emoji/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-emoji/actions?query=workflow%3ATest) -Provides regular expressions to find Emoji in strings, incorporating the latest Unicode / Emoji standards. +Provides various sophisticated regular expressions to work with Emoji in strings, +incorporating the latest Unicode / Emoji standards. Additional features: - A categorized list of Emoji (RGI: Recommended for General Interchange) - Retrieve Emoji properties info about specific codepoints (Emoji_Modifier, Emoji_Presentation, etc.) @@ -24,16 +25,17 @@ ```ruby require "unicode/emoji" string = "String which contains all types of Emoji sequences: -- Singleton Emoji: 😴 -- Textual singleton Emoji with Emoji variation: ▢️ +- Basic Emoji: 😴 +- Textual Emoji with Emoji variation (VS16): ▢️ - Emoji with skin tone modifier: πŸ›ŒπŸ½ - Region flag: πŸ‡΅πŸ‡Ή - Sub-Region flag: 🏴󠁧󠁒󠁳󠁣󠁴󠁿 - Keycap sequence: 2️⃣ +- Skin tone modifier: 🏻 - Sequence using ZWJ (zero width joiner): πŸ€ΎπŸ½β€β™€οΈ " string.scan(Unicode::Emoji::REGEX) # => ["😴", "▢️", "πŸ›ŒπŸ½", "πŸ‡΅πŸ‡Ή", "🏴󠁧󠁒󠁳󠁣󠁴󠁿", "2️⃣", "πŸ€ΎπŸ½β€β™€οΈ"] ``` @@ -42,44 +44,44 @@ ### Main Regexes Regex | Description | Example Matches | Example Non-Matches ------------------------------|-------------|-----------------|-------------------- -`Unicode::Emoji::REGEX` | **Use this one if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *recommended* Emoji sequences (RGI/FQE) | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ` | `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`, `1⃣` -`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *valid* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€` ,`πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `1`, `1⃣` -`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *well-formed* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`,`πŸŒβ€β™‚οΈ` , `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅` | `😴︎`, `β–Ά`, `🏻`, `1`, `1⃣` -`Unicode::Emoji::REGEX_POSSIBLE` | Matches all singleton Emoji, singleton components, all kinds of Emoji sequences, and even single digits (except for: unqualified keycap sequences) | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `😴︎`, `β–Ά`, `🏻`, `1` | `1⃣` +`Unicode::Emoji::REGEX` | **Use this one if unsure!** Matches (non-textual) Basic Emoji and all kinds of *recommended* Emoji sequences (RGI/FQE) | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `🏻` | `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `😴︎`, `β–Ά`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`, `1⃣` +`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) Basic Emoji and all kinds of *valid* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€` ,`πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `🏻` | `😴︎`, `β–Ά`, `πŸ‡΅πŸ‡΅`, `1`, `1⃣` +`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) Basic Emoji and all kinds of *well-formed* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`,`πŸŒβ€β™‚οΈ` , `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `🏻` | `😴︎`, `β–Ά`, `1`, `1⃣` +`Unicode::Emoji::REGEX_POSSIBLE` | Matches all singleton Emoji, all kinds of Emoji sequences, and even non-Emoji singleton components like digits. Only exception: Unqualified keycap sequences are not matched | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `😴︎`, `β–Ά`, `🏻`, `1` | `1⃣` #### Include Text Emoji By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes (except in `REGEX_POSSIBLE`). However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix: Regex | Description | Example Matches | Example Non-Matches ------------------------------|-------------|-----------------|-------------------- -`Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `😴︎`, `β–Ά`, `1⃣` | `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1` -`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `😴︎`, `β–Ά`, `1⃣` | `🏻`, `πŸ‡΅πŸ‡΅`, `1` -`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `😴︎`, `β–Ά`, `1⃣` | `🏻`, `1` +`Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `😴︎`, `β–Ά`, `1⃣` , `🏻`| `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1` +`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `😴︎`, `β–Ά`, `1⃣` , `🏻` | `πŸ‡΅πŸ‡΅`, `1` +`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `😴︎`, `β–Ά`, `1⃣` , `🏻` | `1` #### Minimally-qualified and Unqualified Sequences Regex | Description | Example Matches | Example Non-Matches ------------------------------|-------------|-----------------|-------------------- -`Unicode::Emoji::REGEX_INCLUDE_MQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors, where the first partial Emoji has all required Variation Selectors | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€` | `πŸŒβ€β™‚οΈ`, `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`, `1⃣` -`Unicode::Emoji::REGEX_INCLUDE_MQE_UQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`, `1⃣` +`Unicode::Emoji::REGEX_INCLUDE_MQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors, where the first partial Emoji has all required Variation Selectors | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `🏻` | `πŸŒβ€β™‚οΈ`, `😴︎`, `β–Ά`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`, `1⃣` +`Unicode::Emoji::REGEX_INCLUDE_MQE_UQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `🏻` | `😴︎`, `β–Ά`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`, `1⃣` [List of MQE and UQE Emoji sequences](https://character.construction/unqualified-emoji) #### Singleton Regexes Matches only simple one-codepoint (+ optional variation selector) Emoji: Regex | Description | Example Matches | Example Non-Matches ------------------------------|-------------|-----------------|-------------------- -`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `😴`, `▢️` | `😴︎`, `β–Ά`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `1` -`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digits) | `😴︎`, `β–Ά` | `😴`, `▢️`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `1` +`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) Basic Emoji, but no sequences at all | `😴`, `▢️`, `🏻` | `😴︎`, `β–Ά`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `1` +`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji | `😴︎`, `β–Ά` | `😴`, `▢️`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `1` -Here is a list of all Emoji that can be matched using the two regexes: [character.construction/emoji-vs-text](https://character.construction/emoji-vs-text) +Here is a list of all Emoji that can be matched using the two regexes: [character.construction/emoji-vs-text](https://character.construction/emoji-vs-text). The `REGEX_BASIC` regex also matches visual Emoji components (skin tone modifiers and hair components). While `REGEX_BASIC` is part of the above regexes, `REGEX_TEXT` is only included in the `*_INCLUDE_TEXT` or `*_UQE` variants. ### Comparison @@ -138,20 +140,28 @@ Please see [the standard](https://www.unicode.org/reports/tr51/#Emoji_Sets) for more details, examples, explanations. More info about valid vs. recommended Emoji can also be found in this [blog article on Emojipedia](https://blog.emojipedia.org/unicode-behind-the-curtain/). -### Extended Pictographic Regex +### Emoji Property Regexes +Ruby includes native regex Emoji properties, as listed in the following table. You can also opt-in to use the `*_PROP_*` regexes to get the Emoji support level of this gem (instead of Ruby's). + +Gem Regex (`Unicode::Emoji`'s Emoji support level) | Native Regex (Ruby's Emoji support level) +---------------------------------------------------|------------------------------------------ +`Unicode::Emoji::REGEX_PROP_EMOJI` | `/\p{Emoji}/` +`Unicode::Emoji::REGEX_PROP_MODIFIER` | `/\p{EMod}/` +`Unicode::Emoji::REGEX_PROP_MODIFIER_BASE` | `/\p{EBase}/` +`Unicode::Emoji::REGEX_PROP_COMPONENT` | `/\p{EComp}/` +`Unicode::Emoji::REGEX_PROP_PRESENTATION` | `/\p{EPres}/` + +#### Extended Pictographic Regex + `Unicode::Emoji::REGEX_PICTO` matches single codepoints with the **Extended_Pictographic** property. For example, it will match `βœ€` BLACK SAFETY SCISSORS. `Unicode::Emoji::REGEX_PICTO_NO_EMOJI` matches single codepoints with the **Extended_Pictographic** property, but excludes Emoji characters. See [character.construction/picto](https://character.construction/picto) for a list of all non-Emoji pictographic characters. - -### Partial Regexes - -`Unicode::Emoji::REGEX_ANY`, same as `\p{Emoji}`. Deprecated: Will be removed or renamed in the future. ## Usage – List Use `Unicode::Emoji::LIST` or the **list** method to get a ordered and categorized list of Emoji: