README.md in regexp-examples-0.4.0 vs README.md in regexp-examples-0.4.1
- old
+ new
@@ -7,12 +7,13 @@
This method generates a list of (some\*) strings that will match the given regular expression
\* If the regex has an infinite number of possible srings that match it, such as `/a*b+c{2,}/`,
or a huge number of possible matches, such as `/.\w/`, then only a subset of these will be listed.
-For more detail on this, see [configuration options](#configuration_options).
+For more detail on this, see [configuration options](#configuration-options).
+
## Usage
```ruby
/a*/.examples #=> [''. 'a', 'aa']
/ab+/.examples #=> ['ab', 'abb', 'abbb']
@@ -39,14 +40,34 @@
* Control characters, e.g. `/\ca/`, `/\cZ/`, `/\C-9/`
* Escape sequences, e.g. `/\x42/`, `/\x3D/`, `/\x5word/`, `/#{"\x80".force_encoding("ASCII-8BIT")}/`
* Unicode characters, e.g. `/\u0123/`, `/\uabcd/`, `/\u{789}/`
* **Arbitrarily complex combinations of all the above!**
-## Not-Yet-Supported syntax
+## Bugs and Not-Yet-Supported syntax
-* Options, e.g. `/pattern/i`, `/foo.*bar/m` - Using options will currently just be ignored, e.g. `/test/i.examples` will NOT include `"TEST"`
+* Backreferences are replaced by the _first_ occurance of the group, not the _last_ (as it should be). This is quite a rare occurance, but for example:
+ * `/(a|b){2} \1/.examples` incorrectly includes: `"ba b"` rather than the correct: `"ba a"`
+* Options, e.g. `/pattern/i`, `/foo.*bar/m` - Using options will currently just be ignored, for example:
+ * `/test/i.examples` will NOT include `"TEST"`
+ * `/white space/x.examples` will not strip out the whitespace from the pattern, i.e. this incorrectly returns `["white space"]` rather than `["whitespace"]`
+
+* Nested character classes, and the use of set intersection ([See here](http://www.ruby-doc.org/core-2.2.0/Regexp.html#class-Regexp-label-Character+Classes) for the official documentation on this.) For example:
+ * `/[[abc]]/.examples` (which _should_ return `["a", "b", "c"]`)
+ * `/[[a-d]&&[c-f]]/.examples` (which _should_ return: `["c", "d"]`)
+
+* Extended groups are not yet supported, such as:
+ * Including comments inside the pattern, i.e. `/(?#...)/`
+ * Conditional capture groups, such as `/(group1) (?(1)yes|no)`
+ * Options toggling, i.e. `/(?imx)/`, `/(?-imx)/`, `/(?imx: re)/` and `/(?-imx: re)/`
+
+* Possessive quantifiers, i.e. `/.?+/`, `/.*+/`, `/.++/`
+
+* The patterns: `/\10/` ... `/\77/` should match the octal representation of their character code, if there is no nth grouped subexpression. For example, `/\10/.examples` should return `["\x08"]`. Funnily enough, I did not think of this when writing my regexp parser.
+
+Full documentation on all the various other obscurities in the ruby (version 2.x) regexp parser can be found [here](https://raw.githubusercontent.com/k-takata/Onigmo/master/doc/RE).
+
Using any of the following will raise a RegexpExamples::UnsupportedSyntax exception (until such time as they are implemented!):
* POSIX bracket expressions, e.g. `/[[:alnum:]]/`, `/[[:space:]]/`
* Named properties, e.g. `/\p{L}/` ("Letter"), `/\p{Arabic}/` ("Arabic character"), `/\p{^Ll}/` ("Not a lowercase letter")
* Subexpression calls, e.g. `/(?<name> ... \g<name>* )/` (Note: These could get _really_ ugly to implement, and may even be impossible, so I highly doubt it's worth the effort!)
@@ -62,11 +83,10 @@
* [Anchors](http://ruby-doc.org/core-2.2.0/Regexp.html#class-Regexp-label-Anchors) (`\b`, `\B`, `\G`, `^`, `\A`, `$`, `\z`, `\Z`), e.g. `/\bword\b/`, `/line1\n^line2/`
* However, a special case has been made to allow `^` and `\A` at the start of a pattern; and to allow `$`, `\z` and `\Z` at the end of pattern. In such cases, the characters are effectively just ignored.
(Note: Backreferences are not really "regular" either, but I got these to work with a bit of hackery!)
-<a name="configuration_options"/>
##Configuration Options
When generating examples, the gem uses 2 configurable values to limit how many examples are listed:
* `max_repeater_variance` (default = `2`) restricts how many examples to return for each repeater. For example:
@@ -87,28 +107,20 @@
```ruby
/a*/.examples(max_repeater_variance: 5) #=> [''. 'a', 'aa', 'aaa', 'aaaa' 'aaaaa']
/[F-X]/.examples(max_group_results: 10) #=> ['F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
```
-**_WARNING_**: Choosing huge numbers, along with a "complex" regex, could easily cause your system to freeze!
+_**WARNING**: Choosing huge numbers, along with a "complex" regex, could easily cause your system to freeze!_
For example, if you try to generate a list of _all_ 5-letter words: `/\w{5}/.examples(max_group_results: 999)`, then since there are actually `63` "word" characters (upper/lower case letters, numbers and "\_"), this will try to generate `63**5 #=> 992436543` (almost 1 _trillion_) examples!
In other words, think twice before playing around with this config!
A more sensible use case might be, for example, to generate one random 1-4 digit string:
`/\d{1,4}/.examples(max_repeater_variance: 3, max_group_results: 10).sample(1)`
(Note: I may develop a much more efficient way to "generate one example" in a later release of this gem.)
-
-## Known Bugs
-
-There are a few obscure bugs that have yet to be resolved:
-
-* Various (weird!) legal patterns do not get parsed correctly, such as `/[[wtf]]/.examples` - To solve this, I'll probably have to dig deep into the Ruby source code and imitate the actual Regex parser more closely.
-
-* Backreferences are replaced by the _first_ occurance of the group, not the _last_ (as it should be). This is quite a rare occurance, but for example: `/(a|b){2} \1/.examples` incorrectly includes: `"ba b"` rather than the correct: `"ba a"`
## Installation
Add this line to your application's Gemfile: