README.md in bloomed-1.0.0 vs README.md in bloomed-1.1.0
- old
+ new
@@ -1,9 +1,9 @@
# Bloomed
Troy Hunt's brilliant haveibeenpwned.com let's you download SHA1s of 517,238,891 real world passwords previously exposed in data breaches. This list is comprehensive but huge in size: 11GB compressed.
-Using a bloom filter we can reduce the size down to files measured in MBs.
+Using a bloom filter we can reduce the size down to files measured in MBs.
You can even keep a the bloom filter in memory in your web app or api. This is great if you're afraid to send the passwords that your users enter, to an external service for lookup.
This gem will let you control the trade off between memory size and precision. False positives will occur (that's the nature of bloom filters), but you control the frequency and how many of the pwned passwords you want in your filter, starting from the most pwned at the top.
@@ -27,11 +27,11 @@
### Quick start
```ruby
require 'bloomed'
-pw=Bloomed::PW.new
+pw = Bloomed::PW.new
pw.pwned? "password123"
=> true
```
### Using lower precision / lower memory consumption
@@ -39,32 +39,50 @@
There are two parameters that can be varied: `top` and `false_positive_probability`.
```ruby
require 'bloomed'
-pw=Bloomed::PW.new(top: 100000, false_positive_probability: 0.01) # 136 kb memory
+pw = Bloomed::PW.new(top: 100000, false_positive_probability: 0.01) # 136 kb memory
pw.pwned? "password123"
=> true
```
### Using higher precision / higher memory consumption
To keep the gem size small, it only ships with dumps up to 253 kb in size.
-To generate a larger, optimized bloom filter for pwned passwords, please download pwned-passwords-ordered-by-count.7z from https://haveibeenpwned.com and extract `pwned-passwords-ordered-by-count.txt` to the current dir.
+To generate all combinations of `top` and `false_positive_probability` bloom filters for pwned passwords, run:
-Once you have the `pwned-passwords-ordered-by-count.txt` file in place, you can run the following to generate the filter and cache it for later (on your machine.)
+`rake seed\[all\]`
+This will download the source 7zip file with pwned passwords, unpack it to the current dir, write the generated bloom filters in the lib/dump dir relative to the installation path of the gem.
+
+Note: You'll need to `brew install curl p7zip` on macos and `apt-get install curl p7zip` on linux.
+
+Sometimes you will want to have more precise control of the placement of the cache files. To seed all variants in the current dir, run:
+
+```
+rake seed_here\[all\]
+```
+
+But be aware that it will take a long time!
+
+Once you have the massive 22GB text files available, you can generate binary cache files using the exact precision you want.
+
```ruby
require 'bloomed'
-pw=Bloomed::PW.new(top: 1E8, false_positive_probability: 0.0001) # 247 Mb! memory
-pw.pwned? "password123"
-=> true
+pw = Bloomed::PW.new(top:1E9, false_positive_probability: 0.0001)
```
-### The cache
+Warning! This seeds all the passwords and will take a loooong time the first time. Even with a binary cache file in place loading it will take massive time and memory.
-The cache is stored in the `dumps` dir inside `dirname $(gem which bloomed)`.
+For deployment scenarios where you don't want the server to `rake seed`, you can override the directory used for caching by giving a `cache_dir` argument to the constructor:
+
+```ruby
+require 'bloomed'
+pw = Bloomed::PW.new(top:1E8, false_positive_probability: 0.0001, cache_dir: '/var/lib/bloomed'
+)
+```
### Size of the in memory bloom filter
The filter can vary much in size. Use `Bloomed:PW#memory_size_bytes` to get the exact size.