README.md in seo_cache-0.2.0 vs README.md in seo_cache-0.3.0

- old
+ new

@@ -1,10 +1,17 @@
 # SeoCache
 
 Cache dedicated for SEO with Javascript rendering :fire:
 
+## Purpose
 
+Google credo is: Don't waste my bot time!
+
+So to reduce Googlebot crawling time, let's provide HTML files in a specific cache.
+
+This cache is suitable for static (generated or not) pages but not for connected pages.
+
 ## Installation
 
 Add this line to your application's Gemfile:
 
 ```ruby
@@ -17,46 +24,102 @@
 
 Or install it yourself as:
 
     $ gem install seo_cache
 
-Install chrome driver on your device
+Install chromium or chrome driver on your device (the chromedriver will be automatically downloaded).
 
-## How it works
+Declare the middleware. For instance in `config/initializers/seo_cache.rb`:
 
-Specific cache for bots to optimize time to first byte and render Javascript on server side.
+```ruby
+require 'seo_cache'
 
-Options:
+# See options below
 
-Choose a cache mode (`disk` or `memory`):
+Rails.application.config.middleware.use SeoCache::Middleware
+```
 
+## Options
+
+Chrome path (**required**) (`disk` or `memory`):
+
+    SeoCache.chrome_path = Rails.env.development? ? '/usr/bin/chromium-browser' : '/usr/bin/chromium'
+
+Choose a cache mode (`memory` (default) or `disk`):
+
     SeoCache.cache_mode = 'memory'
 
-If cache on disk, specify the cache path (e.g. `Rails.root.join('public', 'seo_cache')`):
+Disk cache path (required if disk cache):
     
-    SeoCache.disk_cache_path = nil
+    SeoCache.disk_cache_path = Rails.root.join('public', 'seo_cache')
+
+Redis URL (required if memory cache):
     
+    SeoCache.redis_url = "redis://localhost:6379/"
+
+Redis prefix:
+    
+    SeoCache.redis_namespace = '_my_project:seo_cache'
+
+Specific log file (if you want to log missed cache urls):
+    
+    SeoCache.logger_path = Rails.root.join('log', 'seo_cache.log')
+
+Activate missed cache urls:
+    
+    SeoCache.log_missed_cache = true
+    
 URLs to blacklist:
 
-    SeoCache.blacklist_urls = []
+    SeoCache.blacklist_params = %w[^/assets/.* ^/admin.*]
     
+Params to blacklist:
+
+    SeoCache.blacklist_urls = %w[page]
+    
 URLs to whitelist:
 
     SeoCache.whitelist_urls = []
     
-Query params un URl to blacklist:
+Parameter to add manually to the URl to force page caching, if you want to cache a specific URL (e.g. `https://<my_website>/?_seo_cache_=true`):
 
-    SeoCache.blacklist_params = []
+    SeoCache.force_cache_url_param = '_seo_cache_'
+    
+URL extension to ignore when caching (already defined):
 
+    SeoCache.extensions_to_ignore = [<your_list>]
+    
+List of bot agents (already defined):
+
+    SeoCache.crawler_user_agents = [<your_list>]
+    
+Parameter added to URL when generating the page, avoid infinite rendering (override only if already used):
+
+    SeoCache.prerender_url_param = '_prerender_'
+
+Be aware, JS will be render twice: once by server rendering and once by client. For React, this not a problem but with jQuery plugins, it can duplicate elements in the page (you have to check the redundancy). 
+
 ## Automatic caching
 
-To automate cache, create a cron rake task which called:
+To automate caching, create a cron rake task (e.g. in `lib/tasks/populate_seo_cache.rake`):
 
 ```ruby
-SeoCache::PopulateCache.new('https://<your-domain-name>', paths_to_cache).new.perform
+namespace :MyProject do
+
+  desc 'Populate cache for SEO'
+  task populate_seo_cache: :environment do |_task, _args|
+    require 'seo_cache/populate_cache'
+    
+    paths_to_cache = public_paths_like_sitemap
+    
+    SeoCache::PopulateCache.new('https://<your-domain-name>', paths_to_cache).new.perform
+  end
+end
 ```
 
+You can add the `force_cache: true` option to `SeoCache::PopulateCache` for overwrite cache data.
+
 ## Server
 
 If you use disk caching, add to your Nginx configuration:
 
 ```
@@ -86,9 +149,81 @@
     if (-f $document_root/seo_cache/$uri) {
       rewrite (.*) /seo_cache/$1 break;
     }
 }
 ```
+
+## Heroku case
+
+If you use Heroku server, you can't store file on dynos. But you have two alternatives :
+
+- Use the memory mode
+
+- Use a second server (a dedicated one) to store HTML files and combine with Nginx.
+
+To intercept the request, use the following middleware in Rails:
+
+In `config/initializers`, create a new file:
+
+```ruby
+require 'bot_detector'
+
+if Rails.env.production?
+  Rails.application.config.middleware.insert_before ActionDispatch::Static, BotDetector
+end
+``` 
+
+Then in `lib` directory, for instance, manage the request:
+
+```ruby
+class BotRedirector
+  CRAWLER_USER_AGENTS = ['googlebot', 'yahoo', 'bingbot', 'baiduspider', 'facebookexternalhit', 'twitterbot', 'rogerbot', 'linkedinbot', 'embedly', 'bufferbot', 'quora link preview', 'showyoubot', 'outbrain', 'pinterest/0.', 'developers.google.com/+/web/snippet', 'www.google.com/webmasters/tools/richsnippets', 'slackbot', 'vkShare', 'W3C_Validator', 'redditbot', 'Applebot', 'WhatsApp', 'flipboard', 'tumblr', 'bitlybot', 'SkypeUriPreview', 'nuzzel', 'Discordbot', 'Google Page Speed', 'Qwantify'].freeze
+
+  IGNORE_URLS = [
+    '/robots.txt'
+  ].freeze
+
+  def initialize(app)
+    @app = app
+  end
+
+  def call(env)
+    if env['HTTP_USER_AGENT'].present? && CRAWLER_USER_AGENTS.any? { |crawler_user_agent| env['HTTP_USER_AGENT'].downcase.include?(crawler_user_agent.downcase) }
+      begin
+        request = Rack::Request.new(env)
+
+        return @app.call(env) if IGNORE_URLS.any? { |ignore_url| request.fullpath.downcase =~ /^#{ignore_url.downcase}/ }
+
+        url     = URI.parse(ENV['SEO_SERVER'] + request.fullpath)
+        headers = {
+          'User-Agent'      => env['HTTP_USER_AGENT'],
+          'Accept-Encoding' => 'gzip'
+        }
+        req     = Net::HTTP::Get.new(url.request_uri, headers)
+        # req.basic_auth(ENV['SEO_USER_ID'], ENV['SEO_PASSWD']) # if authentication mechanism
+        http         = Net::HTTP.new(url.host, url.port)
+        http.use_ssl = true if url.scheme == 'https'
+        response     = http.request(req)
+        if response['Content-Encoding'] == 'gzip'
+          response.body              = ActiveSupport::Gzip.decompress(response.body)
+          response['Content-Length'] = response.body.length
+          response.delete('Content-Encoding')
+        end
+
+        return [response.code.to_i, { 'Content-Type' => response.header['Content-Type'] }, [response.body]]
+      rescue => error
+        Rails.logger.error("[bot_redirection] #{error.message}")
+
+        @app.call(env)
+      end
+    else
+      @app.call(env)
+    end
+  end
+end
+```
+
+If you use a second server, all links must be relatives in your HTML files, to avoid multi-domains links.
 
 ## Inspiration
 
 Inspired by [prerender gem](https://github.com/prerender/prerender_rails).