README.md in seo_cache-0.2.0 vs README.md in seo_cache-0.3.0
- old
+ new
@@ -1,10 +1,17 @@
# SeoCache
Cache dedicated for SEO with Javascript rendering :fire:
+## Purpose
+Google credo is: Don't waste my bot time!
+
+So to reduce Googlebot crawling time, let's provide HTML files in a specific cache.
+
+This cache is suitable for static (generated or not) pages but not for connected pages.
+
## Installation
Add this line to your application's Gemfile:
```ruby
@@ -17,46 +24,102 @@
Or install it yourself as:
$ gem install seo_cache
-Install chrome driver on your device
+Install chromium or chrome driver on your device (the chromedriver will be automatically downloaded).
-## How it works
+Declare the middleware. For instance in `config/initializers/seo_cache.rb`:
-Specific cache for bots to optimize time to first byte and render Javascript on server side.
+```ruby
+require 'seo_cache'
-Options:
+# See options below
-Choose a cache mode (`disk` or `memory`):
+Rails.application.config.middleware.use SeoCache::Middleware
+```
+## Options
+
+Chrome path (**required**) (`disk` or `memory`):
+
+ SeoCache.chrome_path = Rails.env.development? ? '/usr/bin/chromium-browser' : '/usr/bin/chromium'
+
+Choose a cache mode (`memory` (default) or `disk`):
+
SeoCache.cache_mode = 'memory'
-If cache on disk, specify the cache path (e.g. `Rails.root.join('public', 'seo_cache')`):
+Disk cache path (required if disk cache):
- SeoCache.disk_cache_path = nil
+ SeoCache.disk_cache_path = Rails.root.join('public', 'seo_cache')
+
+Redis URL (required if memory cache):
+ SeoCache.redis_url = "redis://localhost:6379/"
+
+Redis prefix:
+
+ SeoCache.redis_namespace = '_my_project:seo_cache'
+
+Specific log file (if you want to log missed cache urls):
+
+ SeoCache.logger_path = Rails.root.join('log', 'seo_cache.log')
+
+Activate missed cache urls:
+
+ SeoCache.log_missed_cache = true
+
URLs to blacklist:
- SeoCache.blacklist_urls = []
+ SeoCache.blacklist_params = %w[^/assets/.* ^/admin.*]
+Params to blacklist:
+
+ SeoCache.blacklist_urls = %w[page]
+
URLs to whitelist:
SeoCache.whitelist_urls = []
-Query params un URl to blacklist:
+Parameter to add manually to the URl to force page caching, if you want to cache a specific URL (e.g. `https://<my_website>/?_seo_cache_=true`):
- SeoCache.blacklist_params = []
+ SeoCache.force_cache_url_param = '_seo_cache_'
+
+URL extension to ignore when caching (already defined):
+ SeoCache.extensions_to_ignore = [<your_list>]
+
+List of bot agents (already defined):
+
+ SeoCache.crawler_user_agents = [<your_list>]
+
+Parameter added to URL when generating the page, avoid infinite rendering (override only if already used):
+
+ SeoCache.prerender_url_param = '_prerender_'
+
+Be aware, JS will be render twice: once by server rendering and once by client. For React, this not a problem but with jQuery plugins, it can duplicate elements in the page (you have to check the redundancy).
+
## Automatic caching
-To automate cache, create a cron rake task which called:
+To automate caching, create a cron rake task (e.g. in `lib/tasks/populate_seo_cache.rake`):
```ruby
-SeoCache::PopulateCache.new('https://<your-domain-name>', paths_to_cache).new.perform
+namespace :MyProject do
+
+ desc 'Populate cache for SEO'
+ task populate_seo_cache: :environment do |_task, _args|
+ require 'seo_cache/populate_cache'
+
+ paths_to_cache = public_paths_like_sitemap
+
+ SeoCache::PopulateCache.new('https://<your-domain-name>', paths_to_cache).new.perform
+ end
+end
```
+You can add the `force_cache: true` option to `SeoCache::PopulateCache` for overwrite cache data.
+
## Server
If you use disk caching, add to your Nginx configuration:
```
@@ -86,9 +149,81 @@
if (-f $document_root/seo_cache/$uri) {
rewrite (.*) /seo_cache/$1 break;
}
}
```
+
+## Heroku case
+
+If you use Heroku server, you can't store file on dynos. But you have two alternatives :
+
+- Use the memory mode
+
+- Use a second server (a dedicated one) to store HTML files and combine with Nginx.
+
+To intercept the request, use the following middleware in Rails:
+
+In `config/initializers`, create a new file:
+
+```ruby
+require 'bot_detector'
+
+if Rails.env.production?
+ Rails.application.config.middleware.insert_before ActionDispatch::Static, BotDetector
+end
+```
+
+Then in `lib` directory, for instance, manage the request:
+
+```ruby
+class BotRedirector
+ CRAWLER_USER_AGENTS = ['googlebot', 'yahoo', 'bingbot', 'baiduspider', 'facebookexternalhit', 'twitterbot', 'rogerbot', 'linkedinbot', 'embedly', 'bufferbot', 'quora link preview', 'showyoubot', 'outbrain', 'pinterest/0.', 'developers.google.com/+/web/snippet', 'www.google.com/webmasters/tools/richsnippets', 'slackbot', 'vkShare', 'W3C_Validator', 'redditbot', 'Applebot', 'WhatsApp', 'flipboard', 'tumblr', 'bitlybot', 'SkypeUriPreview', 'nuzzel', 'Discordbot', 'Google Page Speed', 'Qwantify'].freeze
+
+ IGNORE_URLS = [
+ '/robots.txt'
+ ].freeze
+
+ def initialize(app)
+ @app = app
+ end
+
+ def call(env)
+ if env['HTTP_USER_AGENT'].present? && CRAWLER_USER_AGENTS.any? { |crawler_user_agent| env['HTTP_USER_AGENT'].downcase.include?(crawler_user_agent.downcase) }
+ begin
+ request = Rack::Request.new(env)
+
+ return @app.call(env) if IGNORE_URLS.any? { |ignore_url| request.fullpath.downcase =~ /^#{ignore_url.downcase}/ }
+
+ url = URI.parse(ENV['SEO_SERVER'] + request.fullpath)
+ headers = {
+ 'User-Agent' => env['HTTP_USER_AGENT'],
+ 'Accept-Encoding' => 'gzip'
+ }
+ req = Net::HTTP::Get.new(url.request_uri, headers)
+ # req.basic_auth(ENV['SEO_USER_ID'], ENV['SEO_PASSWD']) # if authentication mechanism
+ http = Net::HTTP.new(url.host, url.port)
+ http.use_ssl = true if url.scheme == 'https'
+ response = http.request(req)
+ if response['Content-Encoding'] == 'gzip'
+ response.body = ActiveSupport::Gzip.decompress(response.body)
+ response['Content-Length'] = response.body.length
+ response.delete('Content-Encoding')
+ end
+
+ return [response.code.to_i, { 'Content-Type' => response.header['Content-Type'] }, [response.body]]
+ rescue => error
+ Rails.logger.error("[bot_redirection] #{error.message}")
+
+ @app.call(env)
+ end
+ else
+ @app.call(env)
+ end
+ end
+end
+```
+
+If you use a second server, all links must be relatives in your HTML files, to avoid multi-domains links.
## Inspiration
Inspired by [prerender gem](https://github.com/prerender/prerender_rails).