# CrawlerDetect ![Build](https://github.com/loadkpi/crawler_detect/workflows/build/badge.svg?branch=master) [![Gem Version](https://badge.fury.io/rb/crawler_detect.svg)](https://badge.fury.io/rb/crawler_detect) ## About **CrawlerDetect** is a Ruby version of PHP class @[CrawlerDetect](https://github.com/JayBizzle/Crawler-Detect). It helps to detect bots/crawlers/spiders via the user agent and other HTTP-headers. Currently able to detect 1,000's of bots/spiders/crawlers. ### Why CrawlerDetect? Comparing with other popular bot-detection gems: | | CrawlerDetect | Voight-Kampff | Browser | |--|--|--|--| | Number of bot-patterns | >1000 | ~280 | ~280 | | Number of checked HTTP-headers | 10 | 1 | 1 | | Number of updates of bot-list *(1st half of 2018)* | 14 | 1 | 7 | In order to remain up-to-date, this gem does not accept any crawler data updates – any PRs to edit the crawler data should be offered to the original [JayBizzle/CrawlerDetect](https://github.com/JayBizzle/Crawler-Detect) project. ## Installation Add this line to your application's Gemfile: `gem 'crawler_detect'` ## Basic Usage ```ruby CrawlerDetect.is_crawler?("Bot user agent") => true ``` Or if you need crawler name: ```ruby detector = CrawlerDetect.new("Googlebot/2.1 (http://www.google.com/bot.html)") detector.is_crawler? # => true detector.crawler_name # => "Googlebot" ``` ## Rack::Request extension **Optionally** you can add additional methods for `request`: ```ruby request.is_crawler? # => false request.crawler_name # => nil ``` It's more flexible to use `request.is_crawler?` rather than `CrawlerDetect.is_crawler?` because it automatically checks 10 HTTP-headers, not only `HTTP_USER_AGENT`. Only one thing you have to do is to configure `Rack::CrawlerDetect` midleware: ### Rails ```ruby class Application < Rails::Application # ... config.middleware.use Rack::CrawlerDetect end ``` ### Rack ```ruby use Rack::CrawlerDetect ``` ## Configuration In some cases you may want to use your own white-list, or black-list or list of http-headers to detect User-agent. It is possible to do via `CrawlerDetect::Config`. For example, you may have initializer like this: ```ruby CrawlerDetect.setup! do |config| config.raw_headers_path = File.expand_path("crawlers/MyHeaders.json", __dir__) config.raw_crawlers_path = File.expand_path("crawlers/MyCrawlers.json", __dir__) config.raw_exclusions_path = File.expand_path("crawlers/MyExclusions.json", __dir__) end ``` Make sure that your files are correct JSON files. Look at [the raw files](https://github.com/loadkpi/crawler_detect/tree/master/lib/crawler_detect/library/raw) which are used by default for more information. ## License MIT License