Sha256: 9be50c9a347a2206b2fae843a8861b57c4240aa5eab8ac9efcc60062ee341320
Contents?: true
Size: 1.34 KB
Versions: 1
Compression:
Stored size: 1.34 KB
Contents
Kabutops ======== Installation ------------ You can install it via gem ```bash gem install kabutops ``` Or you can put it in your Gemfile ```ruby gem 'kabutops' ``` Basic example ------------- Create **fruit_crawler.rb**. ```ruby require 'kabutops' class FruitCrawler < Kabutops::Crawler include Sidekiq::Worker collection (1..5).map { |id| { id: id, url: "https://www.example.com/fruits/#{id}", } }.shuffle proxy '127.0.0.1', 81818 cache true elasticsearch do index :books document :book data do id :var, :id url :var, :url some_attr :css, 'h1.bookTitle' grape :lambda, ->(page) { page.css('h3.fruit').split(',').first } nested_attr do apple :css, 'h1.bookTitle' banana :xpath, '//table/tr/td[0]' end end end callback do |resource, page| end end FruitCrawler.crawl! ``` Run it via sidekiq ```bash bundle exec sidekiq -r ./fruit_crawler.rb -c 10 ``` This example will parallely crawl specified urls and result will be stored to the ElasticSearch index named books as a book document. One document will look something like this ```json { 'id': '...', 'url': '...', 'some_attr': '...', 'grape': '...', 'nested_attr': { 'apple': '...', 'banana': '...' } } ```
Version data entries
1 entries across 1 versions & 1 rubygems
Version | Path |
---|---|
kabutops-0.0.1 | README.md |