Sha256: 9be50c9a347a2206b2fae843a8861b57c4240aa5eab8ac9efcc60062ee341320

Contents?: true

Size: 1.34 KB

Versions: 1

Compression:

Stored size: 1.34 KB

Contents

Kabutops
========

Installation
------------

You can install it via gem

```bash
gem install kabutops
```

Or you can put it in your Gemfile

```ruby
gem 'kabutops'
```

Basic example
-------------

Create **fruit_crawler.rb**.

```ruby
require 'kabutops'

class FruitCrawler < Kabutops::Crawler
  include Sidekiq::Worker

  collection (1..5).map { |id|
               {
                 id: id,
                 url: "https://www.example.com/fruits/#{id}",
               }
             }.shuffle
  proxy '127.0.0.1', 81818
  cache true

  elasticsearch do
    index :books
    document :book

    data do
      id :var, :id
      url :var, :url
      some_attr :css, 'h1.bookTitle'
      grape :lambda, ->(page) {
        page.css('h3.fruit').split(',').first 
      }

      nested_attr do
        apple :css, 'h1.bookTitle'
        banana :xpath, '//table/tr/td[0]'
      end
    end
  end

  callback do |resource, page|
  end
end

FruitCrawler.crawl!
```

Run it via sidekiq

```bash
bundle exec sidekiq -r ./fruit_crawler.rb -c 10
```

This example will parallely crawl specified urls and result will be
stored to the ElasticSearch index named books as a book document.

One document will look something like this

```json
{
  'id': '...',
  'url': '...',
  'some_attr': '...',
  'grape': '...',
  'nested_attr': {
    'apple': '...',
    'banana': '...'
  }
}
```

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
kabutops-0.0.1 README.md