Sha256: cd6d1c563936962a45ec5adfb11f3b054036deb98152c39da3798eabdc5d50be

Contents?: true

Size: 1.26 KB

Versions: 3

Compression:

Stored size: 1.26 KB

Contents

# PageByPage

Scrape page by page, according to url pattern, return an array of Nokogiri::XML::Element you want.

## Installation

Add this line to your application's Gemfile:

```ruby
gem 'page_by_page'
```

And then execute:

    $ bundle

Or install it yourself as:

    $ gem install page_by_page

## Usage

If you know page number pattern, use fetch:

```ruby
nodes = PageByPage.fetch do
  url 'https://book.douban.com/subject/25846075/comments/hot?p=<%= n %>'
  selector '.comment-item'
  # from 2
  # step 2
  # to 100
  # interval 3
  # threads 4
  # no_progress
  # header Cookie: 'douban-fav-remind=1'
end
```

If you don't know the pattern, but you see link to next page, use jump:

```ruby
nodes = PageByPage.jump do
  start 'https://book.douban.com/subject/25846075/comments/hot'
  iterate '.comment-paginator li:nth-child(3) a'
  selector '.comment-item'
  # to 100
  # interval 3
  # no_progress
  # header Cookie: 'douban-fav-remind=1'
end
```

You may just pass parameters instead of block:

```ruby
nodes = PageByPage.fetch(
  url: 'https://book.douban.com/subject/25846075/comments/hot?p=<%= n %>',
  selector: '.comment-item',
  # from: 2,
  # step: 2,
  # to: 100,
  # interval: 3
  # threads: 4,
  # no_progress: true
  # header: {Cookie: 'douban-fav-remind=1'}
)
```

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
page_by_page-0.1.12 README.md
page_by_page-0.1.11 README.md
page_by_page-0.1.10 README.md