Sha256: fbdd12f4d7f8b3e1e9c2454c23a40cfcd26204ce9b8e3216ebd08e66f0a38852

Contents?: true

Size: 1.41 KB

Versions: 1

Compression:

Stored size: 1.41 KB

Contents

# PageByPage

Scrape page by page, according to url pattern, return an array of Nokogiri::XML::Element you want.

## Installation

Add this line to your application's Gemfile:

```ruby
gem 'page_by_page'
```

And then execute:

    $ bundle

Or install it yourself as:

    $ gem install page_by_page

## Usage

If you know page number pattern, use fetch:

```ruby
nodes = PageByPage.fetch do
  url 'https://book.douban.com/subject/25846075/comments/hot?p=<%= n %>'
  selector '.comment-item'
  # from 2
  # step 2
  # to 100
  # interval 3
  # threads 4
  # no_progress
  # header Cookie: 'douban-fav-remind=1'
end
```

If you don't know the pattern, but you see link to next page, use jump:

```ruby
nodes = PageByPage.jump do
  start 'https://book.douban.com/subject/25846075/comments/hot'
  iterate '.comment-paginator li:nth-child(3) a'
  selector '.comment-item'
  # to 100
  # interval 3
  # no_progress
  # header Cookie: 'douban-fav-remind=1'
end
```

You may just pass parameters instead of block:

```ruby
nodes = PageByPage.fetch(
  url: 'https://book.douban.com/subject/25846075/comments/hot?p=<%= n %>',
  selector: '.comment-item',
  # from: 2,
  # step: 2,
  # to: 100,
  # interval: 3
  # threads: 4,
  # no_progress: true
  # header: {Cookie: 'douban-fav-remind=1'}
)
```

Also note that, instead of Array, `lazy_fetch` returns an Enumerator, which is native lazy-loading:

```ruby
nodes = PageByPage.lazy_fetch(
  #...
)
```

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
page_by_page-0.1.13 README.md