Sha256: 64107be3bd55ba30f38b948f0b68ee5178b35fbc48e8656112f17391765a3b08

Contents?: true

Size: 1.13 KB

Versions: 1

Compression:

Stored size: 1.13 KB

Contents

require "nokogiri"
require "open-uri"

require 'harvestman/version'
require 'harvestman/crawler'

module Harvestman
  # Public: Crawl a website. You can visit similar URLs (eg: pages in a search
  # result) by passing an optional argument.
  #
  # url   - A String containing the url to be crawled.
  # pages - Zero or more Strings that will replace a * in the
  #         base url. Note: this does not need to be an Array.
  # type  - Optional. You can use a "plain" (default) or "fast" crawler.
  #         Fast mode uses threads for performance.
  #
  # Example: Crawl Etsy.com, printing the title and price of each item in
  #          pages 1, 2 and 3 of the Electronics category.
  #
  # Harvestman.crawl 'http://www.etsy.com/browse/vintage-category/electronics/*', (1..3) do
  #   css "div.listing-hover" do
  #     title = css "div.title a"
  #     price = css "span.listing-price"
  #
  #     puts "* #{title} (#{price})"
  #   end
  # end
  #
  # Returns nothing.
  def self.crawl(url, pages = nil, type = :fast, &block)
    crawler = Harvestman::Crawler.new(url, pages, type)
    if block_given?
      crawler.crawl(&block)
    end
  end
end

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
harvestman-0.1.1 lib/harvestman.rb