Sha256: e082101e5924103020191a484ba3d61b05ce9a515f696c6f554b9f957672e172

Contents?: true

Size: 823 Bytes

Versions: 3

Compression:

Stored size: 823 Bytes

Contents

module Dhalang
  # Provides functionality for scraping webpages.
  class Scraper
    SCRIPT_PATH = File.expand_path('../js/html-scraper.js', __FILE__).freeze
    private_constant :SCRIPT_PATH
    
    # Scrapes full HTML content under given url.
    #
    # @param  [String] url      Url to scrape.
    # @param  [Hash]   options  User configurable options.
    #
    # @return [String] Scraped HTML content.
    def self.html(url, options = {})
      UrlUtils.validate(url)
      temp_file = FileUtils.create_temp_file("html")
      begin
        configuration = Configuration.new(options, url, temp_file.path, "html")
        NodeScriptInvoker.execute_script(SCRIPT_PATH, configuration)
        html = IO.read(temp_file.path)
      ensure
        FileUtils.delete(temp_file)
      end
      return html
    end
  end
end

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
Dhalang-0.7.2 lib/Scraper.rb
Dhalang-0.7.1 lib/Scraper.rb
Dhalang-0.7.0 lib/Scraper.rb