husc ==== [![Build Status](https://api.travis-ci.org/AjxLab/husc.svg?branch=master)](https://travis-ci.org/AjxLab/husc) [![Gem Version](https://badge.fury.io/rb/husc.svg)](https://rubygems.org/gems/husc/) [![MIT License](http://img.shields.io/badge/license-MIT-blue.svg?style=flat)](LICENSE.txt) A simple crawling utility for Ruby. ## Description This project enables site crawling and data extraction with xpath and css selectors. You can also send forms such as text data, files, and checkboxes. ## Requirement - Ruby 2.3 or above ## Usage ### Description of Instance Methods name | Description -----------|---------------------------------------------- send | Set the value you want to submit to the form. submit | Submit form. css | Get node by css selector. xpath | Get node by xpath. attr | Get node's attribute. inner_text | Get node's inner text. ### Simple Example ```ruby require 'husc' url = 'http://www.example.com/' doc = Husc.new(url) # access another url doc.get('another url') # get current url doc.url # get current site's html doc.html # get tags as dict doc.tables ``` ### Scraping Example ```ruby # search for nodes by css selector # tag : css('name') # class : css('.name') # id : css('#name') doc.css('div') doc.css('.main-text') doc.css('#tadjs') # search for nodes by xpath doc.xpath('//*[@id="top"]/div[1]') # other example doc.css('div').css('a')[2].attr('href') # => string object doc.css('p').inner_text() # => string object # You do not need to specify "[]" to access the first index ``` ### Submitting Form Example 1. Specify target node's attribute 2. Specify value(int or str) / check(bool) / file_name(str) 3. Call submit() with form attribute specified ```ruby # login doc.send(id:'id attribute', value:'value to send') doc.send(id:'id attribute', value:'value to send') doc.submit(id:'id attribute') # submit # post file doc.send(id:'id attribute', file_name:'target file name') # checkbox doc.send(id:'id attribute', check:true) # check doc.send(id:'id attribute', check:false) # uncheck # button click doc.send(id:'id attribute', button:true) # click doc.send(id:'id attribute', button:false) # unclick # example of specify other attribute doc.send(name:'name attribute', value:'hello') doc.send(class:'class attribute', value:100) ``` ## Installation ```sh $ gem install husc ``` ## Contributing Bug reports and pull requests are welcome on GitHub at [https://github.com/AjxLab/husc](https://github.com/AjxLab/husc).