README.md in husc-0.1.0 vs README.md in husc-0.1.1

- old
+ new

@@ -1,9 +1,9 @@ -Crawler +Husc ======= -Script for crawling in Ruby +A simple crawling utility for Ruby. ## Description This project enables site crawling and data extraction with xpath and css selectors. You can also send forms such as text data, files, and checkboxes. @@ -14,31 +14,76 @@ ## Usage ### Simple Example ```ruby -require './rbcrawl.rb' +require 'husc' url = 'http://www.example.com/' -doc = RbCrawl.new(url) +doc = Husc(url) -# Search for nodes by css +# access another url +doc.get('another url') + +# get current url +doc.url + +# get current site's html +doc.html + +# get <table> tags as dict +doc.tables +# ex) doc.tables['予約・お問い合わせ'] => 050-5596-6465 +``` + +### Scraping Example +```ruby +# search for nodes by css selector +# tag : css('name') +# class : css('.name') +# id : css('#name') doc.css('div') doc.css('.main-text') doc.css('#tadjs') -# Search for nodes by xpath +# search for nodes by xpath doc.xpath('//*[@id="top"]/div[1]') -# Others -doc.css('div').css('a')[2].attr('href') -doc.css('p').innerText() -doc.tables # -> Table Tag to Dict - +# other example +doc.css('div').css('a')[2].attr('href') # => string object +doc.css('p').innerText() # => string object # You do not need to specify "[]" to access the first index ``` +### Submitting Form Example +1. Specify target node's attribute +2. Specify value(int or str) / check(bool) / file_name(str) +3. call submit() with form attribute specified +```ruby +# login +doc.send(id:'id attribute', value:'value to send') +doc.send(id:'id attribute', value:'value to send') +doc.submit(id:'id attribute') # submit +# post file +doc.send(id:'id attribute', file_name:'target file name') + +# checkbox +doc.send(id:'id attribute', check:True) # check +doc.send(id:'id attribute', check:False) # uncheck + +# example of specify other attribute +doc.send(name:'name attribute', value:'hello') +doc.send(class:'class attribute', value:100) +``` + + + + ## Installation ```sh $ gem install husc -``` \ No newline at end of file +``` + + +## Contributing +Bug reports and pull requests are welcome on GitHub at [https://github.com/AjxLab/PyCrawl](https://github.com/AjxLab/PyCrawl).