README.md in husc-0.1.0 vs README.md in husc-0.1.1
- old
+ new
@@ -1,9 +1,9 @@
-Crawler
+Husc
=======
-Script for crawling in Ruby
+A simple crawling utility for Ruby.
## Description
This project enables site crawling and data extraction with xpath and css selectors. You can also send forms such as text data, files, and checkboxes.
@@ -14,31 +14,76 @@
## Usage
### Simple Example
```ruby
-require './rbcrawl.rb'
+require 'husc'
url = 'http://www.example.com/'
-doc = RbCrawl.new(url)
+doc = Husc(url)
-# Search for nodes by css
+# access another url
+doc.get('another url')
+
+# get current url
+doc.url
+
+# get current site's html
+doc.html
+
+# get <table> tags as dict
+doc.tables
+# ex) doc.tables['予約・お問い合わせ'] => 050-5596-6465
+```
+
+### Scraping Example
+```ruby
+# search for nodes by css selector
+# tag : css('name')
+# class : css('.name')
+# id : css('#name')
doc.css('div')
doc.css('.main-text')
doc.css('#tadjs')
-# Search for nodes by xpath
+# search for nodes by xpath
doc.xpath('//*[@id="top"]/div[1]')
-# Others
-doc.css('div').css('a')[2].attr('href')
-doc.css('p').innerText()
-doc.tables # -> Table Tag to Dict
-
+# other example
+doc.css('div').css('a')[2].attr('href') # => string object
+doc.css('p').innerText() # => string object
# You do not need to specify "[]" to access the first index
```
+### Submitting Form Example
+1. Specify target node's attribute
+2. Specify value(int or str) / check(bool) / file_name(str)
+3. call submit() with form attribute specified
+```ruby
+# login
+doc.send(id:'id attribute', value:'value to send')
+doc.send(id:'id attribute', value:'value to send')
+doc.submit(id:'id attribute') # submit
+# post file
+doc.send(id:'id attribute', file_name:'target file name')
+
+# checkbox
+doc.send(id:'id attribute', check:True) # check
+doc.send(id:'id attribute', check:False) # uncheck
+
+# example of specify other attribute
+doc.send(name:'name attribute', value:'hello')
+doc.send(class:'class attribute', value:100)
+```
+
+
+
+
## Installation
```sh
$ gem install husc
-```
\ No newline at end of file
+```
+
+
+## Contributing
+Bug reports and pull requests are welcome on GitHub at [https://github.com/AjxLab/PyCrawl](https://github.com/AjxLab/PyCrawl).