Sha256: b748e35745d0ddaec8d9b4826728948afc64d0a2be3b6558e7aab8d41a9e1706
Contents?: true
Size: 915 Bytes
Versions: 2
Compression:
Stored size: 915 Bytes
Contents
= Anemone Anemone is a web spider framework that can spider a domain and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized spider tasks quickly and easily. See http://anemone.rubyforge.org for more information. == Features * Multi-threaded design for high performance * Tracks 301 HTTP redirects to understand a page's aliases * Built-in BFS algorithm for determining page depth * Allows exclusion of URLs based on regular expressions * Choose the links to follow on each page with focus_crawl() * HTTPS support * Records response time for each page * CLI program can list all pages in a domain, calculate page depths, and more == Examples See the scripts under the <tt>lib/anemone/cli</tt> directory for examples of several useful Anemone tasks. == Requirements * nokogiri == Optional * fizx-robots (required if obey_robots_txt is set to true)
Version data entries
2 entries across 2 versions & 1 rubygems
Version | Path |
---|---|
anemone-0.2.2 | README.rdoc |
anemone-0.2.1 | README.rdoc |