Sha256: b748e35745d0ddaec8d9b4826728948afc64d0a2be3b6558e7aab8d41a9e1706

Contents?: true

Size: 915 Bytes

Versions: 2

Compression:

Stored size: 915 Bytes

Contents

= Anemone

Anemone is a web spider framework that can spider a domain and collect useful
information about the pages it visits. It is versatile, allowing you to
write your own specialized spider tasks quickly and easily.

See http://anemone.rubyforge.org for more information.

== Features
* Multi-threaded design for high performance
* Tracks 301 HTTP redirects to understand a page's aliases
* Built-in BFS algorithm for determining page depth
* Allows exclusion of URLs based on regular expressions
* Choose the links to follow on each page with focus_crawl()
* HTTPS support
* Records response time for each page
* CLI program can list all pages in a domain, calculate page depths, and more

== Examples
See the scripts under the <tt>lib/anemone/cli</tt> directory for examples of several useful Anemone tasks.

== Requirements
* nokogiri

== Optional
* fizx-robots (required if obey_robots_txt is set to true)

Version data entries

2 entries across 2 versions & 1 rubygems

Version Path
anemone-0.2.2 README.rdoc
anemone-0.2.1 README.rdoc