Sha256: 859cfed485bbcac6944136c4abe1c4417235c76b1848e1fe3c72650c4f084e4a

Contents?: true

Size: 1.31 KB

Versions: 15

Compression:

Stored size: 1.31 KB

Contents

# Krawler

Simple little command-line web crawler.  Use it to find 404's or 500's on your site.
I use it to warm caches.  Multi-threaded enabled for faster crawling.

## Installation

Install:

    gem install krawler

## Usage

From the command line:

    $ krawl http://localhost:3000/

Options:

    -e, --exclude regex              Exclude matching paths
    -s, --sub-restrict               Restrict to sub paths of base url
    -c, --concurrent count           Crawl with count number of concurrent connections
    -r, --randomize                  Randomize crawl path

Examples:

Restrict crawling to sub-paths of /public

    $ krawl http://localhost:3000/public -s

Restrict crawling to paths that do not match `/^\/api\//`

    $ krawl http://localhost:3000/ -e "^\/api\/"

Crawl with 4 current threaded crawlers. Make sure your server is capable of handling
concurrent requests.

    $ krawl http://production.server -c 4

Randomize the crawl path.  Helpful when you have a lot of links and get bored watching
the same crawl path over and over.

    $ krawl http://localhost:3000/ -r


## Contributing

1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Added some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request

Version data entries

15 entries across 15 versions & 1 rubygems

Version Path
krawler-1.0.14 README.md
krawler-1.0.13 README.md
krawler-1.0.12 README.md
krawler-1.0.11 README.md
krawler-1.0.10 README.md
krawler-1.0.9 README.md
krawler-1.0.8 README.md
krawler-1.0.7 README.md
krawler-1.0.6 README.md
krawler-1.0.5 README.md
krawler-1.0.4 README.md
krawler-1.0.3 README.md
krawler-1.0.2 README.md
krawler-1.0.1 README.md
krawler-1.0.0 README.md