Spanx
=====

[![Build status](https://secure.travis-ci.org/wanelo/spanx.png)](http://travis-ci.org/wanelo/spanx)

Spank down IP spam: IP-based rate limiting for web applications behind HTTP server such as nginx or Apache.

Spanx integrates into any web application simply by monitoring one or more HTTP server access log file(s)
in real time (think Apache/nginx access.log). Spanx is built on top of the gem Pause, which is a simple Redis-based rate
limiter.

Basic flow is as follows:

* Spanx tails the access.log file(s)
* parses out IP addresses of each request
* maintains a tally of request counts per IP, and per a time slice.
* Spanx is then able to detect when one or more IPs exceed the rate limiting configuration thresholds provided
(multiple thresholds are supported).
  * When such IP is detected, Spanx immediately writes it out into a block-list file (suitable for consumption by nginx or
apache, in format eg "deny 127.0.0.1;"), and then
  * executes a pre-configured command, presumed to reload HTTP server configuration (such as HUP nginx, etc) and activate new blocking rules.

Spanx additionally supports regular expression based white list file, that can be used to eliminate certain log lines
from the consideration (for example, you Googlebot based on User-Agent).

### Design

Spanx can be integrated into part of your application, or can run as a standalone ruby app.  Spanx requires ruby
1.9.3, and it uses ruby threads to work on a few things in parallel.

Spanx has two main components:

1. *watcher* is a process that monitors HTTP server log files, and updates Redis periodically with most recent counts.
   Watcher also writes out the blocked IP file, if blocked IPs are found in Redis database.

2. *analyzer* is a process that reads up to date information on IP addresses from Redis, and analyzes it. If any rate
   limit-exceeding IPs are found, it writes them to the Redis DB, with an expiration TTL set.

If you have only one web server, you can run both watcher and analyzer as a single ruby process.

If you have multiple web servers, you need to run watcher on each server, and analyzer only once (somewhere).

### Alerts

Besides actually writing out IPs to a block list file, Spanx supports notifiers that will be called when a new IP
is blocked.  Currently supported are audit log notifier (that writes that information to a log file), a Campfire
Chat notifier, which will print IP blocking information into your Campfire chat room, and an Email notifier. It is
very easy to write additional notifiers.

## Installation

Add this line to your application's Gemfile:

    gem 'spanx'

And then execute:

    $ bundle

Or install it yourself as:

    $ gem install spanx

### Dependencies

Spanx uses the Pause gem to persist state. This depends on Redis to save state and do set logic on the information it finds.

## Usage

Spanx has a single executable with several sub-commands. In practice, multiple commands will
be run concurrently to do all of the necessary calculations.

Configuration can be provided via a YAML file (see example), and/or via command line options. Not
all configuration can be set via command line. If an option is provided in both YAML file and command line,
then latter is chosen.


### watch

This command watches an HTTP server log file and writes out blocked IPs to a file specified.

```bash
  Usage: [bundle exec] spanx watch [options]
    -f, --file ACCESS_LOG            Apache/nginx access log file to scan continuously
    -z, --analyze                    Analyze IPs also (as opposed to running `spanx analyze` in another process)
    -b, --block_file BLOCK_FILE      Output file to store NGINX block list
    -c, --config CONFIG              Path to config file (YML) (required)
    -d, --daemonize                  Detach from TTY and run as a daemon
    -g, --debug                      Log to STDOUT status of execution and some time metrics
    -r, --run <shell command>        Shell command to run anytime blocked ip file changes, for example "sudo pkill -HUP nginx"
    -w, --whitelist WHITELIST        File with newline separated reg exps, to exclude lines from access log
    -h, --help                       Show this message
```

### analyze

Analyzes IPs found by the `watch` command. If an IP exceeds its maximum count for a time
period check (as set in the config file), the IP is written into Redis with a TTL defined by the
period check.

```bash
Usage: [bundle exec] spanx analyze [options]
    -a, --audit AUDIT_FILE           Historical record of IP blocking decisions
    -c, --config CONFIG              Path to config file (YML) (required)
    -d, --daemonize
    -g, --debug                      Log status to STDOUT
    -h, --help                       Show this message
```

### disable

Disables IP blocking. Note that this only effects the actual writing out
of block files, not of IP tracking or analysis. Note that this requires
a connection to redis, and thus requires the same config file used in
`analyze` and `watch`.

```bash
Usage: [bundle exec] spanx disable [options]
    -c, --config CONFIG              Path to config file (YML) (required)
    -g, --debug                      Log status to STDOUT
    -h, --help                       Show this message
```

### disable

Reenables IP blocking if disabled. As with `disable`, the config file is
required to connect to redis.

```bash
Usage: [bundle exec] spanx enable [options]
    -c, --config CONFIG              Path to config file (YML) (required)
    -g, --debug                      Log status to STDOUT
    -h, --help                       Show this message
```

### flush

This removes the persistence data around current IP blocks. Use this
when you want to remove all data around current blocks without (or in
addition to) disabling the blocker.

```bash
Usage: [bundle exec] spanx flush [options]
    -c, --config CONFIG              Path to config file (YML) (required)
    -g, --debug                      Log status to STDOUT
    -h, --help                       Show this message
```

## Examples

If you have only one load balancer, you may want to centralize all work into a single process, as such:

```bash
 $ spanx watch -w /path/to/whitelist -c /path/to/spanx.conf.yml -z -d
```

With multiple load balancers, this may not be desirable. All hosts will need to process their own access
log, but a minimum number of hosts should analyze the IP traffic.

```bash
 lb1 $ spanx watch -c spanx.conf.yml -r "sudo pkill -HUP nginx" --debug 2>&1 >> /var/log/spanx.watch.log &
 lb2 $ spanx watch -c spanx.conf.yml -r "sudo pkill -HUP nginx" --debug 2>&1 >> /var/log/spanx.watch.log &

 lb2 $ spanx analyze -c spanx.conf.yml -a spanx.audit.log --debug 2>&1 >> /var/log/spanx.analyze.log &
```

## Contributing

1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Added some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request

## Maintainers

Konstantin Gredeskoul (@kigster) and Eric Saxby (@sax) at Wanelo, Inc (http://github.com/wanelo)

(c) 2012, All rights reserved.