# Anschel ![Version](https://img.shields.io/gem/v/anschel.svg?style=flat-square)

Logstash-like for moving events from Kafka into Elasticsearch.


## Usage, Configuration &c.

### Installation

Download the jarfile from the [GitHub releases page](https://github.com/sczizzo/anschel/releases)
and run like so:

    $ java -jar anschel-x.y.z.jar

### Usage

Just call for help!

    $ java -jar anschel-x.y.z.jar help
    Commands:
      anschel agent           # Run application
      anschel art             # Show application art
      anschel help [COMMAND]  # Describe available commands or one specific command
      anschel version         # Show application version

Probably you're most interested in the `agent` command:

    $ java -jar anschel-x.y.z.jar help agent
    Usage:
      anschel agent

    Options:
      -c, [--config=CONFIG]        # Path to primary configuration file
                                   # Default: /etc/anschel.json
      -l, [--log=LOG]              # Log to file instead of STDOUT
      -v, [--debug], [--no-debug]  # Enable DEBUG-level logging
      -z, [--trace], [--no-trace]  # Enable TRACE-level logging (overrides DEBUG)

    Run application


### Configuration

It's kinda like a JSON version of the Logstash config language:

    {
      // Comments like this are allowed, so long as they've a line to themself

      // If Anschel fails, state will be stored for safekeeping
      "store": "/tmp/anschel.db",

      // How often to report stats (in seconds)
      "stats_interval": 30,

      // Anschel requires JRuby, and some libraries log via Log4j
      "log4j": {
        "path": "/path/to/anschel4j.log",
        "pattern": "[%d] %p %m (%c)%n"
      },

      // Specify any number of inputs
      "input": [

        // Kafka is the primary input; see the `jruby-kafka` homepage for
        // more details: https://github.com/joekiller/jruby-kafka
        {
          "kind": "kafka",
          "queue_size": 2000,
          "zk_connect": "localhost:2181",
          "zk_connect_timeout": 6000,
          "zk_session_timeout": 6000,
          "group_id": "anschel",
          "topic_id": "franz",
          "reset_beginning": null,
          "auto_offset_reset": "smallest",
          "consumer_restart_on_error": true,
          "auto_commit_interval": 1000,
          "rebalance_max_retries": 4,
          "rebalance_backoff_ms": 2000,
          "socket_timeout_ms": 30000,
          "socket_receive_buffer_bytes": 65536,
          "fetch_message_max_bytes": 1048576,
          "auto_commit_enable": true,
          "queued_max_message_chunks": 10,
          "fetch_min_bytes": 1,
          "fetch_wait_max_ms": 100,
          "refresh_leader_backoff_ms": 200,
          "consumer_timeout_ms": -1,
          "consumer_restart_sleep_ms": 0
        }
      ],

      // Specify any number of outputs
      "output": [

        // Elasticsearch is the primary output; see the `elasticsearch-ruby`
        // homepage for more: https://github.com/elastic/elasticsearch-ruby
        {
          "kind": "elasticsearch",
          "queue_size": 2000,
          "bulk_size": 200,
          "hosts": [ "localhost:9200" ],
          "randomize_hosts": true,
          "reload_connections": true,
          "reload_on_failure": true,
          "sniffer_timeout": 5
        }
      ],

      // Just like Logstash, Anschel has a notion of filters
      "filter": {

        // "_before" is a special set of filters executed on every event before
        // the main type-specific filters are applied
        "_before": [
          {
            "gsub": {
              "field": "type",
              "match": "-.*",
              "replace": ""
            }
          }
        ],

        // Each type is allowed any number of filters; see `lib/anschel/filter`
        // for configuration options and available filters
        "some-type": [
          {
            "scan": {
              "field": "message",
              "pattern": "[A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}",
              "target": "guids"
            }
          }
        ],

        // Any "_after" filters will be applied to all events, like "_before"
        // (but, you know, like after the main filters)
        "_after": [
          {
            "index": {}
          }
        ]
      }
    }


### Operation

You might deploy Anschel with Upstart. Here's a minimal config:

    #!upstart
    description "anschel"

    console log

    start on startup
    stop on shutdown

    exec java -jar anschel-x.y.z.jar \
      --config /etc/anschel.json --log /var/log/anschel.log


## Changelog

### v1.0

_In development_

- Stability and usage improvements (v0.7)
- Allow multiple input and output configurations (v0.6)
- Support for RabbitMQ input (v0.5)
- Support for file (device) output (v0.4)
- Intial implementation of the Kafka-to-Elasticsearch pipeline