# Anschel ![Version](https://img.shields.io/gem/v/anschel.svg?style=flat-square) Logstash-like for moving events from Kafka into Elasticsearch. ## Usage, Configuration &c. ### Installation Download the jarfile from the [GitHub releases page](https://github.com/sczizzo/anschel/releases) and run like so: $ java -jar anschel-x.y.z.jar ### Usage Just call for help! $ java -jar anschel-x.y.z.jar help Commands: anschel agent # Run application anschel art # Show application art anschel help [COMMAND] # Describe available commands or one specific command anschel version # Show application version Probably you're most interested in the `agent` command: $ java -jar anschel-x.y.z.jar help agent Usage: anschel agent Options: -c, [--config=CONFIG] # Path to primary configuration file # Default: /etc/anschel.json -l, [--log=LOG] # Log to file instead of STDOUT -v, [--debug], [--no-debug] # Enable DEBUG-level logging -z, [--trace], [--no-trace] # Enable TRACE-level logging (overrides DEBUG) Run application ### Configuration It's kinda like a JSON version of the Logstash config language: { // If Anschel fails, state will be stored for safekeeping "store": "/tmp/anschel.db", // How often to report stats (in seconds) "stats_interval": 30, // Anschel requires JRuby, and some libraries log via Log4j "log4j": { "path": "/path/to/anschel4j.log", "pattern": "[%d] %p %m (%c)%n" }, // Specify any number of inputs "input": [ // Kafka is the primary input; see the `jruby-kafka` homepage for // more details: https://github.com/joekiller/jruby-kafka { "kind": "kafka", "queue_size": 2000, "zk_connect": "localhost:2181", "zk_connect_timeout": 6000, "zk_session_timeout": 6000, "group_id": "anschel", "topic_id": "franz", "reset_beginning": null, "auto_offset_reset": "smallest", "consumer_restart_on_error": true, "auto_commit_interval": 1000, "rebalance_max_retries": 4, "rebalance_backoff_ms": 2000, "socket_timeout_ms": 30000, "socket_receive_buffer_bytes": 65536, "fetch_message_max_bytes": 1048576, "auto_commit_enable": true, "queued_max_message_chunks": 10, "fetch_min_bytes": 1, "fetch_wait_max_ms": 100, "refresh_leader_backoff_ms": 200, "consumer_timeout_ms": -1, "consumer_restart_sleep_ms": 0 } ], // Specify any number of outputs "output": [ // Elasticsearch is the primary output; see the `elasticsearch-ruby` // homepage for more: https://github.com/elastic/elasticsearch-ruby { "kind": "elasticsearch", "queue_size": 2000, "bulk_size": 200, "hosts": [ "localhost:9200" ], "randomize_hosts": true, "reload_connections": true, "reload_on_failure": true, "sniffer_timeout": 5 } ], // Just like Logstash, Anschel has a notion of filters "filter": { // "_before" is a special set of filters executed on every event before // the main type-specific filters are applied "_before": [ { "gsub": { "field": "type", "match": "-.*", "replace": "" } } ], // Each type is allowed any number of filters; see `lib/anschel/filter` // for configuration options and available filters "some-type": [ { "scan": { "field": "message", "pattern": "[A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}", "target": "guids" } } ], // Any "_after" filters will be applied to all events, like "_before" // (but, you know, like after the main filters) "_after": [ { "index": {} } ] } } ### Operation You might deploy Anschel with Upstart. Here's a minimal config: #!upstart description "anschel" console log start on startup stop on shutdown exec java -jar anschel-x.y.z.jar \ --config /etc/anschel.json --log /var/log/anschel.log ## Changelog ### v1.0 _In development_ - Allow multiple input and output configurations (v0.6) - Support for RabbitMQ input (v0.5) - Support for file (device) output (v0.4) - Intial implementation of the Kafka-to-Elasticsearch pipeline