# Splunk input plugin for Embulk A simple plug-in to run a once-off Splunk query and emit the results. This plugin uses Splunks `table` command to effeciently and flexibly return results. If you want more flexibility, you can add `_raw` as a table field and then use filter plugins such as [embulk-filter-expand_json](https://github.com/civitaspo/embulk-filter-expand_json) or [embulk-filter-add_time](https://github.com/treasure-data/embulk-filter-add_time) to convert the json column to typed columns. [Rename filter](http://www.embulk.org/docs/built-in.html#rename-filter-plugin) is also useful to rename the typed columns. # _time and this plugin This plugin expects and requires `_time`. If you do not include time in your list of columns, this plugin will automatically add it. It is possible to rename or reformat `_time` in the query in a such a way that this plugin will fail or have unexpected results. It is recommended you do not alter the `_time` in the query unless you know what you're doing. If you need to do something esoteric with `_time`, create another field to work with in your Splunk query. In addition, as a column we treat `_time` as a String, but only because we couldn't get the plugin to work with timestamps. We'd welcome a pull request to fix this issue. ## Overview - **Plugin type**: input - **Resume supported**: yes - **Cleanup supported**: no - **Guess supported**: no ## Configuration - **type**: splunk - **scheme**: HTTP scheme for using the Splunk API (string, default: https) - **host**: host of your splunk server (string, required) - **username**: splunk username (string, required) - **password**: splunk password (string, required) - **port**: splunk API port (integer, default: 8089) - **max_results**: API flag to limit results returned. Set to zero for theoretical no limit. However, Splunk server config will generally limit this to 50,000. Setting this to non-zero value will cause the plugin to keep fetching results in `max_results` batches (pagination) (integer, default: 50000) - **query**: the query you wish to run. It should be prefixed with "search" (string required) - **earliest_time**: the earliest time for the splunk search. (string, default: nil, which is unbounded) - **latest_time**: the latest time for the splunk search. (string, default: nil, which is unbounded) - **incremental**: whether to resume next search from last result time (boolean, default: false) - **table** array of columns to include in the results (array, default: []) ### Earliest and latest times Splunk's required data format is `%Y-%m-%dT%H:%M:%S.%L%:z` which is the required format for `earliest_time` and `latest_time`. In addition, Splunk relative time operations are also accepted, such as -1d@d. For more information, see the [Splunk documentation](https://docs.splunk.com/Documentation/Splunk/7.0.2/SearchReference/SearchTimeModifiers) ### Incremental loads Incremental support is basic. The logic is: - always rely on `_time` field in Splunk - determine latest `_time` in search - use latest `_time` as `earliest_time` in next run ### Number of returned results The default Splunk API limits resuts to 100. In this plugin, the limit is not set, so it is possible to generate very large result sets. To limit the number of results, use the `head` or `tail` command in your query. ## Examples Remember the queries much be prefixed with the `search` command or they are unlikely not to work. See examples below. ### Unbounded time range ```yaml in: type: splunk host: splunk.example.com username: splunk_user password: abc123 port: 8089 query: search index="main" table: # We treat time as a string, only because we can't get timestamp + format to work - {name: "_time", type: "string"} ``` ### Relative time range ```yaml in: type: splunk host: splunk.example.com username: splunk_user password: abc123 port: 8089 query: search index="main" earliest_time: -1m@m table: - {name: "_time", type: "string"} - {name: "foo", type: "string"} - {name: "bar", type: "long"} ``` ### Absolute time range ```yaml in: type: splunk host: splunk.example.com username: splunk_user password: abc123 port: 8089 query: search index="main" earliest_time: 2017-01-18T19:23:08.237+11:00 latest_time: 2018-01-18T19:23:08.237+11:00 table: - {name: "_time", type: "string"} - {name: "foo", type: "string"} - {name: "bar", type: "long"} ``` ### Max results The query below assumes to return 100 rows, but the max_results is set to 100. This will cause the plugin to loop 10 times, returning 10 results each time. In the end, you will receive the full 100 events. ```yaml in: type: splunk host: splunk.example.com username: splunk_user password: abc123 port: 8089 max_results: 10 query: search index="main" | head 100 table: - {name: "_time", type: "string"} - {name: "foo", type: "string"} - {name: "bar", type: "long"} ``` ### Complex Searches For those unfamiliar with YAML, `>` or `|` indicates a multiline string. In Splunk the pipe operator is also used for creating multi-step processing. For non-trivial Splunk queries, you should leverage the YAML pipe or > alongside Splunk pipes for easier to read queries. ```yaml in: type: splunk host: splunk.example.com username: splunk_user password: abc123 port: 8089 query: | search index="main" | eval foo=bar | where like(bar, "%baz%") | head 100 earliest_time: 2017-01-18T19:23:08.237+11:00 latest_time: 2018-01-18T19:23:08.237+11:00 table: - {name: "_time", type: "string"} - {name: "foo", type: "string"} # Uses foo from the above query ``` Or with the greater than symbol: ```yaml in: type: splunk host: splunk.example.com username: splunk_user password: abc123 port: 8089 query: > search index="main" | eval foo=bar | where like(bar, "%baz%") | head 100 earliest_time: 2017-01-18T19:23:08.237+11:00 latest_time: 2018-01-18T19:23:08.237+11:00 table: - {name: "_time", type: "string"} - {name: "foo", type: "string"} # Uses foo from the above query ``` ## Build ``` $ rake ```