:plugin: elasticsearch
:type: input
:default_codec: json

///////////////////////////////////////////
START - GENERATED VARIABLES, DO NOT EDIT!
///////////////////////////////////////////
:version: %VERSION%
:release_date: %RELEASE_DATE%
:changelog_url: %CHANGELOG_URL%
:include_path: ../../../../logstash/docs/include
///////////////////////////////////////////
END - GENERATED VARIABLES, DO NOT EDIT!
///////////////////////////////////////////

[id="plugins-{type}s-{plugin}"]

=== Elasticsearch input plugin

include::{include_path}/plugin_header.asciidoc[]

==== Description

Read from an Elasticsearch cluster, based on search query results.
This is useful for replaying test logs, reindexing, etc.
You can periodically schedule ingestion using a cron syntax 
(see `schedule` setting) or run the query one time to load
data into Logstash.

Example:
[source,ruby]
    input {
      # Read all documents from Elasticsearch matching the given query
      elasticsearch {
        hosts => "localhost"
        query => '{ "query": { "match": { "statuscode": 200 } }, "sort": [ "_doc" ] }'
      }
    }

This would create an Elasticsearch query with the following format:
[source,json]
    curl 'http://localhost:9200/logstash-*/_search?&scroll=1m&size=1000' -d '{
      "query": {
        "match": {
          "statuscode": 200
        }
      },
      "sort": [ "_doc" ]
    }'


==== Scheduling

Input from this plugin can be scheduled to run periodically according to a specific
schedule. This scheduling syntax is powered by https://github.com/jmettraux/rufus-scheduler[rufus-scheduler].
The syntax is cron-like with some extensions specific to Rufus (e.g. timezone support ).

Examples:

|==========================================================
| `* 5 * 1-3 *`               | will execute every minute of 5am every day of January through March.
| `0 * * * *`                 | will execute on the 0th minute of every hour every day.
| `0 6 * * * America/Chicago` | will execute at 6:00am (UTC/GMT -5) every day.
|==========================================================


Further documentation describing this syntax can be found
https://github.com/jmettraux/rufus-scheduler#parsing-cronlines-and-time-strings[here].


[id="plugins-{type}s-{plugin}-auth"]
==== Authentication

Authentication to a secure Elasticsearch cluster is possible using _one_ of the following options:

* <<plugins-{type}s-{plugin}-user>> AND <<plugins-{type}s-{plugin}-password>>
* <<plugins-{type}s-{plugin}-cloud_auth>>
* <<plugins-{type}s-{plugin}-api_key>>


[id="plugins-{type}s-{plugin}-options"]
==== Elasticsearch Input Configuration Options

This plugin supports the following configuration options plus the <<plugins-{type}s-{plugin}-common-options>> described later.

[cols="<,<,<",options="header",]
|=======================================================================
|Setting |Input type|Required
| <<plugins-{type}s-{plugin}-api_key>> |<<password,password>>|No
| <<plugins-{type}s-{plugin}-ca_file>> |a valid filesystem path|No
| <<plugins-{type}s-{plugin}-cloud_auth>> |<<password,password>>|No
| <<plugins-{type}s-{plugin}-cloud_id>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-connect_timeout_seconds>> | <<number,number>>|No
| <<plugins-{type}s-{plugin}-docinfo>> |<<boolean,boolean>>|No
| <<plugins-{type}s-{plugin}-docinfo_fields>> |<<array,array>>|No
| <<plugins-{type}s-{plugin}-docinfo_target>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-hosts>> |<<array,array>>|No
| <<plugins-{type}s-{plugin}-index>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-password>> |<<password,password>>|No
| <<plugins-{type}s-{plugin}-proxy>> |<<uri,uri>>|No
| <<plugins-{type}s-{plugin}-query>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-request_timeout_seconds>> | <<number,number>>|No
| <<plugins-{type}s-{plugin}-schedule>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-scroll>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-size>> |<<number,number>>|No
| <<plugins-{type}s-{plugin}-slices>> |<<number,number>>|No
| <<plugins-{type}s-{plugin}-ssl>> |<<boolean,boolean>>|No
| <<plugins-{type}s-{plugin}-socket_timeout_seconds>> | <<number,number>>|No
| <<plugins-{type}s-{plugin}-user>> |<<string,string>>|No
|=======================================================================

Also see <<plugins-{type}s-{plugin}-common-options>> for a list of options supported by all
input plugins.

&nbsp;

[id="plugins-{type}s-{plugin}-api_key"]
===== `api_key`

  * Value type is <<password,password>>
  * There is no default value for this setting.

Authenticate using Elasticsearch API key. Note that this option also requires enabling the `ssl` option.

Format is `id:api_key` where `id` and `api_key` are as returned by the Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html[Create API key API].

[id="plugins-{type}s-{plugin}-ca_file"]
===== `ca_file` 

  * Value type is <<path,path>>
  * There is no default value for this setting.

SSL Certificate Authority file in PEM encoded format, must also include any chain certificates as necessary.

[id="plugins-{type}s-{plugin}-cloud_auth"]
===== `cloud_auth`

  * Value type is <<password,password>>
  * There is no default value for this setting.

Cloud authentication string ("<username>:<password>" format) is an alternative for the `user`/`password` pair.

For more info, check out the https://www.elastic.co/guide/en/logstash/current/connecting-to-cloud.html#_cloud_auth[Logstash-to-Cloud documentation]

[id="plugins-{type}s-{plugin}-cloud_id"]
===== `cloud_id`

  * Value type is <<string,string>>
  * There is no default value for this setting.

Cloud ID, from the Elastic Cloud web console. If set `hosts` should not be used.

For more info, check out the https://www.elastic.co/guide/en/logstash/current/connecting-to-cloud.html#_cloud_id[Logstash-to-Cloud documentation]

[id="plugins-{type}s-{plugin}-connect_timeout_seconds"]
===== `connect_timeout_seconds`

  * Value type is <<number,number>>
  * Default value is `10`

The maximum amount of time, in seconds, to wait while establishing a connection to Elasticsearch.
Connect timeouts tend to occur when Elasticsearch or an intermediate proxy is overloaded with requests and has exhausted its connection pool.

[id="plugins-{type}s-{plugin}-docinfo"]
===== `docinfo` 

  * Value type is <<boolean,boolean>>
  * Default value is `false`

If set, include Elasticsearch document information such as index, type, and
the id in the event.

It might be important to note, with regards to metadata, that if you're
ingesting documents with the intent to re-index them (or just update them)
that the `action` option in the elasticsearch output wants to know how to
handle those things. It can be dynamically assigned with a field
added to the metadata.

Example
[source, ruby]
    input {
      elasticsearch {
        hosts => "es.production.mysite.org"
        index => "mydata-2018.09.*"
        query => '{ "query": { "query_string": { "query": "*" } } }'
        size => 500
        scroll => "5m"
        docinfo => true
      }
    }
    output {
      elasticsearch {
        index => "copy-of-production.%{[@metadata][_index]}"
        document_type => "%{[@metadata][_type]}"
        document_id => "%{[@metadata][_id]}"
      }
    }

If set, you can use metadata information in the <<plugins-{type}s-{plugin}-add_field>> common option.

Example
[source, ruby]
    input {
      elasticsearch {
        docinfo => true
        add_field => {
          identifier => %{[@metadata][_index]}:%{[@metadata][_type]}:%{[@metadata][_id]}"
        }
      }
    }


[id="plugins-{type}s-{plugin}-docinfo_fields"]
===== `docinfo_fields` 

  * Value type is <<array,array>>
  * Default value is `["_index", "_type", "_id"]`

If document metadata storage is requested by enabling the `docinfo` option, this
option lists the metadata fields to save in the current event. See
{ref}/mapping-fields.html[Meta-Fields] in the Elasticsearch documentation for
more information.

[id="plugins-{type}s-{plugin}-docinfo_target"]
===== `docinfo_target` 

  * Value type is <<string,string>>
  * Default value is `"@metadata"`

If document metadata storage is requested by enabling the `docinfo`
option, this option names the field under which to store the metadata
fields as subfields.

[id="plugins-{type}s-{plugin}-hosts"]
===== `hosts` 

  * Value type is <<array,array>>
  * There is no default value for this setting.

List of one or more Elasticsearch hosts to use for querying. Each host
can be either IP, HOST, IP:port, or HOST:port. The port defaults to
9200.

[id="plugins-{type}s-{plugin}-index"]
===== `index` 

  * Value type is <<string,string>>
  * Default value is `"logstash-*"`

The index or alias to search. See
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html[Multi Indices documentation]
in the Elasticsearch documentation for more information on how to reference
multiple indices.


[id="plugins-{type}s-{plugin}-password"]
===== `password` 

  * Value type is <<password,password>>
  * There is no default value for this setting.

The password to use together with the username in the `user` option
when authenticating to the Elasticsearch server. If set to an empty
string authentication will be disabled.

[id="plugins-{type}s-{plugin}-proxy"]
===== `proxy`

* Value type is <<uri,uri>>
* There is no default value for this setting.

Set the address of a forward HTTP proxy.
An empty string is treated as if proxy was not set, this is useful when using
environment variables e.g. `proxy => '${LS_PROXY:}'`.

[id="plugins-{type}s-{plugin}-query"]
===== `query` 

  * Value type is <<string,string>>
  * Default value is `'{ "sort": [ "_doc" ] }'`

The query to be executed. Read the
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html[Elasticsearch query DSL documentation]
for more information.

[id="plugins-{type}s-{plugin}-request_timeout_seconds"]
===== `request_timeout_seconds`

  * Value type is <<number,number>>
  * Default value is `60`

The maximum amount of time, in seconds, for a single request to Elasticsearch.
Request timeouts tend to occur when an individual page of data is very large, such as when it contains large-payload
documents and/or the <<plugins-{type}s-{plugin}-size>> has been specified as a large value.

[id="plugins-{type}s-{plugin}-schedule"]
===== `schedule` 

  * Value type is <<string,string>>
  * There is no default value for this setting.

Schedule of when to periodically run statement, in Cron format
for example: "* * * * *" (execute query every minute, on the minute)

There is no schedule by default. If no schedule is given, then the statement is run
exactly once.

[id="plugins-{type}s-{plugin}-scroll"]
===== `scroll` 

  * Value type is <<string,string>>
  * Default value is `"1m"`

This parameter controls the keepalive time in seconds of the scrolling
request and initiates the scrolling process. The timeout applies per
round trip (i.e. between the previous scroll request, to the next).

[id="plugins-{type}s-{plugin}-size"]
===== `size` 

  * Value type is <<number,number>>
  * Default value is `1000`

This allows you to set the maximum number of hits returned per scroll.

[id="plugins-{type}s-{plugin}-slices"]
===== `slices`

  * Value type is <<number,number>>
  * There is no default value.
  * Sensible values range from 2 to about 8.

In some cases, it is possible to improve overall throughput by consuming multiple
distinct slices of a query simultaneously using
https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#slice-scroll[sliced scrolls],
especially if the pipeline is spending significant time waiting on Elasticsearch
to provide results.

If set, the `slices` parameter tells the plugin how many slices to divide the work
into, and will produce events from the slices in parallel until all of them are done
scrolling.

NOTE: The Elasticsearch manual indicates that there can be _negative_ performance
      implications to both the query and the Elasticsearch cluster when a scrolling
      query uses more slices than shards in the index.

If the `slices` parameter is left unset, the plugin will _not_ inject slice
instructions into the query.

[id="plugins-{type}s-{plugin}-ssl"]
===== `ssl` 

  * Value type is <<boolean,boolean>>
  * Default value is `false`

If enabled, SSL will be used when communicating with the Elasticsearch
server (i.e. HTTPS will be used instead of plain HTTP).

[id="plugins-{type}s-{plugin}-socket_timeout_seconds"]
===== `socket_timeout_seconds`

  * Value type is <<number,number>>
  * Default value is `60`

The maximum amount of time, in seconds, to wait on an incomplete response from Elasticsearch while no additional data has been appended.
Socket timeouts usually occur while waiting for the first byte of a response, such as when executing a particularly complex query.

[id="plugins-{type}s-{plugin}-user"]
===== `user` 

  * Value type is <<string,string>>
  * There is no default value for this setting.

The username to use together with the password in the `password`
option when authenticating to the Elasticsearch server. If set to an
empty string authentication will be disabled.


[id="plugins-{type}s-{plugin}-common-options"]
include::{include_path}/{type}.asciidoc[]

:default_codec!: