:plugin: file
:type: input
:default_codec: plain

///////////////////////////////////////////
START - GENERATED VARIABLES, DO NOT EDIT!
///////////////////////////////////////////
:version: %VERSION%
:release_date: %RELEASE_DATE%
:changelog_url: %CHANGELOG_URL%
:include_path: ../../../../logstash/docs/include
///////////////////////////////////////////
END - GENERATED VARIABLES, DO NOT EDIT!
///////////////////////////////////////////

[id="plugins-{type}s-{plugin}"]

=== File input plugin

include::{include_path}/plugin_header.asciidoc[]

==== Description

Stream events from files, normally by tailing them in a manner
similar to `tail -0F` but optionally reading them from the
beginning.

Normally, logging will add a newline to the end of each line written.
By default, each event is assumed to be one line
and a line is taken to be the text before a newline character.
If you would like to join multiple log lines into one event,
you'll want to use the multiline codec.
The plugin loops between discovering new files and processing
each discovered file. Discovered files have a lifecycle, they start off
in the "watched" or "ignored" state. Other states in the lifecycle are:
"active", "closed" and "unwatched"

By default, a window of 4095 files is used to limit the number of file handles in use.
The processing phase has a number of stages:

* Checks whether "closed" or "ignored" files have changed in size since last time and
if so puts them in the "watched" state.
* Selects enough "watched" files to fill the available space in the window, these files
are made "active".
* The active files are opened and read, each file is read from the last known position
to the end of current content (EOF) by default.

In some cases it is useful to be able to control which files are read first, sorting,
and whether files are read completely or banded/striped.
Complete reading is *all of* file A then file B then file C and so on.
Banded or striped reading is *some of* file A then file B then file C and so on looping around
to file A again until all files are read. Banded reading is specified by changing
<<plugins-{type}s-{plugin}-file_chunk_count>> and perhaps <<plugins-{type}s-{plugin}-file_chunk_size>>.
Banding and sorting may be useful if you want some events from all files to appear
in Kibana as early as possible.

The plugin has two modes of operation, Tail mode and Read mode.

===== Tail mode

In this mode the plugin aims to track changing files and emit new content as it's
appended to each file. In this mode, files are seen as a never ending stream of
content and EOF has no special significance. The plugin always assumes that
there will be more content. When files are rotated, the smaller or zero size is
detected, the current position is reset to zero and streaming continues.
A delimiter must be seen before the accumulated characters can be emitted as a line.

===== Read mode

In this mode the plugin treats each file as if it is content complete, that is,
a finite stream of lines and now EOF is significant. A last delimiter is not
needed because EOF means that the accumulated characters can be emitted as a line.
Further, EOF here means that the file can be closed and put in the "unwatched"
state - this automatically frees up space in the active window. This mode also
makes it possible to process compressed files as they are content complete.
Read mode also allows for an action to take place after processing the file completely.

In the past attempts to simulate a Read mode while still assuming infinite streams
was not ideal and a dedicated Read mode is an improvement.

==== Reading from remote network volumes

The file input is not tested on remote filesystems such as NFS, Samba, s3fs-fuse, etc. These
remote filesystems typically have behaviors that are very different from local filesystems and
are therefore unlikely to work correctly when used with the file input.

==== Tracking of current position in watched files

The plugin keeps track of the current position in each file by
recording it in a separate file named sincedb. This makes it
possible to stop and restart Logstash and have it pick up where it
left off without missing the lines that were added to the file while
Logstash was stopped.

By default, the sincedb file is placed in the data directory of Logstash
with a filename based on the filename patterns being watched (i.e. the `path` option).
Thus, changing the filename patterns will result in a new sincedb file being used and
any existing current position state will be lost. If you change your patterns
with any frequency it might make sense to explicitly choose a sincedb path
with the `sincedb_path` option.

A different `sincedb_path` must be used for each input. Using the same
path will cause issues. The read checkpoints for each input must be
stored in a different path so the information does not override.

Sincedb records can now be expired meaning that read positions of older files
will not be remembered after a certain time period. File systems may need to reuse
inodes for new content. Ideally, we would not use the read position of old content,
but we have no reliable way to detect that inode reuse has occurred. This is more
relevant to Read mode where a great many files are tracked in the sincedb.
Bear in mind though, if a record has expired, a previously seen file will be read again.

Sincedb files are text files with four (< v5.0.0), five or six columns:

. The inode number (or equivalent).
. The major device number of the file system (or equivalent).
. The minor device number of the file system (or equivalent).
. The current byte offset within the file.
. The last active timestamp (a floating point number)
. The last known path that this record was matched to (for
old sincedb records converted to the new format, this is blank.

On non-Windows systems you can obtain the inode number of a file
with e.g. `ls -li`.

==== File rotation in Tail mode

File rotation is detected and handled by this input, regardless of
whether the file is rotated via a rename or a copy operation. To
support programs that write to the rotated file for some time after
the rotation has taken place, include both the original filename and
the rotated filename (e.g. /var/log/syslog and /var/log/syslog.1) in
the filename patterns to watch (the `path` option). Note that the
rotated filename will be treated as a new file so if
`start_position` is set to 'beginning' the rotated file will be
reprocessed.

With the default value of `start_position` ('end') any messages
written to the end of the file between the last read operation prior
to the rotation and its reopening under the new name (an interval
determined by the `stat_interval` and `discover_interval` options)
will not get picked up.

[id="plugins-{type}s-{plugin}-options"]
==== File Input Configuration Options

This plugin supports the following configuration options plus the <<plugins-{type}s-{plugin}-common-options>> described later.

[NOTE]
Duration settings can be specified in text form e.g. "250 ms", this string will be converted into
decimal seconds. There are quite a few supported natural and abbreviated durations,
see <<string_duration,string duration>> for the details.

[cols="<,<,<",options="header",]
|=======================================================================
|Setting |Input type|Required
| <<plugins-{type}s-{plugin}-close_older>> |<<number,number>> or <<string_duration,string duration>>|No
| <<plugins-{type}s-{plugin}-delimiter>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-discover_interval>> |<<number,number>>|No
| <<plugins-{type}s-{plugin}-exclude>> |<<array,array>>|No
| <<plugins-{type}s-{plugin}-file_chunk_count>> |<<number,number>>|No
| <<plugins-{type}s-{plugin}-file_chunk_size>> |<<number,number>>|No
| <<plugins-{type}s-{plugin}-file_completed_action>> |<<string,string>>, one of `["delete", "log", "log_and_delete"]`|No
| <<plugins-{type}s-{plugin}-file_completed_log_path>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-file_sort_by>> |<<string,string>>, one of `["last_modified", "path"]`|No
| <<plugins-{type}s-{plugin}-file_sort_direction>> |<<string,string>>, one of `["asc", "desc"]`|No
| <<plugins-{type}s-{plugin}-ignore_older>> |<<number,number>> or <<string_duration,string duration>>|No
| <<plugins-{type}s-{plugin}-max_open_files>> |<<number,number>>|No
| <<plugins-{type}s-{plugin}-mode>> |<<string,string>>, one of `["tail", "read"]`|No
| <<plugins-{type}s-{plugin}-path>> |<<array,array>>|Yes
| <<plugins-{type}s-{plugin}-sincedb_clean_after>> |<<number,number>> or <<string_duration,string duration>>|No
| <<plugins-{type}s-{plugin}-sincedb_path>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-sincedb_write_interval>> |<<number,number>> or <<string_duration,string duration>>|No
| <<plugins-{type}s-{plugin}-start_position>> |<<string,string>>, one of `["beginning", "end"]`|No
| <<plugins-{type}s-{plugin}-stat_interval>> |<<number,number>> or <<string_duration,string duration>>|No
|=======================================================================

Also see <<plugins-{type}s-{plugin}-common-options>> for a list of options supported by all
input plugins.

&nbsp;

[id="plugins-{type}s-{plugin}-close_older"]
===== `close_older`

  * Value type is <<number,number>> or <<string_duration,string duration>>
  * Default value is `"1 hour"`

The file input closes any files that were last read the specified
duration (seconds if a number is specified) ago.
This has different implications depending on if a file is being tailed or
read. If tailing, and there is a large time gap in incoming data the file
can be closed (allowing other files to be opened) but will be queued for
reopening when new data is detected. If reading, the file will be closed
after closed_older seconds from when the last bytes were read.
This setting is retained for backward compatibility if you upgrade the
plugin to 4.1.0+, are reading not tailing and do not switch to using Read mode.

[id="plugins-{type}s-{plugin}-delimiter"]
===== `delimiter`

  * Value type is <<string,string>>
  * Default value is `"\n"`

set the new line delimiter, defaults to "\n". Note that when reading compressed files
this setting is not used, instead the standard Windows or Unix line endings are used.

[id="plugins-{type}s-{plugin}-discover_interval"]
===== `discover_interval`

  * Value type is <<number,number>>
  * Default value is `15`

How often we expand the filename patterns in the `path` option to discover new files to watch.
This value is a multiple to `stat_interval`, e.g. if `stat_interval` is "500 ms" then new files
files could be discovered every 15 X 500 milliseconds - 7.5 seconds.
In practice, this will be the best case because the time taken to read new content needs to be factored in.

[id="plugins-{type}s-{plugin}-exclude"]
===== `exclude`

  * Value type is <<array,array>>
  * There is no default value for this setting.

Exclusions (matched against the filename, not full path). Filename
patterns are valid here, too. For example, if you have
[source,ruby]
    path => "/var/log/*"

In Tail mode, you might want to exclude gzipped files:
[source,ruby]
    exclude => "*.gz"

[id="plugins-{type}s-{plugin}-file_chunk_count"]
===== `file_chunk_count`

  * Value type is <<number,number>>
  * Default value is `4611686018427387903`

When combined with the `file_chunk_size`, this option sets how many chunks (bands or stripes)
are read from each file before moving to the next active file.
For example, a `file_chunk_count` of 32 and a `file_chunk_size` 32KB will process the next 1MB from each active file.
As the default is very large, the file is effectively read to EOF before moving to the next active file.

[id="plugins-{type}s-{plugin}-file_chunk_size"]
===== `file_chunk_size`

  * Value type is <<number,number>>
  * Default value is `32768` (32KB)

File content is read off disk in blocks or chunks and lines are extracted from the chunk.
See <<plugins-{type}s-{plugin}-file_chunk_count>> to see why and when to change this setting
from the default.

[id="plugins-{type}s-{plugin}-file_completed_action"]
===== `file_completed_action`

  * Value can be any of: `delete`, `log`, `log_and_delete`
  * The default is `delete`.

When in `read` mode, what action should be carried out when a file is done with.
If 'delete' is specified then the file will be deleted. If 'log' is specified
then the full path of the file is logged to the file specified in the
`file_completed_log_path` setting. If `log_and_delete` is specified then
both above actions take place.

[id="plugins-{type}s-{plugin}-file_completed_log_path"]
===== `file_completed_log_path`

  * Value type is <<string,string>>
  * There is no default value for this setting.

Which file should the completely read file paths be appended to. Only specify
this path to a file when `file_completed_action` is 'log' or 'log_and_delete'.
IMPORTANT: this file is appended to only - it could become very large. You are
responsible for file rotation.

[id="plugins-{type}s-{plugin}-file_sort_by"]
===== `file_sort_by`

  * Value can be any of: `last_modified`, `path`
  * The default is `last_modified`.

Which attribute of a "watched" file should be used to sort them by.
Files can be sorted by modified date or full path alphabetic.
Previously the processing order of the discovered and therefore
"watched" files was OS dependent.

[id="plugins-{type}s-{plugin}-file_sort_direction"]
===== `file_sort_direction`

  * Value can be any of: `asc`, `desc`
  * The default is `asc`.

Select between ascending and descending order when sorting "watched" files.
If oldest data first is important then the defaults of `last_modified` + `asc` are good.
If newest data first is more important then opt for `last_modified` + `desc`.
If you use special naming conventions for the file full paths then perhaps
`path` + `asc` will help to control the order of file processing.

[id="plugins-{type}s-{plugin}-ignore_older"]
===== `ignore_older`

  * Value type is <<number,number>> or <<string_duration,string duration>>
  * There is no default value for this setting.

When the file input discovers a file that was last modified
before the specified duration (seconds if a number is specified), the file is ignored.
After it's discovery, if an ignored file is modified it is no
longer ignored and any new data is read. By default, this option is
disabled. Note this unit is in seconds.

[id="plugins-{type}s-{plugin}-max_open_files"]
===== `max_open_files`

  * Value type is <<number,number>>
  * There is no default value for this setting.

What is the maximum number of file_handles that this input consumes
at any one time. Use close_older to close some files if you need to
process more files than this number. This should not be set to the
maximum the OS can do because file handles are needed for other
LS plugins and OS processes.
A default of 4095 is set in internally.

[id="plugins-{type}s-{plugin}-mode"]
===== `mode`

  * Value can be either `tail` or `read`.
  * The default value is `tail`.

What mode do you want the file input to operate in. Tail a few files or
read many content-complete files. Read mode now supports gzip file processing.
If "read" is specified then the following other settings are ignored:

. `start_position` (files are always read from the beginning)
. `close_older` (files are automatically 'closed' when EOF is reached)

If "read" is specified then the following settings are heeded:

. `ignore_older` (older files are not processed)
. `file_completed_action` (what action should be taken when the file is processed)
. `file_completed_log_path` (which file should the completed file path be logged to)

[id="plugins-{type}s-{plugin}-path"]
===== `path`

  * This is a required setting.
  * Value type is <<array,array>>
  * There is no default value for this setting.

The path(s) to the file(s) to use as an input.
You can use filename patterns here, such as `/var/log/*.log`.
If you use a pattern like `/var/log/**/*.log`, a recursive search
of `/var/log` will be done for all `*.log` files.
Paths must be absolute and cannot be relative.

You may also configure multiple paths. See an example
on the {logstash-ref}/configuration-file-structure.html#array[Logstash configuration page].

[id="plugins-{type}s-{plugin}-sincedb_clean_after"]
===== `sincedb_clean_after`

  * Value type is <<number,number>> or <<string_duration,string duration>>
  * The default value for this setting is "2 weeks".
  * If a number is specified then it is interpreted as *days* and can be decimal e.g. 0.5 is 12 hours.

The sincedb record now has a last active timestamp associated with it.
If no changes are detected in a tracked file in the last N days its sincedb
tracking record expires and will not be persisted.
This option helps protect against the inode recycling problem.
Filebeat has a {filebeat-ref}/faq.html#inode-reuse-issue[FAQ about inode recycling].

[id="plugins-{type}s-{plugin}-sincedb_path"]
===== `sincedb_path`

  * Value type is <<string,string>>
  * There is no default value for this setting.

Path of the sincedb database file (keeps track of the current
position of monitored log files) that will be written to disk.
The default will write sincedb files to `<path.data>/plugins/inputs/file`
NOTE: it must be a file path and not a directory path

[id="plugins-{type}s-{plugin}-sincedb_write_interval"]
===== `sincedb_write_interval`

  * Value type is <<number,number>> or <<string_duration,string duration>>
  * Default value is `"15 seconds"`

How often (in seconds) to write a since database with the current position of
monitored log files.

[id="plugins-{type}s-{plugin}-start_position"]
===== `start_position`

  * Value can be any of: `beginning`, `end`
  * Default value is `"end"`

Choose where Logstash starts initially reading files: at the beginning or
at the end. The default behavior treats files like live streams and thus
starts at the end. If you have old data you want to import, set this
to 'beginning'.

This option only modifies "first contact" situations where a file
is new and not seen before, i.e. files that don't have a current
position recorded in a sincedb file read by Logstash. If a file
has already been seen before, this option has no effect and the
position recorded in the sincedb file will be used.

[id="plugins-{type}s-{plugin}-stat_interval"]
===== `stat_interval`

  * Value type is <<number,number>> or <<string_duration,string duration>>
  * Default value is `"1 second"`

How often (in seconds) we stat files to see if they have been modified.
Increasing this interval will decrease the number of system calls we make,
but increase the time to detect new log lines.
[NOTE]
Discovering new files and checking whether they have grown/or shrunk occurs in a loop.
This loop will sleep for `stat_interval` seconds before looping again. However, if files
have grown, the new content is read and lines are enqueued.
Reading and enqueuing across all grown files can take time, especially if
the pipeline is congested. So the overall loop time is a combination of the
`stat_interval` and the file read time.

[id="plugins-{type}s-{plugin}-common-options"]
include::{include_path}/{type}.asciidoc[]

:default_codec!:

[id="string_duration"]
// Move this to the includes when we make string durations available generally.
==== String Durations

Format is `number` `string` and the space between these is optional.
So "45s" and "45 s" are both valid.
[TIP]
Use the most suitable duration, for example, "3 days" rather than "72 hours".

===== Weeks
Supported values: `w` `week` `weeks`, e.g. "2 w", "1 week", "4 weeks".

===== Days
Supported values: `d` `day` `days`, e.g. "2 d", "1 day", "2.5 days".

===== Hours
Supported values: `h` `hour` `hours`, e.g. "4 h", "1 hour", "0.5 hours".

===== Minutes
Supported values: `m` `min` `minute` `minutes`, e.g. "45 m", "35 min", "1 minute", "6 minutes".

===== Seconds
Supported values: `s` `sec` `second` `seconds`, e.g. "45 s", "15 sec", "1 second", "2.5 seconds".

===== Milliseconds
Supported values: `ms` `msec` `msecs`, e.g. "500 ms", "750 msec", "50 msecs
[NOTE]
`milli` `millis` and `milliseconds` are not supported

===== Microseconds
Supported values: `us` `usec` `usecs`, e.g. "600 us", "800 usec", "900 usecs"
[NOTE]
`micro` `micros` and `microseconds` are not supported