# Grok Parser for Fluentd [![Build Status](https://travis-ci.org/fluent/fluent-plugin-grok-parser.svg?branch=master)](https://travis-ci.org/fluent/fluent-plugin-grok-parser) This is a Fluentd plugin to enable Logstash's Grok-like parsing logic. ## Requirements | fluent-plugin-grok-parser | fluentd | ruby | |---------------------------|------------|--------| | >= 1.0.0 | >= v0.14.0 | >= 2.1 | | < 1.0.0 | >= v0.12.0 | >= 1.9 | ## What's Grok? Grok is a macro to simplify and reuse regexes, originally developed by [Jordan Sissel](http://github.com/jordansissel). This is a partial implementation of Grok's grammer that should meet most of the needs. ## How It Works You can use it wherever you used the `format` parameter to parse texts. In the following example, it extracts the first IP address that matches in the log. ```aconf @type tail path /path/to/log tag grokked_log @type grok grok_pattern %{IP:ip_address} ``` You can also use Fluentd v0.12 style: ```aconf @type tail path /path/to/log tag grokked_log format grok grok_pattern %{IP:ip_address} ``` **If you want to try multiple grok patterns and use the first matched one**, you can use the following syntax: ```aconf @type tail path /path/to/log tag grokked_log @type grok pattern %{COMBINEDAPACHELOG} time_format "%d/%b/%Y:%H:%M:%S %z" pattern %{IP:ip_address} pattern %{GREEDYDATA:message} ``` You can also use Fluentd v0.12 style: ```aconf @type tail path /path/to/log tag grokked_log format grok pattern %{COMBINEDAPACHELOG} time_format "%d/%b/%Y:%H:%M:%S %z" pattern %{IP:ip_address} pattern %{GREEDYDATA:message} ``` ### Multiline support You can parse multiple line text. ```aconf @type tail path /path/to/log tag grokked_log @type multiline_grok grok_pattern %{IP:ip_address}%{GREEDYDATA:message} multiline_start_regexp /^[^\s]/ ``` You can also use Fluentd v0.12 style: ```aconf @type tail path /path/to/log format multiline_grok grok_pattern %{IP:ip_address}%{GREEDYDATA:message} multiline_start_regexp /^[^\s]/ tag grokked_log ``` You can use multiple grok patterns to parse your data. ```aconf @type tail path /path/to/log tag grokked_log @type multiline_grok pattern Started %{WORD:verb} "%{URIPATH:pathinfo}" for %{IP:ip} at %{TIMESTAMP_ISO8601:timestamp}\nProcessing by %{WORD:controller}#%{WORD:action} as %{WORD:format}%{DATA:message}Completed %{NUMBER:response} %{WORD} in %{NUMBER:elapsed} (%{DATA:elapsed_details}) ``` You can also use Fluentd v0.12 style: ```aconf @type tail path /path/to/log format multiline_grok pattern Started %{WORD:verb} "%{URIPATH:pathinfo}" for %{IP:ip} at %{TIMESTAMP_ISO8601:timestamp}\nProcessing by %{WORD:controller}#%{WORD:action} as %{WORD:format}%{DATA:message}Completed %{NUMBER:response} %{WORD} in %{NUMBER:elapsed} (%{DATA:elapsed_details}) tag grokked_log ``` Fluentd accumulates data in the buffer forever to parse complete data when no pattern matches. You can use this parser without `multiline_start_regexp` when you know your data structure perfectly. ## Configurations **time_format** The format of the time field. **grok_pattern** The pattern of grok. You cannot specify multiple grok pattern with this. **custom_pattern_path** Path to the file that includes custom grok patterns **grok_failure_key** The key has grok failure reason. Default is `nil`. ```aconf @type dummy @label @dummy dummy [ { "message1": "no grok pattern matched!", "prog": "foo" }, { "message1": "/", "prog": "bar" } ] tag dummy.log ``` This generates following events: ``` 2016-11-28 13:07:08.009131727 +0900 dummy.log: {"message1":"no grok pattern matched!","prog":"foo","message":"no grok pattern matched!","grokfailure":"No grok pattern matched"} 2016-11-28 13:07:09.010400923 +0900 dummy.log: {"message1":"/","prog":"bar","path":"/"} ``` **grok/pattern** Section for grok patterns. You can use multiple grok patterns with multiple `` sections. ```aconf pattern %{IP:ipaddress} ``` **multiline_start_regexp** The regexp to match beginning of multiline. This is only for "multiline_grok". ## How to write Grok patterns Grok patterns look like `%{PATTERN_NAME:name}` where ":name" is optional. If "name" is provided, then it becomes a named capture. So, for example, if you have the grok pattern ``` %{IP} %{HOST:host} ``` it matches ``` 127.0.0.1 foo.example ``` but only extracts "foo.example" as {"host": "foo.example"} Please see `patterns/*` for the patterns that are supported out of the box. ## How to add your own Grok pattern You can add your own Grok patterns by creating your own Grok file and telling the plugin to read it. This is what the `custom_pattern_path` parameter is for. ```aconf @type tail path /path/to/log @type grok grok_pattern %{MY_SUPER_PATTERN} custom_pattern_path /path/to/my_pattern ``` `custom_pattern_path` can be either a directory or file. If it's a directory, it reads all the files in it. ## FAQs ### 1. How can I convert types of the matched patterns like Logstash's Grok? Although every parsed field has type `string` by default, you can specify other types. This is useful when filtering particular fields numerically or storing data with sensible type information. The syntax is ``` grok_pattern %{GROK_PATTERN:NAME:TYPE}... ``` e.g., ``` grok_pattern %{INT:foo:integer} ``` Unspecified fields are parsed at the default string type. The list of supported types are shown below: * `string` * `bool` * `integer` ("int" would NOT work!) * `float` * `time` * `array` For the `time` and `array` types, there is an optional 4th field after the type name. For the "time" type, you can specify a time format like you would in `time_format`. For the "array" type, the third field specifies the delimiter (the default is ","). For example, if a field called "item\_ids" contains the value "3,4,5", `types item_ids:array` parses it as ["3", "4", "5"]. Alternatively, if the value is "Adam|Alice|Bob", `types item_ids:array:|` parses it as ["Adam", "Alice", "Bob"]. Here is a sample config using the Grok parser with `in_tail` and the `types` parameter: ```aconf @type tail path /path/to/log format grok grok_pattern %{INT:user_id:integer} paid %{NUMBER:paid_amount:float} tag payment ``` ## Notice If you want to use this plugin with Fluentd v0.12.x or earlier, you can use this plugin version v1.0.0. See also: [Plugin Management | Fluentd](http://docs.fluentd.org/articles/plugin-management#ldquomdashgemfilerdquo-option) ## License Apache 2.0 License