README.md in fluent-plugin-viaq_data_model-0.0.5 vs README.md in fluent-plugin-viaq_data_model-0.0.6

- old
+ new

@@ -27,10 +27,35 @@
 
 You cannot set the `@timestamp` field in a Fluentd `record_transformer` filter.
 The plugin allows you to use some other field e.g. `time` and have that "moved"
 to a top level field called `@timestamp`.
 
+* Converts systemd and json-file logs to ViaQ data model format
+
+Doing this conversion in a `record_transformer` with embedded ruby code is very
+resource intensive.  The ViaQ plugin can convert common input formats such as
+Kubernetes `json-file`, `/var/log/messages`, and systemd `journald` into their
+corresponding ViaQ `_default_`, `systemd`, `kubernetes`, and
+`pipeline_metadata` namespaced fields.  The `pipeline_metadata` will be added
+to all records, regardless of tag.  Use the `pipeline_type` parameter to
+specify which part of the pipeline this is, `collector` or `normalizer`.
+The ViaQ data model conversion will only be applied to matching `tag`s
+specified in a `formatter` section.
+
+* Creates Elasticsearch index names or prefixes
+
+You can create either a full Elasticsearch index name for the record (to be
+used with the `fluent-plugin-elasticsearch` `target_index_key` parameter), or
+create an index name prefix (missing the date/timestamp part of the index
+name - to be used with `logstash_prefix_key`).  In order to use this, create an
+`elasticsearch_index_name` section, and specify the `tag` to match, and the
+`name_type` type of index name to create.  By default, a prefix name will be
+stored in the `viaq_index_prefix` field in the record, and a full name will be
+stored in the `viaq_index_name` field.  Configure
+`elasticsearch_index_name_field` or `elasticsearch_index_prefix_field` to use a
+different field name.
+
 ## Configuration
 
 NOTE: All fields are Optional - no required fields.
 
 See `filter-viaq_data_model.conf` for an example filter configuration.
@@ -68,11 +93,54 @@
   * NOTE: This field must be present in the `default_keep_fields` or
   `extra_keep_fields` if `use_undefined true`
 * `dest_time_name` - string - default `@timestamp`
   * This is the name of the top level field to hold the time value.  The value
   is taken from the value of the `src_time_name` field.
+* `formatter` - a formatter for a well known common data model source
+  * `type` - one of the well known sources
+    * `sys_journal` - a record read from the systemd journal
+    * `k8s_journal` - a Kubernetes container record read from the systemd
+      journal - should have `CONTAINER_NAME`, `CONTAINER_ID_FULL`
+    * `sys_var_log` - a record read from `/var/log/messages`
+    * `k8s_json_file` - a record read from a `/var/log/containers/*.log` JSON
+      formatted container log file
+    * `tag` - the Fluentd tag pattern to match for these records
+    * `remove_keys` - comma delimited list of keys to remove from the record
+* `pipeline_type` - which part of the pipeline is this? `collector` or
+  `normalizer` - the default is `collector`
+* `elasticsearch_index_name` - how to construct Elasticsearch index names or
+  prefixes for given tags
+  * `tag` - the Fluentd tag pattern to match for these records
+  * `name_type` - the well known type of index name or prefix to create -
+    `operations_full, project_full, operations_prefix, project_prefix` - The
+    `operations_*` types will create a name like `.operations`, and the
+    `project_*` types will create a name like
+    `project.record['kubernetes']['namespace_name'].record['kubernetes']['namespace_id']`.
+    When using the `full` types, a delimiter `.` followed by the date in
+    `YYYY.MM.DD` format will be added to the string to make a full index name.
+    When using the `prefix` types, it is assumed that the
+    `fluent-plugin-elaticsearch` is used with the `logstash_prefix_key` to
+    create the full index name.
+* `elasticsearch_index_name_field` - name of the field in the record which stores
+  the index name - you should remove this field in the elasticsearch output
+  plugin using the `remove_keys` config parameter - default is `viaq_idnex_name`
+* `elasticsearch_index_prefix_field` - name of the field in the record which stores
+  the index prefix - you should remove this field in the elasticsearch output
+  plugin using the `remove_keys` config parameter - default is `viaq_idnex_prefix`
 
+**NOTE** The `formatter` blocks are matched in the given order in the file.
+  This means, don't use `tag "**"` as the first formatter or none of your
+  others will be matched or evaulated.
+
+**NOTE** The `elasticsearch_index_name` processing is done *last*, *after* the
+  formatting, removal of empty fields, `@timestamp` creation, etc., so use
+  e.g. `record['systemd']['t']['GID']` instead of `record['_GID']`
+
+**NOTE** The `elasticsearch_index_name` blocks are matched in the given order
+  in the file.  This means, don't use `tag "**"` as the first formatter or none
+  of your others will be matched or evaulated.
+
 ## Example
 
 If the input record looks like this:
 
     {
@@ -101,9 +169,98 @@
         "k": False,
       },
       "@timestamp": "2017-02-13 15:30:10.259106596-07:00"
     }
 
+## Formatter example
+
+Given a record like the following with a tag of `journal.system`
+
+    __REALTIME_TIMESTAMP=1502228121310282
+    __MONOTONIC_TIMESTAMP=722903835100
+    _BOOT_ID=d85e8a9d524c4a419bcfb6598db78524
+    _TRANSPORT=syslog
+    PRIORITY=6
+    SYSLOG_FACILITY=3
+    SYSLOG_IDENTIFIER=dnsmasq-dhcp
+    SYSLOG_PID=2289
+    _PID=2289
+    _UID=99
+    _GID=40
+    _COMM=dnsmasq
+    _EXE=/usr/sbin/dnsmasq
+    _CMDLINE=/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
+    _CAP_EFFECTIVE=3400
+    _SYSTEMD_CGROUP=/system.slice/libvirtd.service
+    MESSASGE=my message
+
+Using a configuration like this:
+
+    <formatter>
+      tag "journal.system**"
+      type sys_journal
+      remove_keys log,stream,MESSAGE,_SOURCE_REALTIME_TIMESTAMP,__REALTIME_TIMESTAMP,CONTAINER_ID,CONTAINER_ID_FULL,CONTAINER_NAME,PRIORITY,_BOOT_ID,_CAP_EFFECTIVE,_CMDLINE,_COMM,_EXE,_GID,_HOSTNAME,_MACHINE_ID,_PID,_SELINUX_CONTEXT,_SYSTEMD_CGROUP,_SYSTEMD_SLICE,_SYSTEMD_UNIT,_TRANSPORT,_UID,_AUDIT_LOGINUID,_AUDIT_SESSION,_SYSTEMD_OWNER_UID,_SYSTEMD_SESSION,_SYSTEMD_USER_UNIT,CODE_FILE,CODE_FUNCTION,CODE_LINE,ERRNO,MESSAGE_ID,RESULT,UNIT,_KERNEL_DEVICE,_KERNEL_SUBSYSTEM,_UDEV_SYSNAME,_UDEV_DEVNODE,_UDEV_DEVLINK,SYSLOG_FACILITY,SYSLOG_IDENTIFIER,SYSLOG_PID
+    </formatter>
+
+The resulting record will look like this:
+
+    {
+    "systemd": {
+      "t": {
+        "BOOT_ID":"d85e8a9d524c4a419bcfb6598db78524",
+        "GID":40,
+        ...
+      },
+      "u": {
+        "SYSLOG_FACILITY":3,
+        "SYSLOG_IDENTIFIER":"dnsmasq-dhcp",
+        ...
+      },
+    "message":"my message",
+    ...
+    }
+
+## Elasticsearch index name example
+
+Given a configuration like this:
+
+    <elasticsearch_index_name>
+      tag "journal.system** system.var.log** **_default_** **_openshift_** **_openshift-infra_** mux.ops"
+      name_type operations_full
+    </elasticsearch_index_name>
+    <elasticsearch_index_name>
+      tag "**"
+      name_type project_full
+    </elasticsearch_index_name>
+    elasticsearch_index_field viaq_index_name
+
+A record with tag `journal.system` like this:
+
+    {
+      "@timestamp":"2017-07-27T17:27:46.216527+00:00"
+    }
+
+will end up looking like this:
+
+    {
+      "@timestamp":"2017-07-27T17:27:46.216527+00:00",
+      "viaq_index_name":".operations.2017.07.07"
+    }
+
+A record with tag `kubernetes.journal.container` like this:
+
+    {
+      "@timestamp":"2017-07-27T17:27:46.216527+00:00",
+      "kubernetes":{"namespace_name":"myproject","namespace_id":"000000"}
+    }
+
+will end up looking like this:
+
+    {
+      "@timestamp":"2017-07-27T17:27:46.216527+00:00",
+      "kubernetes":{"namespace_name":"myproject","namespace_id":"000000"}
+      "viaq_index_name":"project.myproject.000000.2017.07.07"
+    }
 
 ## Installation
 
     gem install fluent-plugin-viaq_data_model