= Amazon S3 output plugin for {Fluentd}[http://github.com/fluent/fluentd] == Overview *s3* output plugin buffers event logs in local file and upload it to S3 periodically. This plugin splits files exactly by using the time of event logs (not the time when the logs are received). For example, a log '2011-01-02 message B' is reached, and then another log '2011-01-03 message B' is reached in this order, the former one is stored in "20110102.gz" file, and latter one in "20110103.gz" file. == Installation Simply use RubyGems: gem install fluent-plugin-s3 == Configuration type s3 aws_key_id YOUR_AWS_KEY_ID aws_sec_key YOUR_AWS_SECRET/KEY s3_bucket YOUR_S3_BUCKET_NAME s3_endpoint s3-ap-northeast-1.amazonaws.com s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension} path logs/ buffer_path /var/log/fluent/s3 time_slice_format %Y%m%d-%H time_slice_wait 10m utc [aws_key_id] AWS access key id. This parameter is required when your agent is not running on EC2 instance with an IAM Role. [aws_sec_key] AWS secret key. This parameter is required when your agent is not running on EC2 instance with an IAM Role. [s3_bucket (required)] S3 bucket name. [s3_region] s3 region name. For example, US West (Oregon) Region is "us-west-2". The full list of regions are available here. > http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region. We recommend using `s3_region` instead of `s3_endpoint`. [s3_endpoint] s3 endpoint name. For example, US West (Oregon) Region is "s3-us-west-2.amazonaws.com". The full list of endpoints are available here. > http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region [s3_object_key_format] The format of S3 object keys. You can use several built-in variables: - %{path} - %{time_slice} - %{index} - %{file_extension} to decide keys dynamically. %{path} is exactly the value of *path* configured in the configuration file. E.g., "logs/" in the example configuration above. %{time_slice} is the time-slice in text that are formatted with *time_slice_format*. %{index} is the sequential number starts from 0, increments when multiple files are uploaded to S3 in the same time slice. %{file_extention} is always "gz" for now. The default format is "%{path}%{time_slice}_%{index}.%{file_extension}". For instance, using the example configuration above, actual object keys on S3 will be something like: "logs/20130111-22_0.gz" "logs/20130111-23_0.gz" "logs/20130111-23_1.gz" "logs/20130112-00_0.gz" With the configuration: s3_object_key_format %{path}/events/ts=%{time_slice}/events_%{index}.%{file_extension} path log time_slice_format %Y%m%d-%H You get: "log/events/ts=20130111-22/events_0.gz" "log/events/ts=20130111-23/events_0.gz" "log/events/ts=20130111-23/events_1.gz" "log/events/ts=20130112-00/events_0.gz" The {fluent-mixin-config-placeholders}[https://github.com/tagomoris/fluent-mixin-config-placeholders] mixin is also incorporated, so additional variables such as %{hostname}, %{uuid}, etc. can be used in the s3_object_key_format. This could prove useful in preventing filename conflicts when writing from multiple servers. s3_object_key_format %{path}/events/ts=%{time_slice}/events_%{index}-%{hostname}.%{file_extension} [store_as] archive format on S3. You can use serveral format: - gzip (default) - json - text - lzo (Need lzop command) [format] Change one line format in the S3 object. Supported formats are "out_file", "json", "ltsv" and "single_value". - out_file (default). time\ttag\t{..json1..} time\ttag\t{..json2..} ... - json {..json1..} {..json2..} ... At this format, "time" and "tag" are omitted. But you can set these information to the record by setting "include_tag_key" / "tag_key" and "include_time_key" / "time_key" option. If you set following configuration in S3 output: format_json true include_time_key true time_key log_time # default is time then the record has log_time field. {"log_time":"time string",...} - ltsv key1:value1\tkey2:value2 key1:value1\tkey2:value2 ... "ltsv" format also accepts "include_xxx" related options. See "json" section. - single_value Use specified value instead of entire recode. If you get '{"message":"my log"}', then contents are my log1 my log2 ... You can change key name by "message_key" option. [auto_create_bucket] Create S3 bucket if it does not exists. Default is true. [check_apikey_on_start] Check AWS key on start. Default is true. [proxy_uri] uri of proxy environment. [path] path prefix of the files on S3. Default is "" (no prefix). [buffer_path (required)] path prefix of the files to buffer logs. [time_slice_format] Format of the time used as the file name. Default is '%Y%m%d'. Use '%Y%m%d%H' to split files hourly. [time_slice_wait] The time to wait old logs. Default is 10 minutes. Specify larger value if old logs may reache. [utc] Use UTC instead of local time. == IAM Policy The following is an example for a minimal IAM policy needed to write to an s3 bucket (matches my-s3bucket/logs, my-s3bucket-test, etc.). { "Statement": [ { "Effect":"Allow", "Action":"s3:*", "Resource":"arn:aws:s3:::my-s3bucket*" } ] } Note that the bucket must already exist and *auto_create_bucket* has no effect in this case. Refer to the {AWS documentation}[http://docs.aws.amazon.com/IAM/latest/UserGuide/ExampleIAMPolicies.html] for example policies. Using {IAM roles}[http://docs.aws.amazon.com/IAM/latest/UserGuide/WorkingWithRoles.html] with a properly configured IAM policy are preferred over embedding access keys on EC2 instances. == Website, license, et. al. Web site:: http://fluentd.org/ Documents:: http://docs.fluentd.org/ Source repository:: http://github.com/fluent Discussion:: http://groups.google.com/group/fluentd Author:: Sadayuki Furuhashi Copyright:: (c) 2011 FURUHASHI Sadayuki License:: Apache License, Version 2.0