README.md in fluent-plugin-bigquery-0.0.3 vs README.md in fluent-plugin-bigquery-0.0.4

- old
+ new

@@ -1,8 +1,8 @@ # fluent-plugin-bigquery -Fluentd output plugin to load/insert data into Google BigQuery. +[Fluentd](http://fluentd.org) output plugin to load/insert data into Google BigQuery. * insert data over streaming inserts * for continuous real-time insertions, under many limitations * https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases * (NOT IMPLEMENTED) load data @@ -102,10 +102,11 @@ * see `patches` below ### Authentication There are two methods supported to fetch access token for the service account. + 1. Public-Private key pair 2. Predefined access token (Compute Engine only) The examples above use the first one. You first need to create a service account (client ID), download its private key and deploy the key with fluentd. @@ -132,9 +133,76 @@ field_string rhost,vhost,path,method,protocol,agent,referer field_float requestime field_boolean bot_access,loginsession </match> ``` + +### Table schema + +There are two methods to describe the schema of the target table. + +1. List fields in fluent.conf +2. Load a schema file in JSON. + +The examples above use the first method. In this method, +you can also specify nested fields by prefixing their belonging record fields. + +```apache +<match dummy> + type bigquery + + ... + + time_format %s + time_field time + + field_integer time,response.status,response.bytes + field_string request.vhost,request.path,request.method,request.protocol,request.agent,request.referer,remote.host,remote.ip,remote.user + field_float request.time + field_boolean request.bot_access,request.loginsession +</match> +``` + +This schema accepts structured JSON data like: + +```json +{ + "request":{ + "time":1391748126.7000976, + "vhost":"www.example.com", + "path":"/", + "method":"GET", + "protocol":"HTTP/1.1", + "agent":"HotJava", + "bot_access":false + }, + "remote":{ "ip": "192.0.2.1" }, + "response":{ + "status":200, + "bytes":1024 + } +} +``` + +The second method is to specify a path to a BigQuery schema file instead of listing fields. In this case, your fluent.conf looks like: + +```apache +<match dummy> + type bigquery + + ... + + time_format %s + time_field time + + schema_path /path/to/httpd.schema + field_integer time +</match> +``` +where /path/to/httpd.schema is a path to the JSON-encoded schema file which you used for creating the table on BigQuery. + +NOTE: Since JSON does not define how to encode data of TIMESTAMP type, +you are still recommended to specify JSON types for TIMESTAMP fields as "time" field does in the example. ### patches This plugin depends on `fluent-plugin-buffer-lightening`, and it includes monkey patch module for BufferedOutput plugin, to realize high rate and low latency flushing. With this patch, sub 1 second flushing available.