README.md in fluent-plugin-bigquery-0.0.3 vs README.md in fluent-plugin-bigquery-0.0.4
- old
+ new
@@ -1,8 +1,8 @@
# fluent-plugin-bigquery
-Fluentd output plugin to load/insert data into Google BigQuery.
+[Fluentd](http://fluentd.org) output plugin to load/insert data into Google BigQuery.
* insert data over streaming inserts
* for continuous real-time insertions, under many limitations
* https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases
* (NOT IMPLEMENTED) load data
@@ -102,10 +102,11 @@
* see `patches` below
### Authentication
There are two methods supported to fetch access token for the service account.
+
1. Public-Private key pair
2. Predefined access token (Compute Engine only)
The examples above use the first one. You first need to create a service account (client ID),
download its private key and deploy the key with fluentd.
@@ -132,9 +133,76 @@
field_string rhost,vhost,path,method,protocol,agent,referer
field_float requestime
field_boolean bot_access,loginsession
</match>
```
+
+### Table schema
+
+There are two methods to describe the schema of the target table.
+
+1. List fields in fluent.conf
+2. Load a schema file in JSON.
+
+The examples above use the first method. In this method,
+you can also specify nested fields by prefixing their belonging record fields.
+
+```apache
+<match dummy>
+ type bigquery
+
+ ...
+
+ time_format %s
+ time_field time
+
+ field_integer time,response.status,response.bytes
+ field_string request.vhost,request.path,request.method,request.protocol,request.agent,request.referer,remote.host,remote.ip,remote.user
+ field_float request.time
+ field_boolean request.bot_access,request.loginsession
+</match>
+```
+
+This schema accepts structured JSON data like:
+
+```json
+{
+ "request":{
+ "time":1391748126.7000976,
+ "vhost":"www.example.com",
+ "path":"/",
+ "method":"GET",
+ "protocol":"HTTP/1.1",
+ "agent":"HotJava",
+ "bot_access":false
+ },
+ "remote":{ "ip": "192.0.2.1" },
+ "response":{
+ "status":200,
+ "bytes":1024
+ }
+}
+```
+
+The second method is to specify a path to a BigQuery schema file instead of listing fields. In this case, your fluent.conf looks like:
+
+```apache
+<match dummy>
+ type bigquery
+
+ ...
+
+ time_format %s
+ time_field time
+
+ schema_path /path/to/httpd.schema
+ field_integer time
+</match>
+```
+where /path/to/httpd.schema is a path to the JSON-encoded schema file which you used for creating the table on BigQuery.
+
+NOTE: Since JSON does not define how to encode data of TIMESTAMP type,
+you are still recommended to specify JSON types for TIMESTAMP fields as "time" field does in the example.
### patches
This plugin depends on `fluent-plugin-buffer-lightening`, and it includes monkey patch module for BufferedOutput plugin, to realize high rate and low latency flushing. With this patch, sub 1 second flushing available.