README.md in fluent-plugin-bigquery-0.2.16 vs README.md in fluent-plugin-bigquery-0.3.0
- old
+ new
@@ -1,9 +1,11 @@
# fluent-plugin-bigquery
[Fluentd](http://fluentd.org) output plugin to load/insert data into Google BigQuery.
+- **Plugin type**: TimeSlicedOutput
+
* insert data over streaming inserts
* for continuous real-time insertions
* https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases
* load data
* for data loading as batch jobs, for big amount of data
@@ -12,10 +14,63 @@
Current version of this plugin supports Google API with Service Account Authentication, but does not support
OAuth flow for installed applications.
## Configuration
+### Options
+
+| name | type | required? | default | description |
+| :------------------------------------- | :------------ | :----------- | :------------------------- | :----------------------- |
+| method | string | no | insert | `insert` (Streaming Insert) or `load` (load job) |
+| buffer_type | string | no | lightening (insert) or file (load) | |
+| buffer_chunk_limit | integer | no | 1MB (insert) or 1GB (load) | |
+| buffer_queue_limit | integer | no | 1024 (insert) or 32 (load) | |
+| buffer_chunk_records_limit | integer | no | 500 | |
+| flush_interval | float | no | 0.25 (*insert) or default of time sliced output (load) | |
+| try_flush_interval | float | no | 0.05 (*insert) or default of time sliced output (load) | |
+| auth_method | enum | yes | private_key | `private_key` or `json_key` or `compute_engine` or `application_default` |
+| email | string | yes (private_key) | nil | GCP Service Account Email |
+| private_key_path | string | yes (private_key) | nil | GCP Private Key file path |
+| private_key_passphrase | string | yes (private_key) | nil | GCP Private Key Passphrase |
+| json_key | string | yes (json_key) | nil | GCP JSON Key file path or JSON Key string |
+| project | string | yes | nil | |
+| table | string | yes (either `tables`) | nil | |
+| tables | string | yes (either `table`) | nil | can set multi table names splitted by `,` |
+| template_suffix | string | no | nil | can use `%{time_slice}` placeholder replaced by `time_slice_format` |
+| auto_create_table | bool | no | false | If true, creates table automatically |
+| skip_invalid_rows | bool | no | false | Only `insert` method. |
+| max_bad_records | integer | no | 0 | Only `load` method. If the number of bad records exceeds this value, an invalid error is returned in the job result. |
+| ignore_unknown_values | bool | no | false | Accept rows that contain values that do not match the schema. The unknown values are ignored. |
+| schema_path | string | yes (either `fetch_schema`) | nil | Schema Definition file path. It is formatted by JSON. |
+| fetch_schema | bool | yes (either `schema_path`) | false | If true, fetch table schema definition from Bigquery table automatically. |
+| fetch_schema_table | string | no | nil | If set, fetch table schema definition from this table, If fetch_schema is false, this param is ignored |
+| schema_cache_expire | integer | no | 600 | Value is second. If current time is after expiration interval, re-fetch table schema definition. |
+| field_string | string | no | nil | see examples. |
+| field_integer | string | no | nil | see examples. |
+| field_float | string | no | nil | see examples. |
+| field_boolean | string | no | nil | see examples. |
+| field_timestamp | string | no | nil | see examples. |
+| time_field | string | no | nil | If this param is set, plugin set formatted time string to this field. |
+| time_format | string | no | nil | ex. `%s`, `%Y/%m%d %H:%M:%S` |
+| replace_record_key | bool | no | false | see examples. |
+| replace_record_key_regexp{1-10} | string | no | nil | see examples. |
+| convert_hash_to_json | bool | no | false | If true, converts Hash value of record to JSON String. |
+| insert_id_field | string | no | nil | Use key as `insert_id` of Streaming Insert API parameter. |
+| request_timeout_sec | integer | no | nil | Bigquery API response timeout |
+| request_open_timeout_sec | integer | no | 60 | Bigquery API connection, and request timeout. If you send big data to Bigquery, set large value. |
+
+### Standard Options
+
+| name | type | required? | default | description |
+| :------------------------------------- | :------------ | :----------- | :------------------------- | :----------------------- |
+| localtime | bool | no | nil | Use localtime |
+| utc | bool | no | nil | Use utc |
+
+And see http://docs.fluentd.org/articles/output-plugin-overview#time-sliced-output-parameters
+
+## Examples
+
### Streaming inserts
Configure insert specifications with target table schema, with your credentials. This is minimum configurations:
```apache
@@ -137,11 +192,11 @@
__CAUTION: `flush_interval` default is still `0.25` even if `method` is `load` on current version.__
### Authentication
-There are two methods supported to fetch access token for the service account.
+There are four methods supported to fetch access token for the service account.
1. Public-Private key pair of GCP(Google Cloud Platform)'s service account
2. JSON key of GCP(Google Cloud Platform)'s service account
3. Predefined access token (Compute Engine only)
4. Google application default credentials (http://goo.gl/IUuyuX)
@@ -299,10 +354,28 @@
For example value of `subdomain` attribute is `"bq.fluent"`, table id will be like "accesslog_2016_03_bqfluent".
- any type of attribute is allowed because stringified value will be used as replacement.
- acceptable characters are alphabets, digits and `_`. All other characters will be removed.
+### Date partitioned table support
+this plugin can insert (load) into date partitioned table.
+
+Use `%{time_slice}`.
+
+```apache
+<match dummy>
+ @type bigquery
+
+ ...
+ time_slice_format %Y%m%d
+ table accesslog$%{time_slice}
+ ...
+</match>
+```
+
+But, Dynamic table creating doesn't support date partitioned table yet.
+
### Dynamic table creating
When `auto_create_table` is set to `true`, try to create the table using BigQuery API when insertion failed with code=404 "Not Found: Table ...".
Next retry of insertion is expected to be success.
@@ -396,10 +469,11 @@
time_format %s
time_field time
fetch_schema true
+ # fetch_schema_table other_table # if you want to fetch schema from other table
field_integer time
</match>
```
If you specify multiple tables in configuration file, plugin get all schema data from BigQuery and merge it.
@@ -423,12 +497,10 @@
</match>
```
## TODO
-* support Load API
- * with automatically configured flush/buffer options
* support optional data fields
* support NULLABLE/REQUIRED/REPEATED field options in field list style of configuration
* OAuth installed application credentials support
* Google API discovery expiration
* Error classes
@@ -436,5 +508,6 @@
## Authors
* @tagomoris: First author, original version
* KAIZEN platform Inc.: Maintener, Since 2014.08.19
+* @joker1007