README.md in fluent-plugin-bigquery-0.3.4 vs README.md in fluent-plugin-bigquery-0.4.0
- old
+ new
@@ -19,51 +19,52 @@
## Configuration
### Options
-| name | type | required? | default | description |
-| :------------------------------------- | :------------ | :----------- | :------------------------- | :----------------------- |
-| method | string | no | insert | `insert` (Streaming Insert) or `load` (load job) |
-| buffer_type | string | no | lightening (insert) or file (load) | |
-| buffer_chunk_limit | integer | no | 1MB (insert) or 1GB (load) | |
-| buffer_queue_limit | integer | no | 1024 (insert) or 32 (load) | |
-| buffer_chunk_records_limit | integer | no | 500 | |
-| flush_interval | float | no | 0.25 (*insert) or default of time sliced output (load) | |
-| try_flush_interval | float | no | 0.05 (*insert) or default of time sliced output (load) | |
-| auth_method | enum | yes | private_key | `private_key` or `json_key` or `compute_engine` or `application_default` |
-| email | string | yes (private_key) | nil | GCP Service Account Email |
-| private_key_path | string | yes (private_key) | nil | GCP Private Key file path |
-| private_key_passphrase | string | yes (private_key) | nil | GCP Private Key Passphrase |
-| json_key | string | yes (json_key) | nil | GCP JSON Key file path or JSON Key string |
-| project | string | yes | nil | |
-| table | string | yes (either `tables`) | nil | |
-| tables | string | yes (either `table`) | nil | can set multi table names splitted by `,` |
-| template_suffix | string | no | nil | can use `%{time_slice}` placeholder replaced by `time_slice_format` |
-| auto_create_table | bool | no | false | If true, creates table automatically |
-| skip_invalid_rows | bool | no | false | Only `insert` method. |
-| max_bad_records | integer | no | 0 | Only `load` method. If the number of bad records exceeds this value, an invalid error is returned in the job result. |
-| ignore_unknown_values | bool | no | false | Accept rows that contain values that do not match the schema. The unknown values are ignored. |
-| schema_path | string | yes (either `fetch_schema`) | nil | Schema Definition file path. It is formatted by JSON. |
-| fetch_schema | bool | yes (either `schema_path`) | false | If true, fetch table schema definition from Bigquery table automatically. |
-| fetch_schema_table | string | no | nil | If set, fetch table schema definition from this table, If fetch_schema is false, this param is ignored |
-| schema_cache_expire | integer | no | 600 | Value is second. If current time is after expiration interval, re-fetch table schema definition. |
-| field_string | string | no | nil | see examples. |
-| field_integer | string | no | nil | see examples. |
-| field_float | string | no | nil | see examples. |
-| field_boolean | string | no | nil | see examples. |
-| field_timestamp | string | no | nil | see examples. |
-| time_field | string | no | nil | If this param is set, plugin set formatted time string to this field. |
-| time_format | string | no | nil | ex. `%s`, `%Y/%m%d %H:%M:%S` |
-| replace_record_key | bool | no | false | see examples. |
-| replace_record_key_regexp{1-10} | string | no | nil | see examples. |
-| convert_hash_to_json | bool | no | false | If true, converts Hash value of record to JSON String. |
-| insert_id_field | string | no | nil | Use key as `insert_id` of Streaming Insert API parameter. |
-| request_timeout_sec | integer | no | nil | Bigquery API response timeout |
-| request_open_timeout_sec | integer | no | 60 | Bigquery API connection, and request timeout. If you send big data to Bigquery, set large value. |
-| time_partitioning_type | enum | no (either day) | nil | Type of bigquery time partitioning feature(experimental feature on BigQuery). |
-| time_partitioning_expiration | time | no | nil | Expiration milliseconds for bigquery time partitioning. (experimental feature on BigQuery) |
+| name | type | required? | default | description |
+| :------------------------------------- | :------------ | :----------- | :------------------------- | :----------------------- |
+| method | string | no | insert | `insert` (Streaming Insert) or `load` (load job) |
+| buffer_type | string | no | lightening (insert) or file (load) | |
+| buffer_chunk_limit | integer | no | 1MB (insert) or 1GB (load) | |
+| buffer_queue_limit | integer | no | 1024 (insert) or 32 (load) | |
+| buffer_chunk_records_limit | integer | no | 500 | |
+| flush_interval | float | no | 0.25 (*insert) or default of time sliced output (load) | |
+| try_flush_interval | float | no | 0.05 (*insert) or default of time sliced output (load) | |
+| auth_method | enum | yes | private_key | `private_key` or `json_key` or `compute_engine` or `application_default` |
+| email | string | yes (private_key) | nil | GCP Service Account Email |
+| private_key_path | string | yes (private_key) | nil | GCP Private Key file path |
+| private_key_passphrase | string | yes (private_key) | nil | GCP Private Key Passphrase |
+| json_key | string | yes (json_key) | nil | GCP JSON Key file path or JSON Key string |
+| project | string | yes | nil | |
+| table | string | yes (either `tables`) | nil | |
+| tables | string | yes (either `table`) | nil | can set multi table names splitted by `,` |
+| template_suffix | string | no | nil | can use `%{time_slice}` placeholder replaced by `time_slice_format` |
+| auto_create_table | bool | no | false | If true, creates table automatically |
+| skip_invalid_rows | bool | no | false | Only `insert` method. |
+| max_bad_records | integer | no | 0 | Only `load` method. If the number of bad records exceeds this value, an invalid error is returned in the job result. |
+| ignore_unknown_values | bool | no | false | Accept rows that contain values that do not match the schema. The unknown values are ignored. |
+| schema | array | yes (either `fetch_schema` or `schema_path`) | nil | Schema Definition. It is formatted by JSON. |
+| schema_path | string | yes (either `fetch_schema`) | nil | Schema Definition file path. It is formatted by JSON. |
+| fetch_schema | bool | yes (either `schema_path`) | false | If true, fetch table schema definition from Bigquery table automatically. |
+| fetch_schema_table | string | no | nil | If set, fetch table schema definition from this table, If fetch_schema is false, this param is ignored |
+| schema_cache_expire | integer | no | 600 | Value is second. If current time is after expiration interval, re-fetch table schema definition. |
+| field_string (deprecated) | string | no | nil | see examples. |
+| field_integer (deprecated) | string | no | nil | see examples. |
+| field_float (deprecated) | string | no | nil | see examples. |
+| field_boolean (deprecated) | string | no | nil | see examples. |
+| field_timestamp (deprecated) | string | no | nil | see examples. |
+| time_field | string | no | nil | If this param is set, plugin set formatted time string to this field. |
+| time_format | string | no | nil | ex. `%s`, `%Y/%m%d %H:%M:%S` |
+| replace_record_key | bool | no | false | see examples. |
+| replace_record_key_regexp{1-10} | string | no | nil | see examples. |
+| convert_hash_to_json (deprecated) | bool | no | false | If true, converts Hash value of record to JSON String. |
+| insert_id_field | string | no | nil | Use key as `insert_id` of Streaming Insert API parameter. |
+| request_timeout_sec | integer | no | nil | Bigquery API response timeout |
+| request_open_timeout_sec | integer | no | 60 | Bigquery API connection, and request timeout. If you send big data to Bigquery, set large value. |
+| time_partitioning_type | enum | no (either day) | nil | Type of bigquery time partitioning feature(experimental feature on BigQuery). |
+| time_partitioning_expiration | time | no | nil | Expiration milliseconds for bigquery time partitioning. (experimental feature on BigQuery) |
### Standard Options
| name | type | required? | default | description |
| :------------------------------------- | :------------ | :----------- | :------------------------- | :----------------------- |
@@ -94,14 +95,29 @@
table tablename
time_format %s
time_field time
- field_integer time,status,bytes
- field_string rhost,vhost,path,method,protocol,agent,referer
- field_float requesttime
- field_boolean bot_access,loginsession
+ schema [
+ {"name": "time", "type": "INTEGER"},
+ {"name": "status", "type": "INTEGER"},
+ {"name": "bytes", "type": "INTEGER"},
+ {"name": "vhost", "type": "STRING"},
+ {"name": "path", "type": "STRING"},
+ {"name": "method", "type": "STRING"},
+ {"name": "protocol", "type": "STRING"},
+ {"name": "agent", "type": "STRING"},
+ {"name": "referer", "type": "STRING"},
+ {"name": "remote", "type": "RECORD", "fields": [
+ {"name": "host", "type": "STRING"},
+ {"name": "ip", "type": "STRING"},
+ {"name": "user", "type": "STRING"}
+ ]},
+ {"name": "requesttime", "type": "FLOAT"},
+ {"name": "bot_access", "type": "BOOLEAN"},
+ {"name": "loginsession", "type": "BOOLEAN"}
+ ]
</match>
```
For high rate inserts over streaming inserts, you should specify flush intervals and buffer chunk options:
@@ -128,14 +144,29 @@
tables accesslog1,accesslog2,accesslog3
time_format %s
time_field time
- field_integer time,status,bytes
- field_string rhost,vhost,path,method,protocol,agent,referer
- field_float requesttime
- field_boolean bot_access,loginsession
+ schema [
+ {"name": "time", "type": "INTEGER"},
+ {"name": "status", "type": "INTEGER"},
+ {"name": "bytes", "type": "INTEGER"},
+ {"name": "vhost", "type": "STRING"},
+ {"name": "path", "type": "STRING"},
+ {"name": "method", "type": "STRING"},
+ {"name": "protocol", "type": "STRING"},
+ {"name": "agent", "type": "STRING"},
+ {"name": "referer", "type": "STRING"},
+ {"name": "remote", "type": "RECORD", "fields": [
+ {"name": "host", "type": "STRING"},
+ {"name": "ip", "type": "STRING"},
+ {"name": "user", "type": "STRING"}
+ ]},
+ {"name": "requesttime", "type": "FLOAT"},
+ {"name": "bot_access", "type": "BOOLEAN"},
+ {"name": "loginsession", "type": "BOOLEAN"}
+ ]
</match>
```
Important options for high rate events are:
@@ -264,15 +295,11 @@
dataset yourdataset_id
table tablename
time_format %s
time_field time
-
- field_integer time,status,bytes
- field_string rhost,vhost,path,method,protocol,agent,referer
- field_float requesttime
- field_boolean bot_access,loginsession
+ ...
</match>
```
#### Application default credentials
@@ -417,14 +444,29 @@
...
time_format %s
time_field time
- field_integer time,response.status,response.bytes
- field_string request.vhost,request.path,request.method,request.protocol,request.agent,request.referer,remote.host,remote.ip,remote.user
- field_float request.time
- field_boolean request.bot_access,request.loginsession
+ schema [
+ {"name": "time", "type": "INTEGER"},
+ {"name": "status", "type": "INTEGER"},
+ {"name": "bytes", "type": "INTEGER"},
+ {"name": "vhost", "type": "STRING"},
+ {"name": "path", "type": "STRING"},
+ {"name": "method", "type": "STRING"},
+ {"name": "protocol", "type": "STRING"},
+ {"name": "agent", "type": "STRING"},
+ {"name": "referer", "type": "STRING"},
+ {"name": "remote", "type": "RECORD", "fields": [
+ {"name": "host", "type": "STRING"},
+ {"name": "ip", "type": "STRING"},
+ {"name": "user", "type": "STRING"}
+ ]},
+ {"name": "requesttime", "type": "FLOAT"},
+ {"name": "bot_access", "type": "BOOLEAN"},
+ {"name": "loginsession", "type": "BOOLEAN"}
+ ]
</match>
```
This schema accepts structured JSON data like:
@@ -457,14 +499,13 @@
time_format %s
time_field time
schema_path /path/to/httpd.schema
- field_integer time
</match>
```
-where /path/to/httpd.schema is a path to the JSON-encoded schema file which you used for creating the table on BigQuery.
+where /path/to/httpd.schema is a path to the JSON-encoded schema file which you used for creating the table on BigQuery. By using external schema file you are able to write full schema that does support NULLABLE/REQUIRED/REPEATED, this feature is really useful and adds full flexbility.
The third method is to set `fetch_schema` to `true` to enable fetch a schema using BigQuery API. In this case, your fluent.conf looks like:
```apache
<match dummy>
@@ -475,11 +516,10 @@
time_format %s
time_field time
fetch_schema true
# fetch_schema_table other_table # if you want to fetch schema from other table
- field_integer time
</match>
```
If you specify multiple tables in configuration file, plugin get all schema data from BigQuery and merge it.
@@ -496,20 +536,17 @@
@type bigquery
...
insert_id_field uuid
- field_string uuid
+ schema [{"name": "uuid", "type": "STRING"}]
</match>
```
## TODO
-* support optional data fields
-* support NULLABLE/REQUIRED/REPEATED field options in field list style of configuration
* OAuth installed application credentials support
* Google API discovery expiration
-* Error classes
* check row size limits
## Authors
* @tagomoris: First author, original version