README.md in embulk-output-bigquery-0.1.2 vs README.md in embulk-output-bigquery-0.1.3
- old
+ new
@@ -1,9 +1,9 @@
# embulk-output-bigquery
-[Embulk](https://github.com/embulk/embulk/) output plugin to load/insert data into [Google BigQuery](https://cloud.google.com/bigquery/)
+[Embulk](https://github.com/embulk/embulk/) output plugin to load/insert data into [Google BigQuery](https://cloud.google.com/bigquery/) using [direct insert](https://cloud.google.com/bigquery/loading-data-into-bigquery#loaddatapostrequest)
## Overview
load data into Google BigQuery as batch jobs for big amount of data
https://developers.google.com/bigquery/loading-data-into-bigquery
@@ -14,20 +14,21 @@
* **Dynamic table creating**: yes
### NOT IMPLEMENTED
* insert data over streaming inserts
* for continuous real-time insertions
- * Pleast use other product, like [fluent-plugin-bigquery](https://github.com/kaizenplatform/fluent-plugin-bigquery)
+ * Please use other product, like [fluent-plugin-bigquery](https://github.com/kaizenplatform/fluent-plugin-bigquery)
* https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases
Current version of this plugin supports Google API with Service Account Authentication, but does not support
OAuth flow for installed applications.
## Configuration
-- **service_account_email**: your Google service account email (string, required)
-- **p12_keyfile_path**: fullpath of private key in P12(PKCS12) format (string, required)
+- **auth_method**: (private_key or compute_engine) (string, optional, default is private_key)
+- **service_account_email**: your Google service account email (string, required when auth_method is private_key)
+- **p12_keyfile_path**: fullpath of private key in P12(PKCS12) format (string, required when auth_method is private_key)
- **path_prefix**: (string, required)
- **sequence_format**: (string, optional, default is %03d.%02d)
- **file_ext**: (string, required)
- **source_format**: file type (NEWLINE_DELIMITED_JSON or CSV) (string, required, default is CSV)
- **project**: project_id (string, required)
@@ -40,17 +41,18 @@
- **job_status_max_polling_time**: max job status polling time. (int, optional, default is 3600 sec)
- **job_status_polling_interval**: job status polling interval. (int, optional, default is 10 sec)
- **is_skip_job_result_check**: (boolean, optional, default is 0)
- **field_delimiter**: (string, optional, default is ",")
- **max_bad_records**: (int, optional, default is 0)
-- **encoding**: (UTF-8 or ISO-8859-1) (string, optional, default is "UTF-8")
+- **encoding**: (UTF-8 or ISO-8859-1) (string, optional, default is UTF-8)
-## Example
+### Example
```yaml
out:
type: bigquery
+ auth_method: private_key # default
service_account_email: ABCXYZ123ABCXYZ123.gserviceaccount.com
p12_keyfile_path: /path/to/p12_keyfile.p12
path_prefix: /path/to/output
file_ext: csv.gz
source_format: CSV
@@ -62,22 +64,61 @@
header_line: false
encoders:
- {type: gzip}
```
-## Dynamic table creating
+### Authentication
+There are two methods supported to fetch access token for the service account.
+
+1. Public-Private key pair
+2. Predefined access token (Compute Engine only)
+
+The examples above use the first one. You first need to create a service account (client ID),
+download its private key and deploy the key with embulk.
+
+On the other hand, you don't need to explicitly create a service account for embulk when you
+run embulk in Google Compute Engine. In this second authentication method, you need to
+add the API scope "https://www.googleapis.com/auth/bigquery" to the scope list of your
+Compute Engine instance, then you can configure embulk like this.
+
+```yaml
+out:
+ type: bigquery
+ auth_method: compute_engine
+```
+
+### Table id formatting
+
+`table` and option accept [Time#strftime](http://ruby-doc.org/core-1.9.3/Time.html#method-i-strftime)
+format to construct table ids.
+Table ids are formatted at runtime
+using the local time of the embulk server.
+
+For example, with the configuration below,
+data is inserted into tables `table_2015_04`, `table_2015_05` and so on.
+
+```yaml
+out:
+ type: bigquery
+ table: table_%Y_%m
+```
+
+### Dynamic table creating
+
When `auto_create_table` is set to true, try to create the table using BigQuery API.
+If table already exists, insert into it.
+
To describe the schema of the target table, please write schema path.
-`table` option accept [Time#strftime](http://ruby-doc.org/core-1.9.3/Time.html#method-i-strftime)
-format of ruby to construct table name.
-```
-auto_create_table: true
-table: table_%Y_%m
-schema_path: /path/to/schema.json
+```yaml
+out:
+ type: bigquery
+ auto_create_table: true
+ table: table_%Y_%m
+ schema_path: /path/to/schema.json
```
## Build
```