README.md in embulk-output-bigquery-0.3.7 vs README.md in embulk-output-bigquery-0.4.0
- old
+ new
@@ -42,11 +42,11 @@
| service_account_email | string | required when auth_method is private_key | | Your Google service account email
| p12_keyfile | string | required when auth_method is private_key | | Fullpath of private key in P12(PKCS12) format |
| json_keyfile | string | required when auth_method is json_key | | Fullpath of json key |
| project | string | required if json_keyfile is not given | | project_id |
| dataset | string | required | | dataset |
-| table | string | required | | table name |
+| table | string | required | | table name, or table name with a partition decorator such as `table_name$20160929`|
| auto_create_dataset | boolean | optional | false | automatically create dataset |
| auto_create_table | boolean | optional | false | See [Dynamic Table Creating](#dynamic-table-creating) |
| schema_file | string | optional | | /path/to/schema.json |
| template_table | string | optional | | template table name. See [Dynamic Table Creating](#dynamic-table-creating) |
| prevent_duplicate_insert | boolean | optional | false | See [Prevent Duplication] (#prevent-duplication) |
@@ -61,10 +61,11 @@
| default_timestamp_format | string | optional | %Y-%m-%d %H:%M:%S.%6N | |
| payload_column | string | optional | nil | See [Formatter Performance Issue](#formatter-performance-issue) |
| payload_column_index | integer | optional | nil | See [Formatter Performance Issue](#formatter-performance-issue) |
| gcs_bucket | stringr | optional | nil | See [GCS Bucket](#gcs-bucket) |
| auto_create_gcs_bucket | boolean | optional | false | See [GCS Bucket](#gcs-bucket) |
+| progress_log_interval | float | optional | nil (Disabled) | Progress log interval. The progress log is disabled by nil (default). NOTE: This option may be removed in a future because a filter plugin can achieve the same goal |
Client or request options
| name | type | required? | default | description |
|:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------|
@@ -85,22 +86,25 @@
| delete_from_local_when_job_end | boolean | optional | true | If set to true, delete generate local files when job is end |
| compression | string | optional | "NONE" | Compression of local files (`GZIP` or `NONE`) |
`source_format` is also used to determine formatter (csv or jsonl).
-#### Same options of bq command-line tools or BigQuery job's propery
+#### Same options of bq command-line tools or BigQuery job's property
Following options are same as [bq command-line tools](https://cloud.google.com/bigquery/bq-command-line-tool#creatingtablefromfile) or BigQuery [job's property](https://cloud.google.com/bigquery/docs/reference/v2/jobs#resource).
-| name | type | required? | default | description |
-|:--------------------------|:------------|:-----------|:-------------|:-----------------------|
-| source_format | string | required | "CSV" | File type (`NEWLINE_DELIMITED_JSON` or `CSV`) |
-| max_bad_records | int | optional | 0 | |
-| field_delimiter | char | optional | "," | |
-| encoding | string | optional | "UTF-8" | `UTF-8` or `ISO-8859-1` |
-| ignore_unknown_values | boolean | optional | 0 | |
-| allow_quoted_newlines | boolean | optional | 0 | Set true, if data contains newline characters. It may cause slow procsssing |
+| name | type | required? | default | description |
+|:----------------------------------|:---------|:----------|:--------|:-----------------------|
+| source_format | string | required | "CSV" | File type (`NEWLINE_DELIMITED_JSON` or `CSV`) |
+| max_bad_records | int | optional | 0 | |
+| field_delimiter | char | optional | "," | |
+| encoding | string | optional | "UTF-8" | `UTF-8` or `ISO-8859-1` |
+| ignore_unknown_values | boolean | optional | false | |
+| allow_quoted_newlines | boolean | optional | false | Set true, if data contains newline characters. It may cause slow procsssing |
+| time_partitioning | hash | optional | nil | See [Time Partitioning](#time-partitioning) |
+| time_partitioning.type | string | required | nil | The only type supported is DAY, which will generate one partition per day based on data loading time. |
+| time_partitioning.expiration__ms | int | optional | nil | Number of milliseconds for which to keep the storage for a partition. partition |
### Example
```yaml
out:
@@ -121,36 +125,36 @@
5 modes are provided.
##### append
1. Load to temporary table.
-2. Copy temporary table to destination table. (WRITE_APPEND)
+2. Copy temporary table to destination table (or partition). (WRITE_APPEND)
##### append_direct
-Insert data into existing table directly.
+Insert data into existing table (or partition) directly.
This is not transactional, i.e., if fails, the target table could have some rows inserted.
##### replace
1. Load to temporary table.
-2. Copy temporary table to destination table. (WRITE_TRUNCATE)
+2. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
```is_skip_job_result_check``` must be false when replace mode
##### replace_backup
1. Load to temporary table.
-2. Copy destination table to backup table. (dataset_old, table_old)
-3. Copy temporary table to destination table. (WRITE_TRUNCATE)
+2. Copy destination table (or partition) to backup table (or partition). (dataset_old, table_old)
+3. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
```is_skip_job_result_check``` must be false when replace_backup mode.
##### delete_in_advance
-1. Delete destination table, if it exists.
-2. Load to destination table.
+1. Delete destination table (or partition), if it exists.
+2. Load to destination table (or partition).
### Authentication
There are three methods supported to fetch access token for the service account.
@@ -363,9 +367,35 @@
gcs_bucket: bucket_name
auto_create_gcs_bucket: false
```
ToDo: Use https://cloud.google.com/storage/docs/streaming if google-api-ruby-client supports streaming transfers into GCS.
+
+### Time Partitioning
+
+From 0.4.0, embulk-output-bigquery supports to load into partitioned table.
+See also [Creating and Updating Date-Partitioned Tables](https://cloud.google.com/bigquery/docs/creating-partitioned-tables).
+
+To load into a partition, specify `table` parameter with a partition decorator as:
+
+```yaml
+out:
+ type: bigquery
+ table: table_name$20160929
+ auto_create_table: true
+```
+
+You may configure `time_partitioning` parameter together to create table via `auto_create_table: true` option as:
+
+```yaml
+out:
+ type: bigquery
+ table: table_name$20160929
+ auto_create_table: true
+ time-partitioning:
+ type: DAY
+ expiration_ms: 259200000
+```
## Development
### Run example: