README.md in embulk-output-bigquery-0.3.0 vs README.md in embulk-output-bigquery-0.3.1
- old
+ new
@@ -37,11 +37,11 @@
| table | string | required | | table name |
| auto_create_dataset | boolean | optional | false | automatically create dataset |
| auto_create_table | boolean | optional | false | [See below](#dynamic-table-creating) |
| schema_file | string | optional | | /path/to/schema.json |
| template_table | string | optional | | template table name [See below](#dynamic-table-creating) |
-| prevent_duplicate_insert | boolean | optional | false | [See below](#data-consistency) |
+| prevent_duplicate_insert | boolean | optional | false | [See below](#prevent-duplication) |
| job_status_max_polling_time | int | optional | 3600 sec | Max job status polling time |
| job_status_polling_interval | int | optional | 10 sec | Job status polling interval |
| is_skip_job_result_check | boolean | optional | false | Skip waiting Load job finishes. Available for append, or delete_in_advance mode |
| with_rehearsal | boolean | optional | false | Load `rehearsal_counts` records as a rehearsal. Rehearsal loads into REHEARSAL temporary table, and delete finally. You may use this option to investigate data errors as early stage as possible |
| rehearsal_counts | integer | optional | 1000 | Specify number of records to load in a rehearsal |
@@ -57,10 +57,11 @@
|:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------|
| timeout_sec | integer | optional | 300 | Seconds to wait for one block to be read |
| open_timeout_sec | integer | optional | 300 | Seconds to wait for the connection to open |
| retries | integer | optional | 5 | Number of retries |
| application_name | string | optional | "Embulk BigQuery plugin" | User-Agent |
+| sdk_log_level | string | optional | nil (WARN) | Log level of google api client library |
Options for intermediate local files
| name | type | required? | default | description |
|:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------|
@@ -315,18 +316,18 @@
out:
type: bigquery
payload_column_index: 0 # or, payload_column: payload
```
-### Data Consistency
+### Prevent Duplication
-When `prevent_duplicate_insert` is set to true, embulk-output-bigquery generate job ID from md5 hash of file and other options to prevent duplicate data insertion.
+`prevent_duplicate_insert` option is used to prevent inserting same data for modes `append` or `append_direct`.
+When `prevent_duplicate_insert` is set to true, embulk-output-bigquery generate job ID from md5 hash of file and other options.
+
`job ID = md5(md5(file) + dataset + table + schema + source_format + file_delimiter + max_bad_records + encoding + ignore_unknown_values + allow_quoted_newlines)`
-[job ID must be unique(including failures)](https://cloud.google.com/bigquery/loading-data-into-bigquery#consistency). So same data can't insert with same settings.
-
-In other words, you can retry as many times as you like, in case something bad error(like network error) happens before job insertion.
+[job ID must be unique(including failures)](https://cloud.google.com/bigquery/loading-data-into-bigquery#consistency) so that same data can't be inserted with same settings repeatedly.
```yaml
out:
type: bigquery
prevent_duplicate_insert: true