README.md in embulk-output-bigquery-0.3.0 vs README.md in embulk-output-bigquery-0.3.1

- old
+ new

@@ -37,11 +37,11 @@ | table | string | required | | table name | | auto_create_dataset | boolean | optional | false | automatically create dataset | | auto_create_table | boolean | optional | false | [See below](#dynamic-table-creating) | | schema_file | string | optional | | /path/to/schema.json | | template_table | string | optional | | template table name [See below](#dynamic-table-creating) | -| prevent_duplicate_insert | boolean | optional | false | [See below](#data-consistency) | +| prevent_duplicate_insert | boolean | optional | false | [See below](#prevent-duplication) | | job_status_max_polling_time | int | optional | 3600 sec | Max job status polling time | | job_status_polling_interval | int | optional | 10 sec | Job status polling interval | | is_skip_job_result_check | boolean | optional | false | Skip waiting Load job finishes. Available for append, or delete_in_advance mode | | with_rehearsal | boolean | optional | false | Load `rehearsal_counts` records as a rehearsal. Rehearsal loads into REHEARSAL temporary table, and delete finally. You may use this option to investigate data errors as early stage as possible | | rehearsal_counts | integer | optional | 1000 | Specify number of records to load in a rehearsal | @@ -57,10 +57,11 @@ |:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------| | timeout_sec | integer | optional | 300 | Seconds to wait for one block to be read | | open_timeout_sec | integer | optional | 300 | Seconds to wait for the connection to open | | retries | integer | optional | 5 | Number of retries | | application_name | string | optional | "Embulk BigQuery plugin" | User-Agent | +| sdk_log_level | string | optional | nil (WARN) | Log level of google api client library | Options for intermediate local files | name | type | required? | default | description | |:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------| @@ -315,18 +316,18 @@ out: type: bigquery payload_column_index: 0 # or, payload_column: payload ``` -### Data Consistency +### Prevent Duplication -When `prevent_duplicate_insert` is set to true, embulk-output-bigquery generate job ID from md5 hash of file and other options to prevent duplicate data insertion. +`prevent_duplicate_insert` option is used to prevent inserting same data for modes `append` or `append_direct`. +When `prevent_duplicate_insert` is set to true, embulk-output-bigquery generate job ID from md5 hash of file and other options. + `job ID = md5(md5(file) + dataset + table + schema + source_format + file_delimiter + max_bad_records + encoding + ignore_unknown_values + allow_quoted_newlines)` -[job ID must be unique(including failures)](https://cloud.google.com/bigquery/loading-data-into-bigquery#consistency). So same data can't insert with same settings. - -In other words, you can retry as many times as you like, in case something bad error(like network error) happens before job insertion. +[job ID must be unique(including failures)](https://cloud.google.com/bigquery/loading-data-into-bigquery#consistency) so that same data can't be inserted with same settings repeatedly. ```yaml out: type: bigquery prevent_duplicate_insert: true