README.md in embulk-output-bigquery-0.3.7 vs README.md in embulk-output-bigquery-0.4.0

- old
+ new

@@ -42,11 +42,11 @@ | service_account_email | string | required when auth_method is private_key | | Your Google service account email | p12_keyfile | string | required when auth_method is private_key | | Fullpath of private key in P12(PKCS12) format | | json_keyfile | string | required when auth_method is json_key | | Fullpath of json key | | project | string | required if json_keyfile is not given | | project_id | | dataset | string | required | | dataset | -| table | string | required | | table name | +| table | string | required | | table name, or table name with a partition decorator such as `table_name$20160929`| | auto_create_dataset | boolean | optional | false | automatically create dataset | | auto_create_table | boolean | optional | false | See [Dynamic Table Creating](#dynamic-table-creating) | | schema_file | string | optional | | /path/to/schema.json | | template_table | string | optional | | template table name. See [Dynamic Table Creating](#dynamic-table-creating) | | prevent_duplicate_insert | boolean | optional | false | See [Prevent Duplication] (#prevent-duplication) | @@ -61,10 +61,11 @@ | default_timestamp_format | string | optional | %Y-%m-%d %H:%M:%S.%6N | | | payload_column | string | optional | nil | See [Formatter Performance Issue](#formatter-performance-issue) | | payload_column_index | integer | optional | nil | See [Formatter Performance Issue](#formatter-performance-issue) | | gcs_bucket | stringr | optional | nil | See [GCS Bucket](#gcs-bucket) | | auto_create_gcs_bucket | boolean | optional | false | See [GCS Bucket](#gcs-bucket) | +| progress_log_interval | float | optional | nil (Disabled) | Progress log interval. The progress log is disabled by nil (default). NOTE: This option may be removed in a future because a filter plugin can achieve the same goal | Client or request options | name | type | required? | default | description | |:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------| @@ -85,22 +86,25 @@ | delete_from_local_when_job_end | boolean | optional | true | If set to true, delete generate local files when job is end | | compression | string | optional | "NONE" | Compression of local files (`GZIP` or `NONE`) | `source_format` is also used to determine formatter (csv or jsonl). -#### Same options of bq command-line tools or BigQuery job's propery +#### Same options of bq command-line tools or BigQuery job's property Following options are same as [bq command-line tools](https://cloud.google.com/bigquery/bq-command-line-tool#creatingtablefromfile) or BigQuery [job's property](https://cloud.google.com/bigquery/docs/reference/v2/jobs#resource). -| name | type | required? | default | description | -|:--------------------------|:------------|:-----------|:-------------|:-----------------------| -| source_format | string | required | "CSV" | File type (`NEWLINE_DELIMITED_JSON` or `CSV`) | -| max_bad_records | int | optional | 0 | | -| field_delimiter | char | optional | "," | | -| encoding | string | optional | "UTF-8" | `UTF-8` or `ISO-8859-1` | -| ignore_unknown_values | boolean | optional | 0 | | -| allow_quoted_newlines | boolean | optional | 0 | Set true, if data contains newline characters. It may cause slow procsssing | +| name | type | required? | default | description | +|:----------------------------------|:---------|:----------|:--------|:-----------------------| +| source_format | string | required | "CSV" | File type (`NEWLINE_DELIMITED_JSON` or `CSV`) | +| max_bad_records | int | optional | 0 | | +| field_delimiter | char | optional | "," | | +| encoding | string | optional | "UTF-8" | `UTF-8` or `ISO-8859-1` | +| ignore_unknown_values | boolean | optional | false | | +| allow_quoted_newlines | boolean | optional | false | Set true, if data contains newline characters. It may cause slow procsssing | +| time_partitioning | hash | optional | nil | See [Time Partitioning](#time-partitioning) | +| time_partitioning.type | string | required | nil | The only type supported is DAY, which will generate one partition per day based on data loading time. | +| time_partitioning.expiration__ms | int | optional | nil | Number of milliseconds for which to keep the storage for a partition. partition | ### Example ```yaml out: @@ -121,36 +125,36 @@ 5 modes are provided. ##### append 1. Load to temporary table. -2. Copy temporary table to destination table. (WRITE_APPEND) +2. Copy temporary table to destination table (or partition). (WRITE_APPEND) ##### append_direct -Insert data into existing table directly. +Insert data into existing table (or partition) directly. This is not transactional, i.e., if fails, the target table could have some rows inserted. ##### replace 1. Load to temporary table. -2. Copy temporary table to destination table. (WRITE_TRUNCATE) +2. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE) ```is_skip_job_result_check``` must be false when replace mode ##### replace_backup 1. Load to temporary table. -2. Copy destination table to backup table. (dataset_old, table_old) -3. Copy temporary table to destination table. (WRITE_TRUNCATE) +2. Copy destination table (or partition) to backup table (or partition). (dataset_old, table_old) +3. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE) ```is_skip_job_result_check``` must be false when replace_backup mode. ##### delete_in_advance -1. Delete destination table, if it exists. -2. Load to destination table. +1. Delete destination table (or partition), if it exists. +2. Load to destination table (or partition). ### Authentication There are three methods supported to fetch access token for the service account. @@ -363,9 +367,35 @@ gcs_bucket: bucket_name auto_create_gcs_bucket: false ``` ToDo: Use https://cloud.google.com/storage/docs/streaming if google-api-ruby-client supports streaming transfers into GCS. + +### Time Partitioning + +From 0.4.0, embulk-output-bigquery supports to load into partitioned table. +See also [Creating and Updating Date-Partitioned Tables](https://cloud.google.com/bigquery/docs/creating-partitioned-tables). + +To load into a partition, specify `table` parameter with a partition decorator as: + +```yaml +out: + type: bigquery + table: table_name$20160929 + auto_create_table: true +``` + +You may configure `time_partitioning` parameter together to create table via `auto_create_table: true` option as: + +```yaml +out: + type: bigquery + table: table_name$20160929 + auto_create_table: true + time-partitioning: + type: DAY + expiration_ms: 259200000 +``` ## Development ### Run example: