README.md in embulk-output-redshift-0.4.0 vs README.md in embulk-output-redshift-0.4.1

- old
+ new

@@ -1,97 +1,104 @@ -# Redshift output plugins for Embulk - -Redshift output plugins for Embulk loads records to Redshift. - -## Overview - -* **Plugin type**: output -* **Load all or nothing**: depnds on the mode. see bellow. -* **Resume supported**: depnds on the mode. see bellow. - -## Configuration - -- **host**: database host name (string, required) -- **port**: database port number (integer, default: 5439) -- **user**: database login user name (string, required) -- **password**: database login password (string, default: "") -- **database**: destination database name (string, required) -- **schema**: destination schema name (string, default: "public") -- **table**: destination table name (string, required) -- **options**: extra connection properties (hash, default: {}) -- **mode**: "replace" or "insert" (string, required) -- **batch_size**: size of a single batch insert (integer, default: 16777216) -- **default_timezone**: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp into a SQL string. This default_timezone option is used to control the timezone. You can overwrite timezone for each columns using column_options option. (string, default: `UTC`) -- **column_options**: advanced: a key-value pairs where key is a column name and value is options for the column. - - **type**: type of a column when this plugin creates new tables (e.g. `VARCHAR(255)`, `INTEGER NOT NULL UNIQUE`). This used when this plugin creates intermediate tables (insert, truncate_insert and merge modes), when it creates the target table (insert_direct and replace modes), and when it creates nonexistent target table automatically. (string, default: depends on input column type. `BIGINT` if input column type is long, `BOOLEAN` if boolean, `DOUBLE PRECISION` if double, `CLOB` if string, `TIMESTAMP` if timestamp) - - **value_type**: This plugin converts input column type (embulk type) into a database type to build a INSERT statement. This value_type option controls the type of the value in a INSERT statement. (string, default: depends on input column type. Available values options are: `byte`, `short`, `int`, `long`, `double`, `float`, `boolean`, `string`, `nstring`, `date`, `time`, `timestamp`, `decimal`, `null`, `pass`) - - **timestamp_format**: If input column type (embulk type) is timestamp and value_type is `string` or `nstring`, this plugin needs to format the timestamp value into a string. This timestamp_format option is used to control the format of the timestamp. (string, default: `%Y-%m-%d %H:%M:%S.%6N`) - - **timezone**: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp value into a SQL string. In this cases, this timezone option is used to control the timezone. (string, value of default_timezone option is used by default) - -### Modes - -* **insert**: - * Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs `INSERT INTO <target_table> SELECT * FROM <intermediate_table_1> UNION ALL SELECT * FROM <intermediate_table_2> UNION ALL ...` query. - * Transactional: Yes. This mode successfully writes all rows, or fails with writing zero rows. - * Resumable: Yes. -* **insert_direct**: - * Behavior: This mode inserts rows to the target table directly. - * Transactional: No. If fails, the target table could have some rows inserted. - * Resumable: No. -* **truncate_insert**: - * Behavior: Same with `insert` mode excepting that it truncates the target table right before the last `INSERT ...` query. - * Transactional: Yes. - * Resumable: Yes. -* **merge**: - * Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs `with updated AS (UPDATE .... RETURNING ...) INSERT INTO ....` query. - * Transactional: Yes. - * Resumable: Yes. -* **replace**: - * Behavior: Same with `insert` mode excepting that it truncates the target table right before the last `INSERT ...` query. - * Transactional: Yes. - * Resumable: No. - -### Example - -```yaml -out: - type: redshift - host: myinstance.us-west-2.redshift.amazonaws.com - user: pg - password: "" - database: my_database - table: my_table - access_key_id: ABCXYZ123ABCXYZ123 - secret_access_key: AbCxYz123aBcXyZ123 - s3_bucket: my-redshift-transfer-bucket - iam_user_name: my-s3-read-only - mode: insert -``` - -Advanced configuration: - -```yaml -out: - type: redshift - host: myinstance.us-west-2.redshift.amazonaws.com - user: pg - password: "" - database: my_database - table: my_table - access_key_id: ABCXYZ123ABCXYZ123 - secret_access_key: AbCxYz123aBcXyZ123 - s3_bucket: my-redshift-transfer-bucket - iam_user_name: my-s3-read-only - options: {loglevel: 2} - mode: insert_direct - column_options: - my_col_1: {type: 'VARCHAR(255)'} - my_col_3: {type: 'INT NOT NULL'} - my_col_4: {value_type: string, timestamp_format: `%Y-%m-%d %H:%M:%S %z`, timezone: '-0700'} - my_col_5: {type: 'DECIMAL(18,9)', value_type: pass} -``` - -### Build - -``` -$ ./gradlew gem -``` +# Redshift output plugins for Embulk + +Redshift output plugins for Embulk loads records to Redshift. + +## Overview + +* **Plugin type**: output +* **Load all or nothing**: depnds on the mode. see bellow. +* **Resume supported**: depnds on the mode. see bellow. + +## Configuration + +- **host**: database host name (string, required) +- **port**: database port number (integer, default: 5439) +- **user**: database login user name (string, required) +- **password**: database login password (string, default: "") +- **database**: destination database name (string, required) +- **schema**: destination schema name (string, default: "public") +- **table**: destination table name (string, required) +- **access_key_id**: access key id for AWS +- **secret_access_key: secret access key for AWS +- **iam_user_name: IAM user name for uploading temporary files to S3. The user should have permissions of `s3:GetObject`, `s3:PutObject`, `s3:ListBucket` and `sts:GetFederationToken`. +- **s3_bucket: S3 bucket name for temporary files +- **s3_key_prefix: S3 key prefix for temporary files (string, default:"") +- **options**: extra connection properties (hash, default: {}) +- **mode**: "replace" or "insert" (string, required) +- **batch_size**: size of a single batch insert (integer, default: 16777216) +- **default_timezone**: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp into a SQL string. This default_timezone option is used to control the timezone. You can overwrite timezone for each columns using column_options option. (string, default: `UTC`) +- **column_options**: advanced: a key-value pairs where key is a column name and value is options for the column. + - **type**: type of a column when this plugin creates new tables (e.g. `VARCHAR(255)`, `INTEGER NOT NULL UNIQUE`). This used when this plugin creates intermediate tables (insert, truncate_insert and merge modes), when it creates the target table (insert_direct and replace modes), and when it creates nonexistent target table automatically. (string, default: depends on input column type. `BIGINT` if input column type is long, `BOOLEAN` if boolean, `DOUBLE PRECISION` if double, `CLOB` if string, `TIMESTAMP` if timestamp) + - **value_type**: This plugin converts input column type (embulk type) into a database type to build a INSERT statement. This value_type option controls the type of the value in a INSERT statement. (string, default: depends on input column type. Available values options are: `byte`, `short`, `int`, `long`, `double`, `float`, `boolean`, `string`, `nstring`, `date`, `time`, `timestamp`, `decimal`, `null`, `pass`) + - **timestamp_format**: If input column type (embulk type) is timestamp and value_type is `string` or `nstring`, this plugin needs to format the timestamp value into a string. This timestamp_format option is used to control the format of the timestamp. (string, default: `%Y-%m-%d %H:%M:%S.%6N`) + - **timezone**: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp value into a SQL string. In this cases, this timezone option is used to control the timezone. (string, value of default_timezone option is used by default) + +### Modes + +* **insert**: + * Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs `INSERT INTO <target_table> SELECT * FROM <intermediate_table_1> UNION ALL SELECT * FROM <intermediate_table_2> UNION ALL ...` query. + * Transactional: Yes. This mode successfully writes all rows, or fails with writing zero rows. + * Resumable: Yes. +* **insert_direct**: + * Behavior: This mode inserts rows to the target table directly. + * Transactional: No. If fails, the target table could have some rows inserted. + * Resumable: No. +* **truncate_insert**: + * Behavior: Same with `insert` mode excepting that it truncates the target table right before the last `INSERT ...` query. + * Transactional: Yes. + * Resumable: Yes. +* **merge**: + * Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs `with updated AS (UPDATE .... RETURNING ...) INSERT INTO ....` query. + * Transactional: Yes. + * Resumable: Yes. +* **replace**: + * Behavior: Same with `insert` mode excepting that it truncates the target table right before the last `INSERT ...` query. + * Transactional: Yes. + * Resumable: No. + +### Example + +```yaml +out: + type: redshift + host: myinstance.us-west-2.redshift.amazonaws.com + user: pg + password: "" + database: my_database + table: my_table + access_key_id: ABCXYZ123ABCXYZ123 + secret_access_key: AbCxYz123aBcXyZ123 + iam_user_name: my-s3-read-only + s3_bucket: my-redshift-transfer-bucket + s3_key_prefix: temp/redshift + mode: insert +``` + +Advanced configuration: + +```yaml +out: + type: redshift + host: myinstance.us-west-2.redshift.amazonaws.com + user: pg + password: "" + database: my_database + table: my_table + access_key_id: ABCXYZ123ABCXYZ123 + secret_access_key: AbCxYz123aBcXyZ123 + iam_user_name: my-s3-read-only + s3_bucket: my-redshift-transfer-bucket + s3_key_prefix: temp/redshift + options: {loglevel: 2} + mode: insert_direct + column_options: + my_col_1: {type: 'VARCHAR(255)'} + my_col_3: {type: 'INT NOT NULL'} + my_col_4: {value_type: string, timestamp_format: `%Y-%m-%d %H:%M:%S %z`, timezone: '-0700'} + my_col_5: {type: 'DECIMAL(18,9)', value_type: pass} +``` + +### Build + +``` +$ ./gradlew gem +```