README.md in embulk-output-redshift-0.2.4 vs README.md in embulk-output-redshift-0.3.0

- old
+ new

@@ -1,47 +1,97 @@ -# Redshift output plugins for Embulk - -Redshift output plugins for Embulk loads records to Redshift. - -## Overview - -* **Plugin type**: output -* **Load all or nothing**: depnds on the mode: - * **insert**: no - * **replace**: yes -* **Resume supported**: no - -## Configuration - -- **host**: database host name (string, required) -- **port**: database port number (integer, default: 5439) -- **user**: database login user name (string, required) -- **password**: database login password (string, default: "") -- **database**: destination database name (string, required) -- **schema**: destination schema name (string, default: "public") -- **table**: destination table name (string, required) -- **mode**: "replace" or "insert" (string, required) -- **batch_size**: size of a single batch insert (integer, default: 16777216) -- **options**: extra connection properties (hash, default: {}) - -### Example - -```yaml -out: - type: redshift - host: myinstance.us-west-2.redshift.amazonaws.com - user: pg - password: "" - database: my_database - table: my_table - access_key_id: ABCXYZ123ABCXYZ123 - secret_access_key: AbCxYz123aBcXyZ123 - s3_bucket: my-redshift-transfer-bucket - iam_user_name: my-s3-read-only - mode: insert -``` - -### Build - -``` -$ ./gradlew gem -``` +# Redshift output plugins for Embulk + +Redshift output plugins for Embulk loads records to Redshift. + +## Overview + +* **Plugin type**: output +* **Load all or nothing**: depnds on the mode. see bellow. +* **Resume supported**: depnds on the mode. see bellow. + +## Configuration + +- **host**: database host name (string, required) +- **port**: database port number (integer, default: 5439) +- **user**: database login user name (string, required) +- **password**: database login password (string, default: "") +- **database**: destination database name (string, required) +- **schema**: destination schema name (string, default: "public") +- **table**: destination table name (string, required) +- **options**: extra connection properties (hash, default: {}) +- **mode**: "replace" or "insert" (string, required) +- **batch_size**: size of a single batch insert (integer, default: 16777216) +- **default_timezone**: If input column type (embulk type) is timestamp and destination column type is `string` or `nstring`, this plugin needs to format the timestamp into a string. This default_timezone option is used to control the timezone. You can overwrite timezone for each columns using column_options option. (string, default: `UTC`) +- **column_options**: advanced: a key-value pairs where key is a column name and value is options for the column. + - **type**: type of a column when this plugin creates new tables (e.g. `VARCHAR(255)`, `INTEGER NOT NULL UNIQUE`). This used when this plugin creates intermediate tables (insert, truncate_insert and merge modes), when it creates the target table (insert_direct and replace modes), and when it creates nonexistent target table automatically. (string, default: depends on input column type. `BIGINT` if input column type is long, `BOOLEAN` if boolean, `DOUBLE PRECISION` if double, `CLOB` if string, `TIMESTAMP` if timestamp) + - **value_type**: This plugin converts input column type (embulk type) into a database type to build a INSERT statement. This value_type option controls the type of the value in a INSERT statement. (string, default: depends on input column type. Available values options are: `byte`, `short`, `int`, `long`, `double`, `float`, `boolean`, `string`, `nstring`, `date`, `time`, `timestamp`, `decimal`, `null`, `pass`) + - **timestamp_format**: If input column type (embulk type) is timestamp and value_type is `string` or `nstring`, this plugin needs to format the timestamp value into a string. This timestamp_format option is used to control the format of the timestamp. (string, default: `%Y-%m-%d %H:%M:%S.%6N`) + - **timezone**: If input column type (embulk type) is timestamp and value_type is `string` or `nstring`, this plugin needs to format the timestamp value into a string. And if the input column type is timestamp and value_type is `date`, this plugin needs to consider timezone. In those cases, this timezone option is used to control the timezone. (string, value of default_timezone option is used by default) + +### Modes + +* **insert**: + * Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs `INSERT INTO <target_table> SELECT * FROM <intermediate_table_1> UNION ALL SELECT * FROM <intermediate_table_2> UNION ALL ...` query. + * Transactional: Yes. This mode successfully writes all rows, or fails with writing zero rows. + * Resumable: Yes. +* **insert_direct**: + * Behavior: This mode inserts rows to the target table directly. + * Transactional: No. If fails, the target table could have some rows inserted. + * Resumable: No. +* **truncate_insert**: + * Behavior: Same with `insert` mode excepting that it truncates the target table right before the last `INSERT ...` query. + * Transactional: Yes. + * Resumable: Yes. +* **merge**: + * Behavior: This mode writes rows to some intermediate tables first. If all those tasks run correctly, runs `INSERT INTO <target_table> SELECT * FROM <intermediate_table_1> UNION ALL SELECT * FROM <intermediate_table_2> UNION ALL ... ON DUPLICATE KEY UPDATE ...` query. + * Transactional: Yes. + * Resumable: Yes. +* **replace**: + * Behavior: Same with `insert` mode excepting that it truncates the target table right before the last `INSERT ...` query. + * Transactional: Yes. + * Resumable: No. + +### Example + +```yaml +out: + type: redshift + host: myinstance.us-west-2.redshift.amazonaws.com + user: pg + password: "" + database: my_database + table: my_table + access_key_id: ABCXYZ123ABCXYZ123 + secret_access_key: AbCxYz123aBcXyZ123 + s3_bucket: my-redshift-transfer-bucket + iam_user_name: my-s3-read-only + mode: insert +``` + +Advanced configuration: + +```yaml +out: + type: redshift + host: myinstance.us-west-2.redshift.amazonaws.com + user: pg + password: "" + database: my_database + table: my_table + access_key_id: ABCXYZ123ABCXYZ123 + secret_access_key: AbCxYz123aBcXyZ123 + s3_bucket: my-redshift-transfer-bucket + iam_user_name: my-s3-read-only + options: {loglevel: 2} + mode: insert_direct + column_options: + my_col_1: {type: 'VARCHAR(255)'} + my_col_3: {type: 'INT NOT NULL'} + my_col_4: {value_type: string, timestamp_format: `%Y-%m-%d %H:%M:%S %z`, timezone: '-0700'} + my_col_5: {type: 'DECIMAL(18,9)', value_type: pass} +``` + +### Build + +``` +$ ./gradlew gem +```