Sha256: 51e1e037a5c0a7331bb182d5f29b82c488a3ab2260f03ce555e79882e625ef48
Contents?: true
Size: 1.98 KB
Versions: 4
Compression:
Stored size: 1.98 KB
Contents
# UTF8Parquet output plugin for Embulk ** This is actually a clone of https://github.com/choplin/embulk-output-parquet/ We have added support for UTF-8 instead of binary fields ## Overview * **Plugin type**: output * **Load all or nothing**: no * **Resume supported**: no * **Cleanup supported**: no ## Install ``` embulk gem install embulk-output-utf8parquet ``` ## Configuration - **path_prefix**: A prefix of output path. This is hadoop Path URI, and you can also include `scheme` and `authority` within this parameter. (string, required) - **file_ext**: An extension of output path. (string, default: .parquet) - **sequence_format**: (string, default: .%03d) - **block_size**: A block size of parquet file. (int, default: 134217728(128M)) - **page_size**: A page size of parquet file. (int, default: 1048576(1M)) - **compression_codec**: A compression codec. available: UNCOMPRESSED, SNAPPY, GZIP (string, default: UNCOMPRESSED) - **default_timezone**: Time zone of timestamp columns. This can be overwritten for each column using column_options - **default_timestamp_format**: Format of timestamp columns. This can be overwritten for each column using column_options - **column_options**: Specify timezone and timestamp format for each column. Format of this option is the same as the official csv formatter. See [document]( http://www.embulk.org/docs/built-in.html#csv-formatter-plugin). - **extra_configurations**: Add extra entries to Configuration which will be passed to ParquetWriter - **overwrite**: Overwrite if output files already exist. (default: fail if files exist) - **addUTF8**: If true, string columns are stored with OriginalType.UTF8 (boolean, default false) ## Example ```yaml out: type: parquet path_prefix: file:///data/output ``` ### How to write parquet files into S3 ```yaml out: type: parquet path_prefix: s3a://bucket/keys extra_configurations: fs.s3a.access.key: 'your_access_key' fs.s3a.secret.key: 'your_secret_access_key' ``` ## Build ``` $ ./gradlew gem ```
Version data entries
4 entries across 4 versions & 1 rubygems
Version | Path |
---|---|
embulk-output-utf8parquet-1.0.4 | README.md |
embulk-output-utf8parquet-1.0.3 | README.md |
embulk-output-utf8parquet-1.0.2 | README.md |
embulk-output-utf8parquet-1.0.1 | README.md |