Sha256: 917956d18af3dc728be3430796a503162504289f1b4a7dd95a51c8ea152a120d
Contents?: true
Size: 1.57 KB
Versions: 1
Compression:
Stored size: 1.57 KB
Contents
# Parquet output plugin for Embulk ## Overview * **Plugin type**: output * **Load all or nothing**: no * **Resume supported**: no * **Cleanup supported**: no ## Configuration - **path_prefix**: A prefix of output path. This is hadoop Path URI, and you can also include `scheme` and `authority` within this parameter. (string, required) - **file_ext**: An extension of output path. (string, default: .parquet) - **sequence_format**: (string, default: .%03d) - **block_size**: A block size of parquet file. (int, default: 134217728(128M)) - **page_size**: A page size of parquet file. (int, default: 1048576(1M)) - **compression_codec**: A compression codec. available: UNCOMPRESSED, SNAPPY, GZIP (string, default: UNCOMPRESSED) - **default_timezone**: Time zone of timestamp columns. This can be overwritten for each column using column_options - **default_timestamp_format**: Format of timestamp columns. This can be overwritten for each column using column_options - **column_options**: Specify timezone and timestamp format for each column. Format of this option is the same as the official csv formatter. See [document]( http://www.embulk.org/docs/built-in.html#csv-formatter-plugin). - **extra_configurations**: Add extra entries to Configuration which will be passed to ParquetWriter - **overwrite**: Overwrite if output files already exist. (default: fail if files exist) ## Example ```yaml out: type: parquet path_prefix: s3a://bucket/keys extra_configuration: fs.s3a.access.key: 'your_access_key' fs.s3a.secret.key: 'your_secret_access_key' ``` ## Build ``` $ ./gradlew gem ```
Version data entries
1 entries across 1 versions & 1 rubygems
Version | Path |
---|---|
embulk-output-parquet-0.4.0 | README.md |