Sha256: 51e1e037a5c0a7331bb182d5f29b82c488a3ab2260f03ce555e79882e625ef48

Contents?: true

Size: 1.98 KB

Versions: 4

Compression:

Stored size: 1.98 KB

Contents

# UTF8Parquet output plugin for Embulk
** This is actually a clone of https://github.com/choplin/embulk-output-parquet/

We have added support for UTF-8 instead of binary fields


## Overview

* **Plugin type**: output
* **Load all or nothing**: no
* **Resume supported**: no
* **Cleanup supported**: no

## Install
```
embulk gem install embulk-output-utf8parquet
```

## Configuration

- **path_prefix**: A prefix of output path. This is hadoop Path URI, and you can also include `scheme` and `authority` within this parameter. (string, required)
- **file_ext**: An extension of output path. (string, default: .parquet)
- **sequence_format**: (string, default: .%03d)
- **block_size**: A block size of parquet file. (int, default: 134217728(128M))
- **page_size**: A page size of parquet file. (int, default: 1048576(1M))
- **compression_codec**: A compression codec. available: UNCOMPRESSED, SNAPPY, GZIP (string, default: UNCOMPRESSED)
- **default_timezone**: Time zone of timestamp columns. This can be overwritten for each column using column_options
- **default_timestamp_format**: Format of timestamp columns. This can be overwritten for each column using column_options
- **column_options**: Specify timezone and timestamp format for each column. Format of this option is the same as the official csv formatter. See [document](
http://www.embulk.org/docs/built-in.html#csv-formatter-plugin).
- **extra_configurations**: Add extra entries to Configuration which will be passed to ParquetWriter
- **overwrite**: Overwrite if output files already exist. (default: fail if files exist)
- **addUTF8**: If true, string columns are stored with OriginalType.UTF8 (boolean, default false)

## Example

```yaml
out:
  type: parquet
  path_prefix: file:///data/output
```

### How to write parquet files into S3

```yaml
out:
  type: parquet
  path_prefix: s3a://bucket/keys
  extra_configurations:
    fs.s3a.access.key: 'your_access_key'
    fs.s3a.secret.key: 'your_secret_access_key'
```

## Build

```
$ ./gradlew gem
```

Version data entries

4 entries across 4 versions & 1 rubygems

Version Path
embulk-output-utf8parquet-1.0.4 README.md
embulk-output-utf8parquet-1.0.3 README.md
embulk-output-utf8parquet-1.0.2 README.md
embulk-output-utf8parquet-1.0.1 README.md