Sha256: 9a28550b53f85b23b736acb994a48647444b5657e9d3db0a1ef916865c9a1a75

Contents?: true

Size: 1.49 KB

Versions: 3

Compression:

Stored size: 1.49 KB

Contents

# Hdfs file input plugin for Embulk

Read files on Hdfs.

## Overview

* **Plugin type**: file input
* **Resume supported**: not yet
* **Cleanup supported**: no

## Configuration

- **config_files** list of paths to Hadoop's configuration files (array of strings, default: `[]`)
- **config** overwrites configuration parameters (hash, default: `{}`)
- **input_path** file path on Hdfs. you can use glob and Date format like `%Y%m%d/%s`.
- **rewind_seconds** When you use Date format in input_path property, the format is executed by using the time which is Now minus this property.

## Example

```yaml
in:
  type: hdfs
  config_files:
    - /opt/analytics/etc/hadoop/conf/core-site.xml
    - /opt/analytics/etc/hadoop/conf/hdfs-site.xml
  config:
    fs.defaultFS: 'hdfs://hdp-nn1:8020'
    dfs.replication: 1
    fs.hdfs.impl: 'org.apache.hadoop.hdfs.DistributedFileSystem'
    fs.file.impl: 'org.apache.hadoop.fs.LocalFileSystem'
  input_path: /user/embulk/test/%Y-%m-%d/*
  rewind_seconds: 86400
  decoders:
    - {type: gzip}
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: "\t"
    quote: ''
    escape: ''
    trim_if_not_quoted: true
    skip_header_lines: 0
    allow_extra_columns: true
    allow_optional_columns: true
    columns:
    - {name: c0, type: string}
    - {name: c1, type: string}
    - {name: c2, type: string}
    - {name: c3, type: long}
```

## Build

```
$ ./gradlew gem
```

## Development

```
$ ./gradlew classpath
$ bundle exec embulk run -I lib example.yml
```

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
embulk-input-hdfs-0.0.3 README.md
embulk-input-hdfs-0.0.2 README.md
embulk-input-hdfs-0.0.1 README.md