# Mask filter plugin for Embulk
[![Coverage Status](https://coveralls.io/repos/github/beniyama/embulk-filter-mask/badge.svg)](https://coveralls.io/github/beniyama/embulk-filter-mask)
Mask columns with asterisks in a variety of patterns (still in initial development phase and missing basic features to use in production).
## Overview
* **Plugin type**: filter
## Configuration
*Caution* : Now we use `type` to specify mask types such as `all` and `email`, instead of `pattern` which was used in version 0.1.1 or earlier.
- **columns**: target columns which would be replaced with asterisks (string, required)
- **name**: name of the column (string, required)
- **type**: mask type, `all`, `email`, `regex` or `substring` (string, default: `all`)
- **paths**: list of JSON path and type, works if the column type is JSON
- `[{key: $.json_path1}, {key: $.json_path2}]` would mask both `$.json_path1` and `$.json_path2` nodes
- Elements under the nodes would be converted to string and then masked (e.g., `[0,1,2]` -> `*******`)
- **length**: if specified, this filter replaces the column with fixed number of asterisks (integer, optional. supported only in `all`, `email`, `substring`.)
- **pattern**: Regex pattern such as "[0-9]+" (string, required for `regex` type)
- **start**: The beginning index for `substring` type. The value starts from 0 and inclusive (integer, default: 0)
- **end**: The ending index for `substring` type. The value is exclusive (integer, default: length of the target column)
## Example
If you have below data in csv or other format file,
|first_name | last_name | gender | age | contact |
|---|---|---|---|---|
| Benjamin | Bell | male | 30 | bell.benjamin_dummy@example.com |
| Lucas | Duncan | male | 20 | lucas.duncan_dummy@example.com |
| Elizabeth | May | female | 25 | elizabeth.may_dummy@example.com |
| Christian | Reid | male | 15 | christian.reid_dummy@example.com |
| Amy | Avery | female | 40 | amy.avercy_dummy@example.com |
below filter configuration
```yaml
filters:
- type: mask
columns:
- { name: last_name}
- { name: age}
- { name: contact, type: email, length: 5}
```
would produce
|first_name | last_name | gender | age | contact |
|---|---|---|---|---|
| Benjamin | **** | male | ** | *****@example.com |
| Lucas | ****** | male | ** | *****@example.com |
| Elizabeth | *** | female | ** | *****@example.com |
| Christian | **** | male | ** | *****@example.com |
| Amy | ***** | female | ** | *****@example.com |
If you use `regex` and/or `substring` types,
```yaml
filters:
- type: mask
columns:
- { name: last_name, type: regex, pattern: "[a-z]"}
- { name: contact, type: substring, start: 5, length: 5}
```
would produce
|first_name | last_name | gender | age | contact |
|---|---|---|---|---|
| B******* | Bell | male | 30 | bell.***** |
| L**** | Duncan | male | 20 | lucas***** |
| E******* | May | female | 25 | eliza***** |
| C******** | Reid | male | 15 | chris***** |
| A** | Avery | female | 40 | amy.a***** |
JSON type column is also partially supported.
If you have a `user` column with this JSON data structure
```json
{
"full_name": {
"first_name": "Benjamin",
"last_name": "Bell"
},
"gender": "male",
"age": 30,
"email": "test_mail@example.com"
}
```
below filter configuration
```yaml
filters:
- type: mask
columns:
- { name: user, paths: [{key: $.full_name.first_name}, {key: $.email, type: email}]}
```
would produce
```json
{
"full_name": {
"first_name": "********",
"last_name": "Bell"
},
"gender": "male",
"age": 30,
"email": "*********@example.com"
}
```
## Build
```
$ ./gradlew gem # -t to watch change of files and rebuild continuously
```