README.md in masking-0.0.2 vs README.md in masking-0.0.3
- old
+ new
@@ -1,23 +1,17 @@
# MasKING🤴
[](https://travis-ci.org/kibitan/masking)
[](https://coveralls.io/github/kibitan/masking?branch=master)
[](https://codeclimate.com/github/kibitan/masking/maintainability)
+[](https://badge.fury.io/rb/masking)
The command line tool for anonymizing database records by parsing a SQL dump file and build new SQL dump file with masking sensitive/credential data.
## Installation
```bash
-git clone git@github.com:kibitan/masking.git
-bin/setup
-```
-
-or install it yourself as:
-
-```bash
gem install masking
```
## Requirement
@@ -27,93 +21,128 @@
* MySQL 5.7...(TBC)
## Usage
-1. setup configuration of target columns to `masking.yml`
+1. Setup configuration for anonymizing target tables/columns to `masking.yml`
- ```yaml
- # table_name:
- # column_name: masked_value
+ ```yaml
+ # table_name:
+ # column_name: masked_value
- users:
- string: anonymized string
- email: anonymized+%{n}@example.com # %{n} will be replaced with sequential number
- integer: 12345
- float: 123.45
- boolean: true
- null: null
- date: 2018-08-24
- time: 2018-08-24 15:54:06
- binary_or_blob: !binary | # Binary Data Language-Independent Type for YAML™ Version 1.1: http://yaml.org/type/binary.html
- R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5
- OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+
- +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC
- AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs=
- ```
+ users:
+ string: anonymized string
+ email: anonymized+%{n}@example.com # %{n} will be replaced with sequential number
+ integer: 12345
+ float: 123.45
+ boolean: true
+ null: null
+ date: 2018-08-24
+ time: 2018-08-24 15:54:06
+ binary_or_blob: !binary | # Binary Data Language-Independent Type for YAML™ Version 1.1: http://yaml.org/type/binary.html
+ R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5
+ OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+
+ +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC
+ AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs=
+ ```
-A value will be implicitly converted to compatible type. If you prefer to explicitly convert, you could use a tag as defined in [YAML Version 1.1](http://yaml.org/spec/current.html#id2503753)
+ A value will be implicitly converted to compatible type. If you prefer to explicitly convert, you could use a tag as defined in [YAML Version 1.1](http://yaml.org/spec/current.html#id2503753)
-```yaml
-not-date: !!str 2002-04-28
-```
+ ```yaml
+ not-date: !!str 2002-04-28
+ ```
-String should be matched with [MySQL String Type]( https://dev.mysql.com/doc/refman/8.0/en/string-type-overview.html). Integer/Float should be matched with [MySQL Numeric Type](https://dev.mysql.com/doc/refman/8.0/en/numeric-type-overview.html). Date/Time should be matched with [MySQL Date and Time Type](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-type-overview.html).
+ String should be matched with [MySQL String Type]( https://dev.mysql.com/doc/refman/8.0/en/string-type-overview.html). Integer/Float should be matched with [MySQL Numeric Type](https://dev.mysql.com/doc/refman/8.0/en/numeric-type-overview.html). Date/Time should be matched with [MySQL Date and Time Type](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-type-overview.html).
-*NOTE: MasKING doesn't check actual schema's type from dump. If you put uncomaptible value it will cause error during restoring to database.*
+ *NOTE: MasKING doesn't check actual schema's type from dump. If you put uncomaptible value it will cause error during restoring to database.*
-1. dump with mask
+1. Dump database with anonymizing
- MasKING works with `mysqldump --complete-insert`
+ MasKING works with `mysqldump --complete-insert`
- ```bash
- mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking > masked_dump.sql
- ```
+ ```bash
+ mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking > anonymized_dump.sql
+ ```
-1. restore
+1. Restore from anonymized dump file
- ```bash
- mysql -u USERNAME MASKED_DATABASE_NAME < masked_dump.sql
- ```
+ ```bash
+ mysql -u USERNAME ANONYMIZED_DATABASE_NAME < anonymized_dump.sql
+ ```
+ Tip: If you don't need to have anonymized dump file, you can directly insert from stream. It can be faster because it has less IO interaction.
+
+ ```bash
+ mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking | mysql -u USERNAME ANONYMIZED_DATABASE_NAME
+ ```
+
### options
```bash
$ masking -h
Usage: masking [options]
-c, --config=FILE_PATH specify config file. default: masking.yml
```
-## Run test & rubocop & notes
+## Use case of annonymized (production) database
+* Simulate for database migration and find a problem before release
+
+Some schema changing statement will lock table and it will cause trouble during the migration. But, without having a large number of record such as production, a migration will finish at the moment and easy to overlook.
+
+* Performance optimization of database queries
+
+Some database query can be slow, but some query isn't reproducible until you have similar amount of records/cardinality.
+
+* Finding bug before release on production
+
+Some bugs are related to unexpected data in production (for instance so long text, invalid/not-well formatted data) and it might be noticed after releasing in production.
+
+* Better development/demo of a feature
+
+Using similar data with real one will be good to make a good view of how feature looks like. It makes easy to find out the things to be changed/fixed before release/check the feature in production.
+
+* Analyze metrics on our production data with respecting GDPR
+
+We can use this database for BI and some trouble shooting.
+
+* And… your idea here!
+
+## Development
+
```bash
+git clone git@github.com:kibitan/masking.git
+bin/setup
+```
+
+You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+
+To install this gem onto your local machine, run `bundle exec rake install`.
+
+### Run test & rubocop & notes
+
+```bash
bundle exec rake
```
-### Protip
+#### Protip
It's useful that set `rake` on [Git hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks).
```bash
touch .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit && cat << EOF > .git/hooks/pre-commit
#!/usr/bin/env bash
bundle exec rake
EOF
```
-### [Markdown lint](https://github.com/markdownlint/markdownlint)
+#### [Markdown lint](https://github.com/markdownlint/markdownlint)
```bash
bundle exec mdl *.md
```
-## Development
-
-After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
-
-To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
-
### Profiling
use `bin/masking_profile`
```bash
@@ -125,22 +154,28 @@
$ open profile/flat.txt
```
see also: [ruby-prof/ruby-prof: ruby-prof: a code profiler for MRI rubies](https://github.com/ruby-prof/ruby-prof)
+### Benchmark
+
+use `bin/benchmark.rb`
+
+```bash
+$ bin/benchmark.rb
+ user system total real
+ 1.152776 0.207064 1.359840 ( 1.375090)
+```
+
## Design Concept
### KISS ~ keep it simple, stupid ~
No connection to database, No handling file, Only dealing with stdin/stdout. ~ Do One Thing and Do It Well ~
### No External Dependency
Depend on only pure language standard libraries, no external libraries. (except development/test environment)
-
-### High Code Quality
-
-100% of code coverage [](https://coveralls.io/github/kibitan/masking?branch=master) and low complexity [](https://codeclimate.com/github/kibitan/masking/maintainability)
## Future Todo
* Pluguable/customizable for a mask way e.g. integrate with [Faker](https://github.com/stympy/faker)
* Compatible with other RDBMS e.g. PostgreSQL, Oracle, SQL Server