README.md in masking-0.0.2 vs README.md in masking-0.0.3

- old
+ new

@@ -1,23 +1,17 @@ # MasKING🤴 [![Build Status](https://travis-ci.org/kibitan/masking.svg?branch=master)](https://travis-ci.org/kibitan/masking) [![Coverage Status](https://coveralls.io/repos/github/kibitan/masking/badge.svg?branch=master)](https://coveralls.io/github/kibitan/masking?branch=master) [![Maintainability](https://api.codeclimate.com/v1/badges/290b3005ecc193a3d138/maintainability)](https://codeclimate.com/github/kibitan/masking/maintainability) +[![Gem Version](https://badge.fury.io/rb/masking.svg)](https://badge.fury.io/rb/masking) The command line tool for anonymizing database records by parsing a SQL dump file and build new SQL dump file with masking sensitive/credential data. ## Installation ```bash -git clone git@github.com:kibitan/masking.git -bin/setup -``` - -or install it yourself as: - -```bash gem install masking ``` ## Requirement @@ -27,93 +21,128 @@ * MySQL 5.7...(TBC) ## Usage -1. setup configuration of target columns to `masking.yml` +1. Setup configuration for anonymizing target tables/columns to `masking.yml` - ```yaml - # table_name: - # column_name: masked_value + ```yaml + # table_name: + # column_name: masked_value - users: - string: anonymized string - email: anonymized+%{n}@example.com # %{n} will be replaced with sequential number - integer: 12345 - float: 123.45 - boolean: true - null: null - date: 2018-08-24 - time: 2018-08-24 15:54:06 - binary_or_blob: !binary | # Binary Data Language-Independent Type for YAML™ Version 1.1: http://yaml.org/type/binary.html - R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5 - OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+ - +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC - AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs= - ``` + users: + string: anonymized string + email: anonymized+%{n}@example.com # %{n} will be replaced with sequential number + integer: 12345 + float: 123.45 + boolean: true + null: null + date: 2018-08-24 + time: 2018-08-24 15:54:06 + binary_or_blob: !binary | # Binary Data Language-Independent Type for YAML™ Version 1.1: http://yaml.org/type/binary.html + R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5 + OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+ + +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC + AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs= + ``` -A value will be implicitly converted to compatible type. If you prefer to explicitly convert, you could use a tag as defined in [YAML Version 1.1](http://yaml.org/spec/current.html#id2503753) + A value will be implicitly converted to compatible type. If you prefer to explicitly convert, you could use a tag as defined in [YAML Version 1.1](http://yaml.org/spec/current.html#id2503753) -```yaml -not-date: !!str 2002-04-28 -``` + ```yaml + not-date: !!str 2002-04-28 + ``` -String should be matched with [MySQL String Type]( https://dev.mysql.com/doc/refman/8.0/en/string-type-overview.html). Integer/Float should be matched with [MySQL Numeric Type](https://dev.mysql.com/doc/refman/8.0/en/numeric-type-overview.html). Date/Time should be matched with [MySQL Date and Time Type](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-type-overview.html). + String should be matched with [MySQL String Type]( https://dev.mysql.com/doc/refman/8.0/en/string-type-overview.html). Integer/Float should be matched with [MySQL Numeric Type](https://dev.mysql.com/doc/refman/8.0/en/numeric-type-overview.html). Date/Time should be matched with [MySQL Date and Time Type](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-type-overview.html). -*NOTE: MasKING doesn't check actual schema's type from dump. If you put uncomaptible value it will cause error during restoring to database.* + *NOTE: MasKING doesn't check actual schema's type from dump. If you put uncomaptible value it will cause error during restoring to database.* -1. dump with mask +1. Dump database with anonymizing - MasKING works with `mysqldump --complete-insert` + MasKING works with `mysqldump --complete-insert` - ```bash - mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking > masked_dump.sql - ``` + ```bash + mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking > anonymized_dump.sql + ``` -1. restore +1. Restore from anonymized dump file - ```bash - mysql -u USERNAME MASKED_DATABASE_NAME < masked_dump.sql - ``` + ```bash + mysql -u USERNAME ANONYMIZED_DATABASE_NAME < anonymized_dump.sql + ``` + Tip: If you don't need to have anonymized dump file, you can directly insert from stream. It can be faster because it has less IO interaction. + + ```bash + mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking | mysql -u USERNAME ANONYMIZED_DATABASE_NAME + ``` + ### options ```bash $ masking -h Usage: masking [options] -c, --config=FILE_PATH specify config file. default: masking.yml ``` -## Run test & rubocop & notes +## Use case of annonymized (production) database +* Simulate for database migration and find a problem before release + +Some schema changing statement will lock table and it will cause trouble during the migration. But, without having a large number of record such as production, a migration will finish at the moment and easy to overlook. + +* Performance optimization of database queries + +Some database query can be slow, but some query isn't reproducible until you have similar amount of records/cardinality. + +* Finding bug before release on production + +Some bugs are related to unexpected data in production (for instance so long text, invalid/not-well formatted data) and it might be noticed after releasing in production. + +* Better development/demo of a feature + +Using similar data with real one will be good to make a good view of how feature looks like. It makes easy to find out the things to be changed/fixed before release/check the feature in production. + +* Analyze metrics on our production data with respecting GDPR + +We can use this database for BI and some trouble shooting. + +* And… your idea here! + +## Development + ```bash +git clone git@github.com:kibitan/masking.git +bin/setup +``` + +You can also run `bin/console` for an interactive prompt that will allow you to experiment. + +To install this gem onto your local machine, run `bundle exec rake install`. + +### Run test & rubocop & notes + +```bash bundle exec rake ``` -### Protip +#### Protip It's useful that set `rake` on [Git hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks). ```bash touch .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit && cat << EOF > .git/hooks/pre-commit #!/usr/bin/env bash bundle exec rake EOF ``` -### [Markdown lint](https://github.com/markdownlint/markdownlint) +#### [Markdown lint](https://github.com/markdownlint/markdownlint) ```bash bundle exec mdl *.md ``` -## Development - -After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment. - -To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org). - ### Profiling use `bin/masking_profile` ```bash @@ -125,22 +154,28 @@ $ open profile/flat.txt ``` see also: [ruby-prof/ruby-prof: ruby-prof: a code profiler for MRI rubies](https://github.com/ruby-prof/ruby-prof) +### Benchmark + +use `bin/benchmark.rb` + +```bash +$ bin/benchmark.rb + user system total real + 1.152776 0.207064 1.359840 ( 1.375090) +``` + ## Design Concept ### KISS ~ keep it simple, stupid ~ No connection to database, No handling file, Only dealing with stdin/stdout. ~ Do One Thing and Do It Well ~ ### No External Dependency Depend on only pure language standard libraries, no external libraries. (except development/test environment) - -### High Code Quality - -100% of code coverage [![Coverage Status](https://coveralls.io/repos/github/kibitan/masking/badge.svg?branch=master)](https://coveralls.io/github/kibitan/masking?branch=master) and low complexity [![Maintainability](https://api.codeclimate.com/v1/badges/290b3005ecc193a3d138/maintainability)](https://codeclimate.com/github/kibitan/masking/maintainability) ## Future Todo * Pluguable/customizable for a mask way e.g. integrate with [Faker](https://github.com/stympy/faker) * Compatible with other RDBMS e.g. PostgreSQL, Oracle, SQL Server