= MyObfuscate
You want to develop against real production data, but you don't want to violate your users' privacy. Enter MyObfuscate: standalone Ruby code for the selective rewriting of SQL dumps in order to protect user privacy. It supports MySQL, Postgres, and SQL Server.
= Install
(sudo) gem install my_obfuscate
= Example Usage
Make an obfuscator.rb script:
#!/usr/bin/env ruby
require "rubygems"
require "my_obfuscate"
obfuscator = MyObfuscate.new({
:people => {
:email => { :type => :email, :skip_regexes => [/^[\w\.\_]+@my_company\.com$/i] },
:ethnicity => :keep,
:crypted_password => { :type => :fixed, :string => "SOME_FIXED_PASSWORD_FOR_EASE_OF_DEBUGGING" },
:salt => { :type => :fixed, :string => "SOME_THING" },
:remember_token => :null,
:remember_token_expires_at => :null,
:age => { :type => :null, :unless => lambda { |person| person[:email] == "hello@example.com" } },
:photo_file_name => :null,
:photo_content_type => :null,
:photo_file_size => :null,
:photo_updated_at => :null,
:postal_code => { :type => :fixed, :string => "94109", :unless => lambda {|person| person[:postal_code] == "12345"} },
:name => :name,
:full_address => :address,
:bio => { :type => :lorem, :number => 4 },
:relationship_status => { :type => :fixed, :one_of => ["Single", "Divorced", "Married", "Engaged", "In a Relationship"] },
:has_children => { :type => :integer, :between => 0..1 },
},
:invites => :truncate,
:invite_requests => :truncate,
:tags => :keep,
:relationships => {
:account_id => :keep,
:code => { :type => :string, :length => 8, :chars => MyObfuscate::USERNAME_CHARS }
}
})
obfuscator.fail_on_unspecified_columns = true # if you want it to require every column in the table to be in the above definition
obfuscator.globally_kept_columns = %w[id created_at updated_at] # if you set fail_on_unspecified_columns, you may want this as well
obfuscator.obfuscate(STDIN, STDOUT)
And to get an obfuscated dump:
mysqldump -c --add-drop-table --hex-blob -u user -ppassword database | ruby obfuscator.rb > obfuscated_dump.sql
Note that the -c option on mysqldump is required to use my_obfuscator. Additionally, the default behavior of mysqldump
is to output special characters. This may cause trouble, so you can request hex-encoded blob content with --hex-blob.
If you get MySQL errors due to very long lines, try some combination of --max_allowed_packet=128M, --single-transaction, --skip-extended-insert, and --quick.
== Database Server
By default the database type is assumed to be MySQL, but you can use the
builtin SQL Server support by specifying:
obfuscator.database_type = :sql_server
obfuscator.database_type = :postgres
== Types
Available types include: email, string, lorem, name, first_name, last_name, address, street_address, city, state,
zip_code, phone, company, ipv4, ipv6, url, integer, fixed, null, and keep.
== Changes
* Support for Postgres. Thanks @samuelreh!
* Support for SQL Server
* :unless and :if now support :nil as a shorthand for a Proc that checks for nil
* :name, :lorem, and :address are all now supported types. You can pass :number to :lorem to specify how many sentences to generate. The default is one.
* { :type => :whatever } is now optional when no additional options are needed. Just use :whatever.
* Warnings are thrown when an unknown column type or table is encountered. Use :keep in both cases.
* { :type => :fixed, :string => Proc { |row| ... } } is now available.
== Note on Patches/Pull Requests
* Fork the project.
* Make your feature addition or bug fix.
* Add tests for it. This is important so I don't break it in a future version unintentionally.
* Commit, do not mess with rakefile, version, or history. (If you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
* Send me a pull request. Bonus points for topic branches.
== Thanks
Thanks to Honk for the original gem, Iteration Labs for prior maintenance work, and Pivotal Labs for patches and updates!
== LICENSE
This work is provided under the MIT License. See the included LICENSE file.
The included English word frequency list used for generating random text is provided under the Creative Commons – Attribution / ShareAlike 3.0 license by http://invokeit.wordpress.com/frequency-word-lists/