# pgsync

Sync Postgres data between databases. Designed for:

- **speed** - up to 4x faster than traditional tools on a 4-core machine
- **security** - built-in methods to prevent sensitive data from ever leaving the server
- **convenience** - sync partial tables, groups of tables, and related records

:tangerine: Battle-tested at [Instacart](https://www.instacart.com/opensource)

[![Build Status](https://travis-ci.org/ankane/pgsync.svg?branch=master)](https://travis-ci.org/ankane/pgsync)

## Installation

pgsync is a command line tool. To install, run:

```sh
gem install pgsync
```

This will give you the `pgsync` command.

In your project directory, run:

```sh
pgsync --setup
```

This creates `.pgsync.yml` for you to customize. We recommend checking this into your version control (assuming it doesn’t contain sensitive information). `pgsync` commands can be run from this directory or any subdirectory.

## How to Use

Sync all tables

```sh
pgsync
```

**Note:** pgsync assumes your schema is already set up on your local machine. See the [schema section](#schema) if that’s not the case.

Sync specific tables

```sh
pgsync table1,table2
```

Sync specific rows (existing rows are overwritten)

```sh
pgsync products "where store_id = 1"
```

You can also preserve existing rows

```sh
pgsync products "where store_id = 1" --preserve
```

Or truncate them

```sh
pgsync products "where store_id = 1" --truncate
```

### Exclude Tables

```sh
pgsync --exclude users
```

To always exclude, add to `.pgsync.yml`.

```yml
exclude:
  - table1
  - table2
```

For Rails, you probably want to exclude schema migrations and ActiveRecord metadata.

```yml
exclude:
  - schema_migrations
  - ar_internal_metadata
```

### Groups

Define groups in `.pgsync.yml`:

```yml
groups:
  group1:
    - table1
    - table2
```

And run:

```sh
pgsync group1
```

You can also use groups to sync a specific record and associated records in other tables.

To get product `123` with its reviews, last 10 coupons, and store, use:

```yml
groups:
  product:
    products: "where id = {1}"
    reviews: "where product_id = {1}"
    coupons: "where product_id = {1} order by created_at desc limit 10"
    stores: "where id in (select store_id from products where id = {1})"
```

And run:

```sh
pgsync product:123
```

### Schema

Sync schema before the data

```sh
pgsync --schema-first
```

**Note:** This wipes out existing data

Specify tables

```sh
pgsync table1,table2 --schema-first
```

Or just the schema

```sh
pgsync --schema-only
```

## Sensitive Information

Prevent sensitive information - like passwords and email addresses - from leaving the remote server.

Define rules in `.pgsync.yml`:

```yml
data_rules:
  email: unique_email
  last_name: random_letter
  birthday: random_date
  users.auth_token:
    value: secret
  visits_count:
    statement: "(RANDOM() * 10)::int"
  encrypted_*: null
```

`last_name` matches all columns named `last_name` and `users.last_name` matches only the users table. Wildcards are supported, and the first matching rule is applied.

Options for replacement are:

- null
- value
- statement
- unique_email
- unique_phone
- unique_secret
- random_letter
- random_int
- random_date
- random_time
- random_ip
- random_string
- random_number
- untouched

## Multiple Databases

To use with multiple databases, run:

```sh
pgsync --setup db2
```

This creates `.pgsync-db2.yml` for you to edit. Specify a database in commands with:

```sh
pgsync --db db2
```

## Safety

To keep you from accidentally overwriting production, the destination is limited to `localhost` or `127.0.0.1` by default.

To use another host, add `to_safe: true` to your `.pgsync.yml`.

## Large Tables

For extremely large tables, sync in batches.

```sh
pgsync large_table --in-batches
```

The script will resume where it left off when run again, making it great for backfills.

## Foreign Keys

By default, tables are copied in parallel. If you use foreign keys, this can cause violations. You can specify tables to be copied serially with:

```sh
pgsync group1 --debug
```

## Reference

Help

```sh
pgsync --help
```

Version

```sh
pgsync --version
```

## Setup Scripts

Use groups when possible to take advantage of parallelism.

For Ruby scripts, you may need to do:

```rb
Bundler.with_clean_env do
  system "pgsync ..."
end
```

## Upgrading

Run:

```sh
gem install pgsync
```

To use master, run:

```sh
gem install specific_install
gem specific_install https://github.com/ankane/pgsync.git
```

## Thanks

Inspired by [heroku-pg-transfer](https://github.com/ddollar/heroku-pg-transfer).

## Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

- [Report bugs](https://github.com/ankane/pgsync/issues)
- Fix bugs and [submit pull requests](https://github.com/ankane/pgsync/pulls)
- Write, clarify, or fix documentation
- Suggest or add new features