# pgslice Postgres partitioning as easy as pie. Works great for both new and existing tables, with zero downtime and minimal app changes. Archive older data on a rolling basis to keep your database size under control. :tangerine: Battle-tested at [Instacart](https://www.instacart.com/opensource) [![Build Status](https://travis-ci.org/ankane/pgslice.svg?branch=master)](https://travis-ci.org/ankane/pgslice) ## Install pgslice is a command line tool. To install, run: ```sh gem install pgslice ``` This will give you the `pgslice` command. ## Steps 1. Ensure the table you want to partition has been created. We’ll refer to this as ``. 2. Specify your database credentials ```sh export PGSLICE_URL=postgres://localhost/myapp_development ``` 3. Create an intermediate table ```sh pgslice prep
``` Period can be `day` or `month`. This creates a table named `
_intermediate` with the appropriate trigger for partitioning. 4. Add partitions ```sh pgslice add_partitions
--intermediate --past 3 --future 3 ``` This creates child tables that inherit from the intermediate table. Use the `--past` and `--future` options to control the number of partitions. 5. *Optional, for tables with data* - Fill the partitions in batches with data from the original table ```sh pgslice fill
``` Use the `--batch-size` and `--sleep` options to control the speed. To sync data across different databases, check out [pgsync](https://github.com/ankane/pgsync). 6. Analyze tables ```sh pgslice analyze
``` 7. Swap the intermediate table with the original table ```sh pgslice swap
``` The original table is renamed `
_retired` and the intermediate table is renamed `
`. 8. Fill the rest (rows inserted between the first fill and the swap) ```sh pgslice fill
--swapped ``` 9. Back up the retired table with a tool like [pg_dump](https://www.postgresql.org/docs/current/static/app-pgdump.html) and drop it ```sql pg_dump -c -Fc -t
_retired $PGSLICE_URL >
_retired.dump psql -c "DROP
_retired" $PGSLICE_URL ``` ## Sample Output pgslice prints the SQL commands that were executed on the server. To print without executing, use the `--dry-run` option. ```sh pgslice prep visits created_at month ``` ```sql -- Postgres 10 BEGIN; CREATE TABLE visits_intermediate (LIKE visits INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING STORAGE INCLUDING COMMENTS) PARTITION BY RANGE (created_at); COMMENT ON TABLE visits_intermediate is 'column:created_at,period:day'; COMMIT; -- Postgres 9.6 and below BEGIN; CREATE TABLE visits_intermediate (LIKE visits INCLUDING ALL); CREATE FUNCTION visits_insert_trigger() RETURNS trigger AS $$ BEGIN RAISE EXCEPTION 'Create partitions first.'; END; $$ LANGUAGE plpgsql; CREATE TRIGGER visits_insert_trigger BEFORE INSERT ON visits_intermediate FOR EACH ROW EXECUTE PROCEDURE visits_insert_trigger(); COMMENT ON TRIGGER visits_insert_trigger ON visits_intermediate is 'column:created_at,period:month'; COMMIT; ``` ```sh pgslice add_partitions visits --intermediate --past 1 --future 1 ``` ```sql -- Postgres 10 BEGIN; CREATE TABLE visits_201608 PARTITION OF visits_intermediate FOR VALUES FROM ('2016-08-01') TO ('2016-09-01'); ALTER TABLE visits_201608 ADD PRIMARY KEY (id); CREATE INDEX ON visits_201608 USING btree (user_id); CREATE TABLE visits_201609 PARTITION OF visits_intermediate FOR VALUES FROM ('2016-09-01') TO ('2016-10-01'); ALTER TABLE visits_201609 ADD PRIMARY KEY (id); CREATE INDEX ON visits_201609 USING btree (user_id); CREATE TABLE visits_201610 PARTITION OF visits_intermediate FOR VALUES FROM ('2016-10-01') TO ('2016-11-01'); ALTER TABLE visits_201610 ADD PRIMARY KEY (id); CREATE INDEX ON visits_201610 USING btree (user_id); COMMIT; -- Postgres 9.6 and below BEGIN; CREATE TABLE visits_201608 (CHECK (created_at >= '2016-08-01'::date AND created_at < '2016-09-01'::date)) INHERITS (visits_intermediate); ALTER TABLE visits_201608 ADD PRIMARY KEY (id); CREATE INDEX ON visits_201608 USING btree (user_id); CREATE TABLE visits_201609 (CHECK (created_at >= '2016-09-01'::date AND created_at < '2016-10-01'::date)) INHERITS (visits_intermediate); ALTER TABLE visits_201609 ADD PRIMARY KEY (id); CREATE INDEX ON visits_201609 USING btree (user_id); CREATE TABLE visits_201610 (CHECK (created_at >= '2016-10-01'::date AND created_at < '2016-11-01'::date)) INHERITS (visits_intermediate); ALTER TABLE visits_201610 ADD PRIMARY KEY (id); CREATE INDEX ON visits_201610 USING btree (user_id); CREATE OR REPLACE FUNCTION visits_insert_trigger() RETURNS trigger AS $$ BEGIN IF (NEW.created_at >= '2016-09-01'::date AND NEW.created_at < '2016-10-01'::date) THEN INSERT INTO visits_201609 VALUES (NEW.*); ELSIF (NEW.created_at >= '2016-10-01'::date AND NEW.created_at < '2016-11-01'::date) THEN INSERT INTO visits_201610 VALUES (NEW.*); ELSIF (NEW.created_at >= '2016-08-01'::date AND NEW.created_at < '2016-09-01'::date) THEN INSERT INTO visits_201608 VALUES (NEW.*); ELSE RAISE EXCEPTION 'Date out of range. Ensure partitions are created.'; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql; COMMIT; ``` ```sh pgslice fill visits ``` ```sql /* 1 of 3 */ INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at") SELECT "id", "user_id", "ip", "created_at" FROM visits WHERE id > 0 AND id <= 10000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date /* 2 of 3 */ INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at") SELECT "id", "user_id", "ip", "created_at" FROM visits WHERE id > 10000 AND id <= 20000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date /* 3 of 3 */ INSERT INTO visits_intermediate ("id", "user_id", "ip", "created_at") SELECT "id", "user_id", "ip", "created_at" FROM visits WHERE id > 20000 AND id <= 30000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date ``` ```sh pgslice analyze visits ``` ```sql ANALYZE VERBOSE visits_201608; ANALYZE VERBOSE visits_201609; ANALYZE VERBOSE visits_201610; ANALYZE VERBOSE visits_intermediate; ``` ```sh pgslice swap visits ``` ```sql BEGIN; SET LOCAL lock_timeout = '5s'; ALTER TABLE visits RENAME TO visits_retired; ALTER TABLE visits_intermediate RENAME TO visits; ALTER SEQUENCE visits_id_seq OWNED BY visits.id; COMMIT; ``` ## Adding Partitions To add partitions, use: ```sh pgslice add_partitions
--future 3 ``` Add this as a cron job to create a new partition each day or month. ```sh # day 0 0 * * * pgslice add_partitions
--future 3 --url ... # month 0 0 1 * * pgslice add_partitions
--future 3 --url ... ``` Add a monitor to ensure partitions are being created. ```sql SELECT 1 FROM pg_catalog.pg_class c INNER JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind = 'r' AND n.nspname = 'public' AND c.relname = '
_' || to_char(NOW() + INTERVAL '3 days', 'YYYYMMDD') -- for months, use to_char(NOW() + INTERVAL '3 months', 'YYYYMM') ``` ## Archiving Partitions Back up and drop older partitions each day or month. ```sh pg_dump -c -Fc -t
_201609 $PGSLICE_URL >
_201609.dump psql -c "DROP
_201609" $PGSLICE_URL ``` If you use [Amazon S3](https://aws.amazon.com/s3/) for backups, [s3cmd](https://github.com/s3tools/s3cmd) is a nice tool. ```sh s3cmd put
_201609.dump s3:///
_201609.dump ``` ## Additional Commands To undo prep (which will delete partitions), use: ```sh pgslice unprep
``` To undo swap, use: ```sh pgslice unswap
``` ## App Considerations This set up allows you to read and write with the original table name with no knowledge it’s partitioned. However, there are a few things to be aware of. ### Writes Before Postgres 10, if you use `INSERT` statements with a `RETURNING` clause (as frameworks like Rails do), you’ll no longer receive the id of the newly inserted record(s) back. If you need this, you can either: 1. Insert directly into the partition 2. Get value before the insert with `SELECT nextval('sequence_name')` (for multiple rows, append `FROM generate_series(1, n)`) ### Reads When possible, queries should include the column you partition on to limit the number of partitions the database needs to check. For instance, if you partition on `created_at`, try to include it in queries: ```sql SELECT * FROM visits WHERE user_id = 123 AND -- for performance created_at >= '2016-09-01' AND created_at < '2016-09-02' ``` For this to be effective, ensure `constraint_exclusion` is set to `partition` (default value) or `on`. ```sql SHOW constraint_exclusion; ``` ## One Off Tasks You can also use pgslice to reduce the size of a table without partitioning by creating a new table, filling it with a subset of records, and swapping it in. ```sh pgslice prep
--no-partition pgslice fill
--where "id > 1000" # use any conditions pgslice swap
``` ## Schema Updates Once a table is partitioned, here’s how to change the schema: - To add, remove, or modify a column, make the update on the master table only - To add or remove an index, make the update on the master table and all partitions (for Postgres 11, make the update on the master table only) ## Declarative Partitioning Postgres 10 introduces [declarative partitioning](https://www.postgresql.org/docs/10/static/ddl-partitioning.html#ddl-partitioning-declarative). A major benefit is `INSERT` statements with a `RETURNING` clause work as expected. If you prefer to use trigger-based partitioning instead (not recommended), pass the `--trigger-based` option to the `prep` command. ## Data Protection When connecting to a remote database, make sure your connection is secure. If you do not use a VPN, you must use `sslmode=verify-full` with a root certificate to [protect against MITM attacks](https://www.postgresql.org/docs/current/static/libpq-ssl.html). If you don’t do this, your database credentials can be compromised. This cannot be understated! Surprisingly and unfortunately, there’s [not a secure way](https://thusoy.com/2016/mitming-postgres) to connect to Heroku Postgres with any client. For Amazon RDS, download the [root certificate](https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem) and append to your database url: ``` ?sslmode=verify-full&sslrootcert=rds-combined-ca-bundle.pem ``` ## Upgrading Run: ```sh gem install pgslice ``` To use master, run: ```sh gem install specific_install gem specific_install https://github.com/ankane/pgslice.git ``` ## Docker ```sh docker build -t pgslice . alias pgslice="docker run --rm -e PGSLICE_URL pgslice" ``` This will give you the `pgslice` command. ## Reference - [PostgreSQL Manual](https://www.postgresql.org/docs/current/static/ddl-partitioning.html) - [PostgreSQL Wiki](https://wiki.postgresql.org/wiki/Table_partitioning) ## Related Projects Also check out: - [PgHero](https://github.com/ankane/pghero) - A performance dashboard for Postgres - [pgsync](https://github.com/ankane/pgsync) - Sync Postgres data to your local machine ## Contributing Everyone is encouraged to help improve this project. Here are a few ways you can help: - [Report bugs](https://github.com/ankane/pgslice/issues) - Fix bugs and [submit pull requests](https://github.com/ankane/pgslice/pulls) - Write, clarify, or fix documentation - Suggest or add new features To run tests, do: ```sh git clone https://github.com/ankane/pgslice.git cd pgslice bundle install createdb pgslice_test bundle exec rake ``` To test against different versions of Postgres with Docker, use: ```sh docker run -p=8000:5432 postgres:10 TZ=Etc/UTC PGSLICE_URL=postgres://postgres@localhost:8000/postgres bundle exec rake ``` On Mac, you must use [Docker for Mac](https://www.docker.com/docker-mac) for the port mapping to localhost to work.