# pgslice
Postgres partitioning as easy as pie. Works great for both new and existing tables, with zero downtime and minimal app changes.
:tangerine: Battle-tested at [Instacart](https://www.instacart.com/opensource)
## Install
pgslice is a command line tool. To install, run:
```sh
gem install pgslice
```
This will give you the `pgslice` command.
## Steps
1. Ensure the table you want to partition has been created. We’ll refer to this as `
`.
2. Specify your database credentials
```sh
export PGSLICE_URL=postgres://localhost/myapp_development
```
3. Create an intermediate table
```sh
pgslice prep
```
Period can be `day` or `month`.
This creates a table named `_intermediate` with the appropriate trigger for partitioning.
4. Add partitions
```sh
pgslice add_partitions --intermediate --past 3 --future 3
```
This creates child tables that inherit from the intermediate table.
Use the `--past` and `--future` options to control the number of partitions.
5. *Optional, for tables with data* - Fill the partitions in batches with data from the original table
```sh
pgslice fill
```
Use the `--batch-size` and `--sleep` options to control the speed.
To sync data across different databases, check out [pgsync](https://github.com/ankane/pgsync).
6. Swap the intermediate table with the original table
```sh
pgslice swap
```
The original table is renamed `_retired` and the intermediate table is renamed ``.
7. Fill the rest (rows inserted between the first fill and the swap)
```sh
pgslice fill --swapped
```
8. Archive and drop the original table
## Sample Output
pgslice prints the SQL commands that were executed on the server. To print without executing, use the `--dry-run` option.
```sh
pgslice prep visits created_at month
```
```sql
BEGIN;
CREATE TABLE visits_intermediate (LIKE visits INCLUDING ALL);
CREATE FUNCTION visits_insert_trigger()
RETURNS trigger AS $$
BEGIN
RAISE EXCEPTION 'Date out of range. Create partitions first.';
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER visits_insert_trigger
BEFORE INSERT ON visits_intermediate
FOR EACH ROW EXECUTE PROCEDURE visits_insert_trigger();
COMMENT ON TRIGGER visits_insert_trigger ON visits_intermediate is 'column:created_at,period:month';
COMMIT;
```
```sh
pgslice add_partitions visits --intermediate --past 1 --future 1
```
```sql
BEGIN;
CREATE TABLE visits_201608
(CHECK (created_at >= '2016-08-01'::date AND created_at < '2016-09-01'::date))
INHERITS (visits_intermediate);
ALTER TABLE visits_201608 ADD PRIMARY KEY (id);
CREATE INDEX ON visits_201608 USING btree (user_id);
CREATE TABLE visits_201609
(CHECK (created_at >= '2016-09-01'::date AND created_at < '2016-10-01'::date))
INHERITS (visits_intermediate);
ALTER TABLE visits_201609 ADD PRIMARY KEY (id);
CREATE INDEX ON visits_201609 USING btree (user_id);
CREATE TABLE visits_201610
(CHECK (created_at >= '2016-10-01'::date AND created_at < '2016-11-01'::date))
INHERITS (visits_intermediate);
ALTER TABLE visits_201610 ADD PRIMARY KEY (id);
CREATE INDEX ON visits_201610 USING btree (user_id);
CREATE OR REPLACE FUNCTION visits_insert_trigger()
RETURNS trigger AS $$
BEGIN
IF (NEW.created_at >= '2016-09-01'::date AND NEW.created_at < '2016-10-01'::date) THEN
INSERT INTO visits_201609 VALUES (NEW.*);
ELSIF (NEW.created_at >= '2016-10-01'::date AND NEW.created_at < '2016-11-01'::date) THEN
INSERT INTO visits_201610 VALUES (NEW.*);
ELSIF (NEW.created_at >= '2016-08-01'::date AND NEW.created_at < '2016-09-01'::date) THEN
INSERT INTO visits_201608 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range. Ensure partitions are created.';
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
COMMIT;
```
```sh
pgslice fill visits
```
```sql
/* 1 of 3 */
INSERT INTO visits_intermediate (id, user_id, ip, created_at)
SELECT id, user_id, ip, created_at FROM visits
WHERE id > 0 AND id <= 10000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date
/* 2 of 3 */
INSERT INTO visits_intermediate (id, user_id, ip, created_at)
SELECT id, user_id, ip, created_at FROM visits
WHERE id > 10000 AND id <= 20000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date
/* 3 of 3 */
INSERT INTO visits_intermediate (id, user_id, ip, created_at)
SELECT id, user_id, ip, created_at FROM visits
WHERE id > 20000 AND id <= 30000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date
```
```sh
pgslice swap visits
```
```sql
BEGIN;
ALTER TABLE visits RENAME TO visits_retired;
ALTER TABLE visits_intermediate RENAME TO visits;
ALTER SEQUENCE visits_id_seq OWNED BY visits.id;
COMMIT;
```
## Adding Partitions
To add partitions, use:
```sh
pgslice add_partitions --future 3
```
Add this as a cron job to create a new partition each day or month.
```sh
# day
0 0 * * * pgslice add_partitions --future 3 --url ...
# month
0 0 1 * * pgslice add_partitions --future 3 --url ...
```
Add a monitor to ensure partitions are being created.
```sql
SELECT 1 FROM
pg_catalog.pg_class c
INNER JOIN
pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE
c.relkind = 'r' AND
n.nspname = 'public' AND
c.relname = '_' || to_char(NOW() + INTERVAL '3 days', 'YYYYMMDD')
-- for months, use to_char(NOW() + INTERVAL '3 months', 'YYYYMM')
```
## Additional Commands
To undo prep (which will delete partitions), use:
```sh
pgslice unprep
```
To undo swap, use:
```sh
pgslice unswap
```
## App Changes
This set up allows you to read and write with the original table name with no knowledge it’s partitioned. However, there are a few things to be aware of.
### Reads
When possible, queries should include the column you partition on to limit the number of partitions the database needs to check. For instance, if you partition on `created_at`, try to include it in queries:
```sql
SELECT * FROM
visits
WHERE
user_id = 123 AND
-- for performance
created_at >= '2016-09-01' AND created_at < '2016-09-02'
```
For this to be effective, ensure `constraint_exclusion` is set to `partition` (default value) or `on`.
```sql
SHOW constraint_exclusion;
```
### Writes
If you use `INSERT` statements with a `RETURNING` clause (as frameworks like Rails do), you’ll no longer receive the id of the newly inserted record back. If you need this, you can either:
1. Insert directly into the partition
2. Get the value after the insert with `SELECT CURRVAL('sequence_name')`
## One Off Tasks
You can also use pgslice to reduce the size of a table without partitioning by creating a new table, filling it with a subset of records, and swapping it in.
```sh
pgslice prep --no-partition
pgslice fill --start 1000 # starting primary key
pgslice swap
```
## Upgrading
Run:
```sh
gem install pgslice
```
To use master, run:
```sh
gem install specific_install
gem specific_install ankane/pgslice
```
## Reference
- [PostgreSQL Manual](https://www.postgresql.org/docs/current/static/ddl-partitioning.html)
- [PostgreSQL Wiki](https://wiki.postgresql.org/wiki/Table_partitioning)
## TODO
- Command to sync index changes with partitions
- Disable indexing for faster `fill`
- ETA for `fill`
## Related Projects
Also check out:
- [PgHero](https://github.com/ankane/pghero) - A performance dashboard for Postgres
- [pgsync](https://github.com/ankane/pgsync) - Sync Postgres data to your local machine
## Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- [Report bugs](https://github.com/ankane/pgslice/issues)
- Fix bugs and [submit pull requests](https://github.com/ankane/pgslice/pulls)
- Write, clarify, or fix documentation
- Suggest or add new features