README.md in pgslice-0.1.7 vs README.md in pgslice-0.2.0
- old
+ new
@@ -1,222 +1,272 @@
# pgslice
-Postgres partitioning as easy as pie
+Postgres partitioning as easy as pie. Works great for both new and existing tables, with zero downtime and minimal app changes.
+:tangerine: Battle-tested at [Instacart](https://www.instacart.com/opensource)
+
## Install
-Run:
+pgslice is a command line tool. To install, run:
```sh
gem install pgslice
```
+This will give you the `pgslice` command.
+
## Steps
-1. Specify your database credentials
+1. Ensure the table you want to partition has been created. We’ll refer to this as `<table>`.
+2. Specify your database credentials
+
```sh
export PGSLICE_URL=postgres://localhost/myapp_development
```
-2. Create an intermediate table
+3. Create an intermediate table
```sh
pgslice prep <table> <column> <period>
```
Period can be `day` or `month`.
This creates a table named `<table>_intermediate` with the appropriate trigger for partitioning.
-3. Add partitions
+4. Add partitions
```sh
pgslice add_partitions <table> --intermediate --past 3 --future 3
```
This creates child tables that inherit from the intermediate table.
Use the `--past` and `--future` options to control the number of partitions.
-4. *Optional, for tables with data* - Fill the partitions in batches with data from the original table
+5. *Optional, for tables with data* - Fill the partitions in batches with data from the original table
```sh
pgslice fill <table>
```
Use the `--batch-size` and `--sleep` options to control the speed.
To sync data across different databases, check out [pgsync](https://github.com/ankane/pgsync).
-5. Swap the intermediate table with the original table
+6. Swap the intermediate table with the original table
```sh
pgslice swap <table>
```
The original table is renamed `<table>_retired` and the intermediate table is renamed `<table>`.
-6. Fill the rest
+7. Fill the rest (rows inserted between the first fill and the swap)
```sh
pgslice fill <table> --swapped
```
-7. Archive and drop the original table
+8. Archive and drop the original table
-## Adding Partitions
+## Sample Output
-To add partitions, use:
+pgslice prints the SQL commands that were executed on the server. To print without executing, use the `--dry-run` option.
```sh
-pgslice add_partitions <table> --future 3
+pgslice prep visits created_at month
```
-Add this as a cron job to create a new partition each day or month.
+```sql
+BEGIN;
+CREATE TABLE visits_intermediate (LIKE visits INCLUDING ALL);
+
+CREATE FUNCTION visits_insert_trigger()
+ RETURNS trigger AS $$
+ BEGIN
+ RAISE EXCEPTION 'Date out of range. Create partitions first.';
+ END;
+ $$ LANGUAGE plpgsql;
+
+CREATE TRIGGER visits_insert_trigger
+ BEFORE INSERT ON visits_intermediate
+ FOR EACH ROW EXECUTE PROCEDURE visits_insert_trigger();
+
+COMMENT ON TRIGGER visits_insert_trigger ON visits_intermediate is 'column:created_at,period:month';
+
+COMMIT;
```
-# day
-0 0 * * * pgslice add_partitions <table> --future 3 --url ...
-# month
-0 0 1 * * pgslice add_partitions <table> --future 3 --url ...
+```sh
+pgslice add_partitions visits --intermediate --past 1 --future 1
```
-Add a monitor to ensure partitions are being created.
-
```sql
-SELECT 1 FROM
- pg_catalog.pg_class c
-INNER JOIN
- pg_catalog.pg_namespace n ON n.oid = c.relnamespace
-WHERE
- c.relkind = 'r' AND
- n.nspname = 'public' AND
- c.relname = '<table>_' || to_char(NOW() + INTERVAL '3 days', 'YYYYMMDD')
- -- for months, use to_char(NOW() + INTERVAL '3 months', 'YYYYMM')
-```
+BEGIN;
-## Additional Commands
+CREATE TABLE visits_201608
+ (CHECK (created_at >= '2016-08-01'::date AND created_at < '2016-09-01'::date))
+ INHERITS (visits_intermediate);
-To undo prep (which will delete partitions), use:
+ALTER TABLE visits_201608 ADD PRIMARY KEY (id);
-```sh
-pgslice unprep <table>
-```
+CREATE INDEX ON visits_201608 USING btree (user_id);
-To undo swap, use:
+CREATE TABLE visits_201609
+ (CHECK (created_at >= '2016-09-01'::date AND created_at < '2016-10-01'::date))
+ INHERITS (visits_intermediate);
-```sh
-pgslice unswap <table>
-```
+ALTER TABLE visits_201609 ADD PRIMARY KEY (id);
-## Sample Output
+CREATE INDEX ON visits_201609 USING btree (user_id);
-`pgslice` prints the SQL commands that were executed on the server. To print without executing, use the `--dry-run` option.
+CREATE TABLE visits_201610
+ (CHECK (created_at >= '2016-10-01'::date AND created_at < '2016-11-01'::date))
+ INHERITS (visits_intermediate);
-```console
-$ pgslice prep locations created_at day
-BEGIN;
+ALTER TABLE visits_201610 ADD PRIMARY KEY (id);
-CREATE TABLE locations_intermediate (LIKE locations INCLUDING ALL);
+CREATE INDEX ON visits_201610 USING btree (user_id);
-CREATE FUNCTION locations_insert_trigger()
+CREATE OR REPLACE FUNCTION visits_insert_trigger()
RETURNS trigger AS $$
BEGIN
- EXECUTE 'INSERT INTO locations_' || to_char(NEW.created_at, 'YYYYMMDD') || ' VALUES ($1.*)' USING NEW;
+ IF (NEW.created_at >= '2016-09-01'::date AND NEW.created_at < '2016-10-01'::date) THEN
+ INSERT INTO visits_201609 VALUES (NEW.*);
+ ELSIF (NEW.created_at >= '2016-10-01'::date AND NEW.created_at < '2016-11-01'::date) THEN
+ INSERT INTO visits_201610 VALUES (NEW.*);
+ ELSIF (NEW.created_at >= '2016-08-01'::date AND NEW.created_at < '2016-09-01'::date) THEN
+ INSERT INTO visits_201608 VALUES (NEW.*);
+ ELSE
+ RAISE EXCEPTION 'Date out of range. Ensure partitions are created.';
+ END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
-CREATE TRIGGER locations_insert_trigger
- BEFORE INSERT ON locations_intermediate
- FOR EACH ROW EXECUTE PROCEDURE locations_insert_trigger();
-
COMMIT;
```
-```console
-$ pgslice add_partitions locations --intermediate --past 1 --future 1
-BEGIN;
+```sh
+pgslice fill visits
+```
-CREATE TABLE locations_20160423
- (CHECK (created_at >= '2016-04-23'::date AND created_at < '2016-04-24'::date))
- INHERITS (locations_intermediate);
+```sql
+/* 1 of 3 */
+INSERT INTO visits_intermediate (id, user_id, ip, created_at)
+ SELECT id, user_id, ip, created_at FROM visits
+ WHERE id > 0 AND id <= 10000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date
-ALTER TABLE locations_20160423 ADD PRIMARY KEY (id);
+/* 2 of 3 */
+INSERT INTO visits_intermediate (id, user_id, ip, created_at)
+ SELECT id, user_id, ip, created_at FROM visits
+ WHERE id > 10000 AND id <= 20000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date
-CREATE INDEX ON locations_20160423 USING btree (updated_at, shopper_id);
+/* 3 of 3 */
+INSERT INTO visits_intermediate (id, user_id, ip, created_at)
+ SELECT id, user_id, ip, created_at FROM visits
+ WHERE id > 20000 AND id <= 30000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date
+```
-CREATE TABLE locations_20160424
- (CHECK (created_at >= '2016-04-24'::date AND created_at < '2016-04-25'::date))
- INHERITS (locations_intermediate);
+```sh
+pgslice swap visits
+```
-ALTER TABLE locations_20160424 ADD PRIMARY KEY (id);
+```sql
+BEGIN;
-CREATE INDEX ON locations_20160424 USING btree (updated_at, shopper_id);
+ALTER TABLE visits RENAME TO visits_retired;
-CREATE TABLE locations_20160425
- (CHECK (created_at >= '2016-04-25'::date AND created_at < '2016-04-26'::date))
- INHERITS (locations_intermediate);
+ALTER TABLE visits_intermediate RENAME TO visits;
-ALTER TABLE locations_20160425 ADD PRIMARY KEY (id);
+ALTER SEQUENCE visits_id_seq OWNED BY visits.id;
-CREATE INDEX ON locations_20160425 USING btree (updated_at, shopper_id);
-
COMMIT;
```
-```console
-$ pgslice fill locations
-/* 1 of 3 */
-INSERT INTO locations_intermediate (id, latitude, longitude, created_at)
- SELECT id, latitude, longitude, created_at FROM locations
- WHERE id > 0 AND id <= 10000 AND created_at >= '2016-04-23'::date AND created_at < '2016-04-26'::date
+## Adding Partitions
-/* 2 of 3 */
-INSERT INTO locations_intermediate (id, latitude, longitude, created_at)
- SELECT id, latitude, longitude, created_at FROM locations
- WHERE id > 10000 AND id <= 20000 AND created_at >= '2016-04-23'::date AND created_at < '2016-04-26'::date
+To add partitions, use:
-/* 3 of 3 */
-INSERT INTO locations_intermediate (id, latitude, longitude, created_at)
- SELECT id, latitude, longitude, created_at FROM locations
- WHERE id > 20000 AND id <= 30000 AND created_at >= '2016-04-23'::date AND created_at < '2016-04-26'::date
+```sh
+pgslice add_partitions <table> --future 3
```
-```console
-$ pgslice swap locations
-BEGIN;
+Add this as a cron job to create a new partition each day or month.
-ALTER TABLE locations RENAME TO locations_retired;
+```sh
+# day
+0 0 * * * pgslice add_partitions <table> --future 3 --url ...
-ALTER TABLE locations_intermediate RENAME TO locations;
+# month
+0 0 1 * * pgslice add_partitions <table> --future 3 --url ...
+```
-ALTER SEQUENCE locations_id_seq OWNED BY locations.id;
+Add a monitor to ensure partitions are being created.
-COMMIT;
+```sql
+SELECT 1 FROM
+ pg_catalog.pg_class c
+INNER JOIN
+ pg_catalog.pg_namespace n ON n.oid = c.relnamespace
+WHERE
+ c.relkind = 'r' AND
+ n.nspname = 'public' AND
+ c.relname = '<table>_' || to_char(NOW() + INTERVAL '3 days', 'YYYYMMDD')
+ -- for months, use to_char(NOW() + INTERVAL '3 months', 'YYYYMM')
```
-```console
-$ pgslice add_partitions locations --future 2
-BEGIN;
+## Additional Commands
-CREATE TABLE locations_20160426
- (CHECK (created_at >= '2016-04-26'::date AND created_at < '2016-04-27'::date))
- INHERITS (locations);
+To undo prep (which will delete partitions), use:
-ALTER TABLE locations_20160426 ADD PRIMARY KEY (id);
+```sh
+pgslice unprep <table>
+```
-CREATE INDEX ON locations_20160426 USING btree (updated_at, shopper_id);
+To undo swap, use:
-COMMIT;
+```sh
+pgslice unswap <table>
```
+## App Changes
+
+This set up allows you to read and write with the original table name with no knowledge it’s partitioned. However, there are a few things to be aware of.
+
+### Reads
+
+When possible, queries should include the column you partition on to limit the number of partitions the database needs to check. For instance, if you partition on `created_at`, try to include it in queries:
+
+```sql
+SELECT * FROM
+ visits
+WHERE
+ user_id = 123 AND
+ -- for performance
+ created_at >= '2016-09-01' AND created_at < '2016-09-02'
+```
+
+For this to be effective, ensure `constraint_exclusion` is set to `partition` (default value) or `on`.
+
+```sql
+SHOW constraint_exclusion;
+```
+
+### Writes
+
+If you use `INSERT` statements with a `RETURNING` clause (as frameworks like Rails do), you’ll no longer receive the id of the newly inserted record back. If you need this, you can either:
+
+1. Insert directly into the partition
+2. Get the value after the insert with `SELECT CURRVAL('sequence_name')`
+
## One Off Tasks
-You can also reduce the size of a table without partitioning.
+You can also use pgslice to reduce the size of a table without partitioning by creating a new table, filling it with a subset of records, and swapping it in.
```sh
pgslice prep <table> --no-partition
pgslice fill <table> --start 1000 # starting primary key
pgslice swap <table>
@@ -235,14 +285,26 @@
```sh
gem install specific_install
gem specific_install ankane/pgslice
```
+## Reference
+
+- [PostgreSQL Manual](https://www.postgresql.org/docs/current/static/ddl-partitioning.html)
+- [PostgreSQL Wiki](https://wiki.postgresql.org/wiki/Table_partitioning)
+
## TODO
- Command to sync index changes with partitions
- Disable indexing for faster `fill`
- ETA for `fill`
+
+## Related Projects
+
+Also check out:
+
+- [PgHero](https://github.com/ankane/pghero) - A performance dashboard for Postgres
+- [pgsync](https://github.com/ankane/pgsync) - Sync Postgres data to your local machine
## Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help: