README.md in pgslice-0.1.7 vs README.md in pgslice-0.2.0

- old
+ new

@@ -1,222 +1,272 @@ # pgslice -Postgres partitioning as easy as pie +Postgres partitioning as easy as pie. Works great for both new and existing tables, with zero downtime and minimal app changes. +:tangerine: Battle-tested at [Instacart](https://www.instacart.com/opensource) + ## Install -Run: +pgslice is a command line tool. To install, run: ```sh gem install pgslice ``` +This will give you the `pgslice` command. + ## Steps -1. Specify your database credentials +1. Ensure the table you want to partition has been created. We’ll refer to this as `<table>`. +2. Specify your database credentials + ```sh export PGSLICE_URL=postgres://localhost/myapp_development ``` -2. Create an intermediate table +3. Create an intermediate table ```sh pgslice prep <table> <column> <period> ``` Period can be `day` or `month`. This creates a table named `<table>_intermediate` with the appropriate trigger for partitioning. -3. Add partitions +4. Add partitions ```sh pgslice add_partitions <table> --intermediate --past 3 --future 3 ``` This creates child tables that inherit from the intermediate table. Use the `--past` and `--future` options to control the number of partitions. -4. *Optional, for tables with data* - Fill the partitions in batches with data from the original table +5. *Optional, for tables with data* - Fill the partitions in batches with data from the original table ```sh pgslice fill <table> ``` Use the `--batch-size` and `--sleep` options to control the speed. To sync data across different databases, check out [pgsync](https://github.com/ankane/pgsync). -5. Swap the intermediate table with the original table +6. Swap the intermediate table with the original table ```sh pgslice swap <table> ``` The original table is renamed `<table>_retired` and the intermediate table is renamed `<table>`. -6. Fill the rest +7. Fill the rest (rows inserted between the first fill and the swap) ```sh pgslice fill <table> --swapped ``` -7. Archive and drop the original table +8. Archive and drop the original table -## Adding Partitions +## Sample Output -To add partitions, use: +pgslice prints the SQL commands that were executed on the server. To print without executing, use the `--dry-run` option. ```sh -pgslice add_partitions <table> --future 3 +pgslice prep visits created_at month ``` -Add this as a cron job to create a new partition each day or month. +```sql +BEGIN; +CREATE TABLE visits_intermediate (LIKE visits INCLUDING ALL); + +CREATE FUNCTION visits_insert_trigger() + RETURNS trigger AS $$ + BEGIN + RAISE EXCEPTION 'Date out of range. Create partitions first.'; + END; + $$ LANGUAGE plpgsql; + +CREATE TRIGGER visits_insert_trigger + BEFORE INSERT ON visits_intermediate + FOR EACH ROW EXECUTE PROCEDURE visits_insert_trigger(); + +COMMENT ON TRIGGER visits_insert_trigger ON visits_intermediate is 'column:created_at,period:month'; + +COMMIT; ``` -# day -0 0 * * * pgslice add_partitions <table> --future 3 --url ... -# month -0 0 1 * * pgslice add_partitions <table> --future 3 --url ... +```sh +pgslice add_partitions visits --intermediate --past 1 --future 1 ``` -Add a monitor to ensure partitions are being created. - ```sql -SELECT 1 FROM - pg_catalog.pg_class c -INNER JOIN - pg_catalog.pg_namespace n ON n.oid = c.relnamespace -WHERE - c.relkind = 'r' AND - n.nspname = 'public' AND - c.relname = '<table>_' || to_char(NOW() + INTERVAL '3 days', 'YYYYMMDD') - -- for months, use to_char(NOW() + INTERVAL '3 months', 'YYYYMM') -``` +BEGIN; -## Additional Commands +CREATE TABLE visits_201608 + (CHECK (created_at >= '2016-08-01'::date AND created_at < '2016-09-01'::date)) + INHERITS (visits_intermediate); -To undo prep (which will delete partitions), use: +ALTER TABLE visits_201608 ADD PRIMARY KEY (id); -```sh -pgslice unprep <table> -``` +CREATE INDEX ON visits_201608 USING btree (user_id); -To undo swap, use: +CREATE TABLE visits_201609 + (CHECK (created_at >= '2016-09-01'::date AND created_at < '2016-10-01'::date)) + INHERITS (visits_intermediate); -```sh -pgslice unswap <table> -``` +ALTER TABLE visits_201609 ADD PRIMARY KEY (id); -## Sample Output +CREATE INDEX ON visits_201609 USING btree (user_id); -`pgslice` prints the SQL commands that were executed on the server. To print without executing, use the `--dry-run` option. +CREATE TABLE visits_201610 + (CHECK (created_at >= '2016-10-01'::date AND created_at < '2016-11-01'::date)) + INHERITS (visits_intermediate); -```console -$ pgslice prep locations created_at day -BEGIN; +ALTER TABLE visits_201610 ADD PRIMARY KEY (id); -CREATE TABLE locations_intermediate (LIKE locations INCLUDING ALL); +CREATE INDEX ON visits_201610 USING btree (user_id); -CREATE FUNCTION locations_insert_trigger() +CREATE OR REPLACE FUNCTION visits_insert_trigger() RETURNS trigger AS $$ BEGIN - EXECUTE 'INSERT INTO locations_' || to_char(NEW.created_at, 'YYYYMMDD') || ' VALUES ($1.*)' USING NEW; + IF (NEW.created_at >= '2016-09-01'::date AND NEW.created_at < '2016-10-01'::date) THEN + INSERT INTO visits_201609 VALUES (NEW.*); + ELSIF (NEW.created_at >= '2016-10-01'::date AND NEW.created_at < '2016-11-01'::date) THEN + INSERT INTO visits_201610 VALUES (NEW.*); + ELSIF (NEW.created_at >= '2016-08-01'::date AND NEW.created_at < '2016-09-01'::date) THEN + INSERT INTO visits_201608 VALUES (NEW.*); + ELSE + RAISE EXCEPTION 'Date out of range. Ensure partitions are created.'; + END IF; RETURN NULL; END; $$ LANGUAGE plpgsql; -CREATE TRIGGER locations_insert_trigger - BEFORE INSERT ON locations_intermediate - FOR EACH ROW EXECUTE PROCEDURE locations_insert_trigger(); - COMMIT; ``` -```console -$ pgslice add_partitions locations --intermediate --past 1 --future 1 -BEGIN; +```sh +pgslice fill visits +``` -CREATE TABLE locations_20160423 - (CHECK (created_at >= '2016-04-23'::date AND created_at < '2016-04-24'::date)) - INHERITS (locations_intermediate); +```sql +/* 1 of 3 */ +INSERT INTO visits_intermediate (id, user_id, ip, created_at) + SELECT id, user_id, ip, created_at FROM visits + WHERE id > 0 AND id <= 10000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date -ALTER TABLE locations_20160423 ADD PRIMARY KEY (id); +/* 2 of 3 */ +INSERT INTO visits_intermediate (id, user_id, ip, created_at) + SELECT id, user_id, ip, created_at FROM visits + WHERE id > 10000 AND id <= 20000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date -CREATE INDEX ON locations_20160423 USING btree (updated_at, shopper_id); +/* 3 of 3 */ +INSERT INTO visits_intermediate (id, user_id, ip, created_at) + SELECT id, user_id, ip, created_at FROM visits + WHERE id > 20000 AND id <= 30000 AND created_at >= '2016-08-01'::date AND created_at < '2016-11-01'::date +``` -CREATE TABLE locations_20160424 - (CHECK (created_at >= '2016-04-24'::date AND created_at < '2016-04-25'::date)) - INHERITS (locations_intermediate); +```sh +pgslice swap visits +``` -ALTER TABLE locations_20160424 ADD PRIMARY KEY (id); +```sql +BEGIN; -CREATE INDEX ON locations_20160424 USING btree (updated_at, shopper_id); +ALTER TABLE visits RENAME TO visits_retired; -CREATE TABLE locations_20160425 - (CHECK (created_at >= '2016-04-25'::date AND created_at < '2016-04-26'::date)) - INHERITS (locations_intermediate); +ALTER TABLE visits_intermediate RENAME TO visits; -ALTER TABLE locations_20160425 ADD PRIMARY KEY (id); +ALTER SEQUENCE visits_id_seq OWNED BY visits.id; -CREATE INDEX ON locations_20160425 USING btree (updated_at, shopper_id); - COMMIT; ``` -```console -$ pgslice fill locations -/* 1 of 3 */ -INSERT INTO locations_intermediate (id, latitude, longitude, created_at) - SELECT id, latitude, longitude, created_at FROM locations - WHERE id > 0 AND id <= 10000 AND created_at >= '2016-04-23'::date AND created_at < '2016-04-26'::date +## Adding Partitions -/* 2 of 3 */ -INSERT INTO locations_intermediate (id, latitude, longitude, created_at) - SELECT id, latitude, longitude, created_at FROM locations - WHERE id > 10000 AND id <= 20000 AND created_at >= '2016-04-23'::date AND created_at < '2016-04-26'::date +To add partitions, use: -/* 3 of 3 */ -INSERT INTO locations_intermediate (id, latitude, longitude, created_at) - SELECT id, latitude, longitude, created_at FROM locations - WHERE id > 20000 AND id <= 30000 AND created_at >= '2016-04-23'::date AND created_at < '2016-04-26'::date +```sh +pgslice add_partitions <table> --future 3 ``` -```console -$ pgslice swap locations -BEGIN; +Add this as a cron job to create a new partition each day or month. -ALTER TABLE locations RENAME TO locations_retired; +```sh +# day +0 0 * * * pgslice add_partitions <table> --future 3 --url ... -ALTER TABLE locations_intermediate RENAME TO locations; +# month +0 0 1 * * pgslice add_partitions <table> --future 3 --url ... +``` -ALTER SEQUENCE locations_id_seq OWNED BY locations.id; +Add a monitor to ensure partitions are being created. -COMMIT; +```sql +SELECT 1 FROM + pg_catalog.pg_class c +INNER JOIN + pg_catalog.pg_namespace n ON n.oid = c.relnamespace +WHERE + c.relkind = 'r' AND + n.nspname = 'public' AND + c.relname = '<table>_' || to_char(NOW() + INTERVAL '3 days', 'YYYYMMDD') + -- for months, use to_char(NOW() + INTERVAL '3 months', 'YYYYMM') ``` -```console -$ pgslice add_partitions locations --future 2 -BEGIN; +## Additional Commands -CREATE TABLE locations_20160426 - (CHECK (created_at >= '2016-04-26'::date AND created_at < '2016-04-27'::date)) - INHERITS (locations); +To undo prep (which will delete partitions), use: -ALTER TABLE locations_20160426 ADD PRIMARY KEY (id); +```sh +pgslice unprep <table> +``` -CREATE INDEX ON locations_20160426 USING btree (updated_at, shopper_id); +To undo swap, use: -COMMIT; +```sh +pgslice unswap <table> ``` +## App Changes + +This set up allows you to read and write with the original table name with no knowledge it’s partitioned. However, there are a few things to be aware of. + +### Reads + +When possible, queries should include the column you partition on to limit the number of partitions the database needs to check. For instance, if you partition on `created_at`, try to include it in queries: + +```sql +SELECT * FROM + visits +WHERE + user_id = 123 AND + -- for performance + created_at >= '2016-09-01' AND created_at < '2016-09-02' +``` + +For this to be effective, ensure `constraint_exclusion` is set to `partition` (default value) or `on`. + +```sql +SHOW constraint_exclusion; +``` + +### Writes + +If you use `INSERT` statements with a `RETURNING` clause (as frameworks like Rails do), you’ll no longer receive the id of the newly inserted record back. If you need this, you can either: + +1. Insert directly into the partition +2. Get the value after the insert with `SELECT CURRVAL('sequence_name')` + ## One Off Tasks -You can also reduce the size of a table without partitioning. +You can also use pgslice to reduce the size of a table without partitioning by creating a new table, filling it with a subset of records, and swapping it in. ```sh pgslice prep <table> --no-partition pgslice fill <table> --start 1000 # starting primary key pgslice swap <table> @@ -235,14 +285,26 @@ ```sh gem install specific_install gem specific_install ankane/pgslice ``` +## Reference + +- [PostgreSQL Manual](https://www.postgresql.org/docs/current/static/ddl-partitioning.html) +- [PostgreSQL Wiki](https://wiki.postgresql.org/wiki/Table_partitioning) + ## TODO - Command to sync index changes with partitions - Disable indexing for faster `fill` - ETA for `fill` + +## Related Projects + +Also check out: + +- [PgHero](https://github.com/ankane/pghero) - A performance dashboard for Postgres +- [pgsync](https://github.com/ankane/pgsync) - Sync Postgres data to your local machine ## Contributing Everyone is encouraged to help improve this project. Here are a few ways you can help: