Partitioning Existing Tables in PostgreSQL

Dev & Ops

So, your database is growing rapidly, as, ideally, is your business. The problem is, its performance is getting worse. You’re starting to wonder whether partitioning your tables might help. Of course, this won't solve all your troubles, but if you’ve got a lot of historical data you don’t really need, partitioning can clear that surplus efficiently. In certain cases, it can also speed up some of your queries. Most texts you’ll find online will tell you that partitioning is done by executing x and y, end of story. Here, I’ll try to show you how existing production PostgreSQL tables can be partitioned, while also presenting you with a few options and their trade-offs. Hopefully, this’ll give you enough information to make the best choice for your own situation quickly.

 

What are partitions and how are they implemented?

Conceptually, PostgreSQL partitions are very simple. Let’s start with an example of a table that stores information about each video ad watched on a mobile application:

-- You need one table that will be master table. It will have no data
-- but just provide a common structure and access point for partitions.

CREATE TABLE video_ad_completions (
    "id" bigserial NOT NULL,
    "time" timestamp with time zone NOT NULL DEFAULT NOW(),
    "user_id" text NOT NULL,
    "provider_id" int4 NOT NULL,
    "completion_id" text NOT NULL
);

-- Then you create a number of tables with identical structure using
-- INHERITS clause. These will be your partitions.

CREATE TABLE video_ad_completions_q1() INHERITS (video_ad_completions);
CREATE TABLE video_ad_completions_q2() INHERITS (video_ad_completions);

Now that we’ve implemented this code, all SELECT, UPDATE, DELETE, and ALTER TABLE statements run on the master table will be propagated to child tables. In other words, if you have two rows in the “q1” table and two in “q2”, then selecting all the rows from the master table will display all four rows of the child tables. The documentation also claims there’s no point defining any constraints or indexes on the master table, but just to do this on the child tables. We’ll come back to that later. In this example, an implicit sequence is created for the “id” column in the master table, and that same sequence will be used for the “id” column of all partitions. As a result, all the IDs will be unique across all the partitions.  You’ve probably noticed that I haven’t yet mentioned INSERT. That’s because INSERT statements are not automatically propagated to child tables. PostgreSQL has announced a feature for automatic INSERT propagation in version 10, but, for now, we need to implement it manually.

Inserting data into the appropriate partition

When you decide how you want to partition the tables, you need to implement logic to insert the data into the appropriate child table. For example, you may want to have one partition for each provider_id, and so rows with a particular provider_id should go into the relevant partition. Alternatively, you may choose to allocate partitions to specified periods of time, so rows with different timestamps should go into different partitions. You have a few options for how to execute this:

  1. You can implement the logic on the application level. This means that, in your application code, you’ll first determine the name of the partition, based on whichever partitioning condition you’re using, and then execute the INSERT statement on that partition. This is considered anti-pattern as it couples your application code with DB-specific implementation, which makes it easy to make a mistake. Ideally, your application code shouldn't know or care whether you’re using partitions in your database or not.

  2. You can implement the logic as a stored function in your database, which determines the partition and executes the INSERTs. You’ll need to make sure that all your application code calls this function, rather than calling the INSERT statements directly. If you already use this DB method on your project, this is probably the best choice for you.

  3. You can implement the logic as a BEFORE INSERT trigger on the master table. This might be the most flexible approach, and it minimizes the possibility of mistakes. All your code can continue to work with the master table as with a normal table, while the trigger will ensure that your data is inserted not into the master table but into the appropriate partition. As for the performance impact of using a trigger instead of calling a stored function directly, as in the previous point, our testing has shown that the difference is negligible.

  4. You can also use a rule instead of a trigger. The main difference between these two options is that a rule shows worse performance on single row inserts, but potentially better performance when it comes to batch inserts of multiple rows.

You can find more information in the official PostgreSQL documentation.

So how can partitions help?

Deleting old data

First, if you want to clear obsolete data, you can now just drop old partitions. This is much more effective than deleting rows. Doing the latter means you have to delete according to a specified condition, which is an extremely painstaking operation on huge tables. Plus, deleted rows don't automatically free disk space. For that, you have to run the VACUUM FULL command after DELETE, which locks your entire table and also takes considerable time. Using partitions, on the other hand, lets you disconnect any partition from the rest with a very brief command:

ALTER TABLE video_ad_completions_q1 NO INHERIT video_ad_completions;

You can then decide to keep the partition isolated from queries on the master table, or you can choose to drop it entirely, which is a much faster operation than DELETE, and also frees disk space.

Speeding up queries

Partitioning can also speed up your queries, but it’s very important that you choose your partition criteria carefully. In our example, we can say that if the insert time is in the first annual quarter, it should go into the “q1” partition; if it is in the second, then into “q2”, etc. We’d code this into our INSERT trigger on the master table, but we’d also define a CHECK constraint on each partition, something like this:

CHECK(time >= DATE '2016-01-01' and time < DATE '2016-04-01')

Now, if you have a query like SELECT count(*) from video_ad_completions where time > '2016-07-01'; PostgreSQL can optimize the query and only check partitions containing this time frame.

If, however, you have queries spanning all your partitions, and you have a lot of partitions, those queries might get slower. This is because, in the current PostgreSQL versions, they are executed sequentially across partitions. If this is your situation, you’ll need to test your queries with a few different numbers of partitions to find the best option.

A practical example of switching to partitions

Let’s take a look at how we can painlessly transition an existing table to use partitions. We’ll extend the example from above, and say that this is the starting scheme:

CREATE TABLE video_ad_completions (
    "id" bigserial NOT NULL,
    "time" timestamp with time zone NOT NULL DEFAULT NOW(),
    "user_id" text NOT NULL,
    "provider_id" int4 NOT NULL,
    "completion_id" text NOT NULL,
    PRIMARY KEY("id"),
    CONSTRAINT "vads_completions_unique" UNIQUE("provider_id","completion_id")
);

CREATE INDEX "vads_user_id_provider_id_amount_time_idx" ON "video_ad_completions" (
    "user_id",
    "provider_id",
    "time"
);

CREATE INDEX "vads_time_idx" ON "video_ad_completions" (
    "time"
);

Here, we have a few indexes and one unique constraint. We need the constraint so we can prevent video providers from sending us the same video completion event multiple times. This table has become really big and, besides eating our disk space, some of our queries are slowing down. To solve this issue, we’ll partition according to time, as we said in the previous section. Note that this means our unique constraint will now have to be checked across multiple partitions, which might slow down each insert. We could have chosen to partition by provider_id, for example, so each partition would contain events from only one provider. This would speed up the constraint check but wouldn't help us solve the problem of easily removing obsolete events. That constraint is still performing very fast, so we don't actually need to optimize it. We only need to make sure we don't significantly slow its progress with time-based partitions.

Testing performance of partitions

To test the performance of our partitions, we first need to copy our data to a test server and measure how long the insert takes on the huge original table. Then, we split that table to four partitions, each containing three months of data, and write a trigger onto the master table:

-- Create trigger stored function that does the check and
-- inserts into appropriate partition
CREATE OR REPLACE FUNCTION trg_insert_video_ad_completion()
RETURNS TRIGGER AS $$
DECLARE
    old_time video_ad_completions.time%TYPE := NULL;
BEGIN
    -- First check if the event we are trying to insert exists in some partition
    SELECT time INTO old_time
    FROM video_ad_completions
    WHERE provider_id = NEW.provider_id and completion_id = NEW.completion_id;
    IF found THEN
        RAISE unique_violation USING MESSAGE =
        'Duplicate video ad completion from: ' ||
        to_char(old_time, 'YYYY-MM-DD HH24:MI:SS');
    END IF;
    
    -- Here we use time to insert into appropriate partition
    EXECUTE 'insert into video_ad_completions_' ||
        to_char(NEW.TIME, 'YY_"q"Q') ||
        ' values ( $1.* )' USING NEW;
    
    -- Prevent insertion into master table
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER tai_video_ad_completion_insert BEFORE INSERT
    ON video_ad_completions FOR EACH ROW
    EXECUTE PROCEDURE trg_insert_video_ad_completion();

We test the inserts again to see whether they take the same amount of time as on the original table. As it turns out, the CHECK executes about four times faster on one partition than on the whole table, but, since it needs to operate four times, the total duration actually ends up being the same. We then try to split the table into twelve partitions, each containing one month of data. This time, the inserts become about 1.7 times longer; they only speed up a little per partition, but, since they have to be executed twelve times, the overall execution time is longer. So now we know we want four partitions, each containing three months of data.

Creating new partitions and removing old ones

The last decision we need to make is whether we want to have a system that will create a new partition every three months and delete any that are more than a year old, or whether we want a trigger to handle missing partitions and create new ones if necessary. This will bring some complexity to your trigger that you might not want, so you might simply choose to schedule a job that’ll prepare a new partition every three months. In the next section, you can see how all this is handled with a trigger.

DDL queries for transition

At the time of writing, we are in Q1 of 2017. We want to create a new master table, make the existing table a “q1” partition and define an appropriate trigger. The following DDL commands will accomplish that:

BEGIN;

-- Helper function for creating new partitions
CREATE OR REPLACE FUNCTION create_partition(IN base_name text,
    IN for_time timestamp with time zone)
RETURNS VOID AS
$BODY$
DECLARE
    table_name text;
BEGIN
    table_name := base_name || TO_CHAR(for_time, '_YY_"q"Q');
    
    EXECUTE 'CREATE TABLE IF NOT EXISTS' || table_name ||
        '(LIKE ' || base_name || ' INCLUDING ALL,
            CHECK(time >= DATE ''' || date_trunc('quarter', for_time) ||
                ''' and time < DATE ''' ||
                date_trunc('quarter', for_time + '3 months'::interval) || ''')
        ) INHERITS (' || base_name || ')';
END
$BODY$
    LANGUAGE PLpgSQL
    STRICT
    VOLATILE
    EXTERNAL SECURITY INVOKER;

-- Rename existing table.
ALTER TABLE video_ad_completions RENAME TO video_ad_completions_16_q4;

-- Create a new table with the same name as the original table to be a master table.
-- Why we use INCLUDING ALL here will be explained bellow.
CREATE TABLE video_ad_completions (LIKE video_ad_completions_16_q4 INCLUDING ALL);

-- Make existing table a child of the the new master table
ALTER TABLE video_ad_completions_16_q4 INHERIT video_ad_completions;

-- Don't forget to change the owner of the the sequence so you don't lose it
-- when you drop the partition that used to be original table.
ALTER SEQUENCE video_ad_completions_id_seq OWNED BY video_ad_completions.id;

-- Create trigger stored function that does the check and
-- inserts into appropriate partition
CREATE OR REPLACE FUNCTION trg_insert_video_ad_completion()
RETURNS TRIGGER AS $$
DECLARE
    old_time video_ad_completions.time%TYPE := NULL;
BEGIN
    SELECT time INTO old_time
    FROM video_ad_completions
    WHERE provider_id = NEW.provider_id and completion_id = NEW.completion_id;
    IF found THEN
        RAISE unique_violation
        USING MESSAGE = 'Duplicate video ad completion from: ' ||
            to_char(old_time, 'YYYY-MM-DD HH24:MI:SS');
    END IF;
    
    -- Here we use time to insert into appropriate partition
    EXECUTE 'insert into video_ad_completions_' || to_char(NEW.time, 'YY_"q"Q') ||
        ' values ( $1.* )' USING NEW;
    
    -- Prevent insertion into master table
    RETURN NULL;
EXCEPTION
WHEN undefined_table THEN
    -- Use exclusive advisory lock to prevent two transactions
    -- trying to create new partition at the same time
    SELECT pg_advisory_xact_lock('video_ad_completions'::regclass::oid::integer);

    -- Create a new partition if another transaction didn't already do it
    SELECT create_partition('video_ad_completions', NEW.time);

    -- Try the insert again
    EXECUTE 'insert into video_ad_completions_' || to_char(NEW.time, 'YY_"q"Q') ||
        ' values ( $1.* )' USING NEW;
    
    -- Prevent insertion into master table
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER tai_video_ad_completion_insert BEFORE INSERT
    ON video_ad_completions FOR EACH ROW
    EXECUTE PROCEDURE trg_insert_video_ad_completion();

COMMIT;

We’ve wrapped everything with BEGIN/COMMIT. Therefore, because PostgreSQL supports transactional DDL and these changes have been executed atomically, any other queries will see all these changes at once.

☆☆☆ That’s all, folks. We now have working partitions with zero downtime transition. If this doesn’t absolutely blow your mind, you’ve never worked with data. ☆☆☆

As you saw when we created the master table, we used the LIKE clause with an INCLUDING ALL option to duplicate the original, including not only all the comments and defaults but also all the indexes and constraints. We mentioned earlier that the documentation says there’s no point creating these on the master table, but we decided to do so anyway because we’re not aware of this having any negative impact. It also makes creating new partitions much easier. Rather than first creating a new table, and then all the indexes and constraints for each partition, we can just build a new table using LIKE master_table INCLUDING ALL, and get the indexes and constraints immediately. This will print some notices about how the identically named columns between the master and the child are merged, but you can just ignore those.

You may also notice that we chose not to add the CHECK(time < DATE '2017-01-01') constraint on the original table. This would take some time to execute as PostgreSQL needs to verify whether all the existing rows pass this check, and we don’t actually have any queries that would benefit from it. If you decide to add it, you’d better do so in a separate transaction after this initial job, and choose a time when your system is under less stress:

ALTER TABLE video_ad_completions_17_q1 ADD CHECK (time < DATE '2017-04-01');

Partition maintenance functions

Finally, you may want to create functions to help maintain these partitions. We’ve already seen a function for creating new partitions above. You may have noticed that it can be given a timestamp, for which we want to create a partition. However, the drop partition function below doesn't provide this flexibility; it just deletes any partition that’s more than a year old. This is because nothing bad can happen if you make a mistake in calling `create partition`, but it may really hurt if you do the same with `drop`. Even if you decide not to use a trigger to create new partitions directly, you can use these two functions from a pre-scheduled job:

CREATE OR REPLACE FUNCTION drop_year_old_partition(IN base_name text)
RETURNS VOID AS
$BODY$
DECLARE
    table_name text;
BEGIN
    -- We assume this will be called within the last month of each quarter
    table_name := base_name || TO_CHAR(NOW() - '1 year 2 months'::interval, '_YY_"q"Q');
    EXECUTE 'ALTER TABLE IF EXISTS ' || table_name || ' NO INHERIT ' || base_name;
    EXECUTE 'DROP TABLE IF EXISTS ' || table_name || ' CASCADE';
END
$BODY$
    LANGUAGE PLpgSQL
    STRICT
    VOLATILE
    EXTERNAL SECURITY INVOKER;

We hope this’ll help make your transitions as quick and painless as possible! Get in touch if you have any comments or questions. Happy coding!

Igor Stojkovic

Igor Stojkovic

Software Developer

June 20, 2017