Tuning queries on partitioned tables in PostgreSQL

By John Doe January 15, 2025

Summary: in this article, you will learn how to tune queries on partitioned tables in PostgreSQL.

Table of Contents

Example

Partition pruning on your query will work with some conditions. You need an index on partition key and planner must choose nested loop for joining tables.

CREATE TABLE partitioned_table (
  a int,
  b int,
  c int,
  p int
) PARTITION BY RANGE (p);

CREATE TABLE p1 PARTITION OF partitioned_table FOR VALUES FROM (0) TO (10);
CREATE TABLE p2 PARTITION OF partitioned_table FOR VALUES FROM (10) TO (20);

INSERT INTO partitioned_table
  SELECT g.x, g.x, g.x, g.x FROM generate_series (0,19) AS g(x);

CREATE INDEX ON partitioned_table(p);

CREATE TABLE other_table (
  p int,
  r int
);

INSERT INTO other_table
  SELECT 1, g.x FROM generate_series(1,100) AS g(x);

VACUUM ANALYZE partitioned_table, other_table;

EXPLAIN (settings, analyze, costs off, timing off, summary off)
SELECT a, b, c
FROM partitioned_table
WHERE p IN (SELECT p FROM other_table WHERE r between 1 and 100);

                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Nested Loop (actual rows=1 loops=1)
   ->  HashAggregate (actual rows=1 loops=1)
         Group Key: other_table.p
         Batches: 1  Memory Usage: 24kB
         ->  Seq Scan on other_table (actual rows=100 loops=1)
               Filter: ((r >= 1) AND (r <= 100))
   ->  Append (actual rows=1 loops=1)
         ->  Index Scan using p1_p_idx on p1 partitioned_table_1 (actual rows=1 loops=1)
               Index Cond: (p = other_table.p)
         ->  Index Scan using p2_p_idx on p2 partitioned_table_2 (never executed)
               Index Cond: (p = other_table.p)
(11 rows)

Note that index scan on partition p2 is never executed.

Optimizing access to partitioned tables

Most times, developers have unrealistic expectations that if they partition a gigantic table, “it will be faster to select from it.” In reality, the goal and the purpose of partitioning is improving maintenance, not query optimization, and if the execution speed of short queries remains the same as before partitioning, they should consider it a success. There are no rules without exceptions, and there are cases of performance improvement, but those are exceptions indeed.

In order to keep queries not to slow down after partitioning, developers need to explicitly include the values of the attributes used for range partitioning. Quite often, this requirement is not obvious, and moreover, it requires a significant rewrite of the SQL statements in use.

From an application developer’s perspective, they do just that:

SELECT a, b, c
FROM partitioned_table
WHERE p IN (SELECT p FROM other_table WHERE r between 1 and 100)

Developers might take it for granted that: But I am selecting from a specific partition! Indeed, there is no way for PostgreSQL to know before the execution starts which partition it is going to be!

When developers ask me what they should do, I advise them to calculate the value first and use it in a query.

So, what exactly you should do to get PostgreSQL to know exactly which partition to access? A certainly not bad way to do this, is to first calculate the value of the partition key and then use it in the query.

Example

Optimizing access to partitioned tables

Search

Categories

Tags