Partition pruning with conditions in subquery not works

By John Doe January 13, 2025

Summary: In this article, we will learn a scenario that partition pruning with conditions in subquery not works in PostgreSQL.

Table of Contents

Introduction

Recently, a customer discovered that, there was a poorly performing query that had a perfectly fine execution time on Oracle, but once migrated to Postgres the query was slow. Upon investigation, it was found that the planning portion of the query was not pruning partitions causing a much longer execution time. That said even though the planning portion was not accomplishing partition pruning, the execution portion was (This is indicated by the portion of the explain plan notating “never executed”) .

The query had “WHERE” and “JOIN” clauses which explicitly specified partition keys and were joined on the partition key, so why wasn’t pruning happening during planning time?

Test Case Setup

We need to create a test case for the demonstration below, we used the employee database here, it is a sample database for PostgreSQL to be used mainly in various test applications.

The tarball contains 3 dump files:

  • employees_data.sql – schema with data (~140Mb uncompressed)
  • employees_schema.sql – schema only
  • employees_parts.sql – some partitioned versions of the tables

We’re using PostgreSQL 15 here. Dumps doesn’t contain information about schema or user name, so create any database with any owner (user) you want:

CREATE USER employees_user SUPERUSER;
CREATE DATABASE employees_database WITH OWNER employees_user;

and insert dump into it:

tar -xf employees-database.tar.bz2
psql -U employees_user employees_database < employees_data.sql
psql -U employees_user employees_database < employees_parts.sql

The database contains about 300,000 employee records with 2.8 million salary entries.

Demonstration

The related tables in the query are all partitioned by “hire_date”, as you can see in the “WHERE” clause below, the planner should project partition pruning because the “hire_date” from the driving table is equal to that of the joined table, yet the planner did not prune partitions:

EXPLAIN (ANALYZE, BUFFERS)
WITH top_emp AS (
   SELECT
       emp.id,
       emp.hire_date,
	   max(sal.amount) AS sal_amount
   FROM
       employees.employee_part emp
       INNER JOIN employees.salary sal on (sal.employee_id = emp.id)
   WHERE
       emp.hire_date = '1985-01-01'::date
   GROUP BY emp.id, emp.hire_date
)
SELECT
    dept.employee_id,
    dept.department_id,
    dept.hire_date,
	emp1.sal_amount
FROM
    top_emp emp1,
    employees.department_employee_part dept
WHERE
    emp1.hire_date = dept.hire_date and
    emp1.id = dept.employee_id;

                                                                                        QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=5030.60..10845.53 rows=1 width=25) (actual time=40.026..41.241 rows=9 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=3787
   ->  Hash Join  (cost=4030.61..9845.43 rows=1 width=25) (actual time=27.342..36.103 rows=3 loops=3)
         Hash Cond: ((dept.hire_date = emp1.hire_date) AND (dept.employee_id = emp1.id))
         Buffers: shared hit=3787
         ->  Parallel Append  (cost=0.00..5089.45 rows=138168 width=17) (actual time=0.009..21.210 rows=110534 loops=3)
               Buffers: shared hit=2448
               ->  Parallel Seq Scan on department_employee_part_p1986 dept_2  (cost=0.00..530.32 rows=23532 width=17) (actual time=0.008..5.717 rows=40005 loops=1)
                     Buffers: shared hit=295
               ->  Parallel Seq Scan on department_employee_part_p1985 dept_1  (cost=0.00..517.88 rows=22988 width=17) (actual time=0.007..5.684 rows=39080 loops=1)
                     Buffers: shared hit=288
               ->  Parallel Seq Scan on department_employee_part_p1987 dept_3  (cost=0.00..489.24 rows=21724 width=17) (actual time=0.018..5.021 rows=36930 loops=1)
                     Buffers: shared hit=272
               ->  Parallel Seq Scan on department_employee_part_p1988 dept_4  (cost=0.00..460.15 rows=20415 width=17) (actual time=0.018..5.600 rows=34705 loops=1)
                     Buffers: shared hit=256
               ->  Parallel Seq Scan on department_employee_part_p1989 dept_5  (cost=0.00..415.40 rows=18440 width=17) (actual time=0.011..1.492 rows=10449 loops=3)
                     Buffers: shared hit=231
               ->  Parallel Seq Scan on department_employee_part_p1990 dept_6  (cost=0.00..375.64 rows=16664 width=17) (actual time=0.006..1.985 rows=14164 loops=2)
                     Buffers: shared hit=209
               ->  Parallel Seq Scan on department_employee_part_p1991 dept_7  (cost=0.00..330.67 rows=14667 width=17) (actual time=0.003..3.367 rows=24934 loops=1)
                     Buffers: shared hit=184
               ->  Parallel Seq Scan on department_employee_part_p1992 dept_8  (cost=0.00..298.58 rows=13258 width=17) (actual time=0.003..2.349 rows=22539 loops=1)
                     Buffers: shared hit=166
               ->  Parallel Seq Scan on department_employee_part_p1993 dept_9  (cost=0.00..260.69 rows=11569 width=17) (actual time=0.002..1.967 rows=19667 loops=1)
                     Buffers: shared hit=145
               ->  Parallel Seq Scan on department_employee_part_p1994 dept_10  (cost=0.00..218.84 rows=9684 width=17) (actual time=0.004..1.688 rows=16463 loops=1)
                     Buffers: shared hit=122
               ->  Parallel Seq Scan on department_employee_part_p1995 dept_11  (cost=0.00..177.90 rows=7890 width=17) (actual time=0.002..1.523 rows=13413 loops=1)
                     Buffers: shared hit=99
               ->  Parallel Seq Scan on department_employee_part_p1996 dept_12  (cost=0.00..140.16 rows=6216 width=17) (actual time=0.002..1.180 rows=10568 loops=1)
                     Buffers: shared hit=78
               ->  Parallel Seq Scan on department_employee_part_p1997 dept_13  (cost=0.00..98.38 rows=4338 width=17) (actual time=0.003..0.815 rows=7375 loops=1)
                     Buffers: shared hit=55
               ->  Parallel Seq Scan on department_employee_part_p1998 dept_14  (cost=0.00..60.84 rows=2684 width=17) (actual time=0.002..0.481 rows=4562 loops=1)
                     Buffers: shared hit=34
               ->  Parallel Seq Scan on department_employee_part_p1999 dept_15  (cost=0.00..22.83 rows=983 width=17) (actual time=0.002..0.174 rows=1671 loops=1)
                     Buffers: shared hit=13
               ->  Parallel Seq Scan on department_employee_part_p2000 dept_16  (cost=0.00..1.09 rows=9 width=17) (actual time=0.008..0.011 rows=15 loops=1)
                     Buffers: shared hit=1
         ->  Hash  (cost=4029.12..4029.12 rows=99 width=20) (actual time=7.015..7.018 rows=9 loops=3)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               Buffers: shared hit=1291
               ->  Subquery Scan on emp1  (cost=0.72..4029.12 rows=99 width=20) (actual time=2.652..7.010 rows=9 loops=3)
                     Buffers: shared hit=1291
                     ->  GroupAggregate  (cost=0.72..4028.13 rows=99 width=20) (actual time=2.651..7.007 rows=9 loops=3)
                           Group Key: emp.id, emp.hire_date
                           Buffers: shared hit=1291
                           ->  Nested Loop  (cost=0.72..4018.56 rows=1144 width=20) (actual time=2.636..6.986 rows=162 loops=3)
                                 Buffers: shared hit=1291
                                 ->  Index Scan using employee_part_p1985_pkey on employee_part_p1985 emp  (cost=0.29..1310.32 rows=99 width=12) (actual time=2.611..6.888 rows=9 loops=3)
                                       Filter: (hire_date = '1985-01-01'::date)
                                       Rows Removed by Filter: 35307
                                       Buffers: shared hit=1178
                                 ->  Index Scan using idx_16991_primary on salary sal  (cost=0.43..27.24 rows=12 width=16) (actual time=0.005..0.008 rows=18 loops=27)
                                       Index Cond: (employee_id = emp.id)
                                       Buffers: shared hit=113
 Planning:
   Buffers: shared hit=9
 Planning Time: 0.976 ms
 Execution Time: 41.333 ms
(62 rows)

So if the partition keys are equal and the tables are partitioned the exact same way, then why is the planner not projecting partition pruning? The answer can be found in the PostgreSQL Mailing Lists by clicking here.

The planner can only push quals down into a subquery, it cannot pull quals from a subquery into the outer query.

The execution time partition pruning can only happens on nested loop join, if there is no index on outer table and planner has to go with hash-join, then the original query can’t utilize partitioning pruning at execution time.

Let’s add a emp.birth_date condition to the subquery and see what happens:

EXPLAIN (ANALYZE, BUFFERS)
WITH top_emp AS (
   SELECT 
       emp.id,
       emp.hire_date,
	   max(sal.amount) AS sal_amount
   FROM
       employees.employee_part emp
       INNER JOIN employees.salary sal on (sal.employee_id = emp.id)
   WHERE
       emp.hire_date = '1985-01-01'::date
       and emp.birth_date between '1957-01-01'::date and '1970-01-01'::date
   GROUP BY emp.id, emp.hire_date
)
SELECT
    dept.employee_id,
    dept.department_id,
    dept.hire_date,
	emp1.sal_amount
FROM
    top_emp emp1,
    employees.department_employee_part dept
WHERE
    emp1.hire_date = dept.hire_date and
    emp1.id = dept.employee_id;

                                                                                       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2600.72..8414.19 rows=1 width=25) (actual time=3.710..3.728 rows=5 loops=1)
   Buffers: shared hit=328
   ->  HashAggregate  (cost=2600.43..2601.04 rows=61 width=20) (actual time=3.679..3.682 rows=5 loops=1)
         Group Key: emp.id, emp.hire_date
         Batches: 1  Memory Usage: 24kB
         Buffers: shared hit=313
         ->  Nested Loop  (cost=0.43..2595.14 rows=705 width=20) (actual time=1.329..3.655 rows=90 loops=1)
               Buffers: shared hit=313
               ->  Seq Scan on employee_part_p1985 emp  (cost=0.00..911.03 rows=61 width=12) (actual time=1.309..3.602 rows=5 loops=1)
                     Filter: ((birth_date >= '1957-01-01'::date) AND (birth_date <= '1970-01-01'::date) AND (hire_date = '1985-01-01'::date))
                     Rows Removed by Filter: 35311
                     Buffers: shared hit=293
               ->  Index Scan using idx_16991_primary on salary sal  (cost=0.43..27.49 rows=12 width=16) (actual time=0.005..0.007 rows=18 loops=5)
                     Index Cond: (employee_id = emp.id)
                     Buffers: shared hit=20
   ->  Append  (cost=0.29..95.13 rows=16 width=17) (actual time=0.006..0.007 rows=1 loops=5)
         Buffers: shared hit=15
         ->  Index Scan using department_employee_part_p1985_pkey on department_employee_part_p1985 dept_1  (cost=0.29..7.46 rows=1 width=17) (actual time=0.005..0.005 rows=1 loops=5)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
               Buffers: shared hit=15
         ->  Index Scan using department_employee_part_p1986_pkey on department_employee_part_p1986 dept_2  (cost=0.29..7.52 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1987_pkey on department_employee_part_p1987 dept_3  (cost=0.29..7.39 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1988_pkey on department_employee_part_p1988 dept_4  (cost=0.29..7.39 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1989_pkey on department_employee_part_p1989 dept_5  (cost=0.29..7.26 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1990_pkey on department_employee_part_p1990 dept_6  (cost=0.29..7.19 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1991_pkey on department_employee_part_p1991 dept_7  (cost=0.29..7.06 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1992_pkey on department_employee_part_p1992 dept_8  (cost=0.29..6.93 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1993_pkey on department_employee_part_p1993 dept_9  (cost=0.29..6.73 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1994_pkey on department_employee_part_p1994 dept_10  (cost=0.29..6.54 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1995_pkey on department_employee_part_p1995 dept_11  (cost=0.29..6.21 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1996_pkey on department_employee_part_p1996 dept_12  (cost=0.29..5.81 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1997_pkey on department_employee_part_p1997 dept_13  (cost=0.28..5.15 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1998_pkey on department_employee_part_p1998 dept_14  (cost=0.28..4.11 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1999_pkey on department_employee_part_p1999 dept_15  (cost=0.28..1.94 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p2000_pkey on department_employee_part_p2000 dept_16  (cost=0.14..0.35 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
 Planning:
   Buffers: shared hit=9
 Planning Time: 1.133 ms
 Execution Time: 3.977 ms
(70 rows)

The original hash join becomes a nested loop join. The notation “never executed” in the explain plan indicates that the execution time partition pruning has occurred.

Let’s remove the GROUP BY clause from the original subquery and see what happens:

EXPLAIN (ANALYZE, BUFFERS)
WITH top_emp AS (
   SELECT 
       DISTINCT
       emp.id,
       emp.hire_date
   FROM
       employees.employee_part emp
       INNER JOIN employees.salary sal on (sal.employee_id = emp.id)
   WHERE
       emp.hire_date = '1985-01-01'::date
       and emp.birth_date between '1957-01-01'::date and '1970-01-01'::date
       and sal.amount > 50000
)
SELECT
    dept.employee_id,
    dept.department_id,
    dept.hire_date
FROM
    top_emp emp1,
    employees.department_employee_part dept
WHERE
    emp1.hire_date = dept.hire_date and
    emp1.id = dept.employee_id;

                                                                                       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2619.67..8435.21 rows=1 width=17) (actual time=4.955..4.985 rows=5 loops=1)
   Buffers: shared hit=328
   ->  Unique  (cost=2619.38..2622.06 rows=61 width=12) (actual time=4.909..4.922 rows=5 loops=1)
         Buffers: shared hit=313
         ->  Sort  (cost=2619.38..2620.72 rows=535 width=12) (actual time=4.907..4.913 rows=87 loops=1)
               Sort Key: emp.id
               Sort Method: quicksort  Memory: 29kB
               Buffers: shared hit=313
               ->  Nested Loop  (cost=0.43..2595.14 rows=535 width=12) (actual time=1.829..4.849 rows=87 loops=1)
                     Buffers: shared hit=313
                     ->  Seq Scan on employee_part_p1985 emp  (cost=0.00..911.03 rows=61 width=12) (actual time=1.805..4.773 rows=5 loops=1)
                           Filter: ((birth_date >= '1957-01-01'::date) AND (birth_date <= '1970-01-01'::date) AND (hire_date = '1985-01-01'::date))
                           Rows Removed by Filter: 35311
                           Buffers: shared hit=293
                     ->  Index Scan using idx_16991_primary on salary sal  (cost=0.43..27.52 rows=9 width=8) (actual time=0.007..0.011 rows=17 loops=5)
                           Index Cond: (employee_id = emp.id)
                           Filter: (amount > 50000)
                           Rows Removed by Filter: 1
                           Buffers: shared hit=20
   ->  Append  (cost=0.29..95.13 rows=16 width=17) (actual time=0.009..0.009 rows=1 loops=5)
         Buffers: shared hit=15
         ->  Index Scan using department_employee_part_p1985_pkey on department_employee_part_p1985 dept_1  (cost=0.29..7.46 rows=1 width=17) (actual time=0.007..0.007 rows=1 loops=5)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
               Buffers: shared hit=15
         ->  Index Scan using department_employee_part_p1986_pkey on department_employee_part_p1986 dept_2  (cost=0.29..7.52 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1987_pkey on department_employee_part_p1987 dept_3  (cost=0.29..7.39 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1988_pkey on department_employee_part_p1988 dept_4  (cost=0.29..7.39 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1989_pkey on department_employee_part_p1989 dept_5  (cost=0.29..7.26 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1990_pkey on department_employee_part_p1990 dept_6  (cost=0.29..7.19 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1991_pkey on department_employee_part_p1991 dept_7  (cost=0.29..7.06 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1992_pkey on department_employee_part_p1992 dept_8  (cost=0.29..6.93 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1993_pkey on department_employee_part_p1993 dept_9  (cost=0.29..6.73 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1994_pkey on department_employee_part_p1994 dept_10  (cost=0.29..6.54 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1995_pkey on department_employee_part_p1995 dept_11  (cost=0.29..6.21 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1996_pkey on department_employee_part_p1996 dept_12  (cost=0.29..5.81 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1997_pkey on department_employee_part_p1997 dept_13  (cost=0.28..5.15 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1998_pkey on department_employee_part_p1998 dept_14  (cost=0.28..4.11 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p1999_pkey on department_employee_part_p1999 dept_15  (cost=0.28..1.94 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
         ->  Index Scan using department_employee_part_p2000_pkey on department_employee_part_p2000 dept_16  (cost=0.14..0.35 rows=1 width=17) (never executed)
               Index Cond: (employee_id = emp.id)
               Filter: (emp.hire_date = hire_date)
 Planning:
   Buffers: shared hit=11
 Planning Time: 1.038 ms
 Execution Time: 5.128 ms
(74 rows)

Again, the original hash join becomes a nested loop join. The notation “never executed” in the explain plan indicates that the execution time partition pruning has occurred.

Conclusion

So as you can see it is very important to get the planner to project what you want. While ultimately the execution phase was not fully executed, it still had to look at the stats of the partition to decide not to execute it resulting in some wasted time. Also, keep in mind that this test case is a very simple one. In a real production environment, there will be a lot more partitions and a lot more data, so the planning time being wasted was much higher resulting in a much higher impact.