由 John Doe 一月 13, 2025
摘要:在本文中,我们将学习到一个场景,在 PostgreSQL 中的某些情况下,子查询中的条件无法用于分区剪枝。
目录
介绍
最近,有一位客户发现,有一个性能不佳的查询,它在 Oracle 上执行非常快,但是一旦迁移到 PostgreSQL,查询就很慢。经过调查发现,该查询在规划阶段没有修剪分区,导致执行的时间要长得多。也就是说,该查询在规划阶段没有进行分区修剪,在执行阶段也没有进行(在 EXPLAIN 输出的计划中标注的 “never executed”,可以表明这一点)。
这个查询带有 “WHERE” 和 “JOIN” 子句,这些子句显式地指定了分区键,并且是在分区键上进行连接的,那么为什么在规划期间没有进行分区修剪呢?
测试用例设置
我们需要为下面的演示创建测试用例,此处我们使用了 employee 数据库,它是一个 PostgreSQL 的示例数据库,主要用于各种测试应用程序。
压缩包中包含 3 个转储文件:
- employees_data.sql – 包含了数据和架构(解压后约 140MB)
- employees_schema.sql – 仅包含数据架构
- employees_parts.sql – 对某些表进行分区后的版本
我们这里使用的是 PostgreSQL 15。转储中没有包含有关数据库或用户名的信息,因此,请使用所需的任意所有者(用户),创建一个任意的数据库:
CREATE USER employees_user SUPERUSER;
CREATE DATABASE employees_database WITH OWNER employees_user;
并将转储插入其中:
tar -xf employees-database.tar.bz2
psql -U employees_user employees_database < employees_data.sql
psql -U employees_user employees_database < employees_parts.sql
该数据库包含了大约 300000 条员工记录,和 280 万条薪资记录。
演示
查询中的相关表都已按 “hire_date” 进行分区,如下面的 “WHERE” 子句所示,规划器应该规划分区剪枝,因为驱动表中的 “hire_date” 等于连接表中的 “hire_date”,但规划器并没有修剪分区:
EXPLAIN (ANALYZE, BUFFERS)
WITH top_emp AS (
SELECT
emp.id,
emp.hire_date,
max(sal.amount) AS sal_amount
FROM
employees.employee_part emp
INNER JOIN employees.salary sal on (sal.employee_id = emp.id)
WHERE
emp.hire_date = '1985-01-01'::date
GROUP BY emp.id, emp.hire_date
)
SELECT
dept.employee_id,
dept.department_id,
dept.hire_date,
emp1.sal_amount
FROM
top_emp emp1,
employees.department_employee_part dept
WHERE
emp1.hire_date = dept.hire_date and
emp1.id = dept.employee_id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=5030.60..10845.53 rows=1 width=25) (actual time=40.026..41.241 rows=9 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=3787
-> Hash Join (cost=4030.61..9845.43 rows=1 width=25) (actual time=27.342..36.103 rows=3 loops=3)
Hash Cond: ((dept.hire_date = emp1.hire_date) AND (dept.employee_id = emp1.id))
Buffers: shared hit=3787
-> Parallel Append (cost=0.00..5089.45 rows=138168 width=17) (actual time=0.009..21.210 rows=110534 loops=3)
Buffers: shared hit=2448
-> Parallel Seq Scan on department_employee_part_p1986 dept_2 (cost=0.00..530.32 rows=23532 width=17) (actual time=0.008..5.717 rows=40005 loops=1)
Buffers: shared hit=295
-> Parallel Seq Scan on department_employee_part_p1985 dept_1 (cost=0.00..517.88 rows=22988 width=17) (actual time=0.007..5.684 rows=39080 loops=1)
Buffers: shared hit=288
-> Parallel Seq Scan on department_employee_part_p1987 dept_3 (cost=0.00..489.24 rows=21724 width=17) (actual time=0.018..5.021 rows=36930 loops=1)
Buffers: shared hit=272
-> Parallel Seq Scan on department_employee_part_p1988 dept_4 (cost=0.00..460.15 rows=20415 width=17) (actual time=0.018..5.600 rows=34705 loops=1)
Buffers: shared hit=256
-> Parallel Seq Scan on department_employee_part_p1989 dept_5 (cost=0.00..415.40 rows=18440 width=17) (actual time=0.011..1.492 rows=10449 loops=3)
Buffers: shared hit=231
-> Parallel Seq Scan on department_employee_part_p1990 dept_6 (cost=0.00..375.64 rows=16664 width=17) (actual time=0.006..1.985 rows=14164 loops=2)
Buffers: shared hit=209
-> Parallel Seq Scan on department_employee_part_p1991 dept_7 (cost=0.00..330.67 rows=14667 width=17) (actual time=0.003..3.367 rows=24934 loops=1)
Buffers: shared hit=184
-> Parallel Seq Scan on department_employee_part_p1992 dept_8 (cost=0.00..298.58 rows=13258 width=17) (actual time=0.003..2.349 rows=22539 loops=1)
Buffers: shared hit=166
-> Parallel Seq Scan on department_employee_part_p1993 dept_9 (cost=0.00..260.69 rows=11569 width=17) (actual time=0.002..1.967 rows=19667 loops=1)
Buffers: shared hit=145
-> Parallel Seq Scan on department_employee_part_p1994 dept_10 (cost=0.00..218.84 rows=9684 width=17) (actual time=0.004..1.688 rows=16463 loops=1)
Buffers: shared hit=122
-> Parallel Seq Scan on department_employee_part_p1995 dept_11 (cost=0.00..177.90 rows=7890 width=17) (actual time=0.002..1.523 rows=13413 loops=1)
Buffers: shared hit=99
-> Parallel Seq Scan on department_employee_part_p1996 dept_12 (cost=0.00..140.16 rows=6216 width=17) (actual time=0.002..1.180 rows=10568 loops=1)
Buffers: shared hit=78
-> Parallel Seq Scan on department_employee_part_p1997 dept_13 (cost=0.00..98.38 rows=4338 width=17) (actual time=0.003..0.815 rows=7375 loops=1)
Buffers: shared hit=55
-> Parallel Seq Scan on department_employee_part_p1998 dept_14 (cost=0.00..60.84 rows=2684 width=17) (actual time=0.002..0.481 rows=4562 loops=1)
Buffers: shared hit=34
-> Parallel Seq Scan on department_employee_part_p1999 dept_15 (cost=0.00..22.83 rows=983 width=17) (actual time=0.002..0.174 rows=1671 loops=1)
Buffers: shared hit=13
-> Parallel Seq Scan on department_employee_part_p2000 dept_16 (cost=0.00..1.09 rows=9 width=17) (actual time=0.008..0.011 rows=15 loops=1)
Buffers: shared hit=1
-> Hash (cost=4029.12..4029.12 rows=99 width=20) (actual time=7.015..7.018 rows=9 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1291
-> Subquery Scan on emp1 (cost=0.72..4029.12 rows=99 width=20) (actual time=2.652..7.010 rows=9 loops=3)
Buffers: shared hit=1291
-> GroupAggregate (cost=0.72..4028.13 rows=99 width=20) (actual time=2.651..7.007 rows=9 loops=3)
Group Key: emp.id, emp.hire_date
Buffers: shared hit=1291
-> Nested Loop (cost=0.72..4018.56 rows=1144 width=20) (actual time=2.636..6.986 rows=162 loops=3)
Buffers: shared hit=1291
-> Index Scan using employee_part_p1985_pkey on employee_part_p1985 emp (cost=0.29..1310.32 rows=99 width=12) (actual time=2.611..6.888 rows=9 loops=3)
Filter: (hire_date = '1985-01-01'::date)
Rows Removed by Filter: 35307
Buffers: shared hit=1178
-> Index Scan using idx_16991_primary on salary sal (cost=0.43..27.24 rows=12 width=16) (actual time=0.005..0.008 rows=18 loops=27)
Index Cond: (employee_id = emp.id)
Buffers: shared hit=113
Planning:
Buffers: shared hit=9
Planning Time: 0.976 ms
Execution Time: 41.333 ms
(62 rows)
因此,如果分区键是相等的,并且表的分区方式完全相同,那么为什么规划器没有进行分区修剪呢?可以单击此处,在 PostgreSQL 邮件列表中找到答案。
规划器只能将条件向下推送到子查询中,它不能将条件从子查询拉入到外部查询中。
执行阶段的分区修剪只能发生在嵌套循环连接上,如果外部表上没有索引,并且规划器必须使用哈希连接,那么原始查询在执行阶段也不能进行分区修剪。
让我们在子查询中增加一个emp.birth_date
的条件,来看看会发生什么情况:
EXPLAIN (ANALYZE, BUFFERS)
WITH top_emp AS (
SELECT
emp.id,
emp.hire_date,
max(sal.amount) AS sal_amount
FROM
employees.employee_part emp
INNER JOIN employees.salary sal on (sal.employee_id = emp.id)
WHERE
emp.hire_date = '1985-01-01'::date
and emp.birth_date between '1957-01-01'::date and '1970-01-01'::date
GROUP BY emp.id, emp.hire_date
)
SELECT
dept.employee_id,
dept.department_id,
dept.hire_date,
emp1.sal_amount
FROM
top_emp emp1,
employees.department_employee_part dept
WHERE
emp1.hire_date = dept.hire_date and
emp1.id = dept.employee_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2600.72..8414.19 rows=1 width=25) (actual time=3.710..3.728 rows=5 loops=1)
Buffers: shared hit=328
-> HashAggregate (cost=2600.43..2601.04 rows=61 width=20) (actual time=3.679..3.682 rows=5 loops=1)
Group Key: emp.id, emp.hire_date
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=313
-> Nested Loop (cost=0.43..2595.14 rows=705 width=20) (actual time=1.329..3.655 rows=90 loops=1)
Buffers: shared hit=313
-> Seq Scan on employee_part_p1985 emp (cost=0.00..911.03 rows=61 width=12) (actual time=1.309..3.602 rows=5 loops=1)
Filter: ((birth_date >= '1957-01-01'::date) AND (birth_date <= '1970-01-01'::date) AND (hire_date = '1985-01-01'::date))
Rows Removed by Filter: 35311
Buffers: shared hit=293
-> Index Scan using idx_16991_primary on salary sal (cost=0.43..27.49 rows=12 width=16) (actual time=0.005..0.007 rows=18 loops=5)
Index Cond: (employee_id = emp.id)
Buffers: shared hit=20
-> Append (cost=0.29..95.13 rows=16 width=17) (actual time=0.006..0.007 rows=1 loops=5)
Buffers: shared hit=15
-> Index Scan using department_employee_part_p1985_pkey on department_employee_part_p1985 dept_1 (cost=0.29..7.46 rows=1 width=17) (actual time=0.005..0.005 rows=1 loops=5)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
Buffers: shared hit=15
-> Index Scan using department_employee_part_p1986_pkey on department_employee_part_p1986 dept_2 (cost=0.29..7.52 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1987_pkey on department_employee_part_p1987 dept_3 (cost=0.29..7.39 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1988_pkey on department_employee_part_p1988 dept_4 (cost=0.29..7.39 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1989_pkey on department_employee_part_p1989 dept_5 (cost=0.29..7.26 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1990_pkey on department_employee_part_p1990 dept_6 (cost=0.29..7.19 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1991_pkey on department_employee_part_p1991 dept_7 (cost=0.29..7.06 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1992_pkey on department_employee_part_p1992 dept_8 (cost=0.29..6.93 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1993_pkey on department_employee_part_p1993 dept_9 (cost=0.29..6.73 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1994_pkey on department_employee_part_p1994 dept_10 (cost=0.29..6.54 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1995_pkey on department_employee_part_p1995 dept_11 (cost=0.29..6.21 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1996_pkey on department_employee_part_p1996 dept_12 (cost=0.29..5.81 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1997_pkey on department_employee_part_p1997 dept_13 (cost=0.28..5.15 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1998_pkey on department_employee_part_p1998 dept_14 (cost=0.28..4.11 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1999_pkey on department_employee_part_p1999 dept_15 (cost=0.28..1.94 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p2000_pkey on department_employee_part_p2000 dept_16 (cost=0.14..0.35 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
Planning:
Buffers: shared hit=9
Planning Time: 1.133 ms
Execution Time: 3.977 ms
(70 rows)
原来的哈希连接变成了嵌套循环连接。在 EXPLAIN 输出的计划中标注的 “never executed” 表明,执行阶段的分区修剪已经发生了。
让我们将原始子查询中的 GROUP BY 子句去掉,看看又会发生什么情况:
EXPLAIN (ANALYZE, BUFFERS)
WITH top_emp AS (
SELECT
DISTINCT
emp.id,
emp.hire_date
FROM
employees.employee_part emp
INNER JOIN employees.salary sal on (sal.employee_id = emp.id)
WHERE
emp.hire_date = '1985-01-01'::date
and emp.birth_date between '1957-01-01'::date and '1970-01-01'::date
and sal.amount > 50000
)
SELECT
dept.employee_id,
dept.department_id,
dept.hire_date
FROM
top_emp emp1,
employees.department_employee_part dept
WHERE
emp1.hire_date = dept.hire_date and
emp1.id = dept.employee_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2619.67..8435.21 rows=1 width=17) (actual time=4.955..4.985 rows=5 loops=1)
Buffers: shared hit=328
-> Unique (cost=2619.38..2622.06 rows=61 width=12) (actual time=4.909..4.922 rows=5 loops=1)
Buffers: shared hit=313
-> Sort (cost=2619.38..2620.72 rows=535 width=12) (actual time=4.907..4.913 rows=87 loops=1)
Sort Key: emp.id
Sort Method: quicksort Memory: 29kB
Buffers: shared hit=313
-> Nested Loop (cost=0.43..2595.14 rows=535 width=12) (actual time=1.829..4.849 rows=87 loops=1)
Buffers: shared hit=313
-> Seq Scan on employee_part_p1985 emp (cost=0.00..911.03 rows=61 width=12) (actual time=1.805..4.773 rows=5 loops=1)
Filter: ((birth_date >= '1957-01-01'::date) AND (birth_date <= '1970-01-01'::date) AND (hire_date = '1985-01-01'::date))
Rows Removed by Filter: 35311
Buffers: shared hit=293
-> Index Scan using idx_16991_primary on salary sal (cost=0.43..27.52 rows=9 width=8) (actual time=0.007..0.011 rows=17 loops=5)
Index Cond: (employee_id = emp.id)
Filter: (amount > 50000)
Rows Removed by Filter: 1
Buffers: shared hit=20
-> Append (cost=0.29..95.13 rows=16 width=17) (actual time=0.009..0.009 rows=1 loops=5)
Buffers: shared hit=15
-> Index Scan using department_employee_part_p1985_pkey on department_employee_part_p1985 dept_1 (cost=0.29..7.46 rows=1 width=17) (actual time=0.007..0.007 rows=1 loops=5)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
Buffers: shared hit=15
-> Index Scan using department_employee_part_p1986_pkey on department_employee_part_p1986 dept_2 (cost=0.29..7.52 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1987_pkey on department_employee_part_p1987 dept_3 (cost=0.29..7.39 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1988_pkey on department_employee_part_p1988 dept_4 (cost=0.29..7.39 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1989_pkey on department_employee_part_p1989 dept_5 (cost=0.29..7.26 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1990_pkey on department_employee_part_p1990 dept_6 (cost=0.29..7.19 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1991_pkey on department_employee_part_p1991 dept_7 (cost=0.29..7.06 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1992_pkey on department_employee_part_p1992 dept_8 (cost=0.29..6.93 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1993_pkey on department_employee_part_p1993 dept_9 (cost=0.29..6.73 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1994_pkey on department_employee_part_p1994 dept_10 (cost=0.29..6.54 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1995_pkey on department_employee_part_p1995 dept_11 (cost=0.29..6.21 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1996_pkey on department_employee_part_p1996 dept_12 (cost=0.29..5.81 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1997_pkey on department_employee_part_p1997 dept_13 (cost=0.28..5.15 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1998_pkey on department_employee_part_p1998 dept_14 (cost=0.28..4.11 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p1999_pkey on department_employee_part_p1999 dept_15 (cost=0.28..1.94 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
-> Index Scan using department_employee_part_p2000_pkey on department_employee_part_p2000 dept_16 (cost=0.14..0.35 rows=1 width=17) (never executed)
Index Cond: (employee_id = emp.id)
Filter: (emp.hire_date = hire_date)
Planning:
Buffers: shared hit=11
Planning Time: 1.038 ms
Execution Time: 5.128 ms
(74 rows)
同样,原来的哈希连接变成了嵌套循环连接。在 EXPLAIN 输出的计划中标注的 “never executed” 表明,执行阶段的分区修剪已经发生了。
结论
所以,正如你所看到的,让规划器预测你想要的方式是非常重要的。虽然最终执行阶段没有完全执行,但它仍然必须查看分区的统计信息来决定要不要执行它,从而会导致一些时间浪费。另外,请记住,该测试用例非常简单。在实际的生产环境中,会有更多的分区和更多的数据,因此规划上浪费的时间要高得多,也会产生更大的影响。