由 John Doe 一月 23, 2025
摘要:在本文中,您将学习 PostgreSQL 中通用计划和定制计划的概念。
目录
初始设置
我们首先创建一个 demo 表,并使用一些样例数据填充该表:
CREATE TABLE demo ( a int, b text );
INSERT INTO demo SELECT i, 'aaa' FROM generate_series (1,100) i;
INSERT INTO demo SELECT i, 'bbb' FROM generate_series (101,200) i;
INSERT INTO demo SELECT i, 'ccc' FROM generate_series (201,300) i;
ANALYZE demo;
现在我们已经有了一些数据,我们可以预备一个语句,让我们可以用各种值来执行:
PREPARE my_stmt AS SELECT * FROM demo WHERE b = $1;
另外,您可以通过查询 pg_prepared_statements 系统视图,来检查会话中当前可用的预备语句:
SELECT * FROM pg_prepared_statements;
name | statement | prepare_time | parameter_types | from_sql
---------+-----------------------------------------------------+-------------------------------+-----------------+----------
my_stmt | prepare my_stmt as select * from demo where b = $1; | 2025-01-22 13:15:39.232521+01 | {text} | t
演示
当我们用 EXPLAIN (analyze) 分析该语句时,我们会看到什么?
EXPLAIN (analyze) EXECUTE my_stmt ( 'aaa' );
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.111..0.230 rows=100 loops=1)
Filter: (b = 'aaa'::text)
Rows Removed by Filter: 200
Planning time: 0.344 ms
Execution time: 0.285 ms
(5 rows)
在执行计划的 “Filter” 行中,我们可以看到我们传递给预备语句的实际值(‘aaa’)。当你看到那个时,这是一个所谓的定制计划。当 PostgreSQL 执行一个定制计划时,这意味着将要对提供的参数集重新规划语句。当你再执行几次时:
EXPLAIN (analyze) EXECUTE my_stmt ( 'aaa' );
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.045..0.158 rows=100 loops=1)
Filter: (b = 'aaa'::text)
Rows Removed by Filter: 200
Planning time: 0.243 ms
Execution time: 0.225 ms
(5 rows)
请按照您的需要,重复执行该命令,但至少 5 次。
EXPLAIN (analyze) EXECUTE my_stmt ( 'aaa' );
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.038..0.130 rows=100 loops=1)
Filter: (b = $1)
Rows Removed by Filter: 200
Planning time: 0.191 ms
Execution time: 0.183 ms
(5 rows)
最后,您会看到 “Filter” 行从显示实际参数变为了占位符。现在我们有了一个通用计划。无论您将什么值传递到预备语句中,在预备语句的生命周期内,这个通用计划都不会再变化:
EXPLAIN (analyze) EXECUTE my_stmt ( 'bbb' );
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.096..0.219 rows=100 loops=1)
Filter: (b = $1)
Rows Removed by Filter: 200
Planning time: 0.275 ms
Execution time: 0.352 ms
(5 rows)
EXPLAIN (analyze) EXECUTE my_stmt ( 'ccc' );
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.090..0.132 rows=100 loops=1)
Filter: (b = $1)
Rows Removed by Filter: 200
Planning time: 0.084 ms
Execution time: 0.204 ms
(5 rows)
EXPLAIN (analyze) EXECUTE my_stmt ( null );
QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.033..0.033 rows=0 loops=1)
Filter: (b = $1)
Rows Removed by Filter: 300
Planning time: 0.018 ms
Execution time: 0.051 ms
(5 rows)
怎么做到的?
当你查看 PostgreSQL 源代码(src/backend/utils/cache/plancache.c
)时,你会看到为什么它会在 5 次执行后发生变化:
/*
* choose_custom_plan: choose whether to use custom or generic plan
*
* This defines the policy followed by GetCachedPlan.
*/
static bool
choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
double avg_custom_cost;
/* One-shot plans will always be considered custom */
if (plansource->is_oneshot)
return true;
/* Otherwise, never any point in a custom plan if there's no parameters */
if (boundParams == NULL)
return false;
/* ... nor for transaction control statements */
if (IsTransactionStmtPlan(plansource))
return false;
/* See if caller wants to force the decision */
if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
return false;
if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
return true;
/* Generate custom plans until we have done at least 5 (arbitrary) */
if (plansource->num_custom_plans < 5)
return true;
avg_custom_cost = plansource->total_custom_cost / plansource->num_custom_plans;
/*
* Prefer generic plan if it's less expensive than the average custom
* plan. (Because we include a charge for cost of planning in the
* custom-plan costs, this means the generic plan only has to be less
* expensive than the execution cost plus replan cost of the custom
* plans.)
*
* Note that if generic_cost is -1 (indicating we've not yet determined
* the generic plan cost), we'll always prefer generic at this point.
*/
if (plansource->generic_cost < avg_custom_cost)
return false;
return true;
}
即使我们更改数据并再次分析表,一旦 PostgreSQL 选择了一个通用的计划,我们仍然会采用该计划:
INSERT INTO demo SELECT i, 'ddd' FROM generate_series (201,210) i;
INSERT INTO demo SELECT i, 'ee' FROM generate_series (211,211) i;
ANALYZE demo;
SELECT b, count(*) FROM demo GROUP BY b ORDER BY b;
b | count
-----+-------
aaa | 100
bbb | 100
ccc | 100
ddd | 10
ee | 1
(5 rows)
EXPLAIN (analyze) EXECUTE my_stmt ( 'ddd' );
QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.88 rows=78 width=8) (actual time=0.147..0.151 rows=10 loops=1)
Filter: (b = $1)
Rows Removed by Filter: 300
Planning time: 0.293 ms
Execution time: 0.190 ms
(5 rows)
当我们有了更多的数据,这些数据分布不均匀,并且在列 “b” 上有一个索引,此时情况会发生变化:
TRUNCATE demo;
INSERT INTO demo SELECT i, 'aaa' FROM generate_series (1,1000000) i;
INSERT INTO demo SELECT i, 'bbb' FROM generate_series (1000001,2000000) i;
INSERT INTO demo SELECT i, 'ccc' FROM generate_series (2000001,3000000) i;
INSERT INTO demo SELECT i, 'eee' FROM generate_series (3000001,3000010) i;
CREATE INDEX i1 ON demo (b);
SELECT b, count(*) FROM demo GROUP BY b ORDER BY b;
b | count
-----+---------
aaa | 1000000
bbb | 1000000
ccc | 1000000
eee | 10
(4 rows)
PREPARE my_stmt AS SELECT * FROM demo WHERE b = $1;
无论我们执行多少次下面的语句(传入 ’eee’),我们都不会得到一个通用的计划:
EXPLAIN (analyze) EXECUTE my_stmt ('eee');
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Index Scan using i1 on demo (cost=0.43..4.45 rows=1 width=8) (actual time=0.054..0.061 rows=10 loops=1)
Index Cond: (b = 'eee'::text)
Planning time: 0.249 ms
Execution time: 0.106 ms
(4 rows)
请按照您的需要,重复执行该命令,但至少 10 次。
EXPLAIN (analyze) EXECUTE my_stmt ('eee');
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Index Scan using i1 on demo (cost=0.43..4.45 rows=1 width=8) (actual time=0.054..0.061 rows=10 loops=1)
Index Cond: (b = 'eee'::text)
Planning time: 0.249 ms
Execution time: 0.106 ms
这是因为当我们有这样的数据分布时,定制计划(包括了重新规划的开销)总是比通用计划(不包括重新规划的开销)的开销要小。文档中对此有非常清楚的说明:
来自 PostgreSQL 文档:
在有很多重复值的列中,使用很少出现的值调用
EXECUTE
,可以产生比通用计划更加廉价的定制计划(即使加上规划开销),这样的情况下不会使用通用计划。