PostgreSQL 中的通用计划和定制计划

John Doe 一月 23, 2025

摘要:在本文中,您将学习 PostgreSQL 中通用计划和定制计划的概念。

目录

初始设置

我们首先创建一个 demo 表,并使用一些样例数据填充该表:

CREATE TABLE demo ( a int, b text );

INSERT INTO demo SELECT i, 'aaa' FROM generate_series (1,100) i;
INSERT INTO demo SELECT i, 'bbb' FROM generate_series (101,200) i;
INSERT INTO demo SELECT i, 'ccc' FROM generate_series (201,300) i;

ANALYZE demo;

现在我们已经有了一些数据,我们可以预备一个语句,让我们可以用各种值来执行:

PREPARE my_stmt AS SELECT * FROM demo WHERE b = $1;

另外,您可以通过查询 pg_prepared_statements 系统视图,来检查会话中当前可用的预备语句:

SELECT * FROM pg_prepared_statements;
  name   |                      statement                      |         prepare_time          | parameter_types | from_sql
---------+-----------------------------------------------------+-------------------------------+-----------------+----------
 my_stmt | prepare my_stmt as select * from demo where b = $1; | 2025-01-22 13:15:39.232521+01 | {text}          | t

演示

当我们用 EXPLAIN (analyze) 分析该语句时,我们会看到什么?

EXPLAIN (analyze) EXECUTE my_stmt ( 'aaa' );
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.111..0.230 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.344 ms
 Execution time: 0.285 ms
(5 rows)

在执行计划的 “Filter” 行中,我们可以看到我们传递给预备语句的实际值(‘aaa’)。当你看到那个时,这是一个所谓的定制计划。当 PostgreSQL 执行一个定制计划时,这意味着将要对提供的参数集重新规划语句。当你再执行几次时:

EXPLAIN (analyze) EXECUTE my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.045..0.158 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.243 ms
 Execution time: 0.225 ms
(5 rows)

请按照您的需要,重复执行该命令,但至少 5 次。

EXPLAIN (analyze) EXECUTE my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.038..0.130 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning time: 0.191 ms
 Execution time: 0.183 ms
(5 rows)

最后,您会看到 “Filter” 行从显示实际参数变为了占位符。现在我们有了一个通用计划。无论您将什么值传递到预备语句中,在预备语句的生命周期内,这个通用计划都不会再变化:

EXPLAIN (analyze) EXECUTE my_stmt ( 'bbb' );
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.096..0.219 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning time: 0.275 ms
 Execution time: 0.352 ms
(5 rows)

EXPLAIN (analyze) EXECUTE my_stmt ( 'ccc' );
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.090..0.132 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning time: 0.084 ms
 Execution time: 0.204 ms
(5 rows)

EXPLAIN (analyze) EXECUTE my_stmt ( null );
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.033..0.033 rows=0 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 300
 Planning time: 0.018 ms
 Execution time: 0.051 ms
(5 rows)

怎么做到的?

当你查看 PostgreSQL 源代码(src/backend/utils/cache/plancache.c)时,你会看到为什么它会在 5 次执行后发生变化:

/*
 * choose_custom_plan: choose whether to use custom or generic plan
 *
 * This defines the policy followed by GetCachedPlan.
 */
static bool
choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
	double		avg_custom_cost;

	/* One-shot plans will always be considered custom */
	if (plansource->is_oneshot)
		return true;

	/* Otherwise, never any point in a custom plan if there's no parameters */
	if (boundParams == NULL)
		return false;
	/* ... nor for transaction control statements */
	if (IsTransactionStmtPlan(plansource))
		return false;

	/* See if caller wants to force the decision */
	if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
		return false;
	if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
		return true;

	/* Generate custom plans until we have done at least 5 (arbitrary) */
	if (plansource->num_custom_plans < 5)
		return true;

	avg_custom_cost = plansource->total_custom_cost / plansource->num_custom_plans;

	/*
	 * Prefer generic plan if it's less expensive than the average custom
	 * plan.  (Because we include a charge for cost of planning in the
	 * custom-plan costs, this means the generic plan only has to be less
	 * expensive than the execution cost plus replan cost of the custom
	 * plans.)
	 *
	 * Note that if generic_cost is -1 (indicating we've not yet determined
	 * the generic plan cost), we'll always prefer generic at this point.
	 */
	if (plansource->generic_cost < avg_custom_cost)
		return false;

	return true;
}

即使我们更改数据并再次分析表,一旦 PostgreSQL 选择了一个通用的计划,我们仍然会采用该计划:

INSERT INTO demo SELECT i, 'ddd' FROM generate_series (201,210) i;

INSERT INTO demo SELECT i, 'ee' FROM generate_series (211,211) i;

ANALYZE demo;

SELECT b, count(*) FROM demo GROUP BY b ORDER BY b;
  b  | count 
-----+-------
 aaa |   100
 bbb |   100
 ccc |   100
 ddd |    10
 ee  |     1
(5 rows)

EXPLAIN (analyze) EXECUTE my_stmt ( 'ddd' );
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.88 rows=78 width=8) (actual time=0.147..0.151 rows=10 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 300
 Planning time: 0.293 ms
 Execution time: 0.190 ms
(5 rows)

当我们有了更多的数据,这些数据分布不均匀,并且在列 “b” 上有一个索引,此时情况会发生变化:

TRUNCATE demo;

INSERT INTO demo SELECT i, 'aaa' FROM generate_series (1,1000000) i;
INSERT INTO demo SELECT i, 'bbb' FROM generate_series (1000001,2000000) i;
INSERT INTO demo SELECT i, 'ccc' FROM generate_series (2000001,3000000) i;
INSERT INTO demo SELECT i, 'eee' FROM generate_series (3000001,3000010) i;

CREATE INDEX i1 ON demo (b);

SELECT b, count(*) FROM demo GROUP BY b ORDER BY b;
  b  |  count 
-----+---------
 aaa | 1000000
 bbb | 1000000
 ccc | 1000000
 eee |      10
(4 rows)

PREPARE my_stmt AS SELECT * FROM demo WHERE b = $1;

无论我们执行多少次下面的语句(传入 ’eee’),我们都不会得到一个通用的计划:

EXPLAIN (analyze) EXECUTE my_stmt ('eee');
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Index Scan using i1 on demo  (cost=0.43..4.45 rows=1 width=8) (actual time=0.054..0.061 rows=10 loops=1)
   Index Cond: (b = 'eee'::text)
 Planning time: 0.249 ms
 Execution time: 0.106 ms
(4 rows)

请按照您的需要,重复执行该命令,但至少 10 次。

EXPLAIN (analyze) EXECUTE my_stmt ('eee');
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Index Scan using i1 on demo  (cost=0.43..4.45 rows=1 width=8) (actual time=0.054..0.061 rows=10 loops=1)
   Index Cond: (b = 'eee'::text)
 Planning time: 0.249 ms
 Execution time: 0.106 ms

这是因为当我们有这样的数据分布时,定制计划(包括了重新规划的开销)总是比通用计划(不包括重新规划的开销)的开销要小。文档中对此有非常清楚的说明:

来自 PostgreSQL 文档:

在有很多重复值的列中,使用很少出现的值调用EXECUTE,可以产生比通用计划更加廉价的定制计划(即使加上规划开销),这样的情况下不会使用通用计划。