By John Doe January 23, 2025
Summary: in this article, you will learn the concept of generic and custom plans in PostgreSQL.
Table of Contents
Initial setup
Let’s start with creating a demo table, and populate that table with some sample data:
CREATE TABLE demo ( a int, b text );
INSERT INTO demo SELECT i, 'aaa' FROM generate_series (1,100) i;
INSERT INTO demo SELECT i, 'bbb' FROM generate_series (101,200) i;
INSERT INTO demo SELECT i, 'ccc' FROM generate_series (201,300) i;
ANALYZE demo;
Now that we have some data, we can prepare a statement we would like to execute with various values:
PREPARE my_stmt AS SELECT * FROM demo WHERE b = $1;
Besides, you can check for the currently available prepared statements in your session by querying the pg_prepared_statements catalog view:
SELECT * FROM pg_prepared_statements;
name | statement | prepare_time | parameter_types | from_sql
---------+-----------------------------------------------------+-------------------------------+-----------------+----------
my_stmt | prepare my_stmt as select * from demo where b = $1; | 2025-01-22 13:15:39.232521+01 | {text} | t
Demonstration
When we EXPLAIN (analyze) that statement what do we see?
EXPLAIN (analyze) EXECUTE my_stmt ( 'aaa' );
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.111..0.230 rows=100 loops=1)
Filter: (b = 'aaa'::text)
Rows Removed by Filter: 200
Planning time: 0.344 ms
Execution time: 0.285 ms
(5 rows)
In the “Filter” line of the execution plan we can see the real value (‘aaa’) we passed to our prepared statement. When you see that, it is a so called custom plan. When PostgreSQL goes for a custom plan that means the statement will be re-planned for the provided set of parameters. When you execute that a few times more:
EXPLAIN (analyze) EXECUTE my_stmt ( 'aaa' );
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.045..0.158 rows=100 loops=1)
Filter: (b = 'aaa'::text)
Rows Removed by Filter: 200
Planning time: 0.243 ms
Execution time: 0.225 ms
(5 rows)
Repeat the command how often you want but at least 5 times.
EXPLAIN (analyze) EXECUTE my_stmt ( 'aaa' );
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.038..0.130 rows=100 loops=1)
Filter: (b = $1)
Rows Removed by Filter: 200
Planning time: 0.191 ms
Execution time: 0.183 ms
(5 rows)
Finally you will see that the “Filter” line changes from displaying the actual parameter to a place holder. Now we have a generic plan. This generic plan will not change anymore for the lifetime of the prepared statement no matter which value you pass into the prepared statement:
EXPLAIN (analyze) EXECUTE my_stmt ( 'bbb' );
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.096..0.219 rows=100 loops=1)
Filter: (b = $1)
Rows Removed by Filter: 200
Planning time: 0.275 ms
Execution time: 0.352 ms
(5 rows)
EXPLAIN (analyze) EXECUTE my_stmt ( 'ccc' );
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.090..0.132 rows=100 loops=1)
Filter: (b = $1)
Rows Removed by Filter: 200
Planning time: 0.084 ms
Execution time: 0.204 ms
(5 rows)
EXPLAIN (analyze) EXECUTE my_stmt ( null );
QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.75 rows=100 width=8) (actual time=0.033..0.033 rows=0 loops=1)
Filter: (b = $1)
Rows Removed by Filter: 300
Planning time: 0.018 ms
Execution time: 0.051 ms
(5 rows)
How it works…
When you take a look at the source code (src/backend/utils/cache/plancache.c
) you will see why it changes after 5 executions:
/*
* choose_custom_plan: choose whether to use custom or generic plan
*
* This defines the policy followed by GetCachedPlan.
*/
static bool
choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
double avg_custom_cost;
/* One-shot plans will always be considered custom */
if (plansource->is_oneshot)
return true;
/* Otherwise, never any point in a custom plan if there's no parameters */
if (boundParams == NULL)
return false;
/* ... nor for transaction control statements */
if (IsTransactionStmtPlan(plansource))
return false;
/* See if caller wants to force the decision */
if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
return false;
if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
return true;
/* Generate custom plans until we have done at least 5 (arbitrary) */
if (plansource->num_custom_plans < 5)
return true;
avg_custom_cost = plansource->total_custom_cost / plansource->num_custom_plans;
/*
* Prefer generic plan if it's less expensive than the average custom
* plan. (Because we include a charge for cost of planning in the
* custom-plan costs, this means the generic plan only has to be less
* expensive than the execution cost plus replan cost of the custom
* plans.)
*
* Note that if generic_cost is -1 (indicating we've not yet determined
* the generic plan cost), we'll always prefer generic at this point.
*/
if (plansource->generic_cost < avg_custom_cost)
return false;
return true;
}
Even if we change the data and analyze the table again we will still get a generic plan once PostgreSQL went for it:
INSERT INTO demo SELECT i, 'ddd' FROM generate_series (201,210) i;
INSERT INTO demo SELECT i, 'ee' FROM generate_series (211,211) i;
ANALYZE demo;
SELECT b, count(*) FROM demo GROUP BY b ORDER BY b;
b | count
-----+-------
aaa | 100
bbb | 100
ccc | 100
ddd | 10
ee | 1
(5 rows)
EXPLAIN (analyze) EXECUTE my_stmt ( 'ddd' );
QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on demo (cost=0.00..5.88 rows=78 width=8) (actual time=0.147..0.151 rows=10 loops=1)
Filter: (b = $1)
Rows Removed by Filter: 300
Planning time: 0.293 ms
Execution time: 0.190 ms
(5 rows)
The situation changes when we have much more data, data is not uniformly distributed and we have an index on the column “b”:
TRUNCATE demo;
INSERT INTO demo SELECT i, 'aaa' FROM generate_series (1,1000000) i;
INSERT INTO demo SELECT i, 'bbb' FROM generate_series (1000001,2000000) i;
INSERT INTO demo SELECT i, 'ccc' FROM generate_series (2000001,3000000) i;
INSERT INTO demo SELECT i, 'eee' FROM generate_series (3000001,3000010) i;
CREATE INDEX i1 ON demo (b);
SELECT b, count(*) FROM demo GROUP BY b ORDER BY b;
b | count
-----+---------
aaa | 1000000
bbb | 1000000
ccc | 1000000
eee | 10
(4 rows)
PREPARE my_stmt AS SELECT * FROM demo WHERE b = $1;
No matter how often we execute the following statement (which asks for ‘eee’), we never get a generic plan:
EXPLAIN (analyze) EXECUTE my_stmt ('eee');
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Index Scan using i1 on demo (cost=0.43..4.45 rows=1 width=8) (actual time=0.054..0.061 rows=10 loops=1)
Index Cond: (b = 'eee'::text)
Planning time: 0.249 ms
Execution time: 0.106 ms
(4 rows)
Repeat the command how often you want but at least 10 times.
EXPLAIN (analyze) EXECUTE my_stmt ('eee');
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Index Scan using i1 on demo (cost=0.43..4.45 rows=1 width=8) (actual time=0.054..0.061 rows=10 loops=1)
Index Cond: (b = 'eee'::text)
Planning time: 0.249 ms
Execution time: 0.106 ms
This is because the custom plan (which includes the costs for re-planning) is always cheaper than the generic plan (which does not include the costs for re-planning) when we have a data distribution like that. The documentation is very clear about that:
From PostgreSQL documentation:
Using EXECUTE values which are rare in columns with many duplicates can generate custom plans that are so much cheaper than the generic plan, even after adding planning overhead, that the generic plan might never be used.