February 19, 2024
Summary: in this tutorial, you will learn how to tune parallel safety of a function.
Table of Contents
Introduction
The planner cannot determine automatically if a function is parallel safe, but under certain conditions parallel mode can boost performance significantly if you process a large dataset. The number of workers that the planner will use is limited by the parameters max_parallel_workers_per_gather, which in turn are taken from the pool of processes established by max_worker_processes, limited by max_parallel_workers, the maximum number of concurrent queries to execute with parallelism is determined by the following formula, as long as max_worker_processes <= server cores:
#Q_concurrent_par = max_worker_processes /max_parallel_workers_per_gather (integer division)
When is it safe to use PARALLEL in a function?
As long as your code does not perform the following, you should be ready to use it:
- Writes to the database.
- Access sequences.
- Change the transaction state.
- Makes persistent changes to settings.
- Access temporary tables.
- Use cursors.
- Defines prepared statements.
Example
Let’s start with a little test setup:
CREATE TABLE test1 (id integer, str text);
INSERT INTO test1 (id, str)
SELECT i, 'xxx' FROM generate_series(1, 1000000) AS s(i);
A simple table containing 1'000'000 rows. In addition let’s create one simple function:
CREATE OR REPLACE FUNCTION pair_div_4 (i int)
RETURNS boolean AS $$
BEGIN
IF i % 2 = 0 AND i % 4 = 0 THEN
RETURN TRUE;
END IF;
RETURN FALSE;
END;
$$ LANGUAGE plpgsql;
What is the optimizer doing when you call this function in WHERE clause?
EXPLAIN ANALYZE SELECT * from test1 WHERE pair_div_4 (id);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on test1 (cost=0.00..262476.00 rows=333333 width=8) (actual time=0.207..787.020 rows=250000 loops=1)
Filter: pair_div_4(id)
Rows Removed by Filter: 750000
Planning time: 0.100 ms
Execution time: 792.776 ms
(5 rows)
To change the parallel safety of the function:
ALTER FUNCTION pair_div_4(i int) PARALLEL SAFE;
The planner will detect that pair_div_4()
is parallel safe and runs the query in parallel mode. Looking again at the execution plan again:
EXPLAIN ANALYZE SELECT * from test1 WHERE pair_div_4 (id);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..189750.48 rows=333333 width=8) (actual time=0.502..511.806 rows=250000 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Parallel Seq Scan on test1 (cost=0.00..155417.18 rows=196078 width=8) (actual time=0.283..451.079 rows=125000 loops=2)
Filter: pair_div_4(id)
Rows Removed by Filter: 375000
Planning time: 2.412 ms
Execution time: 517.604 ms
(8 rows)
As shown in the output of the explain, Workers Launched tag has value 1, this means that this query used 1 worker to execute, and the previous EXPLAIN was executed without parallelisms.