PostgreSQL Tutorial: Tune parallel safety of a function

February 19, 2024

Summary: in this tutorial, you will learn how to tune parallel safety of a function.

Introduction

The planner cannot determine automatically if a function is parallel safe, but under certain conditions parallel mode can boost performance significantly if you process a large dataset. The number of workers that the planner will use is limited by the parameters max_parallel_workers_per_gather, which in turn are taken from the pool of processes established by max_worker_processes, limited by max_parallel_workers, the maximum number of concurrent queries to execute with parallelism is determined by the following formula, as long as max_worker_processes <= server cores:

#Q_concurrent_par = max_worker_processes /max_parallel_workers_per_gather (integer division)

When is it safe to use PARALLEL in a function?

As long as your code does not perform the following, you should be ready to use it:

  • Writes to the database.
  • Access sequences.
  • Change the transaction state.
  • Makes persistent changes to settings.
  • Access temporary tables.
  • Use cursors.
  • Defines prepared statements.

Example

Let’s start with a little test setup:

CREATE TABLE test1 (id integer, str text);

INSERT INTO test1 (id, str)
  SELECT i, 'xxx' FROM generate_series(1, 1000000) AS s(i);

A simple table containing 1'000'000 rows. In addition let’s create one simple function:

CREATE OR REPLACE FUNCTION pair_div_4 (i int)
  RETURNS boolean AS $$
BEGIN
  IF i % 2 = 0 AND i % 4 = 0 THEN
    RETURN TRUE;
  END IF;
  RETURN FALSE;
END;
$$ LANGUAGE plpgsql;

What is the optimizer doing when you call this function in WHERE clause?

EXPLAIN ANALYZE SELECT * from test1 WHERE pair_div_4 (id);
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Seq Scan on test1  (cost=0.00..262476.00 rows=333333 width=8) (actual time=0.207..787.020 rows=250000 loops=1)
   Filter: pair_div_4(id)
   Rows Removed by Filter: 750000
 Planning time: 0.100 ms
 Execution time: 792.776 ms
(5 rows)

To change the parallel safety of the function:

ALTER FUNCTION pair_div_4(i int) PARALLEL SAFE;

The planner will detect that pair_div_4() is parallel safe and runs the query in parallel mode. Looking again at the execution plan again:

EXPLAIN ANALYZE SELECT * from test1 WHERE pair_div_4 (id);
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..189750.48 rows=333333 width=8) (actual time=0.502..511.806 rows=250000 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   ->  Parallel Seq Scan on test1  (cost=0.00..155417.18 rows=196078 width=8) (actual time=0.283..451.079 rows=125000 loops=2)
         Filter: pair_div_4(id)
         Rows Removed by Filter: 375000
 Planning time: 2.412 ms
 Execution time: 517.604 ms
(8 rows)

As shown in the output of the explain, Workers Launched tag has value 1, this means that this query used 1 worker to execute, and the previous EXPLAIN was executed without parallelisms.

See more

PostgreSQL Optimization

comments powered by Disqus