PostgreSQL Tutorial: Avoid redundant function calls in queries

February 8, 2024

Summary: in this tutorial, you will learn how to avoid redundant function calls in queries.

Table of Contents

It is possible to write functions in PostgreSQL in almost any widespread language such as Perl, Python or C. In general this offers a great deal of flexibility and acceptable performance. However, in some cases users might say: “We have the feeling that procedures are slow”. The thing is: PostgreSQL might not be to blame for this observation – in many cases it is simply “pilot error”. This article try to elaborate on the key issue of function calls related to bad performance.

How PostgreSQL treats procedures

As mentioned already, it is possible to write functions in basically any language. PostgreSQL simply passes the code of a function to the external language and takes back the result. In a way a function is a kind of black box – PostgreSQL rarely knows what is going on inside a procedure.

Here is an example of a function call:

CREATE OR REPLACE FUNCTION mymax(int, int)
RETURNS int AS
$$
  BEGIN
    RETURN CASE WHEN $1 > $2 THEN $1 ELSE $2 END;
  END;
$$ LANGUAGE 'plpgsql';

The results are not very spectacular:

test=# SELECT mymax(20, 30);
 mymax
-------
    30
(1 row)

The important thing here is: The PL/pgSQL function is a complete black box. The planner has no idea what the “external” language does in this case. This has some important implication.

Consider the following example:

test=# CREATE TABLE demo AS
	SELECT 	*
	FROM generate_series(1, 1000000) AS id;
SELECT 1000000
test=# CREATE INDEX idx_id ON demo(id);
CREATE INDEX

The table is already large enough to consider indexes:

test=# explain SELECT * FROM demo WHERE id = 20;
                      QUERY PLAN
---------------------------------------------------------------
 Index Only Scan using idx_id on demo
	(cost=0.42..8.44 rows=1 width=4)
   Index Cond: (id = 20)
(2 rows)

Function calls are usually black boxes

The problem is: The situation changes completely if we start to use the function I have just shown:

test=# explain SELECT *
	FROM 	demo
	WHERE 	id = mymax(20, 20);
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on demo  (cost=0.00..266925.00 rows=1 width=4)
   Filter: (id = mymax(20, 20))
(2 rows)

PostgreSQL has no idea that the function will return 20. It is clear to humans, but nobody told the machine that this is guaranteed to happen. To PostgreSQL, the result of a function is considered to be “volatile” – anything can happen. Therefore, it cannot simply ask the index for the correct row. The first function call might not return the same as the second call – even if the parameters are identical. The optimizer has to play it safe and will go for a sequential scan, which is definitely going to produce the correct result.

In PostgreSQL a function can be:

VOLATILE
STABLE
IMMUTABLE

If a function is marked as VOLATILE, it can return anything if you call it multiple times using the very same input parameters. In case of STABLE the function is going to return the same result given the same parameters within the same transaction.

The most prominent STABLE function is now(), which will always return the same result within the same transaction:

test=# SELECT now();
              now
-------------------------------
 2018-01-09 11:48:46.385457+01
(1 row)

test=# BEGIN;
BEGIN
test=# SELECT now();
              now
-------------------------------
 2018-01-09 11:48:51.073123+01
(1 row)

test=# SELECT now();
              now
-------------------------------
 2018-01-09 11:48:51.073123+01
(1 row)

test=# COMMIT;
COMMIT
test=# SELECT now();
              now
-------------------------------
 2018-01-09 11:48:59.640697+01
(1 row)

Some functions are even IMMUTABLE: In this case the result given the same input parameters will be constant, regardless of the transaction. Cosine would be an example of one of these function calls:

test=# SELECT cos(10), cos(20);
        cos         |        cos
--------------------+-------------------
 -0.839071529076452 | 0.408082061813392
(1 row)

test=# SELECT cos(10), cos(20);
        cos         |        cos
--------------------+-------------------
 -0.839071529076452 | 0.408082061813392
(1 row)

Even in the second transaction the cosine of a number will be the same.

Reducing the number of function calls

To fix our problem we have to Change the volatility of the function to IMMUTABLE:

ALTER FUNCTION mymax(int, int) IMMUTABLE;

The planner will detect that the function is supposed to return a static value given the same input values and go for an index scan:

test=# explain SELECT *
	FROM 	demo
	WHERE 	id = mymax(20, 20);
                      QUERY PLAN
------------------------------------------------------
 Index Only Scan using idx_id on demo
	(cost=0.42..8.44 rows=1 width=4)
   Index Cond: (id = 20)
(2 rows)

Of course the index scan orders magnitude faster and will return in basically no time.

Detecting bottlenecks

Fortunately PostgreSQL has a system view, which might shed some light on functions, that could be a problem. The first thing you have to do is to set “track_functions = ‘all’ “ – it will tell PostgreSQL to collection function statistics:

test=# SELECT * FROM demo WHERE id = mymax(20, 20);
 id
----
 20
(1 row)

Once this setting has been turned on, pg_stat_user_functions will contain valuable information:

test=# SELECT * FROM pg_stat_user_functions;
 funcid | schemaname | funcname | calls | total_time | self_time
--------+------------+----------+-------+------------+-----------
  16429 | public     | mymax    |     1 |      0.025 |     0.025
(1 row)

If you happen to see that a function is called insanely often, it can make sense to inspect it and check, if it happens to be VOLATILE for no good reason. Changing the function definition can significantly improve speed.

Creating indexes on functions

If you want to create indexes on a function, you have to make sure that the function itself is in fact IMMUTABLE. Otherwise PostgreSQL won’t create the index for you. The reason is simple: PostgreSQL has to make sure that the content of the index is stable and does not have to be changed over time if the underlying data is unchanged.