March 24, 2024
Summary: in this tutorial, you will learn how to utilize inlining of SQL functions.
Table of Contents
Introduction
SQL functions (i.e. LANGUAGE SQL
) will, under certain conditions, have their function bodies inlined into the calling query rather than being invoked directly.
This can have substantial performance advantages since the function body becomes exposed to the planner of the calling query, which can apply optimizations such as constant-folding, qual pushdown, and so on.
However, the exact conditions which apply to inlining are somewhat complex and not well documented outside the source code. This article is an attempt to partially rectify that.
There are actually two completely separate forms of inlining that can occur, at most one of which is possible for any given function call: one for scalar function calls and the other for table function calls.
Scalar functions
A scalar function call is any instance where func(args)
appears in the context of a value expression or predicate, i.e. anywhere an ordinary value or condition can appear. For example:
select func(t.foo) from sometable t;
select t.* from sometable t where func(t.foo, 123);
In the second case above, if the definition of func()
applies an indexable operator to the first argument, then inlining can be critical for performance since it permits the planner to use an index.
This method is used extensively by PostGIS, for functions like ST_Contains
and ST_DWithin
.
Table functions
A table function call is any instance where func(args)
appears where a table is expected. (This is, for most functions, a PostgreSQL extension to the SQL standard.) For example:
select * from func(123);
select * from sometable t, func(t.foo, 123); -- version 9.3+, implicitly LATERAL
One of the most important performance benefits of inlining of table function calls is the ability to push down additional conditions into the function:
select * from func(123) as f where f.foo = 456;
In this example, after inlining func(123)
, the planner may be able to move the f.foo
condition down into the function body, possibly allowing it to be used as an indexed condition on a table referenced by the function.
Inlining conditions for scalar functions
A scalar function call will be inlined if all of the following conditions are met:
-
the function is
LANGUAGE SQL
-
the function is not
SECURITY DEFINER
-
the function is not
RETURNS SETOF
(orRETURNS TABLE
) -
the function is not
RETURNS RECORD
-
the function has no
SET
clauses in its definition -
the function is not already being inlined; a recursive function will have only its outermost call expanded inline
-
there are no plugin modules hooking the function entry/exit calls
-
the function body consists of a single, simple,
SELECT expression
-
the body contains no aggregate or window function calls, no subqueries, no CTEs, no
FROM
clause or references to any table or table-like object, none ofGROUP BY
,HAVING
,ORDER BY
,DISTINCT
,LIMIT
,OFFSET
,UNION
,INTERSECT
,EXCEPT
-
the body query must return exactly one column
-
the type of the body expression must match the declared return type of the function
-
the expression must not return multiple rows (e.g. from calling set-returning functions such as
unnest()
orgenerate_series()
) -
if the function is declared
IMMUTABLE
, then the expression must not invoke any non-immutable function or operator -
if the function is declared
STABLE
, then the expression must not invoke any volatile function or operator -
if the function is declared
STRICT
, then the planner must be able to prove that the body expression necessarily returnsNULL
if any parameter is null. At present, this condition is only satisfied if: every parameter is referenced at least once, and all functions, operators and other constructs used in the body are themselvesSTRICT
. -
if an actual argument to the function call is a volatile expression, then it must not be referenced in the body more than once
-
if an actual argument is an “expensive” expression, defined as costing more than 10 operator costs or containing any subquery, then it must not be referenced in the body more than once
Inlining conditions for table functions
A table function call will be inlined if all of the following conditions are met:
-
the function call does not specify
ORDINALITY
or multiple functions inROWS FROM
-
none of the actual arguments contain volatile expressions or subselects
-
there are no plugin modules hooking function call entry/exit
-
the function is
LANGUAGE SQL
-
the function is not
SECURITY DEFINER
-
the function is declared
STABLE
orIMMUTABLE
-
the function is not declared
STRICT
-
the function is declared
RETURNS SETOF
orRETURNS TABLE
-
the function has no
SET
clauses in its definition -
the function body must consist of a single
SELECT
statement, even after applying any applicable rule rewrites -
the return types of the columns of the function body query must match the columns of the declared result type. However, if there are no set operations (
UNION
,INTERSECT
,EXCEPT
) in the top-level function body query, then adjustment for dropped columns is allowed and implicit binary coercions are allowed provided that the columns affected are not used for sorting -
if the function is declared
RETURNS SETOF RECORD
and does not haveOUT
parameters, then the function body query result types must match the column definition list provided by the caller
Note that these conditions allow the function body to be a complex query, even one including CTEs (but not recursive CTEs). This makes inlinable SQL functions more powerful than views, since they can take parameters in ways that views can’t, while retaining many of the advantages of views.
Example
Let’s start with a little test setup:
CREATE TABLE t (id integer, str text);
INSERT INTO t (id, str)
SELECT i, 'xxx'
FROM generate_series(1, 10000) AS s(i);
Here is an example of a function that can be inlined by the optimizer:
CREATE OR REPLACE FUNCTION ld(int)
RETURNS numeric AS $$
SELECT log(2, $1);
$$ LANGUAGE 'sql' IMMUTABLE;
This is a normal SQL function marked as IMMUTABLE. This is perfect optimization fodder for the optimizer. To make it simple, all my function does is calculate a logarithm:
SELECT ld(1024);
ld
---------------------
10.0000000000000000
(1 row)
As you can see, the function works as expected.
To demonstrate how things work, we create a index on the function:
CREATE INDEX idx_ld ON t (ld(id));
As expected, the index will be used just like any other index. However, let’s take a closer look at the indexing condition:
EXPLAIN SELECT * FROM t WHERE ld(id) = 10;
QUERY PLAN
------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=4.67..30.55 rows=50 width=8)
Recheck Cond: (log('2'::numeric, (id)::numeric) = '10'::numeric)
-> Bitmap Index Scan on idx_ld (cost=0.00..4.66 rows=50 width=0)
Index Cond: (log('2'::numeric, (id)::numeric) = '10'::numeric)
(4 rows)
ANALYZE t;
EXPLAIN SELECT * FROM t WHERE ld(id) = 10;
QUERY PLAN
------------------------------------------------------------------
Index Scan using idx_ld on t (cost=0.29..8.30 rows=1 width=8)
Index Cond: (log('2'::numeric, (id)::numeric) = '10'::numeric)
(2 rows)
The important observation here is that the indexing condition actually looks for the log function instead of the ld function. The optimizer has completely gotten rid of the function call. It is also worth mentioning that fresh optimizer statistics can be of real importance to generate an efficient plan.
Logically, this opens the door for the following query:
EXPLAIN SELECT * FROM t WHERE log(2, id) = 10;
QUERY PLAN
------------------------------------------------------------------
Index Scan using idx_ld on t (cost=0.29..8.30 rows=1 width=8)
Index Cond: (log('2'::numeric, (id)::numeric) = '10'::numeric)
(2 rows)
The optimizer managed to inline the function and provided us with an index scan that is far superior to an expensive sequential operation.