PostgreSQL Tutorial: Utilizing inlining of SQL functions

March 24, 2024

Summary: in this tutorial, you will learn how to utilize inlining of SQL functions.

Table of Contents

Introduction

SQL functions (i.e. LANGUAGE SQL) will, under certain conditions, have their function bodies inlined into the calling query rather than being invoked directly. This can have substantial performance advantages since the function body becomes exposed to the planner of the calling query, which can apply optimizations such as constant-folding, qual pushdown, and so on.

However, the exact conditions which apply to inlining are somewhat complex and not well documented outside the source code. This article is an attempt to partially rectify that.

There are actually two completely separate forms of inlining that can occur, at most one of which is possible for any given function call: one for scalar function calls and the other for table function calls.

Scalar functions

A scalar function call is any instance where func(args) appears in the context of a value expression or predicate, i.e. anywhere an ordinary value or condition can appear. For example:

select func(t.foo) from sometable t;
select t.* from sometable t where func(t.foo, 123);

In the second case above, if the definition of func() applies an indexable operator to the first argument, then inlining can be critical for performance since it permits the planner to use an index. This method is used extensively by PostGIS, for functions like ST_Contains and ST_DWithin.

Table functions

A table function call is any instance where func(args) appears where a table is expected. (This is, for most functions, a PostgreSQL extension to the SQL standard.) For example:

select * from func(123);
select * from sometable t, func(t.foo, 123);   -- version 9.3+, implicitly LATERAL

One of the most important performance benefits of inlining of table function calls is the ability to push down additional conditions into the function:

select * from func(123) as f where f.foo = 456;

In this example, after inlining func(123), the planner may be able to move the f.foo condition down into the function body, possibly allowing it to be used as an indexed condition on a table referenced by the function.

Inlining conditions for scalar functions

A scalar function call will be inlined if all of the following conditions are met:

the function is LANGUAGE SQL
the function is not SECURITY DEFINER
the function is not RETURNS SETOF (or RETURNS TABLE)
the function is not RETURNS RECORD
the function has no SET clauses in its definition
the function is not already being inlined; a recursive function will have only its outermost call expanded inline
there are no plugin modules hooking the function entry/exit calls
the function body consists of a single, simple, SELECT expression
the body contains no aggregate or window function calls, no subqueries, no CTEs, no FROM clause or references to any table or table-like object, none of GROUP BY, HAVING, ORDER BY, DISTINCT, LIMIT, OFFSET, UNION, INTERSECT, EXCEPT
the body query must return exactly one column
the type of the body expression must match the declared return type of the function
the expression must not return multiple rows (e.g. from calling set-returning functions such as unnest() or generate_series())
if the function is declared IMMUTABLE, then the expression must not invoke any non-immutable function or operator
if the function is declared STABLE, then the expression must not invoke any volatile function or operator
if the function is declared STRICT, then the planner must be able to prove that the body expression necessarily returns NULL if any parameter is null. At present, this condition is only satisfied if: every parameter is referenced at least once, and all functions, operators and other constructs used in the body are themselves STRICT.
if an actual argument to the function call is a volatile expression, then it must not be referenced in the body more than once
if an actual argument is an “expensive” expression, defined as costing more than 10 operator costs or containing any subquery, then it must not be referenced in the body more than once

Inlining conditions for table functions

A table function call will be inlined if all of the following conditions are met:

the function call does not specify ORDINALITY or multiple functions in ROWS FROM
none of the actual arguments contain volatile expressions or subselects
there are no plugin modules hooking function call entry/exit
the function is LANGUAGE SQL
the function is not SECURITY DEFINER
the function is declared STABLE or IMMUTABLE
the function is not declared STRICT
the function is declared RETURNS SETOF or RETURNS TABLE
the function has no SET clauses in its definition
the function body must consist of a single SELECT statement, even after applying any applicable rule rewrites
the return types of the columns of the function body query must match the columns of the declared result type. However, if there are no set operations (UNION, INTERSECT, EXCEPT) in the top-level function body query, then adjustment for dropped columns is allowed and implicit binary coercions are allowed provided that the columns affected are not used for sorting
if the function is declared RETURNS SETOF RECORD and does not have OUT parameters, then the function body query result types must match the column definition list provided by the caller

Note that these conditions allow the function body to be a complex query, even one including CTEs (but not recursive CTEs). This makes inlinable SQL functions more powerful than views, since they can take parameters in ways that views can’t, while retaining many of the advantages of views.

Example

Let’s start with a little test setup:

CREATE TABLE t (id integer, str text);

INSERT INTO t (id, str)
  SELECT i, 'xxx'
    FROM generate_series(1, 10000) AS s(i);

Here is an example of a function that can be inlined by the optimizer:

CREATE OR REPLACE FUNCTION ld(int)
  RETURNS numeric AS $$
  SELECT log(2, $1);
$$ LANGUAGE 'sql' IMMUTABLE;

This is a normal SQL function marked as IMMUTABLE. This is perfect optimization fodder for the optimizer. To make it simple, all my function does is calculate a logarithm:

SELECT ld(1024);
         ld
---------------------
 10.0000000000000000
(1 row)

As you can see, the function works as expected.

To demonstrate how things work, we create a index on the function:

CREATE INDEX idx_ld ON t (ld(id));

As expected, the index will be used just like any other index. However, let’s take a closer look at the indexing condition:

EXPLAIN SELECT * FROM t WHERE ld(id) = 10;
                               QUERY PLAN
------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=4.67..30.55 rows=50 width=8)
   Recheck Cond: (log('2'::numeric, (id)::numeric) = '10'::numeric)
   ->  Bitmap Index Scan on idx_ld  (cost=0.00..4.66 rows=50 width=0)
         Index Cond: (log('2'::numeric, (id)::numeric) = '10'::numeric)
(4 rows)

ANALYZE t;

EXPLAIN SELECT * FROM t WHERE ld(id) = 10;
                            QUERY PLAN
------------------------------------------------------------------
 Index Scan using idx_ld on t  (cost=0.29..8.30 rows=1 width=8)
   Index Cond: (log('2'::numeric, (id)::numeric) = '10'::numeric)
(2 rows)

The important observation here is that the indexing condition actually looks for the log function instead of the ld function. The optimizer has completely gotten rid of the function call. It is also worth mentioning that fresh optimizer statistics can be of real importance to generate an efficient plan.

Logically, this opens the door for the following query:

EXPLAIN SELECT * FROM t WHERE log(2, id) = 10;
                            QUERY PLAN
------------------------------------------------------------------
 Index Scan using idx_ld on t  (cost=0.29..8.30 rows=1 width=8)
   Index Cond: (log('2'::numeric, (id)::numeric) = '10'::numeric)
(2 rows)