March 26, 2024
Summary: in this tutorial, you will learn how to cache query plans in PostgreSQL.
Table of Contents
Introduction
Generally, there are two styles of writing SQL in an application. Consider the following example:
String sql = "SELECT first_name, last_name FROM employees"
+ " WHERE employee_id = " + employeeId;
ResultSet rs = conn.executeQuery(sql);
This style of SQL statement will cause hard parsing. Hard parsing is constructing an execution plan based on the SQL statement. That’s a major effort; inspecting all parts of the SQL; considering all indexes; considering all join orders and so on. Hard parsing is very resource intensive.
We can also using bind parameters like this:
PreparedStatement st = conn.prepareStatement(
"SELECT first_name, last_name FROM employees"
+ " WHERE employee_id = ?");
st.setInt(1, employeeId);
ResultSet rs = st.executeQuery();
Bind parameters (also called dynamic parameters or bind variables) are an alternative way to pass data to the database. Instead of putting the values directly into the SQL statement, you just use a placeholder like ?
and provide the actual values using a separate API call. Bind variables make the actual values “invisible” for the optimizer resulting in a generic plan that can be cached and re-used later on.
Using bind parameters allows PostgreSQL to execute a statement with soft parsing. Soft parsing is searching, finding and using a cached execution plan. There are some minor checks done, e.g., access rights, but the execution plan can be re-used as is. That is a rather fast operation.
Parameterized queries
PostgreSQL does not have a shared query plan cache, but it has an optional query plan cache for prepared statements. That means that the developer has the choice to use a prepared statement with or without cached query plan. But note that the cache is dropped when the prepared statement is closed.
The following samples show how to use this functionality in various languages.
-
C
The native C API provides the function PQexecParams , which allows to use bind parameters during prepare (as opposed to
PQprepare
). -
Java
The PostgreSQL JDBC driver controls server-side prepare via the non-standard method
setPrepareThreshold
onPGStatement
andPGConnection
.Note that the default setting is five, which means that the first four executions will actually use the bind parameters during prepare, the later ones not. That counter starts fresh for each
PreparedStatement
instance. -
Python
Python’s Cursor.execute() accepts bind parameters as optional argument. The values will be used during planning if provided. See the documentation for more details.
Summary
Explicitly using bind parameters solves three problems at once:
(1) SQL injection.
(2) execution plan cache-hit rate.
(3) having manual control to utilize histograms.
Tip:
Column histograms are most useful if the values are not uniformly distributed.
For columns with uniform distribution, it is often sufficient to divide the number of distinct values by the number of rows in the table. This method also works when using bind parameters.
Dynamic SQL will automatically lead to well optimized execution plans for conditional filters.
There are some downsides of execution plan re-use. Execution plan caches use the SQL string as key. Same SQL string, same execution plan. That means that the optimizer must plan for re-use. It cannot assume anything about the bind values used for later executions. The smart logic approach makes that quite obvious. The following cannot optimized “away” when planning for re-use:
(last_name = :name OR :name IS NULL)
The query uses named bind variables for better readability. All possible filter expressions are statically coded in the statement. Whenever a filter isn’t needed, you just use NULL
instead of a search term: it disables the condition via the OR
logic.
Besides that, the optimizer doesn’t consider column histograms when planning for re-use.