PostgreSQL Tutorial: Query analysis using pg_stat_statements

October 19, 2023

Summary: The pg_stat_statements extension exposes a lot of query statistics that will be helpful in finding slow queries that need to be optimized. In this tutorial, we shall see some example queries to view some information from pg_stat_statements.

Table of Contents

Getting ready

In order to run the views or queries discussed in this tutorial, we should have pg_stat_statements successfully configured and created as an extension. Please note that every query for which the statistics are collected by pg_stat_statements can be uniquely identified through queryid. A hash is generated based on the query text and is visible as queryid when we query the pg_stat_statements view.

How to do it…

In the following commands, we have used a function to trim the query up to just 40 characters for better visibility. We may use an expanded display using \x and avoid truncating the query when viewing the statistics:

1. Find the queries that are writing to temp the most:

select queryid, left(query,40), calls,
    temp_blks_read, temp_blks_written
  from pg_stat_statements
  order by temp_blks_written desc;

2. Find the queries that are reading from temp the most:

select queryid, left(query,40), calls,
    temp_blks_read, temp_blks_written
  from pg_stat_statements
  order by temp_blks_read desc;

3. Find the queries that are reading the most from disk due to accessing temporary tables:

select queryid, left(query,40), calls, local_blks_read,
    local_blks_read/calls as avg_read_per_call
  from pg_stat_statements
  order by local_blks_read desc;

4. Find the queries with the highest execution times:

select queryid, left(query,40), mean_time, max_time
  from pg_stat_statements
  where calls > 10
  order by mean_time desc, max_time desc;

5. Find the queries that are executed the most:

select queryid, left(query,40), calls, mean_time
  from pg_stat_statements
  order by calls desc;

How it works…

The following is the list of columns available through the pg_stat_statements view. For this view, we could query a lot of information that can be useful for query analysis:

rockdb=# \d pg_stat_statements
                    View "public.pg_stat_statements"
       Column        |       Type       | Collation | Nullable | Default
---------------------+------------------+-----------+----------+---------
 userid              | oid              |           |          |
 dbid                | oid              |           |          |
 queryid             | bigint           |           |          |
 query               | text             |           |          |
 calls               | bigint           |           |          |
 total_time          | double precision |           |          |
 min_time            | double precision |           |          |
 max_time            | double precision |           |          |
 mean_time           | double precision |           |          |
 stddev_time         | double precision |           |          |
 rows                | bigint           |           |          |
 shared_blks_hit     | bigint           |           |          |
 shared_blks_read    | bigint           |           |          |
 shared_blks_dirtied | bigint           |           |          |
 shared_blks_written | bigint           |           |          |
 local_blks_hit      | bigint           |           |          |
 local_blks_read     | bigint           |           |          |
 local_blks_dirtied  | bigint           |           |          |
 local_blks_written  | bigint           |           |          |
 temp_blks_read      | bigint           |           |          |
 temp_blks_written   | bigint           |           |          |
 blk_read_time       | double precision |           |          |
 blk_write_time      | double precision |           |          |

Some of the queries discussed in this tutorial serve a specific purpose. However, more such queries can be constructed in several combinations to filter queries going through a certain phase similar to getting a list of the queries that are generating more temp, reading from temp, and reading from disk but not from memory, highest execution times, and the queries called (executed) the most, as seen in steps 1 to 5. At any point in time, all the stats collected can be reset using the following function call:

rockdb=# select pg_stat_statements_reset();
 pg_stat_statements_reset
--------------------------

(1 row)

See more

PostgreSQL Optimization