PostgreSQL Tutorial: Monitoring large tables

January 28, 2024

Summary: in this tutorial, you will learn how to monitor large tables in PostgreSQL.

Table of Contents

Which are large tables?

PostgreSQL provides a dedicated function, pg_relation_size, to compute the actual disk space used by a specific table or index. We just need to provide the table name. In addition to the main data files, there are other files (called forks) that can be measured by specifying an optional second argument. These include the visibility map (VM), the free space map (FSM), and the initialization fork for unlogged objects.

The following basic query will tell us the 10 largest tables:

WITH tables AS (
  SELECT current_database() AS dbname, n.nspname AS schema, c.relname,
         pg_relation_size(c.oid) AS size
  FROM pg_class c, pg_namespace n
  WHERE c.relnamespace = n.oid AND c.relkind = 'r'
  ORDER BY size DESC
  LIMIT 10
)
SELECT dbname, schema, relname,
       pg_size_pretty(size) AS relsize
FROM tables
WHERE size > pg_size_bytes('1 GB');

The tables are shown in descending order of size, with at the most 10 rows displayed. In this case, we look at all the tables in current database.

Cumulative statistics on the table

A simple select * from pg_stat_user_tables will show statistics about accesses to the tables, with one line for each table in the current database, and the key columns:

  • Sequential scan count (seq_scan): The seq_scan column in the pg_stat_all_tables view represents the total count of sequential scans performed on a particular table. A high number of sequential scans may indicate that appropriate indexes are missing or underutilized.
  • Index scan count (idx_scan): Number of index scans initiated on this table.
  • Inserted rows (n_tup_ins): Number of rows inserted.
  • Updated rows (n_tup_upd): Number of rows updated (includes HOT updated rows).
  • Deleted rows (n_tup_ins): Number of rows deleted.
  • Hot updates count (n_tup_hot_upd): We can calculate the HOT updates ratio for the most updated tables using this column data. The columns n_tup_upd and n_tup_hot_upd in this view represent the total number of regular and HOT updates for each individual table. The “good” tables are those with a high HOT rate; we should pay attention to tables with a high write count but a low or zero HOT rate. Changes to fillfactor settings are the general norm for these and allow free space to be reserved for when additional rows are added and tables are expanded. The presence of reserved space ensures that pages’ rows will be updated, and it greatly increases the likelihood that a HOT update will take place.
  • Fragmented rows (n_dead_tup): It shows an approximate number of dead rows that have had their data removed and are slated to be reused by our datasource when we use a write command like DELETE or UPDATE.

To find old tables inside PostgreSQL, we can run the following query:

SELECT
    schemaname, relname,
    seq_scan, idx_Scan,
    last_seq_scan, last_idx_scan,
    pg_size_pretty(pg_total_relation_size(relid)) AS table_size
FROM
    pg_stat_user_tables
ORDER BY last_seq_scan ASC, last_idx_scan ASC
LIMIT 10;

This query displays a list of all tables sorted by last sequential scan timestamp (column 5) and last index scan timestamp (column 6). The order by clause can also be replaced with seq_scan and idx_scan columns to get tables list which are rarely used.

Statistics on large tables

You can use the following query, to show statistics about accesses to the large tables:

WITH large_tables AS (
  SELECT c.oid AS relid,
         pg_relation_size(c.oid) AS size
  FROM pg_class c
  WHERE c.relkind = 'r'
  ORDER BY size DESC
  LIMIT 10
)
SELECT current_database() AS dbname,
       stat.schemaname, stat.relname,
       pg_size_pretty(large_tables.size) AS relsize,
       stat.seq_scan, stat.seq_tup_read, stat.idx_scan, stat.idx_tup_fetch,
       stat.n_tup_ins, stat.n_tup_upd, stat.n_tup_del, stat.n_tup_hot_upd,
       stat.n_live_tup, stat.n_dead_tup
FROM large_tables
  JOIN pg_stat_user_tables AS stat
    ON large_tables.relid = stat.relid
WHERE large_tables.size > pg_size_bytes('1 GB');

If you have a system that can monitor the values returned by SQL queries, then this can fit into it.

See more

PostgreSQL Monitoring