High cost and uncertainty of VACUUM in PostgreSQL

By John Doe October 9, 2025

Summary: Have you ever encountered a situation in PostgreSQL where multiple autovacuum processes keep running continuously? At this point, most of the resources in the system are used for VACUUM operations. In this article, we will explore the high cost and uncertainty of VACUUM in PostgreSQL, as well as related solutions.

Table of Contents

Use Case of VACUUM old rows

Below, we will create a table in a PostgreSQL database, insert some data into it, and then perform VACUUM and ANALYZE on the table.

-- Session 1
CREATE TABLE t_large (id integer, name text);
INSERT INTO t_large (id, name)
  SELECT i, repeat('Pg', 64)
    FROM generate_series(1, 1000000) AS s(i);

CREATE INDEX large_idx ON t_large (id);
VACUUM ANALYZE t_large;

After preparing the base table and initial data, let’s start a session, initiate a transaction in this session, and keep it in an idle state:

-- Session 2
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT name FROM t_large WHERE id = 202510;

Next, let’s perform some update operations on a portion of the data in the table; of course, we can also delete a portion of the data.

-- Session 1
UPDATE t_large SET name = 'dummy' WHERE mod(id, 100) = 1;
DELETE FROM t_large WHERE mod(id, 100) = 50;

The above update and delete operations have generated a certain number of old row versions in the table. Now, let’s perform a VACUUM cleanup on this table to observe the process of the autovacuum process cleaning up old row versions:

-- Session 1
VACUUM (verbose) t_large;
INFO:  vacuuming "postgres.public.t_large"
INFO:  finished vacuuming "postgres.public.t_large": index scans: 0
pages: 0 removed, 20409 remain, 20409 scanned (100.00% of total)
tuples: 0 removed, 1010000 remain, 20000 are dead but not yet removable
removable cutoff: 753, which was 2 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 40833 hits, 0 misses, 0 dirtied
WAL usage: 1 records, 0 full page images, 188 bytes
system usage: CPU: user: 0.12 s, system: 0.00 s, elapsed: 0.12 s
...

The above VACUUM operation scanned all 20,409 pages of the table but did not clean up any old row versions. Since the transaction in Session 2 is still in progress, the system needs to retain the old row versions to maintain their visibility for this transaction. As a result, the VACUUM operation in Session 1 is ineffective and does not clean up any garbage data.

The same issue can occur with the regular VACUUM actions scheduled by the background autovacuum process: it frequently consumes storage I/O resources but actually fails to clean up any old row versions. Moreover, as subsequent update and delete operations continue, the old row versions generated by the system will accumulate continuously, eventually causing an “avalanche effect” on the system.

Analysis for the VACUUM scenario

PostgreSQL adopts the Multi-Version Concurrency Control (MVCC) mechanism. When data is updated or deleted, the original records are not immediately removed but are marked as invisible. This design supports transaction rollback and snapshot reads, which leads to the generation of old row versions. For example, in an order table where the order status is frequently updated, each update will generate an old row version.

In scenarios with large data volumes and frequent updates, the cost of cleaning up old row versions is high. This is because the cleanup process (VACUUM) needs to scan a large amount of data, determine which row versions are no longer needed, and reclaim space. Additionally, if the autovacuum mechanism is improperly configured and fails to clean up dead tuples in a timely manner, the cleanup cost will increase, and the table will become severely bloated.

To reduce the cost of old row version cleanup, the following measures can be considered:

  1. Execute VACUUM regularly: Arrange the execution time and frequency of VACUUM reasonably (e.g., execute it during off-peak business hours to minimize impact on business operations). At the same time, select an appropriate VACUUM method based on the characteristics of the table—for instance, use regular VACUUM to clean up dead tuples, and VACUUM FULL to free up disk space (note that VACUUM FULL will block access).
  2. Tuning autovacuum parameters: Tuning parameters such as autovacuum_vacuum_cost_delay and autovacuum_naptime to enable the autovacuum process to clean up old row versions more timely and efficiently. For example, appropriately reduce autovacuum_vacuum_cost_delay to increase the cleanup speed.

Conclusion

In this article, we have examined the potential avalanche effect that may occur when cleaning up old row versions in PostgreSQL, as well as possible measures to mitigate this issue.

In addition, Redrock Postgres has introduced the undo logging. The system can generate old row versions on-demand by applying the undo logs to new row versions. Furthermore, after the commit of a transaction that performs deletions or non-in-place updates, the space is reclaimed immediately. Therefore, after updating or deleting table data, there is no need to regularly trigger the cleanup of old row versions.