Bloats caused by heavy updates in PostgreSQL

By John Doe December 6, 2024

Summary: In this article, we’ll take a look at the bloat issue that comes with heavy updates in PostgreSQL, and corresponding solutions.

Table of Contents

Table updates scenario: Use case

In PostgreSQL, let’s create a table, insert some data, and update the table data, then check whether the storage space occupied by the table changes after the update.

CREATE TABLE t_large (id integer, name text);
CREATE INDEX large_idx ON t_large (id);

INSERT INTO t_large (id, name)
    SELECT i, 'xxx' FROM generate_series(1, 1000000) AS s(i);

SELECT pg_size_pretty(pg_table_size('t_large'));
 pg_size_pretty
----------------
 35 MB

UPDATE t_large SET id = id + 1;
UPDATE t_large SET name = 'yyy';

SELECT pg_size_pretty(pg_table_size('t_large'));
 pg_size_pretty
----------------
 104 MB

Table updates scenario: How it works

Here’s what happens to the tuples within a single page, when we perform table updates in PostgreSQL:

Update Bloat

  1. There are multiple versions for a single tuple.
  2. Each transaction has a transaction ID, which is a bit like the clock time of a database system.
  3. Each tuple has two system columns, xmin and xmax, which hold the transaction IDs that inserted and deleted them, respectively.

Table updates scenario: Thought

Impact of the issue

  1. Table updates takes up double storage spaces as the original space.
  2. Increased workload for autovacuum and storage space recycling.
  3. Queries based on index scans and full table scans will be slower.

Solution

  1. Change the scheduling strategy of autovacuum, such as trigger conditions, frequency of vacuum, and vacuum parallelism.
  2. Change the table structure to move frequently updated columns into a separate table.
  3. Merge Updates, which merges multiple updates into a single update.

The solution by Redrock Postgres

Redrock Postgres implemented in-place updates by introducing undo logs and marking index records deleted, and reuse space immediately after a transaction commit that performs a delete or non-in-place update. Therefore, after table data is updated, no more storage space is occupied.

In Redrock Postgres, let’s create a table, insert some data, and update the table data, then check whether the storage space occupied by the table changes after the update.

CREATE TABLE t_large (id integer, name text);
CREATE INDEX large_idx ON t_large (id);

INSERT INTO t_large (id, name)
  SELECT i, 'xxx' FROM generate_series(1, 1000000) AS s(i);

SELECT pg_size_pretty(pg_table_size('t_large'));
 pg_size_pretty
----------------
 19 MB

UPDATE t_large SET id = id + 1;
UPDATE t_large SET name = 'yyy';

SELECT pg_size_pretty(pg_table_size('t_large'));
 pg_size_pretty
----------------
 19 MB

In the above example, the storage space occupied by the t_large table does not change after the table is updated.