By John Doe December 6, 2024
Summary: In this article, we’ll take a look at the bloat issue that comes with heavy updates in PostgreSQL, and corresponding solutions.
Table of Contents
Table updates scenario: Use case
In PostgreSQL, let’s create a table, insert some data, and update the table data, then check whether the storage space occupied by the table changes after the update.
CREATE TABLE t_large (id integer, name text);
CREATE INDEX large_idx ON t_large (id);
INSERT INTO t_large (id, name)
SELECT i, 'xxx' FROM generate_series(1, 1000000) AS s(i);
SELECT pg_size_pretty(pg_table_size('t_large'));
pg_size_pretty
----------------
35 MB
UPDATE t_large SET id = id + 1;
UPDATE t_large SET name = 'yyy';
SELECT pg_size_pretty(pg_table_size('t_large'));
pg_size_pretty
----------------
104 MB
Table updates scenario: How it works
Here’s what happens to the tuples within a single page, when we perform table updates in PostgreSQL:
- There are multiple versions for a single tuple.
- Each transaction has a transaction ID, which is a bit like the clock time of a database system.
- Each tuple has two system columns, xmin and xmax, which hold the transaction IDs that inserted and deleted them, respectively.
Table updates scenario: Thought
Impact of the issue
- Table updates takes up double storage spaces as the original space.
- Increased workload for autovacuum and storage space recycling.
- Queries based on index scans and full table scans will be slower.
Solution
- Change the scheduling strategy of autovacuum, such as trigger conditions, frequency of vacuum, and vacuum parallelism.
- Change the table structure to move frequently updated columns into a separate table.
- Merge Updates, which merges multiple updates into a single update.
The solution by Redrock Postgres
Redrock Postgres implemented in-place updates by introducing undo logs and marking index records deleted, and reuse space immediately after a transaction commit that performs a delete or non-in-place update. Therefore, after table data is updated, no more storage space is occupied.
In Redrock Postgres, let’s create a table, insert some data, and update the table data, then check whether the storage space occupied by the table changes after the update.
CREATE TABLE t_large (id integer, name text);
CREATE INDEX large_idx ON t_large (id);
INSERT INTO t_large (id, name)
SELECT i, 'xxx' FROM generate_series(1, 1000000) AS s(i);
SELECT pg_size_pretty(pg_table_size('t_large'));
pg_size_pretty
----------------
19 MB
UPDATE t_large SET id = id + 1;
UPDATE t_large SET name = 'yyy';
SELECT pg_size_pretty(pg_table_size('t_large'));
pg_size_pretty
----------------
19 MB
In the above example, the storage space occupied by the t_large
table does not change after the table is updated.