February 19, 2025
Summary: in this tutorial, you will learn how to troubleshooting the error “found xmin … from before relfrozenxid …” in PostgreSQL.
Table of Contents
What are xmin and relfrozenxid?
xmin and xmax
Each PostgreSQL table entry (“tuple”) has the system columns xmin and xmax. They contain the transaction ID of the transactions that created and invalidated (updated or deleted) the version of the row that the tuple represents. Each SQL statement has a snapshot that determines which transaction IDs it can see. If a statement can see the tuple’s xmin
, and the xmax
is invalid (has a value of 0), invisible or belongs to a transaction that has not committed, that version of the row is visible to the statement.
Transaction ID wraparound and freezing
One of the problems with PostgreSQL’s multi-versioning architecture is that transaction IDs are generated from a 4-byte unsigned integer counter. Once the counter reaches its maximum value, it will “wrap around” to 3 (the values from 0 to 2 have special meanings). As a consequence, transaction IDs change their meaning as time goes by: a transaction ID that belonged to a committed transaction may now belong to a future transaction or one that is rolled back. That would mean that visible rows could suddenly become invisible, thus causing data corruption. If you want to see that at play, you should read the tutorial on transaction ID wraparound.
To prevent this kind of data corruption from happening, VACUUM
(typically triggered by autovacuum) freezes old, visible table rows: it sets a flag on the row that indicates that the reader should ignore xmin
and xmax
. That flag marks the row as unconditionally visible. Once PostgreSQL has frozen all old, visible rows in a database, the transaction ID can safely wrap around.
The meaning of relfrozenxid
Freezing old rows is vitally important for the health of a PostgreSQL database. Consequently, the system keeps track of the progress of freezing. The system catalog pg_class has a column “relfrozenxid
”. All table entries that have an xmin
or xmax
that is equal or older than relfrozenxid
are guaranteed to be frozen. PostgreSQL uses that column to trigger “anti-wraparound” autovacuum runs: if the relfrozenxid
of a table is more than autovacuum_freeze_max_age transactions in the past, the autovacuum launcher will start an anti-wraparound vacuum worker process. Such a worker insists on visiting all pages of the table that may contain entries that are not yet frozen. After the anti-wraparound autovacuum run is done, PostgreSQL can advance the table’s relfrozenxid
.
What is the problem with an xmin that is older than relfrozenxid?
From the above it should be clear that no unfrozen tuple should ever contain an xmin
that is older than the table’s relfrozenxid
. If we find such an unfrozen tuple, it is a case of data corruption. Indeed, the SQLSTATE of the error message we are researching is XX001
. All error messages that start with XX
indicate data corruption.
Note that the error happens during the execution of VACUUM
. The error terminates the operation, so that VACUUM
cannot finish processing the table. In particular, it won’t advance relfrozenxid
. If nobody detects and fixes the problem, the system will eventually come close to data loss. In that event, PostgreSQL stops processing any new transactions. If that happens, your system will face a down time until somebody can fix the problem using single-user mode. You don’t want that to happen, so you should monitor the PostgreSQL log file for data corruption errors!
What can cause tuples with an xmin older than relfrozenxid?
The big question is why people keep getting this error message. Of course it is possible that all these people have hardware problems, which is the most frequent cause of data corruption. But it’s also possible that there is a yet undiscovered bug somewhere in PostgreSQL. It seems that if VACUUM
is running concurrently with something else, the result can be an unfrozen tuple with an xmin
older than relfrozenxid
.
How to trigger the error by modifying relfrozenxid
In order to figure out how to deal with the error if it happens to us, we’d like to artificially cause the error to happen. As mentioned above, I cannot think of a way to trigger the problem with normal data modifications. But it is fairly easy to cause the problem to happen if we are willing to manually modify a catalog table. That is an unsupported operation that may break your system, so let’s create a new database that we can drop to get rid of the data corruption:
CREATE DATABASE scratch;
\connect scratch
You are now connected to database "scratch" as user "postgres".
CREATE TABLE boom (id integer);
INSERT INTO boom VALUES (1);
UPDATE pg_class
SET relfrozenxid = pg_current_xact_id()::xid
WHERE relname = 'boom';
SELECT * FROM boom;
id
----
1
(1 row)
VACUUM boom;
ERROR: found xmin 31676653 from before relfrozenxid 31676654
CONTEXT: while scanning block 0 offset 1 of relation "public.boom"
Note that a SELECT
won’t trigger the error message. Only VACUUM
checks the data thoroughly enough to report the condition as an error.
How can I fix the error?
There are several ways to address the problem:
Dump and restore the table
Perhaps the simplest and least dangerous method to get rid of the problem is to export the table using pg_dump
. Remember — querying the table won’t trigger the error. Then you can drop the table and restore the dump:
pg_dump -U postgres -F c -t boom -f dumpfile scratch
psql -U postgres -d scratch -c 'DROP TABLE boom'
pg_restore -U postgres -d scratch dumpfile
While this method is simple, it has disadvantages:
- if the table is large, exporting and importing it can take a long time
- if there are foreign keys referencing the table, you have to drop and re-create those foreign keys as well
The big advantage of this method, apart from its simplicity, is that export and import is the only way to be certain that you have got rid of all data corruption. So you should use this method whenever possible.
Update relfrozenxid in the pg_class system catalog
Another option is to manually update the relfrozenxid
for the table’s pg_class
entry.
UPDATE pg_class SET relfrozenxid = '31676653'
WHERE relname = 'boom';
This technique is fast, but it has a disadvantage too: modifying a system catalog is unsupported and dangerous. If you pick a bad value for relfrozenxid
, you could end up with worse problems like
ERROR: could not access status of transaction 43350785
DETAIL: Could not open file "pg_xact/0029": No such file or directory.
Using pg_surgery on the broken table entry
Perhaps the most elegant way to deal with the error is to use the pg_surgery extension. With that extension, we can explicitly freeze the tuple once we know its physical address (ctid
):
CREATE EXTENSION pg_surgery;
-- search only in block 0
SELECT ctid FROM boom
WHERE ctid > '(0,0)'
AND ctid < '(0,32000)'
AND xmin = '31676653';
ctid
-------
(0,1)
(1 row)
SELECT heap_force_freeze('boom', '{(0,1)}');
heap_force_freeze
-------------------
(1 row)
Using pg_surgery is not without risks either: it allows you to freeze or kill arbitrary table entries, which can cause data inconsistencies. The name of the extension should give you a clue: don’t use a scalpel unless you know what you are doing!
Update the corrupted row
If you perform an UPDATE
on the broken row, PostgreSQL will create a new, correct row version.
UPDATE boom SET id = id WHERE id = 1;
After that, the table can be vacuumed without an error, which will remove the broken data. Be careful: bulk updates that modify a big part of the table will lead to a lot of bloat. So treat only the broken rows.
Conclusion
People often report the error “found xmin … from before relfrozenxid …”, there may be a live data corruption bug in PostgreSQL. We have seen what the error means, and we have shown you three ways to deal with the problem. None of these ways is without disadvantages, so choose your approach carefully.