May 13, 2024
Summary: In this tutorial, you will learn how to troubleshooting the transaction wraparound in PostgreSQL.
Table of Contents
Dead rows and bloat
The old version of any row that is affected by an operation like UPDATE or DELETE in PostgreSQL is marked internally so that queries don’t return that row. Therefore, these rows are referred to as “dead rows”, and the space occupied by dead rows is called “bloat.” A database program that often executes a lot of UPDATE/DELETE operations can quickly become very large and require sporadic maintenance work.
Types of vacuuming
Typically, vacuuming, a type of periodic database cleaning, is carried out in PostgreSQL. It comes in two different forms; one of them is plain VACUUM (without FULL), which merely frees up space for other uses. Plain vacuum does not incur exclusive locks, allowing it to run concurrently with regular reading and writing of the table. This type of vacuum operation will keep the additional space accessible for reuse inside the same table, rather than returning it to the operating system. The operating system receives the excess space back when using the VACUUM FULL form, which also rewrites the whole contents of the table onto a new disk file with no extra space. This form must be processed with an ACCESS EXCLUSIVE lock on each table, which makes it slower.
How it works | Vacuum | Vacuum full |
---|---|---|
Method | Free up dead rows for reuse | Rewrite the table with no dead rows |
Access Exclusive Lock | No | Yes |
Free space is made available for | Inside the same table | The Operating System |
Transaction wraparound
Now, let’s learn about the transaction and transaction wraparound. Every row that is updated in the database in PostgreSQL receives a transaction ID (Txid) from the transaction control mechanism. These IDs regulate which rows are shown to other active transactions.
The issue with transaction wraparound is Multi-Version Concurrency Control (MVCC). The ability to distinguish between two transactions based on their Txids is essential for MVCC. Txids in Postgres are just 32-bit integers. Accordingly, there are only roughly four billion (2^32) possible Txids.
Four billion may seem excessive and difficult to achieve, but I must admit that for database systems with really high write-intensive workloads, four billion transactions are achievable in a matter of weeks.
Therefore, PostgreSQL stops allowing WRITE operations and switches the database to READONLY mode if there are 2,000,000,000 unvacuumed transactions.
Any one or more of the following conditions could contribute to transaction ID wraparound.
- Autovacuum is set to turned off.
- Long-running transactions.
- Heavy DML operations forcing the cancellation of autovacuum worker processes.
- Many sessions or connections holding locks for very long durations.
I was able to successfully simulate a transaction wraparound issue on my test machine. I’m getting the following entries in my PostgreSQL log files, which means that PostgreSQL has stopped accepting DML statements and has switched to READONLY mode.
2022-09-16 08:43:38.265 [74098] WARNING: database "postgres" must be vacuumed within 3000000 transactions (10184)
2022-09-16 08:43:38.265 HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
2022-09-16 08:43:48.640 [5764] ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres" (10182)
2022-09-16 08:43:48.640 [5764] HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
Therefore, I have established a connection to the issue database “postgres” and run some SELECT & CREATE commands to make sure.
postgres=# select datname from pg_database;
datname
----------
postgres
templatel
template0
(3 rows)
postgres=# create table test (id numeric);
ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres" (10182)
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
postgres=# select usename from pg_user;
usename
----------
postgres
(1 row)
The database “Postgres” is currently confirmed to be in READONLY mode because of transaction wraparound.
Fix transaction wraparound
Let me now walk you through the methods to remedy this scenario. We must first take the database offline, connect in single-user mode, and then execute VACUUM FULL on each database. However, we must first complete the steps listed below.
-
First of all, we need to find the name of table which is responsible for wraparound by using the query below.
SELECT c.relnamespace::regnamespace as schema_name, c.relname as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as age, 2^31 - 1000000 - greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as remaining FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm') ORDER BY 4;
-
Dropping the table is the only option to fix the issue if you discover any tables in the output of the above query that are part of the ‘pg_temp_’ schema, as PostgreSQL does not allow you to VACUUM temporary tables generated in other sessions.
-
Another scenario is that PostgreSQL is unable to freeze any transaction IDs that were established after the oldest presently active transaction began. This is due to the way the MVCC operates. Transactions may occasionally get so old that VACUUM is unable to clean them up for the entire 2 billion transaction ID wraparound limit, which results in the system no longer accepting new DML.
2022-09-16 05:49:07.514 [64931] WARNING: oldest xmin is far in the past (11833) 2022-09-16 05:49:07.514 [64931] HINT: Close open transactions soon to avoid wraparound problems. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
-
You have probably noticed that all the screenshots’ HINTs tell you to COMMIT or ROLLBACK previously created prepared transactions. Therefore, using the query below, we must check for orphan or outdated prepared transactions.
SELECT age(transaction), * FROM pg_prepared_xacts;
Using the gid from the above query, rollback any prepared transactions you find. The rollback query is below.
ROLLBACK PREPARED gid;
-
Once the above processes have been completed, we must bring the databases offline, connect in single-user mode, and conduct the VACUUM FULL option on each database.
$ pg_ctl stop waiting for server to shut down...... done server stopped
-
Now, you need to start the database in single-user mode as below.
$ postgres --single postgres 2022-09-16 12:06:45.628 [84585] WARNING: database with OID 14728 must be vacuumed within 3000000 transactions (10185) 2022-09-16 12:06:45.628 [84585] HINT: to avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. PostgreSQL stand-alone backend 14.0 backend>
-
After successfully entering into single-user mode, we need to execute VACUUM FULL for each database.
backend> VACUUM FULL; 2022-09-16 12:09:02.711 [84585] WARNING: database "postgres" must be vacuumed within 3000000 transactions (10184) 2022-09-16 12:09:02.711 [84585] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:09:03.860 [84585] WARNING: database "postgres" must be vacuumed within 2999999 transactions (10184) 2022-09-16 12:09:03.860 [84585] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
2022-09-16 12:09:08.052 [84585] WARNING: database "postgres" must be vacuumed within 3000000 transactions (10184) 2022-09-16 12:09:08.052 [84585] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:09:08.058 [84585] WARNING: database "postgres" must be vacuumed within 3000000 transactions (10184) 2022-09-16 12:09:08.058 [84585] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:09:08.065 [84585] WARNING: database "template1" must be vacuumed within 3000000 transactions (10184) 2022-09-16 12:09:08.065 [84585] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
-
Now, you can see that PostgreSQL asking to do the same VACUUM FULL for the template1 database as well. Let’s do the same for template1.
$ postgres --single template1 2022-09-16 12:14:14.897 [84775] WARNING: database with OID 1 must be vacuumed within 2999860 transactions (10185) 2022-09-16 12:14:14.897 [84775] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. PostgreSQL stand-alone backend 14.0 backend> VACUUM FULL; 2022-09-16 12:14:24.965 [84775] WARNING: database "template1" must be vacuumed within 2999860 transactions (10184) 2022-09-16 12:14:24.965 [84775] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:14:25.026 [84775] WARNING: database "template1" must be vacuumed within 2999859 transactions (10184) 2022-09-16 12:14:25.026 [84775] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
-
Now it’s asking to execute same operation on template0.
2022-09-16 12:14:25.696 [84775] WARNING: database "template1" must be vacuumed within 2999795 transactions (10184) 2022-09-16 12:14:25.696 [84775] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:14:25.703 [84775] WARNING: database "template0" must be vacuumed within 2999794 transactions (10184) 2022-09-16 12:14:25.703 [84775] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
-
Let’s do the same for template0.
$ postgres --single template0 2022-09-16 12:16:18.247 [84863] WARNING: database with OID 14727 must be vacuumed within 2999794 transactions (10185) 2022-09-16 12:16:18.247 [84863] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. PostgreSQL stand-alone backend 14.0 backend> VACUUM FULL; 2022-09-16 12:16:36.346 [84863] WARNING: database "template0" must be vacuumed within 2999794 transactions (10184) 2022-09-16 12:16:36.346 [84863] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
2022-09-16 12:16:37.036 [84863] WARNING: database "template0" must be vacuumed within 2999730 transactions (10184) 2022-09-16 12:16:37.036 [84863] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:16:37.042 [84863] WARNING: database "template0" must be vacuumed within 2999729 transactions (10184) 2022-09-16 12:16:37.042 [84863] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
If you noticed, PostgreSQL is no longer requesting to perform VACUUM FULL on any additional databases because I only have three databases (postgres, template1, and template0), and I have successfully finished VACUUM FULL on each of them.
After that we can start the PostgreSQL as below.
$ pg_ctl start waiting for server to start.... 2022-09-16 12:19:11.594 +08 [84987] LOG: redirecting log output to logging collector process 2022-09-16 12:19:11.594 +08 [84987] HINT: Allure log output will appear in directory 'log'. done server started
Now, let’s confirm if the problem is fixed or not by executing DML statements.
select usename from pg_user; usename ---------- postgres (1 row) create table test (id numeric); insert into test values (1); select * from test; id ---- 1 (1 row)
As you can see that the issue is fixed now and PostgreSQL able to perform DML operations effectively like before the issue.
Avoid transaction wraparound
In PostgreSQL, there are various important methods to use in order to prevent transaction wraparound. First and foremost, you must keep track of the age of your database’s oldest transaction ID (XID). This is possible using the pg_class catalog table. You can determine if the XID is approaching the wraparound limit by keeping track of its age.
Regular maintenance procedures, such as vacuuming and database analyzing, should be carried out in order to prevent transaction wraparound. Vacuuming helps to recover space that had been occupied by obsolete or deleted tuples, minimizing bloat and preventing transaction IDs from reaching the wraparound threshold. Analyzing updates the optimizer statistics, ensuring that query plans are correct.
Additionally, consider modifying the autovacuum parameters to some optimal settings in your PostgreSQL configuration. The autovacuum processes are controlled by those settings and helping in controlling transaction IDs.
Another preventive method is to configure a monitoring system to deliver notifications when the XID age reaches a certain limit. This allows you to respond quickly and prevents you from approaching the wraparound limit.
Conclusion
Although it is ideal to prevent transaction wraparound from occurring by setting up regular maintenance and analysis of your system, unexpected events or new requirements may arise as your business grow and evolve. With the example above, I illustrated how you can clean up your PostgreSQL system in case transaction wraparound occurs. I hope now you will not be afraid of transaction wraparound, knowing that you are able to fix it effectively.