五月 13, 2024
摘要:在本教程中,您将学习如何处理 PostgreSQL 中事务 ID 回卷的故障。
目录
死行和膨胀
受 PostgreSQL 中的 UPDATE 或 DELETE 等操作影响的任何行的旧版本,都会在内部进行标记,以让查询不会返回该行。因此,这些行被称为“死行”,死行占用的空间称为“膨胀”。经常执行大量 UPDATE/DELETE 操作的数据库程序,很快就会变得非常大,并需要进行定期的维护工作。
清理的类型
通常,VACUUM 是一种定期性的数据库清理动作,会在 PostgreSQL 中执行。它有两种不同的形式;其中一种是普通的 VACUUM(没有 FULL),它只是为其他新行腾出空间。普通 VACUUM 不会获取排他锁,使其能够与表的常规读写同时运行。这种类型的 VACUUM 操作,将让额外的空间在同一表内可重复使用,而不是将其返回给操作系统。使用 VACUUM FULL 形式时,操作系统会接收多余的空间,而且还会将表的全部内容重写到一个没有额外空间的新磁盘文件上。该类型的清理必须在每个表上使用 ACCESS EXCLUSIVE 锁进行处理,这使得它会慢一些。
运作方式 | VACUUM | VACUUM FULL |
---|---|---|
方法 | 释放死行以供重用 | 重写没有死行的表 |
Access Exclusive Lock | 否 | 是 |
空闲空间可用于 | 在同一个表内 | 操作系统 |
事务回卷
现在,让我们来了解一下事务和事务 ID 回卷。在 PostgreSQL 的数据库中,更新的每一行都会从事务控制系统获得一个事务 ID。这些 ID 规定了会向其他活动事务显示哪些行。
事务 ID 回卷的问题在多版本并发控制(MVCC)。根据 ID 区分两个事务的能力对于 MVCC 至关重要。PostgreSQL 中的事务 ID 就是 32 位的整数。因此,只有大约四十亿(2^32)个可能的事务 ID。
40 亿可能看起来相当多且很难用光,但是不得不承认,对于写入密集型工作负载非常高的数据库系统,40 亿个事务可以在几周内用完。
因此,如果存在 2,000,000,000 个未清理的事务,PostgreSQL 将阻止进行写入操作,并将数据库切换到只读模式。
以下任何一个或多个条件,都可能导致事务 ID 回卷的问题。
- Autovacuum 设置为关闭。
- 长时间运行的事务。
- 繁重的 DML 操作,迫使取消自动清理工作进程。
- 许多会话或连接长时间保持锁定。
让我们在测试机器上来模拟下事务 ID 回卷的问题。在 PostgreSQL 日志文件中会出现以下行,这意味着 PostgreSQL 已停止接受 DML 语句,并已切换到了只读模式。
2022-09-16 08:43:38.265 [74098] WARNING: database "postgres" must be vacuumed within 3000000 transactions (10184)
2022-09-16 08:43:38.265 HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
2022-09-16 08:43:48.640 [5764] ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres" (10182)
2022-09-16 08:43:48.640 [5764] HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
然后,让我们建立与问题数据库 “postgres” 的连接,并运行一些 SELECT 和 CREATE 命令来检查下。
postgres=# select datname from pg_database;
datname
----------
postgres
templatel
template0
(3 rows)
postgres=# create table test (id numeric);
ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres" (10182)
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
postgres=# select usename from pg_user;
usename
----------
postgres
(1 row)
由于事务 ID 回卷,数据库 “postgres” 目前已确认处于只读模式。
修复事务回卷
现在,让我们来看看解决此问题的方法。我们必须首先让数据库停机,以单用户模式连接,然后在每个数据库上执行 VACUUM FULL。但是,我们必须首先完成下面列出的步骤。
-
首先,我们需要使用下面的查询,找到发生事务回卷的表的名称。
SELECT c.relnamespace::regnamespace as schema_name, c.relname as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as age, 2^31 - 1000000 - greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as remaining FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm') ORDER BY 4;
-
如果您在上述查询的输出中发现属于 “pg_temp_” 模式的任何表,则删除表是解决问题的唯一选择,因为 PostgreSQL 不允许您对其他会话中生成的临时表进行 VACUUM。
-
另一种情况是,在当前最旧的活动事务开始后建立的任何事务 ID,PostgreSQL 是无法冻结的。这是由 MVCC 的运作方式决定的。事务有时可能会变得太旧,以至于 VACUUM 无法清理它们,来解除全部 20 亿个事务 ID 回卷的限制,这会导致系统不再接受新的 DML。
2022-09-16 05:49:07.514 [64931] WARNING: oldest xmin is far in the past (11833) 2022-09-16 05:49:07.514 [64931] HINT: Close open transactions soon to avoid wraparound problems. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
-
您可能已经注意到,日志输出中的提示都告诉您提交或回滚以前创建的预备事务。因此,我们必须使用下面的查询,来检查孤立或过时的预备事务。
SELECT age(transaction), * FROM pg_prepared_xacts;
使用上述查询输出的 gid,回滚您找到的任何预备事务。回滚查询如下。
ROLLBACK PREPARED gid;
-
完成上述过程后,我们必须使数据库停机,以单用户模式连接,并对每个数据库执行 VACUUM FULL 操作。
$ pg_ctl stop waiting for server to shut down...... done server stopped
-
现在,您需要以单用户模式启动数据库,如下所示。
$ postgres --single postgres 2022-09-16 12:06:45.628 [84585] WARNING: database with OID 14728 must be vacuumed within 3000000 transactions (10185) 2022-09-16 12:06:45.628 [84585] HINT: to avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. PostgreSQL stand-alone backend 14.0 backend>
-
成功进入单用户模式后,我们需要对每个数据库执行 VACUUM FULL。
backend> VACUUM FULL; 2022-09-16 12:09:02.711 [84585] WARNING: database "postgres" must be vacuumed within 3000000 transactions (10184) 2022-09-16 12:09:02.711 [84585] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:09:03.860 [84585] WARNING: database "postgres" must be vacuumed within 2999999 transactions (10184) 2022-09-16 12:09:03.860 [84585] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
2022-09-16 12:09:08.052 [84585] WARNING: database "postgres" must be vacuumed within 3000000 transactions (10184) 2022-09-16 12:09:08.052 [84585] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:09:08.058 [84585] WARNING: database "postgres" must be vacuumed within 3000000 transactions (10184) 2022-09-16 12:09:08.058 [84585] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:09:08.065 [84585] WARNING: database "template1" must be vacuumed within 3000000 transactions (10184) 2022-09-16 12:09:08.065 [84585] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
-
现在,您可以看到 PostgreSQL 也要求对 template1 数据库执行相同的 VACUUM FULL。让我们对 template1 重复同样的操作。
$ postgres --single template1 2022-09-16 12:14:14.897 [84775] WARNING: database with OID 1 must be vacuumed within 2999860 transactions (10185) 2022-09-16 12:14:14.897 [84775] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. PostgreSQL stand-alone backend 14.0 backend> VACUUM FULL; 2022-09-16 12:14:24.965 [84775] WARNING: database "template1" must be vacuumed within 2999860 transactions (10184) 2022-09-16 12:14:24.965 [84775] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:14:25.026 [84775] WARNING: database "template1" must be vacuumed within 2999859 transactions (10184) 2022-09-16 12:14:25.026 [84775] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
-
现在它要求在 template0 上执行相同的操作。
2022-09-16 12:14:25.696 [84775] WARNING: database "template1" must be vacuumed within 2999795 transactions (10184) 2022-09-16 12:14:25.696 [84775] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:14:25.703 [84775] WARNING: database "template0" must be vacuumed within 2999794 transactions (10184) 2022-09-16 12:14:25.703 [84775] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
-
让我们对 template0 重复同样的操作。
$ postgres --single template0 2022-09-16 12:16:18.247 [84863] WARNING: database with OID 14727 must be vacuumed within 2999794 transactions (10185) 2022-09-16 12:16:18.247 [84863] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. PostgreSQL stand-alone backend 14.0 backend> VACUUM FULL; 2022-09-16 12:16:36.346 [84863] WARNING: database "template0" must be vacuumed within 2999794 transactions (10184) 2022-09-16 12:16:36.346 [84863] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
2022-09-16 12:16:37.036 [84863] WARNING: database "template0" must be vacuumed within 2999730 transactions (10184) 2022-09-16 12:16:37.036 [84863] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 2022-09-16 12:16:37.042 [84863] WARNING: database "template0" must be vacuumed within 2999729 transactions (10184) 2022-09-16 12:16:37.042 [84863] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
您会注意到,PostgreSQL 不再请求对任何其他数据库执行 VACUUM FULL,因为该实例只有三个数据库(postgres、template1 和 template0),并且我们已经成功地在每个数据库上完成了 VACUUM FULL。
之后,我们可以按如下方式启动 PostgreSQL。
$ pg_ctl start waiting for server to start.... 2022-09-16 12:19:11.594 +08 [84987] LOG: redirecting log output to logging collector process 2022-09-16 12:19:11.594 +08 [84987] HINT: Allure log output will appear in directory 'log'. done server started
现在,让我们执行一些 DML 语句,来确认问题是否已经解决。
select usename from pg_user; usename ---------- postgres (1 row) create table test (id numeric); insert into test values (1); select * from test; id ---- 1 (1 row)
如您所见,该问题现已修复,PostgreSQL 能够像问题发生前一样有效地执行 DML 操作了。
避免事务回卷
在 PostgreSQL 中,可以使用各种重要方法来防止事务 ID 回卷。首先,您必须跟踪数据库最早的事务 ID 的年龄。这可以使用 pg_class 系统表来实现。您可以通过跟踪其年龄,来确定事务 ID 是否接近回卷限制。
应定期执行维护操作,例如清理和数据库分析,以防止事务 ID 回卷。VACUUM 清理有助于恢复被过时或已删除的元组占用的空间,从而最大限度地减少膨胀,并防止事务 ID 达到回卷阈值。分析会更新优化器统计信息,以确保查询计划正确无误。
此外,在 PostgreSQL 配置文件中,请考虑对 autovacuum 参数做一些调优。autovacuum 进程由这些设置控制,它们有助于控制事务 ID。
另一种预防方法是配置好监控系统,以便在事务 ID 的年龄达到特定限制时发送通知。这使您可以快速响应,并防止系统接近回卷限制。
结论
尽管通过设置系统的定期维护和分析来防止发生事务 ID 回卷是理想的选择,但随着业务的增长和发展,可能会出现意外事件或新的需求。上面的例子说明了,如何在发生事务 ID 回卷的情况下清理 PostgreSQL 系统。希望您现在不用再害怕事务 ID 回卷了,因为您知道您能够有效地修复它。