PostgreSQL 18 预览: pg_upgrade 的 swap 模式

由 John Doe 四月 1, 2025

你在升级 PostgreSQL 大版本的时候，会担心升级时间过长吗？现在，PostgreSQL 有了一个新的 swap 升级模式。

非洲乞力马扎罗山角下的大象

特性提交日志

pg_upgrade：添加 “–swap” 选项以实现更快的文件传输。

这个新选项指示pg_upgrade将数据目录从旧集群移动到新集群，然后用为新集群生成的系统表文件替换旧的系统表文件。这种模式的性能可能优于--link、--clone、--copy和--copy-file-range选项，尤其是在包含许多数据库对象（关系）的集群上。

然而，这种模式会在旧集群中产生许多垃圾文件。如果使用--sync-method=syncfs选项，这些垃圾文件可能会延长文件刷写的同步时间。为了解决这个问题，我们建议在此模式下使用--sync-method=fsync选项，并且pg_upgrade会在内部使用initdb --sync-only --no-sync-data-files命令来进行文件刷写的同步。pg_upgrade会在传输系统表文件时对其进行刷写的同步。我们假定从旧集群传输过来的数据库文件在升级之前已经进行了同步。

这种模式也会使回滚到旧集群变得更加复杂。因此，我们建议在文件传输过程中或传输完成后出现失败的情况下，从备份中恢复数据。我们确实考虑过让pg_upgrade学会如何为这种失败情况生成回滚脚本，但最终决定不这么做，原因是文件传输过程中出现失败的情况很少见，生成脚本的过程比较复杂，而且存在脚本被误用的可能性。

这种新模式仅限于位于同一文件系统中的集群之间的升级。经过一些努力，我们或许能够支持不同文件系统之间的升级，但是如果我们必须跨文件系统边界复制文件，这种模式可能不会带来太多好处。

它还仅限于从 10 版本或更高版本进行升级。使用交换模式从旧版本进行升级存在一些已知的障碍。例如，可见性映射文件的格式在 9.6 版本中发生了变化，序列的元组格式在 10 版本中发生了变化。实际上，交换模式在使用pg_dump时会省略--sequence-data选项，而是直接重用旧集群的序列数据文件。虽然要让交换模式能处理这些变化肯定是可行的（而且无论如何，我们将来可能还得处理类似的问题），但为支持从早已不受支持的版本进行升级而付出这样的努力，似乎并不值得。

讨论：https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan

示例

当你想要从一个主要版本的 PostgreSQL 升级到另一个版本时，可能会选择使用 pg_upgrade（或者逻辑复制）。目前 pg_upgrade 已有几种操作模式：

–copy：将数据文件从旧集群复制到新集群。
–clone：进行克隆而非复制（前提是文件系统支持）。
–copy-file-range：如果文件系统支持，使用 copy_file_range 系统调用进行高效复制。
–link：使用硬链接而非复制文件。

哪种模式最适合你，取决于具体需求。我们通常会选择 “--link” 模式，因为它速度相当快，但只有在旧集群和新集群位于同一文件系统时才能使用。这种模式的缺点是，一旦新集群启动，就无法再使用旧集群。

在 PostgreSQL 18 中，可能会新增一个名为 “--swap” 的选项。这种模式不是链接或复制文件，而是将文件从旧集群移动到新集群，然后用新集群的系统表文件替换旧的系统表文件。之所以增加这种模式，是因为当集群中包含许多数据库对象（关系）时，它的性能可能比 “--link” 模式（以及其他模式）还要好。

我们来验证一下，通过创建两个包含许多数据库对象（关系）的 PostgreSQL 17 集群：

pgenv $ initdb --version
initdb (PostgreSQL) 17.2 
pgenv $ initdb -D /var/tmp/dummy/17.2_1 --data-checksums
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
...
 
Success. You can now start the database server using:
 
  pg_ctl -D /var/tmp/17.2 -l logfile start
 
pgenv $ echo "port=8888" >> /var/tmp/dummy/17.2_1/postgresql.auto.conf 
pgenv $ pg_ctl --pgdata=/var/tmp/dummy/17.2_1 start -l /dev/null
waiting for server to start.... done
server started
pgenv $ psql -p 8888 -l
                                                     List of databases
   Name    |  Owner   | Encoding | Locale Provider |   Collate   |    Ctype    | Locale | ICU Rules |   Access privileges   
-----------+----------+----------+-----------------+-------------+-------------+--------+-----------+-----------------------
 postgres  | postgres | UTF8     | libc            | en_US.UTF-8 | en_US.UTF-8 |        |           | 
 template0 | postgres | UTF8     | libc            | en_US.UTF-8 | en_US.UTF-8 |        |           | =c/postgres          +
           |          |          |                 |             |             |        |           | postgres=CTc/postgres
 template1 | postgres | UTF8     | libc            | en_US.UTF-8 | en_US.UTF-8 |        |           | =c/postgres          +
           |          |          |                 |             |             |        |           | postgres=CTc/postgres
(3 rows)

下面是一个小脚本，用于创建一些表、索引并插入一些数据：

#!/bin/bash
 
for i in {1..10000}; do
    psql -p 8888 -c "create table t${i} ( a int, b text )"
    psql -p 8888 -c "insert into t${i} select i, i::text from generate_series(1,1000) i;"
    psql -p 8888 -c "create index i${i} on t${i}(a);"
done

如果对该集群运行这个脚本，我们将得到 10000 个表（每个表包含 1000 行数据）和 10000 个索引。这对于进行快速测试来说，已经有足够多的数据库对象（关系）了。

我们通过复制第一个集群来创建第二个集群：

pgenv $ mkdir /var/tmp/dummy/17.2_2/
pgenv $ pg_basebackup --port=8888 --pgdata=/var/tmp/dummy/17.2_2/ --checkpoint=fast
pgenv $ sed -i 's/8888/8889/g' /var/tmp/dummy/17.2_2/postgresql.auto.conf

现在，我们创建两个要升级到的 PostgreSQL 18 集群。其中一个使用 “–link” 模式升级，另一个使用新的 “–swap” 模式升级，我们还要停止旧的集群：

pgdev $ initdb --version
initdb (PostgreSQL) 18devel
pgdev $ initdb --pgdata=/var/tmp/dummy/18link
pgdev $ initdb --pgdata=/var/tmp/dummy/18swap
pgdev $ echo "port=9000" >> /var/tmp/dummy/18link/postgresql.auto.conf 
pgdev $ echo "port=9001" >> /var/tmp/dummy/18swap/postgresql.auto.conf 
pgdev $ pg_ctl --pgdata=/var/tmp/dummy/17.2_1/ stop

快速做一下升级前的检查，看看是否一切正常：

pgdev $ pg_upgrade --version
pg_upgrade (PostgreSQL) 18devel
pgdev $ export PGDATAOLD=/var/tmp/dummy/17.2_1/
pgdev $ export PGDATANEW=/var/tmp/dummy/18link/
pgdev $ export PGBINOLD=/opt/postgres/product/17/db_2/bin
pgdev $ export PGBINNEW=/opt/postgres/product/DEV/db_0/bin/
pgdev $ pg_upgrade --check
Performing Consistency Checks
-----------------------------
Checking cluster versions                                     ok
Checking database connection settings                         ok
Checking database user is the install user                    ok
Checking for prepared transactions                            ok
Checking for contrib/isn with bigint-passing mismatch         ok
Checking for valid logical replication slots                  ok
Checking for subscription state                               ok
Checking data type usage                                      ok
Checking for presence of required libraries                   ok
Checking database user is the install user                    ok
Checking for prepared transactions                            ok
Checking for new cluster tablespace directories               ok
 
*Clusters are compatible*

是时候使用 “–link” 模式进行测试了：

pgdev $ time pg_upgrade --link
Performing Consistency Checks
-----------------------------
Checking cluster versions                                     ok
Checking database connection settings                         ok
Checking database user is the install user                    ok
Checking for prepared transactions                            ok
Checking for contrib/isn with bigint-passing mismatch         ok
Checking for valid logical replication slots                  ok
Checking for subscription state                               ok
Checking data type usage                                      ok
Creating dump of global objects                               ok
Creating dump of database schemas                             
                                                              ok
Checking for presence of required libraries                   ok
Checking database user is the install user                    ok
Checking for prepared transactions                            ok
Checking for new cluster tablespace directories               ok
 
If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.
 
Performing Upgrade
------------------
Setting locale and encoding for new cluster                   ok
Analyzing all rows in the new cluster                         ok
Freezing all rows in the new cluster                          ok
Deleting files from new pg_xact                               ok
Copying old pg_xact to new server                             ok
Setting oldest XID for new cluster                            ok
Setting next transaction ID and epoch for new cluster         ok
Deleting files from new pg_multixact/offsets                  ok
Copying old pg_multixact/offsets to new server                ok
Deleting files from new pg_multixact/members                  ok
Copying old pg_multixact/members to new server                ok
Setting next multixact ID and offset for new cluster          ok
Resetting WAL archives                                        ok
Setting frozenxid and minmxid counters in new cluster         ok
Restoring global objects in the new cluster                   ok
Restoring database schemas in the new cluster                 
                                                              ok
Adding ".old" suffix to old global/pg_control                 ok
 
If you want to start the old cluster, you will need to remove
the ".old" suffix from /var/tmp/dummy/17.2_1/global/pg_control.old.
Because "link" mode was used, the old cluster cannot be safely
started once the new cluster has been started.
Linking user relation files                                   
                                                              ok
Setting next OID for new cluster                              ok
Sync data directory to disk                                   ok
Creating script to delete old cluster                         ok
Checking for extension updates                                ok
 
Upgrade Complete
----------------
Some optimizer statistics may not have been transferred by pg_upgrade.
Once you start the new server, consider running:
    /opt/postgres/product/DEV/db_0/bin/vacuumdb --all --analyze-in-stages --missing-stats-only
Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh
 
real    0m13.776s
user    0m0.654s
sys     0m1.536s

我们用新的 “–swap” 模式进行同样的测试：

pgdev $ export PGDATAOLD=/var/tmp/dummy/17.2_2/
pgdev $ export PGDATANEW=/var/tmp/dummy/18swap/
pgdev $ export PGBINOLD=/opt/postgres/product/17/db_2/bin
pgdev $ export PGBINNEW=/opt/postgres/product/DEV/db_0/
pgdev $ pg_upgrade --check
Performing Consistency Checks
-----------------------------
Checking cluster versions                                     ok
Checking database connection settings                         ok
Checking database user is the install user                    ok
Checking for prepared transactions                            ok
Checking for contrib/isn with bigint-passing mismatch         ok
Checking for valid logical replication slots                  ok
Checking for subscription state                               ok
Checking data type usage                                      ok
Checking for presence of required libraries                   ok
Checking database user is the install user                    ok
Checking for prepared transactions                            ok
Checking for new cluster tablespace directories               ok
 
*Clusters are compatible*
 
pgdev $ time pg_upgrade --swap
Performing Consistency Checks
-----------------------------
Checking cluster versions                                     ok
Checking database connection settings                         ok
Checking database user is the install user                    ok
Checking for prepared transactions                            ok
Checking for contrib/isn with bigint-passing mismatch         ok
Checking for valid logical replication slots                  ok
Checking for subscription state                               ok
Checking data type usage                                      ok
Creating dump of global objects                               ok
Creating dump of database schemas                             
                                                              ok
Checking for presence of required libraries                   ok
Checking database user is the install user                    ok
Checking for prepared transactions                            ok
Checking for new cluster tablespace directories               ok
 
If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.
 
Performing Upgrade
------------------
Setting locale and encoding for new cluster                   ok
Analyzing all rows in the new cluster                         ok
Freezing all rows in the new cluster                          ok
Deleting files from new pg_xact                               ok
Copying old pg_xact to new server                             ok
Setting oldest XID for new cluster                            ok
Setting next transaction ID and epoch for new cluster         ok
Deleting files from new pg_multixact/offsets                  ok
Copying old pg_multixact/offsets to new server                ok
Deleting files from new pg_multixact/members                  ok
Copying old pg_multixact/members to new server                ok
Setting next multixact ID and offset for new cluster          ok
Resetting WAL archives                                        ok
Setting frozenxid and minmxid counters in new cluster         ok
Restoring global objects in the new cluster                   ok
Restoring database schemas in the new cluster                 
                                                              ok
Adding ".old" suffix to old global/pg_control                 ok
 
Because "swap" mode was used, the old cluster can no longer be
safely started.
Swapping data directories                                     
                                                              ok
Setting next OID for new cluster                              ok
Sync data directory to disk                                   ok
Creating script to delete old cluster                         ok
Checking for extension updates                                ok
 
Upgrade Complete
----------------
Some optimizer statistics may not have been transferred by pg_upgrade.
Once you start the new server, consider running:
    /opt/postgres/product/DEV/db_0/bin/vacuumdb --all --analyze-in-stages --missing-stats-only
Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh
 
real    0m11.426s
user    0m0.600s
sys     0m0.659s

使用 “–swap” 模式大约快了 2 秒，虽然提升不算多，但至少更快了。当然，这只是一个非常简单的测试用例，还需要进一步测试。另外请注意输出中的警告信息：

Because "swap" mode was used, the old cluster can no longer be
safely started.
Swapping data directories

这是使用这种模式的必然结果。感谢社区的所有相关人员。

参考

提交日志：https://git.postgresql.org/pg/commitdiff/626d7236b65da50423df7de035e86f273cd36b49

特性提交日志

示例

参考

搜索

分类

标签