十一月 29, 2024
摘要:在本教程中,您将学习如何在 PostgreSQL 中执行增量备份。
目录
为什么选择增量备份?
PostgreSQL 提供了一种方法,可以在不中断现有客户端连接的情况下,从正在运行的 PostgreSQL 服务器进行全量、一致的备份。对于大型数据库,这可能需要数小时或数天才能完成,并且在此期间可能会占用大量磁盘 I/O 带宽。为了解决这个问题,通常会偶尔地进行全量备份,并且会不断地进行 WAL 归档。因此,WAL 归档会被用作细粒度的增量备份。
但是,拥有一个真正的增量备份方案,可以大大降低备份的磁盘空间要求,并降低备份和还原操作的总体复杂性。使用一个全量备份和一个增量备份,而不是一个全量备份和几千个 WAL 文件,更容易且不易出错。
要在 PostgreSQL 17 中执行增量的备份和恢复,请执行以下步骤:
WAL 摘要
PostgreSQL 17 附带了一个称为 WAL summarizer process 的新后台工作进程,该进程会在pg_wal/summaries
目录下创建 WAL 文件的摘要信息。默认情况下,此进程不会启用,但需要启用后增量备份才能正常工作。要启用它,请设置新的配置参数summarize_wal
:
ALTER SYSTEM SET summarize_wal = 'on';
SELECT pg_reload_conf();
更改此设置不需要重启 PostgreSQL 即可生效,只需重新加载即可。
全量备份
通常的备份方式是使用pg_basebackup
,进行一次全量备份。让我们全量备份到一个目录/backup/full1
中:
$ pg_basebackup -D /backup/full1 -c fast -p 7000 -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/10000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_1182"
pg_basebackup: write-ahead log end point: 0/10000120
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
过一会儿,等做了一些更改,让我们再做一次全量备份(到目录/backup/full2
),以便与增量备份进行比较:
$ pg_basebackup -D /backup/full2 -c fast -p 7000
增量备份
此时,让我们来做一次增量备份。为此,请使用新的-i, --incremental=OLDMANIFEST
选项执行 pg_basebackup。我们希望从/backup/full1
中的全量备份来获得增量,因此我们使用清单/backup/full1/backup_manifest
,并在/backup/incr2
中创建该备份:
$ pg_basebackup -D /backup/incr2 --incremental /backup/full1/backup_manifest -c fast -p 7000 -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/17000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_1223"
pg_basebackup: write-ahead log end point: 0/17000120
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
让我们比较一下所有的三个备份大小:
$ du -sh /backup/*
189M /backup/full1
189M /backup/full2
51M /backup/incr2
第二次全量备份/backup/full2
的大小,与初次全量备份/backup/full1
相同。增量备份/backup/incr2
的大小仅为同一时间点的全量备份大小的四分之一多一点。
/backup/full2
和/backup/incr2
的内容看起来很相似,但通过查看备份标签,您可以看到备份是否是增量备份:
$ cat /backup/full2/backup_label
START WAL LOCATION: 0/15000028 (file 000000010000000000000015)
CHECKPOINT LOCATION: 0/15000080
BACKUP METHOD: streamed
BACKUP FROM: primary
START TIME: 2024-11-22 10:13:37 IST
LABEL: pg_basebackup base backup
START TIMELINE: 1
$ cat /backup/incr2/backup_label
START WAL LOCATION: 0/17000028 (file 000000010000000000000017)
CHECKPOINT LOCATION: 0/17000080
BACKUP METHOD: streamed
BACKUP FROM: primary
START TIME: 2024-11-22 10:13:56 IST
LABEL: pg_basebackup base backup
START TIMELINE: 1
INCREMENTAL FROM LSN: 0/10000028
INCREMENTAL FROM TLI: 1
第二次增量备份
在发生更多更改后,使用第一个增量备份中的backup_manifest
,创建第二个增量备份:
$ pg_basebackup -D /backup/incr3 --incremental /backup/incr2/backup_manifest -c fast -p 7000
恢复增量备份
要恢复一个增量备份,请使用新的工具 pg_combinebackup。下面的命令获取第一次全量备份/backup/full1
和稍后的增量备份,将它们合并,并将输出写入到目录/backup/restore3
:
$ pg_combinebackup -o /backup/restore3 /backup/full1 /backup/incr2 /backup/incr3
我们可以用/backup/restore3
中的数据目录启动 postmaster,并看到它在备份恢复后成功启动:
2024-11-22 10:26:28.269 UTC [1443] LOG: starting PostgreSQL 17 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-11-22 10:26:28.272 UTC [1443] LOG: listening on IPv6 address "::1", port 7001
2024-11-22 10:26:28.273 UTC [1443] LOG: listening on IPv4 address "127.0.0.1", port 7001
2024-11-22 10:26:28.277 UTC [1443] LOG: listening on Unix socket "/tmp/.s.PGSQL.7001"
2024-11-22 10:26:28.291 UTC [1446] LOG: database system was interrupted; last known up at 2024-11-22 10:13:56 IST
2024-11-22 10:26:28.500 UTC [1446] LOG: starting backup recovery with redo LSN 0/17000028, checkpoint LSN 0/17000080, on timeline ID 1
2024-11-22 10:26:28.511 UTC [1446] LOG: redo starts at 0/17000028
2024-11-22 10:26:28.515 UTC [1446] LOG: completed backup recovery with redo LSN 0/17000028 and end LSN 0/17000120
2024-11-22 10:26:28.515 UTC [1446] LOG: consistent recovery state reached at 0/17000120
2024-11-22 10:26:28.515 UTC [1446] LOG: redo done at 0/17000120 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-11-22 10:26:28.613 UTC [1444] LOG: checkpoint starting: end-of-recovery immediate wait
2024-11-22 10:26:28.641 UTC [1444] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.009 s, sync=0.003 s, total=0.032 s; sync files=2, longest=0.002 s, average=0.002 s; distance=16384 kB, estimate=16384 kB; lsn=0/18000028, redo lsn=0/18000028
2024-11-22 10:26:28.652 UTC [1443] LOG: database system is ready to accept connections
考虑
- 备份目录:确保每个备份(无论是全量备份还是增量备份)都存储在单独的目录中,以防止覆盖文件。
- 备份监控:定期监控备份,以确保它们成功完成,并正确维护好
backup_manifest
文件。 - 还原过程:熟悉还原过程,特别是当您需要使用 PostgreSQL 17 中的
pg_combinebackup
工具组合多个增量备份时。此工具有助于将增量备份合并到一个可用状态。
通过执行这些步骤,您可以维护一系列的增量备份,以有效地捕获对 PostgreSQL 数据库的更改。
增量备份与时间点恢复
在当前的 PostgreSQL 版本中,可以先进行全量备份,然后归档自备份以来生成的每个 WAL 文件。这允许您恢复备份,然后重放每个 WAL 文件到特定的时间点,从而重新创建当时的数据目录。这通常称为时间点恢复(PITR)。
从逻辑上讲,按时间点恢复到一个增量备份点的结果,应该与恢复在该点产生的增量备份的结果相同。但是,区别在于增量备份的大小可能比组合的 WAL 文件大小要小一点或小得多。例如,如果一行被 100 个事务更新了,则 WAL 文件将必须记录所有这 100 次更改,但增量备份仅记录数据目录内容的初始状态和最终状态之间的差异。
如果您以前将 WAL 归档用于纯粹的增量备份目的,而不是时间点恢复,那么现在可以简单地使用新的增量备份功能。如果您需要按时间点恢复,WAL 归档和重放仍然是需要采用的方案。