由 John Doe 四月 11, 2025
你是否介意 PostgreSQL 的双重缓冲?现在,PostgreSQL 支持异步 I/O 了。
目录
异步 I/O 简介
传统上,PostgreSQL 依赖操作系统来隐藏磁盘写入的延迟,而这一过程是同步进行的,可能会导致双重缓冲(PostgreSQL 共享缓冲区和操作系统文件缓存)。这在写入预写式日志(WAL)时最为关键,因为 PostgreSQL 必须确保更改已刷新到磁盘,并需要等待确认。
在进行测试之前,让我们从参数的角度看看有哪些新变化。新参数之一是io_method
:
show io_method;
io_method
-----------
worker
(1 row)
其默认值为 “worker”,执行异步操作最多的工作进程数目由io_workers
控制:
show io_workers;
io_workers
------------
3
(1 row)
在操作系统中也可以看到这一点:
$ ps aux | grep "io worker" | grep -v grep
postgres 29732 0.0 0.1 224792 7052 ? Ss Apr08 0:00 postgres: pgdev: io worker 1
postgres 29733 0.0 0.2 224792 9884 ? Ss Apr08 0:00 postgres: pgdev: io worker 0
postgres 29734 0.0 0.1 224792 7384 ? Ss Apr08 0:00 postgres: pgdev: io worker 2
io_method
的其他可能设置有:
io_uring
:使用 io_uring 进行异步 I/Osync
:PostgreSQL 18 之前的行为,执行同步 I/O
只有当io_method
设置为默认配置 “worker” 时,io_workers
才会生效。
准备测试环境
和往常一样:以下只是一些基本测试。请在你自己的环境中,针对特定的业务负载进行测试,以获得有意义的数据。特别是在公有云环境中进行测试时,要注意这些数字可能无法完全反映真实情况。
我们将在运行 Debian 12 的系统上进行测试。文件系统为 ext4(默认设置):
$ grep proc /proc/cpuinfo
processor : 0
processor : 1
$ free -g
total used free shared buff/cache available
Mem: 7 0 4 0 3 7
Swap: 0 0 0
$ mount | grep 18
/dev/nvme1n1 on /db/pgsql/18 type ext4 (rw,relatime)
使用默认设置初始化 PostgreSQL:
$ /usr/pgsql/18/bin/initdb --pgdata=/db/pgsql/18/data/
更改以下的设置:
$ echo "shared_buffers='2GB'" >> /db/pgsql/18/data/postgresql.auto.conf
$ echo "checkpoint_timeout='20min'" >> /db/pgsql/18/data/postgresql.auto.conf
$ echo "random_page_cost=1.1" >> /db/pgsql/18/data/postgresql.auto.conf
$ echo "max_wal_size='8GB'" >> /db/pgsql/18/data/postgresql.auto.conf
$ /usr/pgsql/18/bin/pg_ctl --pgdata=/db/pgsql/18/data/ -l /dev/null start
$ /usr/pgsql/18/bin/psql -c "select version()"
version
--------------------------------------------------------------------
PostgreSQL 18devel on x86_64-linux, compiled by gcc-12.2.0, 64-bit
(1 row)
$ export PATH=/usr/pgsql/18/bin/:$PATH
pgbench 数据加载测试
第一个测试是数据加载。设置io_method
为 “worker”,测试加载数据需要多长时间,连续进行 3 次测试。这样会生成大约 1536MB 的数据集:
$ pgbench -i -s 100
dropping old tables...
NOTICE: table "pgbench_accounts" does not exist, skipping
NOTICE: table "pgbench_branches" does not exist, skipping
NOTICE: table "pgbench_history" does not exist, skipping
NOTICE: table "pgbench_tellers" does not exist, skipping
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 31.85 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 24.82 s, vacuum 0.35 s, primary keys 6.68 s).
$ pgbench -i -s 100
dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 31.97 s (drop tables 0.24 s, create tables 0.00 s, client-side generate 25.44 s, vacuum 0.34 s, primary keys 5.93 s).
$ pgbench -i -s 100
dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 30.72 s (drop tables 0.26 s, create tables 0.00 s, client-side generate 23.93 s, vacuum 0.55 s, primary keys 5.98 s).
使用 “sync” 进行相同的测试:
$ psql -c "alter system set io_method='sync'"
$ pg_ctl --pgdata=/db/pgsql/18/data/ restart -l /dev/null
$ psql -c "show io_method"
io_method
-----------
sync
(1 row)
$ pgbench -i -s 100
dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 20.89 s (drop tables 0.29 s, create tables 0.01 s, client-side generate 14.70 s, vacuum 0.45 s, primary keys 5.44 s).
$ pgbench -i -s 100
dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 21.57 s (drop tables 0.20 s, create tables 0.00 s, client-side generate 16.13 s, vacuum 0.46 s, primary keys 4.77 s).
$ pgbench -i -s 100
dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 21.44 s (drop tables 0.20 s, create tables 0.00 s, client-side generate 16.04 s, vacuum 0.52 s, primary keys 4.67 s).
最后使用 “io_uring” 进行测试:
$ psql -c "alter system set io_method='io_uring'"
ALTER SYSTEM
$ pg_ctl --pgdata=/db/pgsql/18/data/ restart -l /dev/null
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started
$ psql -c "show io_method"
io_method
-----------
io_uring
(1 row)
$ pgbench -i -s 100
dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 20.63 s (drop tables 0.35 s, create tables 0.01 s, client-side generate 14.92 s, vacuum 0.47 s, primary keys 4.88 s).
$ pgbench -i -s 100
dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 20.81 s (drop tables 0.29 s, create tables 0.00 s, client-side generate 14.43 s, vacuum 0.46 s, primary keys 5.63 s).
$ pgbench -i -s 100
dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 21.11 s (drop tables 0.24 s, create tables 0.00 s, client-side generate 15.63 s, vacuum 0.53 s, primary keys 4.70 s).
“sync” 和 “io_uring” 之间没有太大区别,但对于这种类型的业务负载,“worker” 明显较慢。
pgbench 基准测试
接下来,让我们看看标准的 pgbench 基准测试结果如何。我们从 “io_uring” 开始,因为这是当前的设置:
$ pgbench --time=600 --client=2 --jobs=2
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 2
number of threads: 2
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 567989
number of failed transactions: 0 (0.000%)
latency average = 2.113 ms
initial connection time = 8.996 ms
tps = 946.659673 (without initial connection time)
$ pgbench --time=600 --client=2 --jobs=2
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 2
number of threads: 2
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 557640
number of failed transactions: 0 (0.000%)
latency average = 2.152 ms
initial connection time = 6.994 ms
tps = 929.408406 (without initial connection time)
$ pgbench --time=600 --client=2 --jobs=2
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 2
number of threads: 2
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 563613
number of failed transactions: 0 (0.000%)
latency average = 2.129 ms
initial connection time = 16.351 ms
tps = 939.378627 (without initial connection time)
使用 “worker” 进行相同的测试:
$ psql -c "alter system set io_method='worker'"
$ pg_ctl --pgdata=/db/pgsql/18/data/ restart -l /dev/null
waiting for server to shut down............. done
server stopped
waiting for server to start.... done
server started
$ pgbench --time=600 --client=2 --jobs=2
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 2
number of threads: 2
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 549176
number of failed transactions: 0 (0.000%)
latency average = 2.185 ms
initial connection time = 7.189 ms
tps = 915.301403 (without initial connection time)
$ pgbench --time=600 --client=2 --jobs=2
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 2
number of threads: 2
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 564898
number of failed transactions: 0 (0.000%)
latency average = 2.124 ms
initial connection time = 11.332 ms
tps = 941.511304 (without initial connection time)
$ pgbench --time=600 --client=2 --jobs=2
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 2
number of threads: 2
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 563041
number of failed transactions: 0 (0.000%)
latency average = 2.131 ms
initial connection time = 9.120 ms
tps = 938.412979 (without initial connection time)
最后使用 “sync” 进行相同的测试:
$ psql -c "alter system set io_method='sync'"
$ pg_ctl --pgdata=/db/pgsql/18/data/ restart -l /dev/null
waiting for server to shut down............ done
server stopped
waiting for server to start.... done
server started
$ pgbench --time=600 --client=2 --jobs=2
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 2
number of threads: 2
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 560420
number of failed transactions: 0 (0.000%)
latency average = 2.141 ms
initial connection time = 12.000 ms
tps = 934.050237 (without initial connection time)
$ pgbench --time=600 --client=2 --jobs=2
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 2
number of threads: 2
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 560077
number of failed transactions: 0 (0.000%)
latency average = 2.143 ms
initial connection time = 7.204 ms
tps = 933.469665 (without initial connection time)
$ pgbench --time=600 --client=2 --jobs=2
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 2
number of threads: 2
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 566150
number of failed transactions: 0 (0.000%)
latency average = 2.120 ms
initial connection time = 7.579 ms
tps = 943.591451 (without initial connection time)
如你所见,无论使用哪种io_method
,差异都不大。让我们给系统施加更多压力(这里仅给出测试结果的摘要):
$ pgbench --time=600 --client=10 --jobs=10
## sync
tps = 2552.785398 (without initial connection time)
tps = 2505.476064 (without initial connection time)
tps = 2542.419230 (without initial connection time)
## io_uring
tps = 2511.138931 (without initial connection time)
tps = 2529.705311 (without initial connection time)
tps = 2573.195751 (without initial connection time)
## worker
tps = 2531.657962 (without initial connection time)
tps = 2523.854335 (without initial connection time)
tps = 2515.490351 (without initial connection time)
从某些方面来看,差异并不大。最后再进行一项测试,给系统施加更大的压力:
$ pgbench --time=600 --client=20 --jobs=20
## worker
tps = 2930.268033 (without initial connection time)
tps = 2799.499964 (without initial connection time)
tps = 3033.491153 (without initial connection time)
## io_uring
tps = 2942.542882 (without initial connection time)
tps = 3061.487286 (without initial connection time)
tps = 2995.175169 (without initial connection time)
## sync
tps = 2997.654084 (without initial connection time)
tps = 2924.269626 (without initial connection time)
tps = 2753.853272 (without initial connection time)
至少就这些测试而言,io_method
的三种设置之间并没有太大差异(“sync” 似乎要稍微慢一点),但仍然表现不错。对于这样一项重大的特性来说,能达到与之前相同的性能表现就已经很不错了。PostgreSQL 的性能一直在提升,而且相信在这个领域也会带来很多改进。
非常不错的体验。感谢社区的所有相关人员。
参考
提交日志:https://git.postgresql.org/pg/commitdiff/da7226993fd4b73d8b40abb7167d124eada97f2e