一月 10, 2025
摘要:在本教程中,我们将对比 PostgreSQL 中的参数调优与查询优化。
目录
优化的事实
调优参数可以提高性能 10%、20%,在某些情况下可以提高 50%。
优化查询可以将性能提高几倍或数十倍。
调优应用程序可以将性能提高多达数百倍!
查询示例
让我们从 postgres_air 项目中获取一个 SQL 查询。
SELECT f.flight_no,
f.actual_departure,
count(passenger_id) passengers
FROM flight f
JOIN booking_leg bl ON bl.flight_id = f.flight_id
JOIN passenger p ON p.booking_id=bl.booking_id
WHERE f.departure_airport = 'JFK'
AND f.arrival_airport = 'ORD'
AND f.actual_departure BETWEEN '2023-08-08' and '2023-08-12'
GROUP BY f.flight_id, f.actual_departure;
使用缺省内存配置的执行计划
几个重要的配置参数如下:
shared_buffers = 128MB
work_mem = 4MB
max_parallel_workers_per_gather = 0
下面是上述查询的执行计划:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=406761.88..406769.58 rows=4 width=24) (actual time=2137.359..2137.373 rows=4 loops=1)
Group Key: f.flight_id
Buffers: shared hit=4157 read=171641
-> Sort (cost=406761.88..406764.44 rows=1021 width=20) (actual time=2137.352..2137.357 rows=163 loops=1)
Sort Key: f.flight_id
Sort Method: quicksort Memory: 36kB
Buffers: shared hit=4157 read=171641
-> Hash Join (cost=10712.91..406710.86 rows=1021 width=20) (actual time=226.312..2137.305 rows=163 loops=1)
Hash Cond: (p.booking_id = bl.booking_id)
Buffers: shared hit=4157 read=171641
-> Seq Scan on passenger p (cost=0.00..334810.99 rows=16313799 width=8) (actual time=0.269..1263.191 rows=16313693 loops=1)
Buffers: shared hit=32 read=171641
-> Hash (cost=10711.60..10711.60 rows=105 width=20) (actual time=18.509..18.510 rows=69 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
Buffers: shared hit=4125
-> Nested Loop (cost=124.94..10711.60 rows=105 width=20) (actual time=4.636..18.457 rows=69 loops=1)
Buffers: shared hit=4125
-> Bitmap Heap Scan on flight f (cost=119.84..9349.49 rows=4 width=16) (actual time=4.602..18.257 rows=4 loops=1)
Recheck Cond: (departure_airport = 'JFK'::bpchar)
Filter: ((actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone) AND (arrival_airport = 'ORD'::bpchar))
Rows Removed by Filter: 10526
Heap Blocks: exact=4085
Buffers: shared hit=4096
-> Bitmap Index Scan on flight_departure_airport (cost=0.00..119.84 rows=10589 width=0) (actual time=0.868..0.868 rows=10530 loops=1)
Index Cond: (departure_airport = 'JFK'::bpchar)
Buffers: shared hit=11
-> Bitmap Heap Scan on booking_leg bl (cost=5.10..339.68 rows=85 width=8) (actual time=0.023..0.034 rows=17 loops=4)
Recheck Cond: (flight_id = f.flight_id)
Heap Blocks: exact=17
Buffers: shared hit=29
-> Bitmap Index Scan on booking_leg_flight_id (cost=0.00..5.08 rows=85 width=0) (actual time=0.014..0.014 rows=17 loops=4)
Index Cond: (flight_id = f.flight_id)
Buffers: shared hit=12
Execution Time: 2.4 s
调整下并行执行的参数:
shared_buffers = 128MB
work_mem = 4MB
max_parallel_workers_per_gather = 2
下面是新的执行计划:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=276065.38..276069.61 rows=4 width=24) (actual time=1994.273..1995.855 rows=4 loops=1)
Group Key: f.flight_id
Buffers: shared hit=4354 read=171481
-> Gather Merge (cost=276065.38..276069.53 rows=8 width=24) (actual time=1994.254..1995.844 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=4354 read=171481
-> Partial GroupAggregate (cost=275065.36..275068.59 rows=4 width=24) (actual time=1985.822..1985.845 rows=3 loops=3)
Group Key: f.flight_id
Buffers: shared hit=4354 read=171481
-> Sort (cost=275065.36..275066.42 rows=425 width=20) (actual time=1985.814..1985.831 rows=54 loops=3)
Sort Key: f.flight_id
Sort Method: quicksort Memory: 29kB
Buffers: shared hit=4354 read=171481
Worker 0: Sort Method: quicksort Memory: 27kB
Worker 1: Sort Method: quicksort Memory: 29kB
-> Parallel Hash Join (cost=9907.55..275046.81 rows=425 width=20) (actual time=785.555..1985.787 rows=54 loops=3)
Hash Cond: (p.booking_id = bl.booking_id)
Buffers: shared hit=4340 read=171481
-> Parallel Seq Scan on passenger p (cost=0.00..239647.16 rows=6797416 width=8) (actual time=1.150..1644.452 rows=5437898 loops=3)
Buffers: shared hit=192 read=171481
-> Parallel Hash (cost=9907.00..9907.00 rows=44 width=20) (actual time=11.744..11.757 rows=23 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 72kB
Buffers: shared hit=4126
-> Nested Loop (cost=124.94..9907.00 rows=44 width=20) (actual time=5.681..11.705 rows=23 loops=3)
Buffers: shared hit=4126
-> Parallel Bitmap Heap Scan on flight f (cost=119.84..9225.95 rows=2 width=16) (actual time=5.646..11.615 rows=1 loops=3)
Recheck Cond: (departure_airport = 'JFK'::bpchar)
Filter: ((actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone) AND (arrival_airport = 'ORD'::b...
Rows Removed by Filter: 3509
Heap Blocks: exact=2684
Buffers: shared hit=4096
-> Bitmap Index Scan on flight_departure_airport (cost=0.00..119.84 rows=10589 width=0) (actual time=2.535..2.538 rows=10530 loops=1)
Index Cond: (departure_airport = 'JFK'::bpchar)
Buffers: shared hit=11
-> Bitmap Heap Scan on booking_leg bl (cost=5.10..339.68 rows=85 width=8) (actual time=0.031..0.052 rows=17 loops=4)
Recheck Cond: (flight_id = f.flight_id)
Heap Blocks: exact=13
Buffers: shared hit=30
-> Bitmap Index Scan on booking_leg_flight_id (cost=0.00..5.08 rows=85 width=0) (actual time=0.024..0.024 rows=17 loops=4)
Index Cond: (flight_id = f.flight_id)
Buffers: shared hit=13
Execution Time: 2.1 s
调整内存参数
让我们保持shared_buffers = 128MB
和max_parallel_workers_per_gather = 2
,并调大work_mem
的值。
使用work_mem = 500MB
时,查询执行时间为 1.8 秒。
使用work_mem = 1GB
时,查询执行时间仍为 1.8 秒。
好的,让我们增加 shared_buffers 的值,这需要重新启动 PostgreSQL。
当 shared_buffers = 1GB,work_mem = 4MB/100MB/500MB 时,查询执行时间为 1 秒。
当 shared_buffers = 2GB,work_mem = 500MB 时,查询执行时间为 1.1 秒。
优化查询
让我们来仔细看看到目前为止的执行计划。
-> Nested Loop (cost=124.94..9907.00 rows=44 width=20) (actual time=5.681..11.705 rows=23 loops=3)
Buffers: shared hit=4126
-> Parallel Bitmap Heap Scan on flight f (cost=119.84..9225.95 rows=2 width=16) (actual time=5.646..11.615 rows=1 loops=3)
Recheck Cond: (departure_airport = 'JFK'::bpchar)
Filter: ((actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone) AND (arrival_airport = 'ORD'::b...
Rows Removed by Filter: 3509
Heap Blocks: exact=2684
Buffers: shared hit=4096
-> Bitmap Index Scan on flight_departure_airport (cost=0.00..119.84 rows=10589 width=0) (actual time=2.535..2.538 rows=10530 loops=1)
Index Cond: (departure_airport = 'JFK'::bpchar)
Buffers: shared hit=11
在查找 8 月 8 日至 12 日之间的出发日期时,采用了堆扫描(Heap scan)。
让我们来创建一个索引:
CREATE INDEX flight_actual_departure ON postgres_air.flight(actual_departure);
创建索引后的执行计划如下:
-> Bitmap Heap Scan on flight f (cost=368.65..1232.58 rows=4 width=16) (actual time=2.200..2.483 rows=4 loops=3)
Recheck Cond: ((departure_airport = 'JFK'::bpchar) AND (actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone))
Filter: (arrival_airport = 'ORD'::bpchar)
Rows Removed by Filter: 229
Heap Blocks: exact=156
Buffers: shared hit=607
-> BitmapAnd (cost=368.65..368.65 rows=249 width=0) (actual time=2.092..2.093 rows=0 loops=3)
Buffers: shared hit=139
-> Bitmap Index Scan on flight_departure_airport (cost=0.00..120.19 rows=10635 width=0) (actual time=0.973..0.973 rows=10530 loops=3)
Index Cond: (departure_airport = 'JFK'::bpchar)
Buffers: shared hit=35
-> Bitmap Index Scan on flight_actual_departure (cost=0.00..248.22 rows=15979 width=0) (actual time=0.905..0.905 rows=15873 loops=3)
Index Cond: ((actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone))
Execution Time: 0.7 s
在基于 booking_id 连接 booking_leg 表时,对 passenger 表进行了顺序扫描。
-> Hash Join (cost=2243.76..267385.35 rows=473 width=20) (actual time=490.297..2186.655 rows=54 loops=3)
Hash Cond: (p.booking_id = bl.booking_id)
Buffers: shared hit=498 read=171481
-> Parallel Seq Scan on passenger p (cost=0.00..239646.72 rows=6797372 width=8) (actual time=0.521..1941.157 rows=5437898 loops=3)
Buffers: shared hit=192 read=171481
-> Hash (cost=2242.12..2242.12 rows=131 width=20) (actual time=4.722..4.725 rows=69 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
Buffers: shared hit=284
让我们创建另一个索引:
CREATE INDEX passenger_booking_id ON postgres_air.passenger(booking_id);
创建索引后的执行计划如下:
HashAggregate (cost=2682.39..2682.43 rows=4 width=24) (actual time=2.554..2.560 rows=4 loops=1)
Group Key: f.flight_id
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=539
-> Nested Loop (cost=374.19..2676.85 rows=1108 width=20) (actual time=1.675..2.464 rows=163 loops=1)
Buffers: shared hit=539
-> Nested Loop (cost=373.75..2594.69 rows=105 width=20) (actual time=1.653..1.989 rows=69 loops=1)
Buffers: shared hit=230
-> Bitmap Heap Scan on flight f (cost=368.65..1232.58 rows=4 width=16) (actual time=1.642..1.878 rows=4 loops=1)
Recheck Cond: ((departure_airport = 'JFK'::bpchar) AND (actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone))
Filter: (arrival_airport = 'ORD'::bpchar)
Rows Removed by Filter: 229
Heap Blocks: exact=156
Buffers: shared hit=201
-> BitmapAnd (cost=368.65..368.65 rows=249 width=0) (actual time=1.572..1.574 rows=0 loops=1)
Buffers: shared hit=45
-> Bitmap Index Scan on flight_departure_airport (cost=0.00..120.19 rows=10635 width=0) (actual time=0.653..0.653 rows=10530 loops=1)
Index Cond: (departure_airport = 'JFK'::bpchar)
Buffers: shared hit=11
-> Bitmap Index Scan on flight_actual_departure (cost=0.00..248.22 rows=15979 width=0) (actual time=0.741..0.742 rows=15873 loops=1)
Index Cond: ((actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone))
Buffers: shared hit=34
-> Bitmap Heap Scan on booking_leg bl (cost=5.10..339.68 rows=85 width=8) (actual time=0.008..0.018 rows=17 loops=4)
Recheck Cond: (flight_id = f.flight_id)
Heap Blocks: exact=17
Buffers: shared hit=29
-> Bitmap Index Scan on booking_leg_flight_id (cost=0.00..5.08 rows=85 width=0) (actual time=0.005..0.005 rows=17 loops=4)
Index Cond: (flight_id = f.flight_id)
Buffers: shared hit=12
-> Index Scan using passenger_booking_id on passenger p (cost=0.43..0.67 rows=11 width=8) (actual time=0.004..0.006 rows=2 loops=69)
Index Cond: (booking_id = bl.booking_id)
Buffers: shared hit=309
Execution Time: 60 ms
如果我们将参数恢复为默认值,会发生什么情况?
结果是,执行计划会保持不变,查询执行的速度也保持不变!
理解参数的作用
配置参数的作用在于,将硬件的能力传达给 PostgreSQL。
例如:
- 服务器内存 16 GB 时,shared_buffers = 128MB
- 服务器内存 16 GB 时,shared_buffers = 4GB,work_mem = 200MB,max_connections = 1000
- random_page_cost = 4