PostgreSQL 教程: 参数调优对比查询优化

一月 10, 2025

摘要:在本教程中,我们将对比 PostgreSQL 中的参数调优与查询优化。

目录

优化的事实

调优参数可以提高性能 10%、20%,在某些情况下可以提高 50%。

优化查询可以将性能提高几倍或数十倍。

调优应用程序可以将性能提高多达数百倍!

查询示例

让我们从 postgres_air 项目中获取一个 SQL 查询。

SELECT f.flight_no,
       f.actual_departure,
       count(passenger_id) passengers
  FROM flight f
       JOIN booking_leg bl ON bl.flight_id = f.flight_id
       JOIN passenger p ON p.booking_id=bl.booking_id
  WHERE f.departure_airport = 'JFK'
    AND f.arrival_airport = 'ORD'
    AND f.actual_departure BETWEEN '2023-08-08' and '2023-08-12'
  GROUP BY f.flight_id, f.actual_departure;

使用缺省内存配置的执行计划

几个重要的配置参数如下:

shared_buffers = 128MB
work_mem = 4MB
max_parallel_workers_per_gather = 0

下面是上述查询的执行计划:

                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate (cost=406761.88..406769.58 rows=4 width=24) (actual time=2137.359..2137.373 rows=4 loops=1)
   Group Key: f.flight_id
   Buffers: shared hit=4157 read=171641
   -> Sort (cost=406761.88..406764.44 rows=1021 width=20) (actual time=2137.352..2137.357 rows=163 loops=1)
        Sort Key: f.flight_id
        Sort Method: quicksort Memory: 36kB
        Buffers: shared hit=4157 read=171641
        -> Hash Join (cost=10712.91..406710.86 rows=1021 width=20) (actual time=226.312..2137.305 rows=163 loops=1)
             Hash Cond: (p.booking_id = bl.booking_id)
             Buffers: shared hit=4157 read=171641
             -> Seq Scan on passenger p (cost=0.00..334810.99 rows=16313799 width=8) (actual time=0.269..1263.191 rows=16313693 loops=1)
                  Buffers: shared hit=32 read=171641
             -> Hash (cost=10711.60..10711.60 rows=105 width=20) (actual time=18.509..18.510 rows=69 loops=1)
                  Buckets: 1024 Batches: 1 Memory Usage: 12kB
                  Buffers: shared hit=4125
                  -> Nested Loop (cost=124.94..10711.60 rows=105 width=20) (actual time=4.636..18.457 rows=69 loops=1)
                       Buffers: shared hit=4125
                       -> Bitmap Heap Scan on flight f (cost=119.84..9349.49 rows=4 width=16) (actual time=4.602..18.257 rows=4 loops=1)
                            Recheck Cond: (departure_airport = 'JFK'::bpchar)
                            Filter: ((actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone) AND (arrival_airport = 'ORD'::bpchar))
                            Rows Removed by Filter: 10526
                            Heap Blocks: exact=4085
                            Buffers: shared hit=4096
                            -> Bitmap Index Scan on flight_departure_airport (cost=0.00..119.84 rows=10589 width=0) (actual time=0.868..0.868 rows=10530 loops=1)
                                 Index Cond: (departure_airport = 'JFK'::bpchar)
                                 Buffers: shared hit=11
                       -> Bitmap Heap Scan on booking_leg bl (cost=5.10..339.68 rows=85 width=8) (actual time=0.023..0.034 rows=17 loops=4)
                            Recheck Cond: (flight_id = f.flight_id)
                            Heap Blocks: exact=17
                            Buffers: shared hit=29
                            -> Bitmap Index Scan on booking_leg_flight_id (cost=0.00..5.08 rows=85 width=0) (actual time=0.014..0.014 rows=17 loops=4)
                                 Index Cond: (flight_id = f.flight_id)
                                 Buffers: shared hit=12
 Execution Time: 2.4 s

调整下并行执行的参数:

shared_buffers = 128MB
work_mem = 4MB
max_parallel_workers_per_gather = 2

下面是新的执行计划:

                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate (cost=276065.38..276069.61 rows=4 width=24) (actual time=1994.273..1995.855 rows=4 loops=1)
   Group Key: f.flight_id
   Buffers: shared hit=4354 read=171481
   -> Gather Merge (cost=276065.38..276069.53 rows=8 width=24) (actual time=1994.254..1995.844 rows=10 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=4354 read=171481
        -> Partial GroupAggregate (cost=275065.36..275068.59 rows=4 width=24) (actual time=1985.822..1985.845 rows=3 loops=3)
             Group Key: f.flight_id
             Buffers: shared hit=4354 read=171481
             -> Sort (cost=275065.36..275066.42 rows=425 width=20) (actual time=1985.814..1985.831 rows=54 loops=3)
                  Sort Key: f.flight_id
                  Sort Method: quicksort Memory: 29kB
                  Buffers: shared hit=4354 read=171481
                  Worker 0: Sort Method: quicksort Memory: 27kB
                  Worker 1: Sort Method: quicksort Memory: 29kB
                  -> Parallel Hash Join (cost=9907.55..275046.81 rows=425 width=20) (actual time=785.555..1985.787 rows=54 loops=3)
                       Hash Cond: (p.booking_id = bl.booking_id)
                       Buffers: shared hit=4340 read=171481
                       -> Parallel Seq Scan on passenger p (cost=0.00..239647.16 rows=6797416 width=8) (actual time=1.150..1644.452 rows=5437898 loops=3)
                            Buffers: shared hit=192 read=171481
                       -> Parallel Hash (cost=9907.00..9907.00 rows=44 width=20) (actual time=11.744..11.757 rows=23 loops=3)
                            Buckets: 1024 Batches: 1 Memory Usage: 72kB
                            Buffers: shared hit=4126
                            -> Nested Loop (cost=124.94..9907.00 rows=44 width=20) (actual time=5.681..11.705 rows=23 loops=3)
                                 Buffers: shared hit=4126
                                 -> Parallel Bitmap Heap Scan on flight f (cost=119.84..9225.95 rows=2 width=16) (actual time=5.646..11.615 rows=1 loops=3)
                                      Recheck Cond: (departure_airport = 'JFK'::bpchar)
                                      Filter: ((actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone) AND (arrival_airport = 'ORD'::b...
                                      Rows Removed by Filter: 3509
                                      Heap Blocks: exact=2684
                                      Buffers: shared hit=4096
                                      -> Bitmap Index Scan on flight_departure_airport (cost=0.00..119.84 rows=10589 width=0) (actual time=2.535..2.538 rows=10530 loops=1)
                                           Index Cond: (departure_airport = 'JFK'::bpchar)
                                           Buffers: shared hit=11
                                 -> Bitmap Heap Scan on booking_leg bl (cost=5.10..339.68 rows=85 width=8) (actual time=0.031..0.052 rows=17 loops=4)
                                      Recheck Cond: (flight_id = f.flight_id)
                                      Heap Blocks: exact=13
                                      Buffers: shared hit=30
                                      -> Bitmap Index Scan on booking_leg_flight_id (cost=0.00..5.08 rows=85 width=0) (actual time=0.024..0.024 rows=17 loops=4)
                                           Index Cond: (flight_id = f.flight_id)
                                           Buffers: shared hit=13
 Execution Time: 2.1 s

调整内存参数

让我们保持shared_buffers = 128MBmax_parallel_workers_per_gather = 2,并调大work_mem的值。

使用work_mem = 500MB时,查询执行时间为 1.8 秒。

使用work_mem = 1GB时,查询执行时间仍为 1.8 秒。

好的,让我们增加 shared_buffers 的值,这需要重新启动 PostgreSQL。

当 shared_buffers = 1GB,work_mem = 4MB/100MB/500MB 时,查询执行时间为 1 秒。

当 shared_buffers = 2GB,work_mem = 500MB 时,查询执行时间为 1.1 秒。

优化查询

让我们来仔细看看到目前为止的执行计划。

 -> Nested Loop (cost=124.94..9907.00 rows=44 width=20) (actual time=5.681..11.705 rows=23 loops=3)
      Buffers: shared hit=4126
      -> Parallel Bitmap Heap Scan on flight f (cost=119.84..9225.95 rows=2 width=16) (actual time=5.646..11.615 rows=1 loops=3)
           Recheck Cond: (departure_airport = 'JFK'::bpchar)
           Filter: ((actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone) AND (arrival_airport = 'ORD'::b...
           Rows Removed by Filter: 3509
           Heap Blocks: exact=2684
           Buffers: shared hit=4096
           -> Bitmap Index Scan on flight_departure_airport (cost=0.00..119.84 rows=10589 width=0) (actual time=2.535..2.538 rows=10530 loops=1)
                Index Cond: (departure_airport = 'JFK'::bpchar)
                Buffers: shared hit=11

在查找 8 月 8 日至 12 日之间的出发日期时,采用了堆扫描(Heap scan)。

让我们来创建一个索引:

CREATE INDEX flight_actual_departure ON postgres_air.flight(actual_departure);

创建索引后的执行计划如下:

 -> Bitmap Heap Scan on flight f (cost=368.65..1232.58 rows=4 width=16) (actual time=2.200..2.483 rows=4 loops=3)
      Recheck Cond: ((departure_airport = 'JFK'::bpchar) AND (actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone))
      Filter: (arrival_airport = 'ORD'::bpchar)
      Rows Removed by Filter: 229
      Heap Blocks: exact=156
      Buffers: shared hit=607
      -> BitmapAnd (cost=368.65..368.65 rows=249 width=0) (actual time=2.092..2.093 rows=0 loops=3)
           Buffers: shared hit=139
           -> Bitmap Index Scan on flight_departure_airport (cost=0.00..120.19 rows=10635 width=0) (actual time=0.973..0.973 rows=10530 loops=3)
                Index Cond: (departure_airport = 'JFK'::bpchar)
                Buffers: shared hit=35
           -> Bitmap Index Scan on flight_actual_departure (cost=0.00..248.22 rows=15979 width=0) (actual time=0.905..0.905 rows=15873 loops=3)
                Index Cond: ((actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone))
 Execution Time: 0.7 s

在基于 booking_id 连接 booking_leg 表时,对 passenger 表进行了顺序扫描。

 -> Hash Join (cost=2243.76..267385.35 rows=473 width=20) (actual time=490.297..2186.655 rows=54 loops=3)
      Hash Cond: (p.booking_id = bl.booking_id)
      Buffers: shared hit=498 read=171481
      -> Parallel Seq Scan on passenger p (cost=0.00..239646.72 rows=6797372 width=8) (actual time=0.521..1941.157 rows=5437898 loops=3)
           Buffers: shared hit=192 read=171481
      -> Hash (cost=2242.12..2242.12 rows=131 width=20) (actual time=4.722..4.725 rows=69 loops=3)
           Buckets: 1024 Batches: 1 Memory Usage: 12kB
           Buffers: shared hit=284

让我们创建另一个索引:

CREATE INDEX passenger_booking_id ON postgres_air.passenger(booking_id);

创建索引后的执行计划如下:

 HashAggregate (cost=2682.39..2682.43 rows=4 width=24) (actual time=2.554..2.560 rows=4 loops=1)
   Group Key: f.flight_id
   Batches: 1 Memory Usage: 24kB
   Buffers: shared hit=539
   -> Nested Loop (cost=374.19..2676.85 rows=1108 width=20) (actual time=1.675..2.464 rows=163 loops=1)
        Buffers: shared hit=539
        -> Nested Loop (cost=373.75..2594.69 rows=105 width=20) (actual time=1.653..1.989 rows=69 loops=1)
             Buffers: shared hit=230
             -> Bitmap Heap Scan on flight f (cost=368.65..1232.58 rows=4 width=16) (actual time=1.642..1.878 rows=4 loops=1)
                  Recheck Cond: ((departure_airport = 'JFK'::bpchar) AND (actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone))
                  Filter: (arrival_airport = 'ORD'::bpchar)
                  Rows Removed by Filter: 229
                  Heap Blocks: exact=156
                  Buffers: shared hit=201
                  -> BitmapAnd (cost=368.65..368.65 rows=249 width=0) (actual time=1.572..1.574 rows=0 loops=1)
                       Buffers: shared hit=45
                       -> Bitmap Index Scan on flight_departure_airport (cost=0.00..120.19 rows=10635 width=0) (actual time=0.653..0.653 rows=10530 loops=1)
                            Index Cond: (departure_airport = 'JFK'::bpchar)
                            Buffers: shared hit=11
                       -> Bitmap Index Scan on flight_actual_departure (cost=0.00..248.22 rows=15979 width=0) (actual time=0.741..0.742 rows=15873 loops=1)
                            Index Cond: ((actual_departure >= '2023-08-08 00:00:00-05'::timestamp with time zone) AND (actual_departure <= '2023-08-12 00:00:00-05'::timestamp with time zone))
                            Buffers: shared hit=34
             -> Bitmap Heap Scan on booking_leg bl (cost=5.10..339.68 rows=85 width=8) (actual time=0.008..0.018 rows=17 loops=4)
                  Recheck Cond: (flight_id = f.flight_id)
                  Heap Blocks: exact=17
                  Buffers: shared hit=29
                  -> Bitmap Index Scan on booking_leg_flight_id (cost=0.00..5.08 rows=85 width=0) (actual time=0.005..0.005 rows=17 loops=4)
                       Index Cond: (flight_id = f.flight_id)
                       Buffers: shared hit=12
        -> Index Scan using passenger_booking_id on passenger p (cost=0.43..0.67 rows=11 width=8) (actual time=0.004..0.006 rows=2 loops=69)
             Index Cond: (booking_id = bl.booking_id)
             Buffers: shared hit=309
 Execution Time: 60 ms

如果我们将参数恢复为默认值,会发生什么情况?

结果是,执行计划会保持不变,查询执行的速度也保持不变!

理解参数的作用

配置参数的作用在于,将硬件的能力传达给 PostgreSQL。

例如:

  • 服务器内存 16 GB 时,shared_buffers = 128MB
  • 服务器内存 16 GB 时,shared_buffers = 4GB,work_mem = 200MB,max_connections = 1000
  • random_page_cost = 4

了解更多

PostgreSQL 优化