更新 PostgreSQL 表引起的索引扫描性能波动

由 John Doe 七月 29, 2025

摘要：在本文中，我们将了解更新 PostgreSQL 表引起的索引扫描性能波动问题，以及相关的解决方案。

索引扫描场景: 用例

下面我们在 PostgreSQL 的数据库中，创建一个表，插入一些数据，在表的id字段上创建索引。

CREATE TABLE t_large (id integer, name text);
INSERT INTO t_large (id, name)
  SELECT i, repeat('Pg', 64)
    FROM generate_series(1, 1000000) AS s(i);

CREATE INDEX large_idx ON t_large (id);
VACUUM ANALYZE t_large;

让我们先确认下这个查询确实使用了索引扫描，并记录下该查询的执行计划和统计信息。

EXPLAIN (analyze, buffers, costs off) SELECT count(*) FROM t_large;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate (actual time=52.110..54.959 rows=1 loops=1)
   Buffers: shared hit=2738
   ->  Gather (actual time=51.997..54.951 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=2738
         ->  Partial Aggregate (actual time=49.467..49.468 rows=1 loops=3)
               Buffers: shared hit=2738
               ->  Parallel Index Only Scan using large_idx on t_large (actual time=0.036..33.244 rows=333333 loops=3)
                     Heap Fetches: 0
                     Buffers: shared hit=2738
 Planning Time: 0.062 ms
 Execution Time: 54.983 ms
(13 rows)

让我们对表中的一部分数据进行一些更新，当然，也可以删除一部分数据。

UPDATE t_large SET name = 'dummy' WHERE mod(id, 100) = 1;
DELETE FROM t_large WHERE mod(id, 100) = 50;

让我们再来看看原来的查询使用的执行计划。

EXPLAIN (analyze, buffers, costs off) SELECT count(*) FROM t_large;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate (actual time=105.906..118.424 rows=1 loops=1)
   Buffers: shared hit=25293 dirtied=2733
   ->  Gather (actual time=105.848..118.418 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=25293 dirtied=2733
         ->  Partial Aggregate (actual time=102.351..102.352 rows=1 loops=3)
               Buffers: shared hit=25293 dirtied=2733
               ->  Parallel Index Only Scan using large_idx on t_large (actual time=0.070..85.082 rows=330000 loops=3)
                     Heap Fetches: 980000
                     Buffers: shared hit=25293 dirtied=2733
 Planning Time: 0.061 ms
 Execution Time: 118.456 ms
(13 rows)

从上面的测试结果来看，在我们对表t_large更新和删除一部分数据后，原来的查询使用的执行计划没有发生变化，查询执行的时间却延长了一倍多！并且，这还是发生在数据都缓存到了共享缓冲区中的情况下！

索引扫描场景: 内部原理

PostgreSQL 的索引不存储可见性信息，元组的可见性需要借助可见性映射表来确定。每个堆表都有一个可见性映射表，用于跟踪哪些页面仅包含已知对所有活动事务可见的元组。每当一个索引方法返回指向可见性映射表中标记的页面的行 TID 时，您就可以确定其中的所有数据都对事务可见。

下面是我们在 PostgreSQL 中进行表更新时，索引扫描的过程发生的变化：

索引记录上没有事务状态信息，需要依赖可见性映射表上的堆表页面状态。
更新堆表页面内的单个元组，会改变页面的可见性状态。
当堆表页面不是全部可见的状态时，Index Only Scan 需要查看堆表页面上的元组事务状态，以确定索引记录是否可见。

Redrock Postgres 的解决方案

Redrock Postgres 引入了撤消日志，修改元组时会标记删除索引记录，给索引记录标记事务状态信息。在不更新每个索引的情况下，只执行相关索引列的更新，修改表数据并不会影响索引扫描。

下面我们在 Redrock Postgres 的数据库中，创建和上面相同的表，并插入同样的数据，对表数据进行更新操作，查看查询的执行计划和执行时间。

EXPLAIN (analyze, buffers, costs off) SELECT count(*) FROM t_large;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate (actual time=53.932..56.645 rows=1 loops=1)
   ->  Gather (actual time=53.824..56.637 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate (actual time=50.878..50.879 rows=1 loops=3)
               ->  Parallel Index Only Scan using large_idx on t_large (actual time=0.022..32.856 rows=333333 loops=3)
                     Heap Fetches: 0
 Planning Time: 0.064 ms
 Execution Time: 56.721 ms
(9 rows)

UPDATE t_large SET name = 'dummy' WHERE mod(id, 100) = 1;
DELETE FROM t_large WHERE mod(id, 100) = 50;

EXPLAIN (analyze, buffers, costs off) SELECT count(*) FROM t_large;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate (actual time=49.150..50.749 rows=1 loops=1)
   ->  Gather (actual time=49.008..50.742 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate (actual time=46.259..46.260 rows=1 loops=3)
               ->  Parallel Index Only Scan using large_idx on t_large (actual time=0.023..29.097 rows=330000 loops=3)
                     Heap Fetches: 0
 Planning Time: 0.080 ms
 Execution Time: 50.773 ms
(9 rows)