By John Doe July 29, 2025
Summary: In this article, we’ll take a look at index scan performance fluctuations that comes with heavy updates in PostgreSQL, and corresponding solutions.
Table of Contents
Index scan scenario: Use case
In PostgreSQL, let’s create a table, insert some data, and create an index on the id
field of the table.
CREATE TABLE t_large (id integer, name text);
INSERT INTO t_large (id, name)
SELECT i, repeat('Pg', 64)
FROM generate_series(1, 1000000) AS s(i);
CREATE INDEX large_idx ON t_large (id);
VACUUM ANALYZE t_large;
Let’s first confirm that the query does use index scanning and record the execution plan and statistics for the query.
EXPLAIN (analyze, buffers, costs off) SELECT count(*) FROM t_large;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (actual time=52.110..54.959 rows=1 loops=1)
Buffers: shared hit=2738
-> Gather (actual time=51.997..54.951 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2738
-> Partial Aggregate (actual time=49.467..49.468 rows=1 loops=3)
Buffers: shared hit=2738
-> Parallel Index Only Scan using large_idx on t_large (actual time=0.036..33.244 rows=333333 loops=3)
Heap Fetches: 0
Buffers: shared hit=2738
Planning Time: 0.062 ms
Execution Time: 54.983 ms
(13 rows)
Let’s make some updates to some of the data in the table and delete some of the data.
UPDATE t_large SET name = 'dummy' WHERE mod(id, 100) = 1;
DELETE FROM t_large WHERE mod(id, 100) = 50;
Let’s look again at the execution plan used by the original query.
EXPLAIN (analyze, buffers, costs off) SELECT count(*) FROM t_large;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (actual time=105.906..118.424 rows=1 loops=1)
Buffers: shared hit=25293 dirtied=2733
-> Gather (actual time=105.848..118.418 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=25293 dirtied=2733
-> Partial Aggregate (actual time=102.351..102.352 rows=1 loops=3)
Buffers: shared hit=25293 dirtied=2733
-> Parallel Index Only Scan using large_idx on t_large (actual time=0.070..85.082 rows=330000 loops=3)
Heap Fetches: 980000
Buffers: shared hit=25293 dirtied=2733
Planning Time: 0.061 ms
Execution Time: 118.456 ms
(13 rows)
In the above example, after some data is updated and deleted from the t_large
table, the execution plan used by the original query does not change, but the query execution time more than doubled! What’s more, this happened even when all the data are cached in the shared buffer!
Index scan scenario: How it works
PostgreSQL indexes don’t store visibility information; the visibility of tuples needs to be determined with the help of the visibility map. Each heap relation has a visibility map to keep track of which pages contain only the tuples that are known to be visible to all active transactions. Whenever an index method returns a row ID that points to a page which is flagged in the visibility map, you can tell for certain that all the data there is visible to your transaction.
The following are the changes that occur in the index scanning process when we perform table updates in PostgreSQL:
- There is no transaction status information on index records, which need to rely on the heap table page status in the visibility map.
- Updating a single tuple within a heap table page will change the visibility status of the page.
- When a heap table page is not in a fully visible state, an
Index Only Scan
needs to check the transaction status of tuples on the heap table page to determine whether the index records are visible.
The solution by Redrock Postgres
Redrock Postgres introduced undo logging. When a tuple is modified, index records are marked for deletion and transaction status information is recorded for index records. If each index is not updated, only related index columns are updated. Modifying table data does not affect index scanning.
In Redrock Postgres, let’s create a table, insert some data, and update the table data, and then check the execution plan and execution time of the query.
EXPLAIN (analyze, buffers, costs off) SELECT count(*) FROM t_large;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (actual time=53.932..56.645 rows=1 loops=1)
-> Gather (actual time=53.824..56.637 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (actual time=50.878..50.879 rows=1 loops=3)
-> Parallel Index Only Scan using large_idx on t_large (actual time=0.022..32.856 rows=333333 loops=3)
Heap Fetches: 0
Planning Time: 0.064 ms
Execution Time: 56.721 ms
(9 rows)
UPDATE t_large SET name = 'dummy' WHERE mod(id, 100) = 1;
DELETE FROM t_large WHERE mod(id, 100) = 50;
EXPLAIN (analyze, buffers, costs off) SELECT count(*) FROM t_large;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (actual time=49.150..50.749 rows=1 loops=1)
-> Gather (actual time=49.008..50.742 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (actual time=46.259..46.260 rows=1 loops=3)
-> Parallel Index Only Scan using large_idx on t_large (actual time=0.023..29.097 rows=330000 loops=3)
Heap Fetches: 0
Planning Time: 0.080 ms
Execution Time: 50.773 ms
(9 rows)
In the above example, after some data is updated and deleted from the t_large
table, the execution plan used by the original query does not change, and the query execution time is not affected.