February 12, 2025
Summary: In this tutorial, you will learn how to tune index scans in PostgreSQL.
Table of Contents
Introduction
An common issue some folks often missing when reviewing query plans, is that all of their scans involve indexes, so they think that the query is likely already as fast as it can be.
Many know that a Seq Scan with a high number of Rows Removed by Filter is a sign an index could help. But the reason an index speeds things up there applies just as much to other scans with large filters, where a better suited index could speed things up! While often we will see index filters show as “Rows Removed by Filter”, sadly there are cases that aren’t currently reported by EXPLAIN.
In this tutorial we’ll look through several examples, some things to look out for, and how to then go about making things faster.
Note that, there are other cases that index use can still be optimised further. One such example is that, when there are a high number of Rows Removed by Index Recheck.
Initial setup
Let’s create a simple example table to demonstrate:
CREATE TABLE example (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
a smallint NOT NULL,
b smallint NOT NULL);
INSERT INTO example (a, b)
SELECT random (1, 1_000), random (1, 1_000)
FROM generate_series(1, 1_000_000);
VACUUM ANALYZE example;
SELECT * FROM example LIMIT 5;
id | a | b
----+-----+-----
1 | 803 | 627
2 | 978 | 702
3 | 15 | 506
4 | 966 | 335
5 | 247 | 172
So we have a million rows in total, with an integer primary key, and two extra columns a and b, which contain random integers between 1 and 1000.
Look out for Rows Removed by Filter
Let’s take a look at several indexing options, for optimising a simple query that filters on both a and b:
CREATE INDEX example_a_idx ON example (a);
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT id FROM example WHERE a = 42 AND b = 42;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.example (cost=11.86..2542.34 rows=1 width=8) (actual time=4.024..4.489 rows=1 loops=1)
Output: id
Recheck Cond: (example.a = 42)
Filter: (example.b = 42)
Rows Removed by Filter: 1016
Heap Blocks: exact=925
Buffers: shared hit=928
-> Bitmap Index Scan on example_a_idx (cost=0.00..11.86 rows=991 width=0) (actual time=0.407..0.407 rows=1017 loops=1)
Index Cond: (example.a = 42)
Buffers: shared hit=3
Planning Time: 0.143 ms
Execution Time: 4.538 ms
With an index on column a, Postgres chose to do a Bitmap Index Scan to find the rows where a=42, and then a Bitmap Heap Scan of the pages that contain those rows, filtering to only those where b also equals 42.
While this only took about 5ms in total. We can see that 1016 rows were removed by a filter, only to end up returning 1 row — not very efficient!
Another clue (thanks to the Buffers line) is that Postgres had to read 928 blocks in total (about 7MB), all but 3 of which were from the Bitmap Heap Scan stage.
Choose simple index scan
Would a simple index scan be any better than this Bitmap scan?
Let’s try discouraging bitmap scans, and running the query again:
SET enable_bitmapscan = off;
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT id FROM example WHERE a = 42 AND b = 42;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Index Scan using example_a_idx on public.example (cost=0.42..3655.99 rows=1 width=8) (actual time=1.438..1.682 rows=1 loops=1)
Output: id
Index Cond: (example.a = 42)
Filter: (example.b = 42)
Rows Removed by Filter: 1016
Buffers: shared hit=928
Settings: enable_bitmapscan = 'off'
Planning Time: 0.144 ms
Execution Time: 1.706 ms
This is faster, about 2ms, but we can still see those 1016 rows being removed by a filter. The Buffers line also shows we’re still reading 928 blocks. So these are both a pretty good sign that we can do better…
Multiple indexes
What if we also had an index on b?
RESET enable_bitmapscan;
CREATE INDEX example_b_idx ON example (b);
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT id FROM example WHERE a = 42 AND b = 42;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.example (cost=24.00..28.01 rows=1 width=8) (actual time=0.674..0.676 rows=1 loops=1)
Output: id
Recheck Cond: ((example.a = 42) AND (example.b = 42))
Heap Blocks: exact=1
Buffers: shared hit=8
-> BitmapAnd (cost=24.00..24.00 rows=1 width=0) (actual time=0.652..0.653 rows=0 loops=1)
Buffers: shared hit=7
-> Bitmap Index Scan on example_a_idx (cost=0.00..11.86 rows=991 width=0) (actual time=0.259..0.259 rows=1017 loops=1)
Index Cond: (example.a = 42)
Buffers: shared hit=3
-> Bitmap Index Scan on example_b_idx (cost=0.00..11.89 rows=995 width=0) (actual time=0.208..0.208 rows=991 loops=1)
Index Cond: (example.b = 42)
Buffers: shared hit=4
Planning Time: 0.143 ms
Execution Time: 0.706 ms
We’re now down to under 1ms, but crucially far fewer buffers (8 blocks, which is 64KB)! This time, Postgres scanned each index separately, before AND-ing the matching rows from each result, to find only the rows that matched both conditions.
Multicolumn index
While there is no longer an explicit “Rows Removed by Filter” field, those Bitmap Index Scans are still reading a few index entries that aren’t being returned. Let’s see if we can do even better, by trying a compound index on (a, b) that will sort entries we need right next to each other in a single index structure:
CREATE INDEX example_a_b_idx ON example (a, b);
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT id FROM example WHERE a = 42 AND b = 42;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Index Scan using example_a_b_idx on public.example (cost=0.42..8.45 rows=1 width=8) (actual time=0.032..0.033 rows=1 loops=1)
Output: id
Index Cond: ((example.a = 42) AND (example.b = 42))
Buffers: shared hit=4
Planning Time: 0.160 ms
Execution Time: 0.058 ms
Choose index only scan
We’re down to 4 buffer hits in total, and quite a lot faster! One last attempt, let’s try a compound index on (a, b, id), to try to get an index only scan:
CREATE INDEX example_a_b_id_idx ON example (a, b, id);
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT id FROM example WHERE a = 42 AND b = 42;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using example_a_b_id_idx on public.example (cost=0.42..4.44 rows=1 width=8) (actual time=0.029..0.031 rows=1 loops=1)
Output: id
Index Cond: ((example.a = 42) AND (example.b = 42))
Heap Fetches: 0
Buffers: shared hit=4
Planning Time: 0.115 ms
Execution Time: 0.046 ms
Despite avoiding a trip to the heap, we’re still reading 4 blocks in total (perhaps since the index is slightly larger), and as a result the query is not much faster.
From the first query plan (that was already pretty fast and using an index) we managed to still achieve several orders of magnitude improvement in both the execution time and the number of blocks read, mostly by trying to get to exactly the data we needed without needing to read in and filter out rows.
So, to recap, regardless of the scan types, if Postgres is having to read in more data than it needs, only to have to filter it out, there might be a more efficient way. Look out for Rows Removed by Filter.
Look out for loops
Importantly, the Rows Removed by Filter field is a per-loop average, rounded to the nearest integer. In the examples above, we always had “loops=1” so this was not relevant, but in other queries, especially those involving Nested Loops, it can be a huge factor.
Imagine you see “actual […] rows=1 loops=10000”, and “Rows Removed by Filter: 5”. That would mean that about 10k rows are being returned, but an additional 50k are being read and filtered in order to do so. 5 to 1 is already not a great efficiency ratio, but when multiplied by 10k, the time saving potential becomes very interesting!
As such, if you’re only seeing a low number of Rows Removed by Filter, but the number of Loops is high, then there’s possibly still a big improvement opportunity (again via better indexing).
Tuning index column order
As PostgreSQL does not support the Index Skip Scan, the column order in btree indexes can be extremely important for performance.
Column order when filtering on a single column
A common example is an index on (a, b) cannot be efficiently used for queries that only filter on b. To demonstrate, lets drop the other index we created only on b, and discourage Seq Scans:
DROP INDEX example_b_idx;
SET enable_seqscan = off;
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT id FROM example WHERE b = 42;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.example (cost=17132.67..19669.35 rows=995 width=8) (actual time=38.616..39.629 rows=991 loops=1)
Output: id
Recheck Cond: (example.b = 42)
Heap Blocks: exact=909
Buffers: shared hit=3308
-> Bitmap Index Scan on example_a_b_idx (cost=0.00..17132.42 rows=995 width=0) (actual time=38.443..38.443 rows=991 loops=1)
Index Cond: (example.b = 42)
Buffers: shared hit=2399
Settings: enable_seqscan = 'off'
Planning Time: 0.304 ms
Execution Time: 39.736 ms
That did use our example_a_b_idx index, but took nearly 40ms, and read 3308 blocks (about 26MB) in total.
Let’s try adding an index with the order (b, a) to see how much more efficiently that can be used for the same query:
RESET enable_seqscan;
CREATE INDEX example_b_a_idx ON example (b, a);
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT id FROM example WHERE b = 42;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.example (cost=20.14..2556.81 rows=995 width=8) (actual time=0.649..3.212 rows=991 loops=1)
Output: id
Recheck Cond: (example.b = 42)
Heap Blocks: exact=909
Buffers: shared hit=915
-> Bitmap Index Scan on example_b_a_idx (cost=0.00..19.89 rows=995 width=0) (actual time=0.393..0.394 rows=991 loops=1)
Index Cond: (example.b = 42)
Buffers: shared hit=6
Planning Time: 0.232 ms
Execution Time: 3.406 ms
That took less that 4ms, and read 915 blocks (about 7MB) in total — much better!
Sadly Postgres isn’t reporting any “Rows Removed by Filter” in the less efficient plan above, despite that being effectively what’s happening during the Bitmap Index Scan. We also cannot currently tell from the Index Cond whether it’s being used efficiently or not, but the execution time, costs, and buffers all give great clues.
Column order when filtering on multiple columns
Column order can also be very important for queries that filter on all of the conditions in an index.
A neat example is if one of the filters is a range condition. To demonstrate, lets try a query that should be well served by our new index on (b, a):
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT id FROM example WHERE b = 42 AND a < 100;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.example (cost=5.44..362.73 rows=99 width=8) (actual time=0.128..0.502 rows=116 loops=1)
Output: id
Recheck Cond: ((example.b = 42) AND (example.a < 100))
Heap Blocks: exact=115
Buffers: shared hit=118
-> Bitmap Index Scan on example_b_a_idx (cost=0.00..5.42 rows=99 width=0) (actual time=0.073..0.074 rows=116 loops=1)
Index Cond: ((example.b = 42) AND (example.a < 100))
Buffers: shared hit=3
Settings: enable_seqscan = 'off'
Planning Time: 0.304 ms
Execution Time: 0.577 ms
Even though this returned 116 rows, it’s running in under 1ms and reading 118 blocks (about 1MB).
Lets drop the index on (b, a) and discourage Seq Scans to see how efficiently the index on (a, b) can serve the same query:
DROP INDEX example_b_a_idx;
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT id FROM example WHERE b = 42 AND a < 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.example (cost=1955.74..2313.03 rows=99 width=8) (actual time=4.721..5.123 rows=116 loops=1)
Output: id
Recheck Cond: ((example.a < 100) AND (example.b = 42))
Heap Blocks: exact=115
Buffers: shared hit=355
-> Bitmap Index Scan on example_a_b_idx (cost=0.00..1955.71 rows=99 width=0) (actual time=4.623..4.624 rows=116 loops=1)
Index Cond: ((example.a < 100) AND (example.b = 42))
Buffers: shared hit=240
Planning Time: 0.305 ms
Execution Time: 5.200 ms
This took a little over 5ms, and read 355 blocks (about 3MB) to serve the same query.
Depending on the data distribution, and other factors like LIMIT clauses, this difference can get far more extreme, so it is important to pay close attention to the order of columns in your indexes!
Summary
While we haven’t discussed every way index scans can be inefficient for a given query, we have covered the most common cases.
Now, you at least have a few more ideas on things to look out for in query plans to help diagnose these types of inefficiency:
- High number of Rows Removed by Filter, even on index scan types
- High number of Loops, multiplied by the Rows Removed by Filter
- High numbers of Buffers, especially compared to the rows being returned
- The order of the columns in the index, and whether that fits the query well
The most important thing is that, if you just see the words “Index Scan”, it does not mean that scan is as fast as it can be!