PostgreSQL Tutorial: 'Rows Removed By Filter' in EXPLAIN output

January 6, 2025

Summary: in this tutorial, you will learn what’s ‘Rows Removed By Filter’ in EXPLAIN output all about.

Table of Contents

Example

We’re using an users table with 20,000+ rows, which has a name_code column, its value was populated with random strings.

-- Sample rows, note name_code "BA492"
SELECT
id,
first_name,
last_name,
name_code
FROM users
ORDER BY id DESC
LIMIT 3;
  id   | first_name | last_name | name_code
-------+------------+-----------+-----------
 20210 | Brooks     | Aufderhar | BA492
 20209 | Nereida    | Goodwin   | NG8292
 20208 | Dillon     | Rodriguez | DR9151

-- total rows, and all rows with name_code "BA492"
SELECT
COUNT(*) AS total_count,
COUNT(*) FILTER (WHERE "users"."name_code" = 'BA492') AS with_code
FROM users;
 total_count | with_code
-------------+-----------
       20210 |         1

-- Why is Rows Removed by Filter "20208" and not "20209"?
EXPLAIN (ANALYZE, BUFFERS, COSTS OFF) SELECT * FROM "users" WHERE
"users"."name_code" = 'BA492' LIMIT 1;
                           QUERY PLAN
------------------------------------------------------------------
 Limit (actual time=13.334..13.338 rows=1 loops=1)
   Buffers: shared hit=684
   -> Seq Scan on users (actual time=13.328..13.329 rows=1 loops=1)
        Filter: ((name_code)::text = 'BA492'::text)
        Rows Removed by Filter: 20208
        Buffers: shared hit=684
 Planning Time: 0.322 ms
 Execution Time: 13.645 ms

Digging Into “Rows Removed by Filter”

Let’s discuss the table and query details, and other circumstances.

We queried a “users” table with 20210 total rows. The table uses an integer sequence starting from 1.
New rows are not being inserted. We’re working with a static set of rows.
VACUUM has not run for this table.
We used the same WHERE clause in the query, with a name_code string column, which holds a mostly-unique (but not guaranteed or enforced with a constraint) “code” value for each row.
We set a LIMIT of 1 on the queries, but no ORDER BY, and no OFFSET. In practice, a LIMIT would often be accompanied by an ORDER BY, but that wasn’t the case here.
A Sequential Scan was used to access the row data because there was no index. In a production system, an index on the name_code column would be a good idea. In this tutorial we’re aiming to understand the behavior without that index.
All data was accessed from shared buffers, confirmed by adding BUFFERS to EXPLAIN (ANALYZE).

What was happening?

We might have expected “Rows Removed by Filter” to be one less than the total row count, but there are a lot of reasons for why that wouldn’t be the case.

With the LIMIT 1 in the query, as soon as PostgreSQL found any match, there was an “early return.” There was no need to access additional rows in pages.

When supplying name_code values from earlier inserted rows, in earlier pages, we’d see smaller values for “Rows Removed by Filter.”

For name_code values “late” in the insertion order, many more rows were removed before finding a match.

We could confirm this in the planner output by observing greater numbers of buffers accessed.

Performance details

As stated, when prepending EXPLAIN (ANALYZE, BUFFERS) on the query, we saw fewer buffers accessed for these “early matches.”

With fewer buffers accessed, there was less latency, and lower execution times. However, performance wasn’t the goal of this exercise. If it was, it would make more sense to add an index covering the name_code column.

Additional information about “Rows removed by filter”

Actually, “Rows Removed by Filter” is a per-loop average, rounded to the nearest integer.

What does this mean? When more than one loop is used for the plan node (e.g. loops=2 or greater), then “Rows Removed by Filter” is an average value of the “Rows Removed by Filter” per-loop.

For example, if one loop removed 10 rows, and another removed 30 rows, we’d expect to see a value of 20 as the average of the two.

When there’s one loop (loops=1), the figure is the actual number of rows processed and removed.

Experiments with OFFSET

When using OFFSET without ordering, it would be correlated with “Rows Removed by Filter.”

To start, when we grabbed the name_code from the first row with the LIMIT 1, we didn’t see “Rows Removed by Filter” at all. In the default planner output format, when zero rows are filtered, the message is not displayed.

When we choose the second row (based on the default ordering), we see a single shared hit, and we see one row removed.

Next, we tried going to the “middle” of the 20 thousand rows, using an OFFSET of 10000, and supplying the name_code value to the query that first row at that offset, again without specifying an ordering for the query.

With that name_code, we see “Rows Removed by Filter: 10000”, which exactly matched the offset.

Other Tidbits

The PostgreSQL EXPLAIN documentation describes how “Rows Removed by Filter” appears only when ANALYZE is added to EXPLAIN.
“Rows Removed by Filter” applies for filter conditions like a WHERE clause, but also for conditions on a JOIN node.
“Rows Removed by Filter” appears when at least one row is scanned for evaluation, or a “potential join pair” (for join nodes) when rows were discarded by the filter condition.
The ORDER BY clause is commonly used with LIMIT, which can produce more predictable planner results.

Takeaways

Without an explicit ordering (ORDER BY), and when using a LIMIT, the results of “Rows Removed by Filter” may be surprising.
When LIMIT 1 is used, PostgreSQL finds the first match and returns. The default ordering is likely the insertion order of the rows.
When analyzing “Rows Removed by Filter” figures, check whether the plan node had more than one loop. In that case, the rows are an average of all loops, rounded to the nearest integer.
For performance work, a high proportion of rows filtered out indicates an optimization opportunity. Adding an index may greatly reduce the filtering of so many rows, reducing storage access, and speeding up your query.