August 22, 2025
Summary: In this tutorial, you will learn various types of join “nodes” / operations that you might see in explain plans.
Table of Contents
Hash Join / Hash
This operation has two sub operations. One of them is always “Hash", and the other is something else.
Hash Join is used, as name suggests to join two recordsets. For example like here:
explain analyze select * from pg_class c
join pg_namespace n on c.relnamespace = n.oid;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.14..16.07 rows=292 width=316) (actual time=0.036..0.343 rows=295 loops=1)
Hash Cond: (c.relnamespace = n.oid)
-> Seq Scan on pg_class c (cost=0.00..10.92 rows=292 width=203) (actual time=0.007..0.044 rows=295 loops=1)
-> Hash (cost=1.06..1.06 rows=6 width=117) (actual time=0.012..0.012 rows=6 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on pg_namespace n (cost=0.00..1.06 rows=6 width=117) (actual time=0.004..0.005 rows=6 loops=1)
Total runtime: 0.462 ms
(7 rows)
It works like this – first Hash Join calls “Hash", which in turns calls something else (Seq Scan on pg_namespace in our case). Then, Hash makes a memory (or disk, depending on size) hash/associative-array/dictionary with rows from the source, hashed using whatever is used to join the data (in our case, it’s OID column in pg_namespace).
Of course – you can have many rows for given join key (well, not in this case, as I’m joining using primary key, but generally, it’s perfectly possible to have multiple rows for single hash key.
So, using Perl notation, output of Hash is something like:
{
'123' => [ { data for row with OID = 123 }, ],
'256' => [ { data for row with OID = 256 }, ],
...
}
Then, Hash Join runs the second suboperation (Seq Scan on pg_class in our case), and for each row from it, it does:
- check if join key (pg_class.relnamespace in our case) is in hash returned by Hash operation
- if it is not – given row from suboperation is ignored (will not be returned)
- if it exists – Hash Join fetches rows from hash, and based on row from one side, and all rows from hash, it generates output rows
It is important to note that both sides are run only once ( in our case, these both are seq scans), but first (the one called by Hash) has to return all rows, which have to be stored in hash, and the other is processed one row at a time, and some rows will get skipped if they don’t exist in hash from the other side (hope the sentence is clear, there are many “hash"es there).
Of course, since both subscans can be any type of operation, these can do filter or index scan or whatever you can imagine.
Final note for Hash Join/Hash is that the Hash operation, just like Sort and HashAggregate – will use up to work_mem of memory.
Nested Loop
Since we’re at joins – we have to discuss Nested Loop. Example:
explain analyze select a.* from pg_class c
join pg_attribute a on c.oid = a.attrelid
where c.relname in ( 'pg_class', 'pg_namespace' );
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.28..52.32 rows=16 width=203) (actual time=0.057..0.134 rows=46 loops=1)
-> Seq Scan on pg_class c (cost=0.00..11.65 rows=2 width=4) (actual time=0.043..0.080 rows=2 loops=1)
Filter: (relname = ANY ('{pg_class,pg_namespace}'::name[]))
Rows Removed by Filter: 291
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (cost=0.28..20.25 rows=8 width=203) (actual time=0.007..0.015 rows=23 loops=2)
Index Cond: (attrelid = c.oid)
Total runtime: 0.182 ms
This is very interesting plan as it can run given operations multiple times.
Just as Hash Join, Nested Loop has two “children". Generally, first it runs the first node that is there, in our example, it runs “Seq Scan", and then, for every row it returns (2 rows in our example), it runs 2nd operation (Index Scan on pg_attribute in our case).
You might notices that Index Scan has “loops=2" in it’s actual run metainfo. This means that this operation has been run twice, and the other values (rows, time) are averages across all runs.
Let’s check the following plan. Note that the actual times for the categories index scan are 0.002 to 0.003 ms. But total time on this node is 78.852 ms, because this index scan has been ran over 26k times.
Nested Loop (cost=0.00..10715.90 rows=26284 width=4449) (actual time=0.054..291.131 rows=26284 loops=1)
-> Index Scan using books_index_title on books (cost=0.00..3306.28 rows=26284 width=3357) (actual time=0.033..50.773 rows=26284 loops=1)
-> Index Scan using categories_pkey on categories (cost=0.00..0.27 rows=1 width=1092) (actual time=0.002..0.003 rows=1 loops=26284)
Index Cond: (categories.id = books.category_id)
Total runtime: 312.212 ms
So, the processing looks like this:
- Nested Loop runs one side of join, once. Let’s name it “A".
- For every row in “A", it runs second operation (let’s name it “B")
- if “B" didn’t return any rows – data from “A" is ignored
- if “B" did return rows, for every row it returned, new row is returned by Nested Loop, based on current row from A, and current row from B
Merge Join
Another method of joining data is called Merge Join. This is used, if joined datasets are (or can be cheaply) sorted using join key.
Here, we force it by using subselects that sort data before joining:
explain analyze select * from
(select oid, * from pg_class order by oid) as c
join
(select * from pg_attribute a order by attrelid) as a
on c.oid = a.attrelid;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=23.16..268.01 rows=2273 width=410) (actual time=0.658..3.779 rows=2274 loops=1)
Merge Cond: (pg_class.oid = a.attrelid)
-> Sort (cost=22.88..23.61 rows=292 width=207) (actual time=0.624..0.655 rows=293 loops=1)
Sort Key: pg_class.oid
Sort Method: quicksort Memory: 102kB
-> Seq Scan on pg_class (cost=0.00..10.92 rows=292 width=207) (actual time=0.011..0.211 rows=293 loops=1)
-> Materialize (cost=0.28..212.34 rows=2273 width=203) (actual time=0.028..1.264 rows=2274 loops=1)
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (cost=0.28..183.92 rows=2273 width=203) (actual time=0.015..0.752 rows=2274 loops=1)
Total runtime: 4.009 ms
(9 rows)
Merge Join, as other joins, runs two sub operations (Sort and Materialize in this case). Because both of these return data sorted and the sort order is the same as join operation, PostgreSQL can scan both returnsets from suboperations at the same time, and simply check whether ids match.
The procedure looks like this:
- if join column on right side is the same as join column on left side:
- return new joined row, based on current rows on the right and left sides
- get next row from right side (or, if there are no more rows, on left side)
- go to step 1
- if join column on right side is “smaller" than join column on left side:
- get next row from right side (if there are no more rows, finish processing)
- go to step 1
- if join column on right side is “larger" than join column on left side:
- get next row from left side (if there are no more rows, finish processing)
- go to step 1
This is very cool way of joining datasets, but it works only for sorted sources.
Hash Join / Nested Loop / Merge Join modifiers
In all examples above I showed that Join operation returns row only when it gets rows from both sides of join.
But this is not always the case. We can have LEFT/RIGHT/FULL outer joins. And there are so called anti-joins.
In case of left/right joins, the operation names get changed to:
- Hash Left Join
- Hash Right Join
- Merge Left Join
- Merge Right Join
- Nested Loop Left Join
There is no Nested Loop Right Join, because Nested Loop always starts with left side as basis to looping. So join that uses RIGHT JOIN, that would use Nested Loop, will get internally transformed to LEFT JOIN so that Nested Loop can work.
In all those cases the logic is simple – we have two sides of join – left and right. And when side is mentioned in join, then join will return new row even if the other side doesn’t have matching rows.
This all happens with queries like:
select * from a left join b on ...
or right join.
All other information for Hash Join/Merge Join or Nested Loop are the same, it’s just a slight change in logic on when to generate output row.
There is also a version called Full Join, with operation names:
- Hash Full Join
- Merge Full Join
In which case join generates new output row regardless of whether data on either side is missing (as long as the data is there for one side). This happens in case of:
select * from a full join b ...
Of course all processing is the same as previously.
There are also so called Anti Joins. Their operation names look like:
- Hash Anti Join
- Merge Anti Join
- Nested Loop Anti Join
In these cases Join emits row only if the right side doesn’t find any row. This is useful when you’re doing things like WHERE not exists ()
or left join … where right_table.column is null
.
Like in here:
explain analyze select * from pg_class c
where not exists (select * from pg_attribute a
where a.attrelid = c.oid and a.attnum = 10);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=62.27..78.66 rows=250 width=203) (actual time=0.145..0.448 rows=251 loops=1)
Hash Cond: (c.oid = a.attrelid)
-> Seq Scan on pg_class c (cost=0.00..10.92 rows=292 width=207) (actual time=0.009..0.195 rows=293 loops=1)
-> Hash (cost=61.75..61.75 rows=42 width=4) (actual time=0.123..0.123 rows=42 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 2kB
-> Index Only Scan using pg_attribute_relid_attnum_index on pg_attribute a (cost=0.28..61.75 rows=42 width=4) (actual time=0.021..0.109 rows=42 loops=1)
Index Cond: (attnum = 10)
Heap Fetches: 0
Total runtime: 0.521 ms
(9 rows)
In here, PostgreSQL ran the right side (Index Scan on pg_attribute), hashed it, and then ran left side (Seq Scan on pg_class), returning only rows where there was no item in Hash for given pg_class.oid.