November 21, 2024
Summary: in this tutorial, you will understand cost estimation for in-memory sorting.
Table of Contents
Sorting
The merge join algorithm takes two data sets that are sorted by the merge key and returns an ordered output. The input data sets may be pre-sorted by an index scan or be sorted explicitly.
If either of the sets (or both) isn’t ordered by the join key, it must be sorted before merging. This necessary sorting is done under the Sort node in the plan.
Let’s start with a little test setup:
CREATE TABLE test1 (id integer, val varchar);
CREATE TABLE test2 (id integer, val varchar);
INSERT INTO test1 (id, val)
SELECT i, 'PG00' || i % 1000
FROM generate_series(1, 1000) AS s(i);
INSERT INTO test2 (id, val)
SELECT i, 'PG00' || i % 1000
FROM generate_series(1, 1000000) AS s(i);
Below is an example of a merge join. Note the Sort node in the plan.
EXPLAIN (costs off)
SELECT * FROM test1 a
JOIN test2 b ON a.val = b.val
ORDER BY b.val;
QUERY PLAN
-----------------------------------------------
Merge Join
Merge Cond: ((a.val)::text = (b.val)::text)
-> Sort
Sort Key: a.val
-> Seq Scan on test1 a
-> Materialize
-> Sort
Sort Key: b.val
-> Seq Scan on test2 b
(9 rows)
The sorting algorithm itself is universal. It can be invoked by an ORDER BY
clause by itself or within a window function:
EXPLAIN (costs off)
SELECT val,
row_number() OVER (PARTITION BY id ORDER BY val)
FROM test1 a;
QUERY PLAN
---------------------------------
WindowAgg
-> Sort
Sort Key: id, val
-> Seq Scan on test1 a
(4 rows)
Here, the WindowAgg node calculates the window function over a data set pre-sorted by the Sort node.
The planner has an arsenal of sorting modes to choose from. The following example showcases two of those (Sort Method). As always, you can dig in for more details with EXPLAIN ANALYZE
:
EXPLAIN (analyze, costs off, timing off, summary off)
SELECT * FROM test1 a
JOIN test2 b ON a.val = b.val
ORDER BY b.val;
QUERY PLAN
---------------------------------------------------------------------
Merge Join (actual rows=1000000 loops=1)
Merge Cond: ((a.val)::text = (b.val)::text)
-> Sort (actual rows=1000 loops=1)
Sort Key: a.val
Sort Method: quicksort Memory: 71kB
-> Seq Scan on test1 a (actual rows=1000 loops=1)
-> Materialize (actual rows=1000000 loops=1)
-> Sort (actual rows=1000000 loops=1)
Sort Key: b.val
Sort Method: external merge Disk: 19456kB
-> Seq Scan on test2 b (actual rows=1000000 loops=1)
(11 rows)
Quicksort
If the data to be sorted fits into the allowed memory (work_mem) completely, then the good old Quicksort algorithm is used. You can find its description on the first pages of any computer science textbook, so I will not repeat it here.
For the implementation, the sorting is done by a specific component that selects the most appropriate algorithm for the task.
Cost estimation. Let’s look at how the algorithm sorts a small table:
EXPLAIN SELECT * FROM test1 ORDER BY val;
QUERY PLAN
----------------------------------------------------------------
Sort (cost=63.83..66.33 rows=1000 width=11)
Sort Key: val
-> Seq Scan on test1 (cost=0.00..14.00 rows=1000 width=11)
(3 rows)
The complexity of sorting n values is O(n log2n). One comparison operation costs twice the value of cpu_operator_cost. You need to scan and sort the whole set to get a result, so the startup sorting cost will include both the comparison operations cost and the total cost of the child node.
The total sorting cost will include, in addition to that, the processing cost for each row of the Sort node output, estimated as cpu_operator_cost per row (instead of the default cpu_tuple_cost because it is cheap).
So, the sorting cost for the example above would be calculated like this:
WITH costs(startup) AS (
SELECT 14.00 + round((
current_setting('cpu_operator_cost')::real * 2
* 1000 * log(2, 1000)
)::numeric, 2)
)
SELECT startup,
startup + round((
current_setting('cpu_operator_cost')::real * 1000
)::numeric, 2) AS total
FROM costs;
startup | total
---------+-------
63.83 | 66.33
(1 row)
Top-N heapsort
A top-N heapsort is something you do when you want just a part of a set sorted (up to a certain LIMIT
) instead of the whole set. More precisely, this algorithm is used when the LIMIT
will filter out at least half of the input rows, or when the input does not fit into the memory, but the output does.
EXPLAIN (analyze, timing off, summary off)
SELECT * FROM test2
ORDER BY val
LIMIT 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Limit (cost=51692.28..51692.53 rows=100 width=11) (actual rows=100 loops=1)
-> Sort (cost=51692.28..54192.28 rows=1000000 width=11) (actual rows=100 loops=1)
Sort Key: val
Sort Method: top-N heapsort Memory: 33kB
-> Seq Scan on test2 (cost=0.00..13473.00 rows=1000000 width=11) (actual rows=1000000 loops=1)
(5 rows)
In order to find the k highest (lowest) values out of n, the algorithm first creates a data structure called a heap and adds k first rows into it. Then it continues adding rows into the heap one by one, but after each one, it expels one lowest (highest) value from the heap. When it runs out of new rows to add, the heap is left with the k highest or lowest values.
The term heap here refers to the heap data structure and should not be confused with database tables, which often go by the same name.
Cost estimation. The algorithm complexity is O(n log2k), but each operation costs more compared to Quicksort, so the cost calculation formula uses n log22k as an estimate.
WITH costs(startup) AS (
SELECT 13473.00 + round((
current_setting('cpu_operator_cost')::real * 2
* 1000000 * log(2, 2 * 100)
)::numeric, 2)
)
SELECT startup,
startup + round((
current_setting('cpu_operator_cost')::real * 100
)::numeric, 2) AS total
FROM costs;
startup | total
----------+----------
51692.28 | 51692.53
(1 row)