Trace PostgreSQL Query Processing Internals

By John Doe May 14, 2024

Summary: In this article, we will trace the internals of PostgreSQL query processing.

Table of Contents

Aggregate expression with an ORDER BY clause

Consider a simple table that holds 3 rows of an int column ranging from 1 to 3:

CREATE TABLE foo AS
  SELECT * FROM generate_series(1, 3) i;

SELECT * FROM foo;
 i
---
 1
 2
 3
(3 rows)

Now let’s assume we want to convert those rows into a json array. This can be accomplished using the json_agg aggregate function:

SELECT json_agg(i) FROM foo;
 json_agg
-----------
 [1, 2, 3]
(1 row)

But what if we want to order the array elements in descending order? No problem, we can simply use the order-by-clause support available to all aggregate expressions:

SELECT json_agg(i ORDER BY i DESC) FROM foo;
 json_agg
-----------
 [3, 2, 1]
(1 row)

If you’re like me, you’d probably imagine the EXPLAIN output for this query to contain three nodes: a Seq Scan, a Sort, and an Aggregate. However, when we run EXPLAIN, it turns out there is no Sort node.

EXPLAIN SELECT json_agg(i ORDER BY i DESC) FROM foo;
                         QUERY PLAN
-------------------------------------------------------------
 Aggregate  (cost=41.88..41.89 rows=1 width=32)
   ->  Seq Scan on foo  (cost=0.00..35.50 rows=2550 width=4)
(2 rows)

In fact, the plan even remains unchanged even if we change the sort order or remove the clause entirely:

EXPLAIN SELECT json_agg(i ORDER BY i ASC) FROM foo;
                         QUERY PLAN
-------------------------------------------------------------
 Aggregate  (cost=41.88..41.89 rows=1 width=32)
   ->  Seq Scan on foo  (cost=0.00..35.50 rows=2550 width=4)
(2 rows)

EXPLAIN SELECT json_agg(i) FROM foo;
                         QUERY PLAN
-------------------------------------------------------------
 Aggregate  (cost=41.88..41.89 rows=1 width=32)
   ->  Seq Scan on foo  (cost=0.00..35.50 rows=2550 width=4)
(2 rows)

But how is this possible? If there is no Sort node, how does the aggregate perform its sorting?

Trace query processing

Actually, the aggregate plan node performs their own sorting. These sorts do not explicitly show up in EXPLAIN, but we can trace them using the debug_print_plan developer option.

SET debug_print_parse = on;
SET debug_print_rewritten = on;
SET debug_print_plan = on;
SET debug_pretty_print = on;

Why do we enable debug log when we will trace PostgreSQL query processing? This is because the output at some of the stages of query processing is represented as a complex list of structures, and it is not very straightforward to print this structure unless we have written a third party print script that can help us recursively print the content of the complex structure. PostgreSQL already has this function built-in and presented in a form of debug log.

Specifically you need to look for the :aggorder field below:

SELECT json_agg(i ORDER BY i DESC) FROM foo;

LOG:  plan:
DETAIL:     {PLANNEDSTMT
   :commandType 1
   :queryId 0
   :hasReturning false
   :hasModifyingCTE false
   :canSetTag true
   :transientPlan false
   :dependsOnRole false
   :parallelModeNeeded false
   :planTree
      {AGG
      :startup_cost 41.88
      :total_cost 41.89
      :plan_rows 1
      :plan_width 32
      :parallel_aware false
      :plan_node_id 0
      :targetlist (
         {TARGETENTRY
         :expr
            {AGGREF
            :aggfnoid 3175
            :aggtype 114
            :aggcollid 0
            :inputcollid 0
            :aggtranstype 2281
            :aggargtypes (o 23)
            :aggdirectargs <>
            :args (
               {TARGETENTRY
               :expr
                  {VAR
                  :varno 65001
                  :varattno 1
                  :vartype 23
                  :vartypmod -1
                  :varcollid 0
                  :varlevelsup 0
                  :varnoold 1
                  :varoattno 1
                  :location 16
                  }
               :resno 1
               :resname <>
               :ressortgroupref 1
               :resorigtbl 0
               :resorigcol 0
               :resjunk false
               }
            )
            :aggorder (
               {SORTGROUPCLAUSE
               :tleSortGroupRef 1
               :eqop 96
               :sortop 521
               :nulls_first true
               :hashable true
               }
            )
            :aggdistinct <>
            :aggfilter <>
            :aggstar false
            :aggvariadic false
            :aggkind n
            :agglevelsup 0
            :aggsplit 0
            :location 7
            }
         :resno 1
         :resname json_agg
         :ressortgroupref 0
         :resorigtbl 0
         :resorigcol 0
         :resjunk false
         }
      )
      :qual <>
      :lefttree
         {SEQSCAN
         :startup_cost 0.00
         :total_cost 35.50
         :plan_rows 2550
         :plan_width 4
         :parallel_aware false
         :plan_node_id 1
         :targetlist (
            {TARGETENTRY
            :expr
               {VAR
               :varno 1
               :varattno 1
               :vartype 23
               :vartypmod -1
               :varcollid 0
               :varlevelsup 0
               :varnoold 1
               :varoattno 1
               :location -1
               }
            :resno 1
            :resname <>
            :ressortgroupref 0
            :resorigtbl 0
            :resorigcol 0
            :resjunk false
            }
         )
         :qual <>
         :lefttree <>
         :righttree <>
         :initPlan <>
         :extParam (b)
         :allParam (b)
         :scanrelid 1
         }
      :righttree <>
      :initPlan <>
      :extParam (b)
      :allParam (b)
      :aggstrategy 0
      :aggsplit 0
      :numCols 0
      :grpColIdx
      :grpOperators
      :numGroups 1
      :aggParams (b)
      :groupingSets <>
      :chain <>
      }
   :rtable (
      {RTE
      :alias <>
      :eref
         {ALIAS
         :aliasname foo
         :colnames ("i")
         }
      :rtekind 0
      :relid 404407
      :relkind r
      :tablesample <>
      :lateral false
      :inh false
      :inFromCl true
      :requiredPerms 2
      :checkAsUser 0
      :selectedCols (b 9)
      :insertedCols (b)
      :updatedCols (b)
      :securityQuals <>
      }
   )
   :resultRelations <>
   :utilityStmt <>
   :subplans <>
   :rewindPlanIDs (b)
   :rowMarks <>
   :relationOids (o 404407)
   :invalItems <>
   :nParamExec 0
}

 json_agg
-----------
 [3, 2, 1]
(1 row)

In conlusion: While EXPLAIN is a powerful weapon for those seeking to understand query performance, you should be aware of the fact that there are things that you can’t EXPLAIN.

comments powered by Disqus