Tuning a JSON query in PostgreSQL

By John Doe September 19, 2025

Summary: in this article, you will learn how to tune a JSON query in PostgreSQL.

Table of Contents

Over the years, ORMs have industrialized query generation, allowing developers to focus on optimizing the few queries that truly need it. However, they can sometimes produce overly complex SQL with excessive joins and business logic, making tuning a nightmare. It is sometimes hard to know when you should split a query in multiple smaller parts. The complexity is amplified with the use of JSON.

Example of an Inefficient ORM-Generated Query

Below is an anonymized version of a complex SQL query generated by an ORM which I had to tune since the execution time (+90min) wasn’t in line with the business need. Note that the original query was over 1500 lines :

WITH result_list AS (
    SELECT o.id,
           o.data AS order_data,
           ARRAY_AGG(DISTINCT cr.id) AS customer_refs_id,
           ARRAY_AGG(DISTINCT cr.data) AS customer_refs_data,
           ARRAY_AGG(DISTINCT cp.policy_id) AS customer_policy_id,
           ROW_NUMBER() OVER () AS rowid
    FROM orders o
    LEFT JOIN customer_refs cr ON cr.data->>'id' = o.data->'customer'->>'id'
    LEFT JOIN customer_policies cp ON cp.policy_data->>'id' = o.data->'customer'->>'id'
        AND cp.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND (
            (
                cp.policy_data->>'status' IS NOT NULL
                AND cp.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
            )
            AND (
                (
                    cp.policy_data->>'customer_type' IS NOT NULL
                    AND cp.policy_data->>'customer_type' = 'issuer'
                )
                OR (
                    cp.policy_data->>'customer_type' IS NOT NULL
                    AND cp.policy_data->>'customer_type' IN ('type1', 'type2', 'type3')
                )
                OR (
                    get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') IS NOT NULL
                    AND get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
                )
            )
        )
    WHERE EXISTS (
        SELECT 1
        FROM account_policies ap
        WHERE ap.policy_id = ARRAY_TO_STRING(ARRAY[
            o.data->'account'->'id'->>'system_ref',
            o.data->'account'->'id'->>'type_ref',
            o.data->'account'->'id'->>'region',
            o.data->'account'->'id'->>'id'], '-')
        AND ap.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND ap.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
        AND ap.policy_data->>'contract' != 'prospect'
        AND (
            ap.policy_data->>'account_type' IN ('typeX', 'typeY', 'typeZ')
            OR get_text_from_path(ap.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
        )
    )
    AND (
        (
            o.data->>'order_type' = 'MONEY_ORDER'
            AND (
                o.data->'close_date'->>'value' > '2024-09-10'
                OR o.data->'close_date'->>'value' IS NULL
                OR o.data->'close_date'->>'value' = ''
            )
        )
        OR (
            o.data->>'order_type' != 'MONEY_ORDER'
            AND COALESCE(
                NULLIF(o.data->'valuation'->'quantity'->>'value', '')::DECIMAL,
                0
            ) != 0
        )
    )
    AND (
        LOWER(o.data->>'display_name') LIKE '%current%'
        OR LOWER(o.data->'product'->'item'->'item_name'->>'abbreviation') LIKE '%current%'
        OR LOWER(o.data->'product'->'item'->'item_name'->>'full') LIKE '%current%'
        OR LOWER(o.data->'product'->'item'->>'identifier') LIKE '%current%'
        OR LOWER(o.data->'product'->'issuer'->>'display_name') LIKE '%current%'
        OR get_text_for_search_from_path(o.data->'product'->'underlying', '$[*].item.item_name.abbreviation', 'LOWER') LIKE '%current%'
    )
    GROUP BY o.id, o.data
    ORDER BY o.id
),
ordered_list AS (
    SELECT *
    FROM result_list
    ORDER BY rowid
    LIMIT 23
)
SELECT *
FROM ordered_list
ORDER BY rowid;

Key Characteristics of the Query:

Multiple Joins: The query includes several LEFT JOIN operations with complex ON conditions involving JSON fields.
Nested JSON Access: Frequent use of JSON operators (->, ->>) to access deeply nested fields.
Custom Functions: Usage of functions like get_text_from_path and get_text_for_search_from_path, which may not be optimized for performance.
Complex WHERE Conditions: The WHERE clause contains multiple nested conditions and subqueries, often involving JSON fields.

Why the Query is Inefficient

Accessing deeply nested JSON fields using operators in large datasets can be slow, especially if not properly indexed this will cause full table scans and increase drastically execution time.
The query performs numerous LOWER() and LIKE operations on JSON fields, which can be computationally expensive with the addition of preventing index usage effectively.
The indexes on JSON fields may not be utilized due to the way conditions are written.
The use of LEFT JOIN with conditions that include functions and JSON field comparisons leads to inefficient execution plans.
The EXISTS subquery further complicates the execution, potentially causing full table scans.
ORMs often generate generic SQL that doesn’t consider database-specific optimizations.
The abstraction can lead to redundant or unnecessary conditions and joins.

Bad Execution Plan

Limit  (cost=292646835.98..292646835.99 rows=1 width=1813) (actual time=5504844.738..5504844.745 rows=23 loops=1)
  Buffers: shared hit=3622772239 read=7862262 dirtied=3639
  ->  Sort  (cost=292646835.98..292646835.99 rows=1 width=1813) (actual time=5502255.883..5502255.889 rows=23 loops=1)
        Sort Key: (row_number() OVER (?))
        Sort Method: quicksort  Memory: 114kB
        Buffers: shared hit=3622772239 read=7862262 dirtied=3639
        ->  WindowAgg  (cost=292646835.93..292646835.97 rows=1 width=1813) (actual time=5502254.506..5502255.810 rows=43 loops=1)
              Buffers: shared hit=3622772236 read=7862262 dirtied=3639
              ->  GroupAggregate  (cost=292646835.93..292646835.96 rows=1 width=1805) (actual time=5502253.088..5502253.249 rows=43 loops=1)
                    Group Key: o.id
                    Buffers: shared hit=3622772236 read=7862262 dirtied=3639
                    ->  Sort  (cost=292646835.93..292646835.93 rows=1 width=1930) (actual time=5502244.952..5502244.964 rows=43 loops=1)
                          Sort Key: o.id, cr.id
                          Sort Method: quicksort  Memory: 71kB
                          Buffers: shared hit=3622772236 read=7862262 dirtied=3639
                          ->  Nested Loop  (cost=0.98..292646835.92 rows=1 width=1930) (actual time=2015030.222..5502243.020 rows=43 loops=1)
                                Join Filter: (array_to_string(ARRAY[...]) = ap.policy_id)
                                Rows Removed by Join Filter: 246198062
                                Buffers: shared hit=3622772236 read=7862262 dirtied=3639
---
 
"Planning:"
"  Buffers: shared hit=649 read=2"
"Planning Time: 133.626 ms"
"JIT:"
"  Functions: 32"
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 226.277 ms, Inlining 631.748 ms, Optimization 801.960 ms, Emission 1156.811 ms, Total 2816.797 ms"
"Execution Time: 5505139.477 ms"

Observations from the Execution Plan:

Execution Time: ~90 minutes Buffer Usage:

Shared hits: ~3.6 billion
Reads: ~7.8 million
Dirtied: ~3,639

Key Issues:

Functions in Joins and Conditions: Prevented effective index usage
Nested Loops: Inefficient row-by-row processing.
Sort Operations: High-cost sorting due to lack of index usage.
Full Table Scans: Extensive rows removed by join filters.

Optimizing the Query

Let’s modify some things, to help the optimizer use the indexes :

Indexes and created some generated columns to avoid applying functions in the where clause on some fields.
I changed the LEFT JOIN to a INNER JOIN when possible to reduce the data set early in the query execution.
I rewrote some conditions to avoid the use of functions on the JSON fields.

ALTER TABLE orders ADD COLUMN customer_id TEXT GENERATED ALWAYS AS (data->'customer'->>'id') STORED;
ALTER TABLE orders ADD COLUMN order_type TEXT GENERATED ALWAYS AS (data->>'order_type') STORED;
ALTER TABLE orders ADD COLUMN display_name_lower TEXT GENERATED ALWAYS AS (LOWER(data->>'display_name')) STORED;
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
CREATE INDEX idx_orders_order_type ON orders (order_type);
CREATE INDEX idx_orders_display_name_lower ON orders (display_name_lower);
--- ... etc

WITH result_list AS (
    SELECT o.id,
           o.data AS order_data,
           ARRAY_AGG(DISTINCT cr.id) AS customer_refs_id,
           ARRAY_AGG(DISTINCT cr.data) AS customer_refs_data,
           ARRAY_AGG(DISTINCT cp.policy_id) AS customer_policy_id,
           ROW_NUMBER() OVER () AS rowid
    FROM orders o
    INNER JOIN customer_refs cr ON cr.data->>'id' = o.customer_id
    INNER JOIN customer_policies cp ON cp.policy_data->>'id' = o.customer_id
        AND cp.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND cp.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
        AND (
            cp.policy_data->>'customer_type' = 'issuer'
            OR cp.policy_data->>'customer_type' IN ('type1', 'type2', 'type3')
            OR get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
        )
    WHERE EXISTS (
        SELECT 1
        FROM account_policies ap
        WHERE ap.policy_id = ARRAY_TO_STRING(ARRAY[
            o.data->'account'->'id'->>'system_ref',
            o.data->'account'->'id'->>'type_ref',
            o.data->'account'->'id'->>'region',
            o.data->'account'->'id'->>'id'], '-')
        AND ap.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND ap.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
        AND ap.policy_data->>'contract' != 'prospect'
        AND (
            ap.policy_data->>'account_type' IN ('typeX', 'typeY', 'typeZ')
            OR get_text_from_path(ap.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
        )
    )
    AND o.order_type = 'MONEY_ORDER'
    AND (o.data->'close_date'->>'value' > '2024-09-10' OR o.data->'close_date'->>'value' IS NULL OR o.data->'close_date'->>'value' = '')
    AND o.display_name_lower LIKE '%current%'
    GROUP BY o.id, o.data
    ORDER BY o.id
)
SELECT *
FROM result_list
ORDER BY rowid
LIMIT 23;

Updated Execution Plan Analysis

Limit  (cost=9013972.46..9013972.46 rows=1 width=1789) (actual time=175757.446..175757.449 rows=0 loops=1)
  Buffers: shared hit=20716139 read=836484 dirtied=1676
  ->  Sort  (cost=9013972.46..9013972.46 rows=1 width=1789) (actual time=174722.049..174722.052 rows=0 loops=1)
        Sort Key: (row_number() OVER (?))
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=20716139 read=836484 dirtied=1676
        ->  WindowAgg  (cost=9013972.40..9013972.45 rows=1 width=1789) (actual time=174722.010..174722.013 rows=0 loops=1)
              Buffers: shared hit=20716136 read=836484 dirtied=1676
              ->  GroupAggregate  (cost=9013972.40..9013972.44 rows=1 width=1781) (actual time=174722.008..174722.011 rows=0 loops=1)
                    Group Key: positions.id
                    Buffers: shared hit=20716136 read=836484 dirtied=1676
                    ->  Sort  (cost=9013972.40..9013972.41 rows=1 width=1906) (actual time=174722.007..174722.009 rows=0 loops=1)
                          Sort Key: positions.id, client_refs.id
                          Sort Method: quicksort  Memory: 25kB
                          Buffers: shared hit=20716136 read=836484 dirtied=1676
                          ->  Nested Loop  (cost=1.41..9013972.39 rows=1 width=1906) (actual time=174722.001..174722.004 rows=0 loops=1)
                                Buffers: shared hit=20716136 read=836484 dirtied=1676
                                ->  Nested Loop Left Join  (cost=0.99..9013961.75 rows=1 width=1933) (actual time=184.358..170945.266 rows=8554 loops=1)
                                      Buffers: shared hit=20683249 read=835147 dirtied=1672
                                      ->  Nested Loop Left Join  (cost=0.56..9013950.91 rows=1 width=1931) (actual time=138.461..146595.622 rows=8554 loops=1)
                                            Join Filter: (client_refs.id = pv.client_id)
                                            Rows Removed by Join Filter: 328,884,138
                                            Buffers: shared hit=20655467 read=828553 dirtied=1153
                                            ->  Nested Loop  (cost=0.56..9008615.75 rows=1 width=1736) (actual time=33.568..75336.287 rows=8554 loops=1)
                                                  Buffers: shared hit=27,870 read=824,707 dirtied=851
                                                  ->  Seq Scan on positions_view pv  (cost=0.00..9008612.97 rows=1 width=78) (actual time=16.384..12504.255 rows=8554 loops=1)
                                                        Filter: ...
                                                        Rows Removed by Filter: 32,144,940
                                                        Buffers: shared hit=7 read=809,465
                                                  ->  Index Scan using positions_pkey on positions positions  (cost=0.56..2.78 rows=1 width=1685) (actual time=7.327..7.327 rows=1 loops=8554)
                                                        Index Cond: (id = pv.id)
                                                        Buffers: shared hit=27,550 read=15,242 dirtied=602
                                            ->  Seq Scan on client_refs  (cost=0.00..4513.96 rows=65,696 width=195) (actual time=0.004..5.734 rows=38,449 loops=8554)
                                                  Buffers: shared hit=20,627,597 read=3,846 dirtied=302
                                      ->  Index Scan using clients_policy_pkey on clients_policy  (cost=0.43..10.84 rows=1 width=26) (actual time=2.841..2.841 rows=0 loops=8554)
                                            Index Cond: (policy_id = pv.client_id)
                                            Filter: ...
                                            Rows Removed by Filter: 1
                                            Buffers: shared hit=27,782 read=6,594 dirtied=519
                                ->  Index Scan using idx_btree_portfolios_policy_id on portfolios_policy  (cost=0.42..10.64 rows=1 width=28) (actual time=0.439..0.439 rows=0 loops=8554)
                                      Index Cond: (policy_id = pv.portfolio_id)
                                      Filter: ...
                                      Rows Removed by Filter: 1
                                      Buffers: shared hit=32,887 read=1,337 dirtied=4
                                       
---

"Planning:"
"  Buffers: shared hit=954 read=78 dirtied=9"
"Planning Time: 130.627 ms"
"JIT:"
"  Functions: 33"
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 5.483 ms, Inlining 81.013 ms, Optimization 530.635 ms, Emission 423.850 ms, Total 1040.982 ms"
"Execution Time: 175799.036 ms"

Execution Time: Under 3 minutes

Buffer Usage:

Shared hits: ~20.7 million
Reads: ~836,484
Dirtied: ~1,676

Improvements:

Efficient Joins: Usage of INNER JOIN, limited the dataset early.
Indexed Columns: Enabled rapid data retrieval without full scans.
Simplified Conditions: Eliminating some functions in WHERE clauses, allowed index usage.
Reduced Buffer Operations: Allowing lower I/O overhead significantly.

Comparison to the Previous Execution Plan: Despite processing a large number of rows, the optimized query executes much faster.

Example of an Inefficient ORM-Generated Query

Why the Query is Inefficient

Bad Execution Plan

Optimizing the Query

Updated Execution Plan Analysis

Search

Categories

Tags