By John Doe September 19, 2025
Summary: in this article, you will learn how to tune a JSON query in PostgreSQL.
Table of Contents
Over the years, ORMs have industrialized query generation, allowing developers to focus on optimizing the few queries that truly need it. However, they can sometimes produce overly complex SQL with excessive joins and business logic, making tuning a nightmare. It is sometimes hard to know when you should split a query in multiple smaller parts. The complexity is amplified with the use of JSON.
Example of an Inefficient ORM-Generated Query
Below is an anonymized version of a complex SQL query generated by an ORM which I had to tune since the execution time (+90min) wasn’t in line with the business need. Note that the original query was over 1500 lines :
WITH result_list AS (
SELECT o.id,
o.data AS order_data,
ARRAY_AGG(DISTINCT cr.id) AS customer_refs_id,
ARRAY_AGG(DISTINCT cr.data) AS customer_refs_data,
ARRAY_AGG(DISTINCT cp.policy_id) AS customer_policy_id,
ROW_NUMBER() OVER () AS rowid
FROM orders o
LEFT JOIN customer_refs cr ON cr.data->>'id' = o.data->'customer'->>'id'
LEFT JOIN customer_policies cp ON cp.policy_data->>'id' = o.data->'customer'->>'id'
AND cp.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
AND (
(
cp.policy_data->>'status' IS NOT NULL
AND cp.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
)
AND (
(
cp.policy_data->>'customer_type' IS NOT NULL
AND cp.policy_data->>'customer_type' = 'issuer'
)
OR (
cp.policy_data->>'customer_type' IS NOT NULL
AND cp.policy_data->>'customer_type' IN ('type1', 'type2', 'type3')
)
OR (
get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') IS NOT NULL
AND get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
)
)
)
WHERE EXISTS (
SELECT 1
FROM account_policies ap
WHERE ap.policy_id = ARRAY_TO_STRING(ARRAY[
o.data->'account'->'id'->>'system_ref',
o.data->'account'->'id'->>'type_ref',
o.data->'account'->'id'->>'region',
o.data->'account'->'id'->>'id'], '-')
AND ap.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
AND ap.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
AND ap.policy_data->>'contract' != 'prospect'
AND (
ap.policy_data->>'account_type' IN ('typeX', 'typeY', 'typeZ')
OR get_text_from_path(ap.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
)
)
AND (
(
o.data->>'order_type' = 'MONEY_ORDER'
AND (
o.data->'close_date'->>'value' > '2024-09-10'
OR o.data->'close_date'->>'value' IS NULL
OR o.data->'close_date'->>'value' = ''
)
)
OR (
o.data->>'order_type' != 'MONEY_ORDER'
AND COALESCE(
NULLIF(o.data->'valuation'->'quantity'->>'value', '')::DECIMAL,
0
) != 0
)
)
AND (
LOWER(o.data->>'display_name') LIKE '%current%'
OR LOWER(o.data->'product'->'item'->'item_name'->>'abbreviation') LIKE '%current%'
OR LOWER(o.data->'product'->'item'->'item_name'->>'full') LIKE '%current%'
OR LOWER(o.data->'product'->'item'->>'identifier') LIKE '%current%'
OR LOWER(o.data->'product'->'issuer'->>'display_name') LIKE '%current%'
OR get_text_for_search_from_path(o.data->'product'->'underlying', '$[*].item.item_name.abbreviation', 'LOWER') LIKE '%current%'
)
GROUP BY o.id, o.data
ORDER BY o.id
),
ordered_list AS (
SELECT *
FROM result_list
ORDER BY rowid
LIMIT 23
)
SELECT *
FROM ordered_list
ORDER BY rowid;
Key Characteristics of the Query:
- Multiple Joins: The query includes several
LEFT JOINoperations with complexONconditions involving JSON fields. - Nested JSON Access: Frequent use of JSON operators (
->,->>) to access deeply nested fields. - Custom Functions: Usage of functions like
get_text_from_pathandget_text_for_search_from_path, which may not be optimized for performance. - Complex WHERE Conditions: The
WHEREclause contains multiple nested conditions and subqueries, often involving JSON fields.
Why the Query is Inefficient
- Accessing deeply nested JSON fields using operators in large datasets can be slow, especially if not properly indexed this will cause full table scans and increase drastically execution time.
- The query performs numerous
LOWER()andLIKEoperations on JSON fields, which can be computationally expensive with the addition of preventing index usage effectively. - The indexes on JSON fields may not be utilized due to the way conditions are written.
- The use of
LEFT JOINwith conditions that include functions and JSON field comparisons leads to inefficient execution plans. - The
EXISTSsubquery further complicates the execution, potentially causing full table scans. - ORMs often generate generic SQL that doesn’t consider database-specific optimizations.
- The abstraction can lead to redundant or unnecessary conditions and joins.
Bad Execution Plan
Limit (cost=292646835.98..292646835.99 rows=1 width=1813) (actual time=5504844.738..5504844.745 rows=23 loops=1)
Buffers: shared hit=3622772239 read=7862262 dirtied=3639
-> Sort (cost=292646835.98..292646835.99 rows=1 width=1813) (actual time=5502255.883..5502255.889 rows=23 loops=1)
Sort Key: (row_number() OVER (?))
Sort Method: quicksort Memory: 114kB
Buffers: shared hit=3622772239 read=7862262 dirtied=3639
-> WindowAgg (cost=292646835.93..292646835.97 rows=1 width=1813) (actual time=5502254.506..5502255.810 rows=43 loops=1)
Buffers: shared hit=3622772236 read=7862262 dirtied=3639
-> GroupAggregate (cost=292646835.93..292646835.96 rows=1 width=1805) (actual time=5502253.088..5502253.249 rows=43 loops=1)
Group Key: o.id
Buffers: shared hit=3622772236 read=7862262 dirtied=3639
-> Sort (cost=292646835.93..292646835.93 rows=1 width=1930) (actual time=5502244.952..5502244.964 rows=43 loops=1)
Sort Key: o.id, cr.id
Sort Method: quicksort Memory: 71kB
Buffers: shared hit=3622772236 read=7862262 dirtied=3639
-> Nested Loop (cost=0.98..292646835.92 rows=1 width=1930) (actual time=2015030.222..5502243.020 rows=43 loops=1)
Join Filter: (array_to_string(ARRAY[...]) = ap.policy_id)
Rows Removed by Join Filter: 246198062
Buffers: shared hit=3622772236 read=7862262 dirtied=3639
---
"Planning:"
" Buffers: shared hit=649 read=2"
"Planning Time: 133.626 ms"
"JIT:"
" Functions: 32"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 226.277 ms, Inlining 631.748 ms, Optimization 801.960 ms, Emission 1156.811 ms, Total 2816.797 ms"
"Execution Time: 5505139.477 ms"
Observations from the Execution Plan:
Execution Time: ~90 minutes Buffer Usage:
- Shared hits: ~3.6 billion
- Reads: ~7.8 million
- Dirtied: ~3,639
Key Issues:
- Functions in Joins and Conditions: Prevented effective index usage
- Nested Loops: Inefficient row-by-row processing.
- Sort Operations: High-cost sorting due to lack of index usage.
- Full Table Scans: Extensive rows removed by join filters.
Optimizing the Query
Let’s modify some things, to help the optimizer use the indexes :
- Indexes and created some generated columns to avoid applying functions in the where clause on some fields.
- I changed the LEFT JOIN to a INNER JOIN when possible to reduce the data set early in the query execution.
- I rewrote some conditions to avoid the use of functions on the JSON fields.
ALTER TABLE orders ADD COLUMN customer_id TEXT GENERATED ALWAYS AS (data->'customer'->>'id') STORED;
ALTER TABLE orders ADD COLUMN order_type TEXT GENERATED ALWAYS AS (data->>'order_type') STORED;
ALTER TABLE orders ADD COLUMN display_name_lower TEXT GENERATED ALWAYS AS (LOWER(data->>'display_name')) STORED;
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
CREATE INDEX idx_orders_order_type ON orders (order_type);
CREATE INDEX idx_orders_display_name_lower ON orders (display_name_lower);
--- ... etc
WITH result_list AS (
SELECT o.id,
o.data AS order_data,
ARRAY_AGG(DISTINCT cr.id) AS customer_refs_id,
ARRAY_AGG(DISTINCT cr.data) AS customer_refs_data,
ARRAY_AGG(DISTINCT cp.policy_id) AS customer_policy_id,
ROW_NUMBER() OVER () AS rowid
FROM orders o
INNER JOIN customer_refs cr ON cr.data->>'id' = o.customer_id
INNER JOIN customer_policies cp ON cp.policy_data->>'id' = o.customer_id
AND cp.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
AND cp.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
AND (
cp.policy_data->>'customer_type' = 'issuer'
OR cp.policy_data->>'customer_type' IN ('type1', 'type2', 'type3')
OR get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
)
WHERE EXISTS (
SELECT 1
FROM account_policies ap
WHERE ap.policy_id = ARRAY_TO_STRING(ARRAY[
o.data->'account'->'id'->>'system_ref',
o.data->'account'->'id'->>'type_ref',
o.data->'account'->'id'->>'region',
o.data->'account'->'id'->>'id'], '-')
AND ap.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
AND ap.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
AND ap.policy_data->>'contract' != 'prospect'
AND (
ap.policy_data->>'account_type' IN ('typeX', 'typeY', 'typeZ')
OR get_text_from_path(ap.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
)
)
AND o.order_type = 'MONEY_ORDER'
AND (o.data->'close_date'->>'value' > '2024-09-10' OR o.data->'close_date'->>'value' IS NULL OR o.data->'close_date'->>'value' = '')
AND o.display_name_lower LIKE '%current%'
GROUP BY o.id, o.data
ORDER BY o.id
)
SELECT *
FROM result_list
ORDER BY rowid
LIMIT 23;
Updated Execution Plan Analysis
Limit (cost=9013972.46..9013972.46 rows=1 width=1789) (actual time=175757.446..175757.449 rows=0 loops=1)
Buffers: shared hit=20716139 read=836484 dirtied=1676
-> Sort (cost=9013972.46..9013972.46 rows=1 width=1789) (actual time=174722.049..174722.052 rows=0 loops=1)
Sort Key: (row_number() OVER (?))
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=20716139 read=836484 dirtied=1676
-> WindowAgg (cost=9013972.40..9013972.45 rows=1 width=1789) (actual time=174722.010..174722.013 rows=0 loops=1)
Buffers: shared hit=20716136 read=836484 dirtied=1676
-> GroupAggregate (cost=9013972.40..9013972.44 rows=1 width=1781) (actual time=174722.008..174722.011 rows=0 loops=1)
Group Key: positions.id
Buffers: shared hit=20716136 read=836484 dirtied=1676
-> Sort (cost=9013972.40..9013972.41 rows=1 width=1906) (actual time=174722.007..174722.009 rows=0 loops=1)
Sort Key: positions.id, client_refs.id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=20716136 read=836484 dirtied=1676
-> Nested Loop (cost=1.41..9013972.39 rows=1 width=1906) (actual time=174722.001..174722.004 rows=0 loops=1)
Buffers: shared hit=20716136 read=836484 dirtied=1676
-> Nested Loop Left Join (cost=0.99..9013961.75 rows=1 width=1933) (actual time=184.358..170945.266 rows=8554 loops=1)
Buffers: shared hit=20683249 read=835147 dirtied=1672
-> Nested Loop Left Join (cost=0.56..9013950.91 rows=1 width=1931) (actual time=138.461..146595.622 rows=8554 loops=1)
Join Filter: (client_refs.id = pv.client_id)
Rows Removed by Join Filter: 328,884,138
Buffers: shared hit=20655467 read=828553 dirtied=1153
-> Nested Loop (cost=0.56..9008615.75 rows=1 width=1736) (actual time=33.568..75336.287 rows=8554 loops=1)
Buffers: shared hit=27,870 read=824,707 dirtied=851
-> Seq Scan on positions_view pv (cost=0.00..9008612.97 rows=1 width=78) (actual time=16.384..12504.255 rows=8554 loops=1)
Filter: ...
Rows Removed by Filter: 32,144,940
Buffers: shared hit=7 read=809,465
-> Index Scan using positions_pkey on positions positions (cost=0.56..2.78 rows=1 width=1685) (actual time=7.327..7.327 rows=1 loops=8554)
Index Cond: (id = pv.id)
Buffers: shared hit=27,550 read=15,242 dirtied=602
-> Seq Scan on client_refs (cost=0.00..4513.96 rows=65,696 width=195) (actual time=0.004..5.734 rows=38,449 loops=8554)
Buffers: shared hit=20,627,597 read=3,846 dirtied=302
-> Index Scan using clients_policy_pkey on clients_policy (cost=0.43..10.84 rows=1 width=26) (actual time=2.841..2.841 rows=0 loops=8554)
Index Cond: (policy_id = pv.client_id)
Filter: ...
Rows Removed by Filter: 1
Buffers: shared hit=27,782 read=6,594 dirtied=519
-> Index Scan using idx_btree_portfolios_policy_id on portfolios_policy (cost=0.42..10.64 rows=1 width=28) (actual time=0.439..0.439 rows=0 loops=8554)
Index Cond: (policy_id = pv.portfolio_id)
Filter: ...
Rows Removed by Filter: 1
Buffers: shared hit=32,887 read=1,337 dirtied=4
---
"Planning:"
" Buffers: shared hit=954 read=78 dirtied=9"
"Planning Time: 130.627 ms"
"JIT:"
" Functions: 33"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 5.483 ms, Inlining 81.013 ms, Optimization 530.635 ms, Emission 423.850 ms, Total 1040.982 ms"
"Execution Time: 175799.036 ms"
Execution Time: Under 3 minutes
Buffer Usage:
- Shared hits: ~20.7 million
- Reads: ~836,484
- Dirtied: ~1,676
Improvements:
- Efficient Joins: Usage of
INNER JOIN, limited the dataset early. - Indexed Columns: Enabled rapid data retrieval without full scans.
- Simplified Conditions: Eliminating some functions in
WHEREclauses, allowed index usage. - Reduced Buffer Operations: Allowing lower I/O overhead significantly.
Comparison to the Previous Execution Plan: Despite processing a large number of rows, the optimized query executes much faster.