由 John Doe 九月 19, 2025
摘要:在本文中,您将学习如何优化 PostgreSQL 中的一个 JSON 查询。
目录
多年来,对象关系映射(ORM)工具已将查询生成流程标准化,让开发人员得以专注于优化少数真正需要调整的查询。但这些工具有时会生成过于复杂的 SQL,包含过多连接操作和业务逻辑,使得调优工作如同噩梦。开发者往往难以判断何时该将一个查询拆分为多个小型查询。而 JSON 数据类型的使用进一步加剧了这种复杂性。
ORM 生成的低效查询示例
以下是一个由 ORM 生成的复杂 SQL 查询的精简版本。由于其执行时间超过 90 分钟,不符合业务需求,我们需要对其进行调优。
WITH result_list AS (
SELECT o.id,
o.data AS order_data,
ARRAY_AGG(DISTINCT cr.id) AS customer_refs_id,
ARRAY_AGG(DISTINCT cr.data) AS customer_refs_data,
ARRAY_AGG(DISTINCT cp.policy_id) AS customer_policy_id,
ROW_NUMBER() OVER () AS rowid
FROM orders o
LEFT JOIN customer_refs cr ON cr.data->>'id' = o.data->'customer'->>'id'
LEFT JOIN customer_policies cp ON cp.policy_data->>'id' = o.data->'customer'->>'id'
AND cp.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
AND (
(
cp.policy_data->>'status' IS NOT NULL
AND cp.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
)
AND (
(
cp.policy_data->>'customer_type' IS NOT NULL
AND cp.policy_data->>'customer_type' = 'issuer'
)
OR (
cp.policy_data->>'customer_type' IS NOT NULL
AND cp.policy_data->>'customer_type' IN ('type1', 'type2', 'type3')
)
OR (
get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') IS NOT NULL
AND get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
)
)
)
WHERE EXISTS (
SELECT 1
FROM account_policies ap
WHERE ap.policy_id = ARRAY_TO_STRING(ARRAY[
o.data->'account'->'id'->>'system_ref',
o.data->'account'->'id'->>'type_ref',
o.data->'account'->'id'->>'region',
o.data->'account'->'id'->>'id'], '-')
AND ap.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
AND ap.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
AND ap.policy_data->>'contract' != 'prospect'
AND (
ap.policy_data->>'account_type' IN ('typeX', 'typeY', 'typeZ')
OR get_text_from_path(ap.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
)
)
AND (
(
o.data->>'order_type' = 'MONEY_ORDER'
AND (
o.data->'close_date'->>'value' > '2024-09-10'
OR o.data->'close_date'->>'value' IS NULL
OR o.data->'close_date'->>'value' = ''
)
)
OR (
o.data->>'order_type' != 'MONEY_ORDER'
AND COALESCE(
NULLIF(o.data->'valuation'->'quantity'->>'value', '')::DECIMAL,
0
) != 0
)
)
AND (
LOWER(o.data->>'display_name') LIKE '%current%'
OR LOWER(o.data->'product'->'item'->'item_name'->>'abbreviation') LIKE '%current%'
OR LOWER(o.data->'product'->'item'->'item_name'->>'full') LIKE '%current%'
OR LOWER(o.data->'product'->'item'->>'identifier') LIKE '%current%'
OR LOWER(o.data->'product'->'issuer'->>'display_name') LIKE '%current%'
OR get_text_for_search_from_path(o.data->'product'->'underlying', '$[*].item.item_name.abbreviation', 'LOWER') LIKE '%current%'
)
GROUP BY o.id, o.data
ORDER BY o.id
),
ordered_list AS (
SELECT *
FROM result_list
ORDER BY rowid
LIMIT 23
)
SELECT *
FROM ordered_list
ORDER BY rowid;
查询的核心特征:
- 多连接操作:查询包含多个
LEFT JOIN
,且ON
条件涉及复杂的 JSON 字段操作。 - 嵌套 JSON 访问:频繁使用 JSON 运算符(
->
、->>
)访问深层嵌套的字段。 - 自定义函数:使用
get_text_from_path
和get_text_for_search_from_path
等函数,这类函数可能未优化性能。 - 复杂 WHERE 条件:
WHERE
子句包含多个嵌套条件和子查询,且常涉及 JSON 字段。
查询低效的原因
- 在大型数据集中,使用运算符访问深层嵌套的 JSON 字段速度较慢;若未正确建立索引,会导致全表扫描,大幅增加执行时间。
- 查询对 JSON 字段执行大量的
LOWER()
和LIKE
操作,计算成本高,且会导致索引无法生效。 - 由于条件的写法问题,JSON 字段上的索引可能无法被利用。
LEFT JOIN
的条件中包含函数和 JSON 字段比较,导致执行计划效率低下。EXISTS
子查询进一步增加了执行复杂性,可能引发全表扫描。- ORM 通常会生成通用性 SQL,未考虑特定数据库的优化方式。
- ORM 的抽象特性可能导致冗余或不必要的条件与连接操作。
糟糕的执行计划
Limit (cost=292646835.98..292646835.99 rows=1 width=1813) (actual time=5504844.738..5504844.745 rows=23 loops=1)
Buffers: shared hit=3622772239 read=7862262 dirtied=3639
-> Sort (cost=292646835.98..292646835.99 rows=1 width=1813) (actual time=5502255.883..5502255.889 rows=23 loops=1)
Sort Key: (row_number() OVER (?))
Sort Method: quicksort Memory: 114kB
Buffers: shared hit=3622772239 read=7862262 dirtied=3639
-> WindowAgg (cost=292646835.93..292646835.97 rows=1 width=1813) (actual time=5502254.506..5502255.810 rows=43 loops=1)
Buffers: shared hit=3622772236 read=7862262 dirtied=3639
-> GroupAggregate (cost=292646835.93..292646835.96 rows=1 width=1805) (actual time=5502253.088..5502253.249 rows=43 loops=1)
Group Key: o.id
Buffers: shared hit=3622772236 read=7862262 dirtied=3639
-> Sort (cost=292646835.93..292646835.93 rows=1 width=1930) (actual time=5502244.952..5502244.964 rows=43 loops=1)
Sort Key: o.id, cr.id
Sort Method: quicksort Memory: 71kB
Buffers: shared hit=3622772236 read=7862262 dirtied=3639
-> Nested Loop (cost=0.98..292646835.92 rows=1 width=1930) (actual time=2015030.222..5502243.020 rows=43 loops=1)
Join Filter: (array_to_string(ARRAY[...]) = ap.policy_id)
Rows Removed by Join Filter: 246198062
Buffers: shared hit=3622772236 read=7862262 dirtied=3639
---
"Planning:"
" Buffers: shared hit=649 read=2"
"Planning Time: 133.626 ms"
"JIT:"
" Functions: 32"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 226.277 ms, Inlining 631.748 ms, Optimization 801.960 ms, Emission 1156.811 ms, Total 2816.797 ms"
"Execution Time: 5505139.477 ms"
执行计划观察结果:
执行时间:约 90 分钟
缓冲区使用:
- 共享命中(shared hit):约 36 亿次
- 读取(read):约 780 万次
- 修改(dirtied):约 3639 次
核心问题:
- 连接与条件中的函数:导致索引无法有效使用。
- 嵌套循环(Nested Loop):逐行处理效率低下。
- 排序操作:因未使用索引,排序成本极高。
- 全表扫描:连接筛选器过滤掉大量行,说明存在大量全表扫描。
优化查询
让我们来对查询做如下修改,以帮助优化器使用索引:
- 建立索引并创建生成列(generated column),避免在
WHERE
子句中对字段应用函数。 - 在可能的情况下,将
LEFT JOIN
改为INNER JOIN
,以便在查询执行早期缩小数据集范围。 - 重写部分条件,避免对 JSON 字段使用函数。
ALTER TABLE orders ADD COLUMN customer_id TEXT GENERATED ALWAYS AS (data->'customer'->>'id') STORED;
ALTER TABLE orders ADD COLUMN order_type TEXT GENERATED ALWAYS AS (data->>'order_type') STORED;
ALTER TABLE orders ADD COLUMN display_name_lower TEXT GENERATED ALWAYS AS (LOWER(data->>'display_name')) STORED;
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
CREATE INDEX idx_orders_order_type ON orders (order_type);
CREATE INDEX idx_orders_display_name_lower ON orders (display_name_lower);
--- ... etc
WITH result_list AS (
SELECT o.id,
o.data AS order_data,
ARRAY_AGG(DISTINCT cr.id) AS customer_refs_id,
ARRAY_AGG(DISTINCT cr.data) AS customer_refs_data,
ARRAY_AGG(DISTINCT cp.policy_id) AS customer_policy_id,
ROW_NUMBER() OVER () AS rowid
FROM orders o
INNER JOIN customer_refs cr ON cr.data->>'id' = o.customer_id
INNER JOIN customer_policies cp ON cp.policy_data->>'id' = o.customer_id
AND cp.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
AND cp.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
AND (
cp.policy_data->>'customer_type' = 'issuer'
OR cp.policy_data->>'customer_type' IN ('type1', 'type2', 'type3')
OR get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
)
WHERE EXISTS (
SELECT 1
FROM account_policies ap
WHERE ap.policy_id = ARRAY_TO_STRING(ARRAY[
o.data->'account'->'id'->>'system_ref',
o.data->'account'->'id'->>'type_ref',
o.data->'account'->'id'->>'region',
o.data->'account'->'id'->>'id'], '-')
AND ap.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
AND ap.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
AND ap.policy_data->>'contract' != 'prospect'
AND (
ap.policy_data->>'account_type' IN ('typeX', 'typeY', 'typeZ')
OR get_text_from_path(ap.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
)
)
AND o.order_type = 'MONEY_ORDER'
AND (o.data->'close_date'->>'value' > '2024-09-10' OR o.data->'close_date'->>'value' IS NULL OR o.data->'close_date'->>'value' = '')
AND o.display_name_lower LIKE '%current%'
GROUP BY o.id, o.data
ORDER BY o.id
)
SELECT *
FROM result_list
ORDER BY rowid
LIMIT 23;
更新后的执行计划分析
Limit (cost=9013972.46..9013972.46 rows=1 width=1789) (actual time=175757.446..175757.449 rows=0 loops=1)
Buffers: shared hit=20716139 read=836484 dirtied=1676
-> Sort (cost=9013972.46..9013972.46 rows=1 width=1789) (actual time=174722.049..174722.052 rows=0 loops=1)
Sort Key: (row_number() OVER (?))
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=20716139 read=836484 dirtied=1676
-> WindowAgg (cost=9013972.40..9013972.45 rows=1 width=1789) (actual time=174722.010..174722.013 rows=0 loops=1)
Buffers: shared hit=20716136 read=836484 dirtied=1676
-> GroupAggregate (cost=9013972.40..9013972.44 rows=1 width=1781) (actual time=174722.008..174722.011 rows=0 loops=1)
Group Key: positions.id
Buffers: shared hit=20716136 read=836484 dirtied=1676
-> Sort (cost=9013972.40..9013972.41 rows=1 width=1906) (actual time=174722.007..174722.009 rows=0 loops=1)
Sort Key: positions.id, client_refs.id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=20716136 read=836484 dirtied=1676
-> Nested Loop (cost=1.41..9013972.39 rows=1 width=1906) (actual time=174722.001..174722.004 rows=0 loops=1)
Buffers: shared hit=20716136 read=836484 dirtied=1676
-> Nested Loop Left Join (cost=0.99..9013961.75 rows=1 width=1933) (actual time=184.358..170945.266 rows=8554 loops=1)
Buffers: shared hit=20683249 read=835147 dirtied=1672
-> Nested Loop Left Join (cost=0.56..9013950.91 rows=1 width=1931) (actual time=138.461..146595.622 rows=8554 loops=1)
Join Filter: (client_refs.id = pv.client_id)
Rows Removed by Join Filter: 328,884,138
Buffers: shared hit=20655467 read=828553 dirtied=1153
-> Nested Loop (cost=0.56..9008615.75 rows=1 width=1736) (actual time=33.568..75336.287 rows=8554 loops=1)
Buffers: shared hit=27,870 read=824,707 dirtied=851
-> Seq Scan on positions_view pv (cost=0.00..9008612.97 rows=1 width=78) (actual time=16.384..12504.255 rows=8554 loops=1)
Filter: ...
Rows Removed by Filter: 32,144,940
Buffers: shared hit=7 read=809,465
-> Index Scan using positions_pkey on positions positions (cost=0.56..2.78 rows=1 width=1685) (actual time=7.327..7.327 rows=1 loops=8554)
Index Cond: (id = pv.id)
Buffers: shared hit=27,550 read=15,242 dirtied=602
-> Seq Scan on client_refs (cost=0.00..4513.96 rows=65,696 width=195) (actual time=0.004..5.734 rows=38,449 loops=8554)
Buffers: shared hit=20,627,597 read=3,846 dirtied=302
-> Index Scan using clients_policy_pkey on clients_policy (cost=0.43..10.84 rows=1 width=26) (actual time=2.841..2.841 rows=0 loops=8554)
Index Cond: (policy_id = pv.client_id)
Filter: ...
Rows Removed by Filter: 1
Buffers: shared hit=27,782 read=6,594 dirtied=519
-> Index Scan using idx_btree_portfolios_policy_id on portfolios_policy (cost=0.42..10.64 rows=1 width=28) (actual time=0.439..0.439 rows=0 loops=8554)
Index Cond: (policy_id = pv.portfolio_id)
Filter: ...
Rows Removed by Filter: 1
Buffers: shared hit=32,887 read=1,337 dirtied=4
---
"Planning:"
" Buffers: shared hit=954 read=78 dirtied=9"
"Planning Time: 130.627 ms"
"JIT:"
" Functions: 33"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 5.483 ms, Inlining 81.013 ms, Optimization 530.635 ms, Emission 423.850 ms, Total 1040.982 ms"
"Execution Time: 175799.036 ms"
执行时间:不到 3 分钟
缓冲区使用:
- 共享命中:约 2070 万次
- 读取:约 836,484 次
- 修改:约 1,676 次
优化点:
- 高效连接:使用
INNER JOIN
在早期缩小数据集范围。 - 索引列:通过索引实现快速数据检索,避免全表扫描。
- 简化条件:移除
WHERE
子句中的部分函数操作,让索引得以生效。 - 减少缓冲区操作:大幅降低 I/O 开销。
与之前执行计划的对比:尽管仍需处理大量行,但优化后的查询执行速度显著提升。