优化 PostgreSQL 中的一个 JSON 查询

由 John Doe 九月 19, 2025

摘要：在本文中，您将学习如何优化 PostgreSQL 中的一个 JSON 查询。

多年来，对象关系映射（ORM）工具已将查询生成流程标准化，让开发人员得以专注于优化少数真正需要调整的查询。但这些工具有时会生成过于复杂的 SQL，包含过多连接操作和业务逻辑，使得调优工作如同噩梦。开发者往往难以判断何时该将一个查询拆分为多个小型查询。而 JSON 数据类型的使用进一步加剧了这种复杂性。

ORM 生成的低效查询示例

以下是一个由 ORM 生成的复杂 SQL 查询的精简版本。由于其执行时间超过 90 分钟，不符合业务需求，我们需要对其进行调优。

WITH result_list AS (
    SELECT o.id,
           o.data AS order_data,
           ARRAY_AGG(DISTINCT cr.id) AS customer_refs_id,
           ARRAY_AGG(DISTINCT cr.data) AS customer_refs_data,
           ARRAY_AGG(DISTINCT cp.policy_id) AS customer_policy_id,
           ROW_NUMBER() OVER () AS rowid
    FROM orders o
    LEFT JOIN customer_refs cr ON cr.data->>'id' = o.data->'customer'->>'id'
    LEFT JOIN customer_policies cp ON cp.policy_data->>'id' = o.data->'customer'->>'id'
        AND cp.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND (
            (
                cp.policy_data->>'status' IS NOT NULL
                AND cp.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
            )
            AND (
                (
                    cp.policy_data->>'customer_type' IS NOT NULL
                    AND cp.policy_data->>'customer_type' = 'issuer'
                )
                OR (
                    cp.policy_data->>'customer_type' IS NOT NULL
                    AND cp.policy_data->>'customer_type' IN ('type1', 'type2', 'type3')
                )
                OR (
                    get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') IS NOT NULL
                    AND get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
                )
            )
        )
    WHERE EXISTS (
        SELECT 1
        FROM account_policies ap
        WHERE ap.policy_id = ARRAY_TO_STRING(ARRAY[
            o.data->'account'->'id'->>'system_ref',
            o.data->'account'->'id'->>'type_ref',
            o.data->'account'->'id'->>'region',
            o.data->'account'->'id'->>'id'], '-')
        AND ap.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND ap.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
        AND ap.policy_data->>'contract' != 'prospect'
        AND (
            ap.policy_data->>'account_type' IN ('typeX', 'typeY', 'typeZ')
            OR get_text_from_path(ap.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
        )
    )
    AND (
        (
            o.data->>'order_type' = 'MONEY_ORDER'
            AND (
                o.data->'close_date'->>'value' > '2024-09-10'
                OR o.data->'close_date'->>'value' IS NULL
                OR o.data->'close_date'->>'value' = ''
            )
        )
        OR (
            o.data->>'order_type' != 'MONEY_ORDER'
            AND COALESCE(
                NULLIF(o.data->'valuation'->'quantity'->>'value', '')::DECIMAL,
                0
            ) != 0
        )
    )
    AND (
        LOWER(o.data->>'display_name') LIKE '%current%'
        OR LOWER(o.data->'product'->'item'->'item_name'->>'abbreviation') LIKE '%current%'
        OR LOWER(o.data->'product'->'item'->'item_name'->>'full') LIKE '%current%'
        OR LOWER(o.data->'product'->'item'->>'identifier') LIKE '%current%'
        OR LOWER(o.data->'product'->'issuer'->>'display_name') LIKE '%current%'
        OR get_text_for_search_from_path(o.data->'product'->'underlying', '$[*].item.item_name.abbreviation', 'LOWER') LIKE '%current%'
    )
    GROUP BY o.id, o.data
    ORDER BY o.id
),
ordered_list AS (
    SELECT *
    FROM result_list
    ORDER BY rowid
    LIMIT 23
)
SELECT *
FROM ordered_list
ORDER BY rowid;

查询的核心特征：

多连接操作：查询包含多个LEFT JOIN，且ON条件涉及复杂的 JSON 字段操作。
嵌套 JSON 访问：频繁使用 JSON 运算符（->、->>）访问深层嵌套的字段。
自定义函数：使用get_text_from_path和get_text_for_search_from_path等函数，这类函数可能未优化性能。
复杂 WHERE 条件：WHERE子句包含多个嵌套条件和子查询，且常涉及 JSON 字段。

查询低效的原因

在大型数据集中，使用运算符访问深层嵌套的 JSON 字段速度较慢；若未正确建立索引，会导致全表扫描，大幅增加执行时间。
查询对 JSON 字段执行大量的LOWER()和LIKE操作，计算成本高，且会导致索引无法生效。
由于条件的写法问题，JSON 字段上的索引可能无法被利用。
LEFT JOIN的条件中包含函数和 JSON 字段比较，导致执行计划效率低下。
EXISTS子查询进一步增加了执行复杂性，可能引发全表扫描。
ORM 通常会生成通用性 SQL，未考虑特定数据库的优化方式。
ORM 的抽象特性可能导致冗余或不必要的条件与连接操作。

糟糕的执行计划

Limit  (cost=292646835.98..292646835.99 rows=1 width=1813) (actual time=5504844.738..5504844.745 rows=23 loops=1)
  Buffers: shared hit=3622772239 read=7862262 dirtied=3639
  ->  Sort  (cost=292646835.98..292646835.99 rows=1 width=1813) (actual time=5502255.883..5502255.889 rows=23 loops=1)
        Sort Key: (row_number() OVER (?))
        Sort Method: quicksort  Memory: 114kB
        Buffers: shared hit=3622772239 read=7862262 dirtied=3639
        ->  WindowAgg  (cost=292646835.93..292646835.97 rows=1 width=1813) (actual time=5502254.506..5502255.810 rows=43 loops=1)
              Buffers: shared hit=3622772236 read=7862262 dirtied=3639
              ->  GroupAggregate  (cost=292646835.93..292646835.96 rows=1 width=1805) (actual time=5502253.088..5502253.249 rows=43 loops=1)
                    Group Key: o.id
                    Buffers: shared hit=3622772236 read=7862262 dirtied=3639
                    ->  Sort  (cost=292646835.93..292646835.93 rows=1 width=1930) (actual time=5502244.952..5502244.964 rows=43 loops=1)
                          Sort Key: o.id, cr.id
                          Sort Method: quicksort  Memory: 71kB
                          Buffers: shared hit=3622772236 read=7862262 dirtied=3639
                          ->  Nested Loop  (cost=0.98..292646835.92 rows=1 width=1930) (actual time=2015030.222..5502243.020 rows=43 loops=1)
                                Join Filter: (array_to_string(ARRAY[...]) = ap.policy_id)
                                Rows Removed by Join Filter: 246198062
                                Buffers: shared hit=3622772236 read=7862262 dirtied=3639
---
 
"Planning:"
"  Buffers: shared hit=649 read=2"
"Planning Time: 133.626 ms"
"JIT:"
"  Functions: 32"
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 226.277 ms, Inlining 631.748 ms, Optimization 801.960 ms, Emission 1156.811 ms, Total 2816.797 ms"
"Execution Time: 5505139.477 ms"

执行计划观察结果：

执行时间：约 90 分钟

缓冲区使用：

共享命中（shared hit）：约 36 亿次
读取（read）：约 780 万次
修改（dirtied）：约 3639 次

核心问题：

连接与条件中的函数：导致索引无法有效使用。
嵌套循环（Nested Loop）：逐行处理效率低下。
排序操作：因未使用索引，排序成本极高。
全表扫描：连接筛选器过滤掉大量行，说明存在大量全表扫描。

优化查询

让我们来对查询做如下修改，以帮助优化器使用索引：

建立索引并创建生成列（generated column），避免在WHERE子句中对字段应用函数。
在可能的情况下，将LEFT JOIN改为INNER JOIN，以便在查询执行早期缩小数据集范围。
重写部分条件，避免对 JSON 字段使用函数。

ALTER TABLE orders ADD COLUMN customer_id TEXT GENERATED ALWAYS AS (data->'customer'->>'id') STORED;
ALTER TABLE orders ADD COLUMN order_type TEXT GENERATED ALWAYS AS (data->>'order_type') STORED;
ALTER TABLE orders ADD COLUMN display_name_lower TEXT GENERATED ALWAYS AS (LOWER(data->>'display_name')) STORED;
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
CREATE INDEX idx_orders_order_type ON orders (order_type);
CREATE INDEX idx_orders_display_name_lower ON orders (display_name_lower);
--- ... etc

WITH result_list AS (
    SELECT o.id,
           o.data AS order_data,
           ARRAY_AGG(DISTINCT cr.id) AS customer_refs_id,
           ARRAY_AGG(DISTINCT cr.data) AS customer_refs_data,
           ARRAY_AGG(DISTINCT cp.policy_id) AS customer_policy_id,
           ROW_NUMBER() OVER () AS rowid
    FROM orders o
    INNER JOIN customer_refs cr ON cr.data->>'id' = o.customer_id
    INNER JOIN customer_policies cp ON cp.policy_data->>'id' = o.customer_id
        AND cp.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND cp.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
        AND (
            cp.policy_data->>'customer_type' = 'issuer'
            OR cp.policy_data->>'customer_type' IN ('type1', 'type2', 'type3')
            OR get_text_from_path(cp.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
        )
    WHERE EXISTS (
        SELECT 1
        FROM account_policies ap
        WHERE ap.policy_id = ARRAY_TO_STRING(ARRAY[
            o.data->'account'->'id'->>'system_ref',
            o.data->'account'->'id'->>'type_ref',
            o.data->'account'->'id'->>'region',
            o.data->'account'->'id'->>'id'], '-')
        AND ap.policy_data->>'region' = ANY (ARRAY['GLOBAL', 'REGION_A', 'REGION_B', 'REGION_C', 'REGION_D'])
        AND ap.policy_data->>'status' NOT IN ('inactive', 'blocked', 'suspended')
        AND ap.policy_data->>'contract' != 'prospect'
        AND (
            ap.policy_data->>'account_type' IN ('typeX', 'typeY', 'typeZ')
            OR get_text_from_path(ap.policy_data->'shared_groups', '$[*]', 'NONE') && ARRAY['GROUP_1']
        )
    )
    AND o.order_type = 'MONEY_ORDER'
    AND (o.data->'close_date'->>'value' > '2024-09-10' OR o.data->'close_date'->>'value' IS NULL OR o.data->'close_date'->>'value' = '')
    AND o.display_name_lower LIKE '%current%'
    GROUP BY o.id, o.data
    ORDER BY o.id
)
SELECT *
FROM result_list
ORDER BY rowid
LIMIT 23;

更新后的执行计划分析

Limit  (cost=9013972.46..9013972.46 rows=1 width=1789) (actual time=175757.446..175757.449 rows=0 loops=1)
  Buffers: shared hit=20716139 read=836484 dirtied=1676
  ->  Sort  (cost=9013972.46..9013972.46 rows=1 width=1789) (actual time=174722.049..174722.052 rows=0 loops=1)
        Sort Key: (row_number() OVER (?))
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=20716139 read=836484 dirtied=1676
        ->  WindowAgg  (cost=9013972.40..9013972.45 rows=1 width=1789) (actual time=174722.010..174722.013 rows=0 loops=1)
              Buffers: shared hit=20716136 read=836484 dirtied=1676
              ->  GroupAggregate  (cost=9013972.40..9013972.44 rows=1 width=1781) (actual time=174722.008..174722.011 rows=0 loops=1)
                    Group Key: positions.id
                    Buffers: shared hit=20716136 read=836484 dirtied=1676
                    ->  Sort  (cost=9013972.40..9013972.41 rows=1 width=1906) (actual time=174722.007..174722.009 rows=0 loops=1)
                          Sort Key: positions.id, client_refs.id
                          Sort Method: quicksort  Memory: 25kB
                          Buffers: shared hit=20716136 read=836484 dirtied=1676
                          ->  Nested Loop  (cost=1.41..9013972.39 rows=1 width=1906) (actual time=174722.001..174722.004 rows=0 loops=1)
                                Buffers: shared hit=20716136 read=836484 dirtied=1676
                                ->  Nested Loop Left Join  (cost=0.99..9013961.75 rows=1 width=1933) (actual time=184.358..170945.266 rows=8554 loops=1)
                                      Buffers: shared hit=20683249 read=835147 dirtied=1672
                                      ->  Nested Loop Left Join  (cost=0.56..9013950.91 rows=1 width=1931) (actual time=138.461..146595.622 rows=8554 loops=1)
                                            Join Filter: (client_refs.id = pv.client_id)
                                            Rows Removed by Join Filter: 328,884,138
                                            Buffers: shared hit=20655467 read=828553 dirtied=1153
                                            ->  Nested Loop  (cost=0.56..9008615.75 rows=1 width=1736) (actual time=33.568..75336.287 rows=8554 loops=1)
                                                  Buffers: shared hit=27,870 read=824,707 dirtied=851
                                                  ->  Seq Scan on positions_view pv  (cost=0.00..9008612.97 rows=1 width=78) (actual time=16.384..12504.255 rows=8554 loops=1)
                                                        Filter: ...
                                                        Rows Removed by Filter: 32,144,940
                                                        Buffers: shared hit=7 read=809,465
                                                  ->  Index Scan using positions_pkey on positions positions  (cost=0.56..2.78 rows=1 width=1685) (actual time=7.327..7.327 rows=1 loops=8554)
                                                        Index Cond: (id = pv.id)
                                                        Buffers: shared hit=27,550 read=15,242 dirtied=602
                                            ->  Seq Scan on client_refs  (cost=0.00..4513.96 rows=65,696 width=195) (actual time=0.004..5.734 rows=38,449 loops=8554)
                                                  Buffers: shared hit=20,627,597 read=3,846 dirtied=302
                                      ->  Index Scan using clients_policy_pkey on clients_policy  (cost=0.43..10.84 rows=1 width=26) (actual time=2.841..2.841 rows=0 loops=8554)
                                            Index Cond: (policy_id = pv.client_id)
                                            Filter: ...
                                            Rows Removed by Filter: 1
                                            Buffers: shared hit=27,782 read=6,594 dirtied=519
                                ->  Index Scan using idx_btree_portfolios_policy_id on portfolios_policy  (cost=0.42..10.64 rows=1 width=28) (actual time=0.439..0.439 rows=0 loops=8554)
                                      Index Cond: (policy_id = pv.portfolio_id)
                                      Filter: ...
                                      Rows Removed by Filter: 1
                                      Buffers: shared hit=32,887 read=1,337 dirtied=4
                                       
---

"Planning:"
"  Buffers: shared hit=954 read=78 dirtied=9"
"Planning Time: 130.627 ms"
"JIT:"
"  Functions: 33"
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 5.483 ms, Inlining 81.013 ms, Optimization 530.635 ms, Emission 423.850 ms, Total 1040.982 ms"
"Execution Time: 175799.036 ms"

执行时间：不到 3 分钟