PostgreSQL 教程: 分区表统计信息

九月 11, 2025

摘要：在本教程中，您将学习到 PostgreSQL 如何收集分区表统计信息，以及这些信息如何影响 PostgreSQL 的估算结果。

何为分区表统计信息？

优化器统计信息是查询优化器用于估算执行计划成本的数据。你可以在系统视图 pg_stats 中查看表列的优化器统计信息。

在 PostgreSQL 中，分区表是一种逻辑结构，与视图（view）有些类似。分区表本身不存储任何数据，数据实际存储在其各个分区中。向分区表插入行时，实际上是将行插入到对应的分区中。在 PostgreSQL 中，分区本质上是普通表，只是额外承担了“分区表的分区”这一角色。

这里提及的“分区表统计信息”，指的是分区表自身的优化器统计信息，而非其各个分区的优化器统计信息。

分区表统计信息示例

为演示分区表统计信息，我们将使用一个简单示例：

CREATE TABLE tab (
   pk integer NOT NULL
) PARTITION BY LIST ((pk % 2));

CREATE UNLOGGED TABLE tab_even
PARTITION OF tab FOR VALUES IN (0);

CREATE UNLOGGED TABLE tab_odd
PARTITION OF tab FOR VALUES IN (1);

这里使用无日志表（UNLOGGED TABLE）是为了提升性能，因为本示例无需崩溃恢复功能。向分区表插入数据时，两个分区将各接收一半数据：

INSERT INTO tab
SELECT * FROM generate_series(1, 1000000);

分区表统计信息与自动清理

在默认配置下，自动清理启动进程（autovacuum launcher）会休眠最多 1 分钟后才开始工作，随后会处理新创建的表，并执行VACUUM（清理）和ANALYZE（收集统计信息）操作。因此，我们等待两分钟后再查看统计信息：

SELECT * FROM pg_stats
WHERE tablename = 'tab_even'
  AND attname = 'pk';

-[ RECORD 1 ]----------+----------------------
schemaname             | redrock
tablename              | tab_even
attname                | pk
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | -1
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {82,10896,21078,...}
correlation            | 1
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
range_length_histogram | 
range_empty_frac       | 
range_bounds_histogram |

SELECT *
FROM pg_stats
WHERE tablename = 'tab'
  AND attname = 'pk';

(0 rows)

PostgreSQL 仅为分区收集了统计信息，却没有收集分区表自身的统计信息！这一点可在官方文档中找到解释：

分区表不直接存储元组（tuple，即行数据），因此不会被自动清理处理（自动清理会像处理普通表一样处理分区表的各个分区）。遗憾的是，这意味着自动清理不会对分区表执行ANALYZE，可能导致引用分区表统计信息的查询生成非最优的执行计划。解决此问题的方法是：首次填充分区表后，手动对其执行ANALYZE；此后，当分区数据分布发生显著变化时，再次手动执行ANALYZE。

分区表统计信息的重要性

既然分区表的统计信息无法自动收集，那这些信息有多重要呢？毕竟分区表本身不存储数据，而我们已经有了分区的统计信息。对于许多 SQL 语句而言，仅分区的统计信息就足够了，例如：

EXPLAIN (ANALYZE) SELECT * FROM tab WHERE pk < 1000;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..12872.14 rows=913 width=4) (actual time=0.558..27.241 rows=999.00 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=4426
   ->  Parallel Append  (cost=0.00..11780.84 rows=380 width=4) (actual time=6.016..20.488 rows=333.00 loops=3)
         Buffers: shared hit=4426
         ->  Parallel Seq Scan on tab_even tab_1  (cost=0.00..5889.47 rows=249 width=4) (actual time=3.035..9.128 rows=166.33 loops=3)
               Filter: (pk < 1000)
               Rows Removed by Filter: 166500
               Buffers: shared hit=2213
         ->  Parallel Seq Scan on tab_odd tab_2  (cost=0.00..5889.47 rows=288 width=4) (actual time=6.741..16.962 rows=250.00 loops=2)
               Filter: (pk < 1000)
               Rows Removed by Filter: 249750
               Buffers: shared hit=2213
 Planning Time: 0.199 ms
 Execution Time: 27.359 ms

如你所见，此处的估算结果相当准确。

然而，PostgreSQL 在估算连接（JOIN）操作的结果行数时，会依赖分区表自身的统计信息。请看如下示例：

EXPLAIN (ANALYZE)
SELECT * FROM generate_series(1, 100) AS gs(id)
  JOIN tab ON gs.id = tab.pk;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=2.25..28178.25 rows=500000 width=8) (actual time=0.164..145.455 rows=100.00 loops=1)
   Hash Cond: (tab.pk = gs.id)
   Buffers: shared hit=4426
   ->  Append  (cost=0.00..19426.00 rows=1000000 width=4) (actual time=0.018..84.210 rows=1000000.00 loops=1)
         Buffers: shared hit=4426
         ->  Seq Scan on tab_even tab_1  (cost=0.00..7213.00 rows=500000 width=4) (actual time=0.017..21.568 rows=500000.00 loops=1)
               Buffers: shared hit=2213
         ->  Seq Scan on tab_odd tab_2  (cost=0.00..7213.00 rows=500000 width=4) (actual time=0.019..16.421 rows=500000.00 loops=1)
               Buffers: shared hit=2213
   ->  Hash  (cost=1.00..1.00 rows=100 width=4) (actual time=0.119..0.120 rows=100.00 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 12kB
         ->  Function Scan on generate_series gs  (cost=0.00..1.00 rows=100 width=4) (actual time=0.058..0.071 rows=100.00 loops=1)
 Planning:
   Buffers: shared hit=73
 Planning Time: 0.743 ms
 Execution Time: 145.540 ms

所有行数估算结果都是准确的，除了哈希连接（Hash Join）的估算：实际行数为 100，估算行数却为 500000，偏差高达 5000 倍！让我们对分区表执行ANALYZE，看看是否能改善估算结果：

ANALYZE tab;

EXPLAIN (ANALYZE)
SELECT * FROM generate_series(1, 100) AS gs(id)
  JOIN tab ON gs.id = tab.pk;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1002.25..14966.87 rows=100 width=8) (actual time=0.382..57.992 rows=100.00 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=4426
   ->  Hash Join  (cost=2.25..13956.87 rows=42 width=8) (actual time=16.681..52.161 rows=33.33 loops=3)
         Hash Cond: (tab.pk = gs.id)
         Buffers: shared hit=4426
         ->  Parallel Append  (cost=0.00..12391.69 rows=416668 width=4) (actual time=0.008..29.572 rows=333333.33 loops=3)
               Buffers: shared hit=4426
               ->  Parallel Seq Scan on tab_even tab_1  (cost=0.00..5154.18 rows=294118 width=4) (actual time=0.004..6.317 rows=166666.67 loops=3)
                     Buffers: shared hit=2213
               ->  Parallel Seq Scan on tab_odd tab_2  (cost=0.00..5154.18 rows=294118 width=4) (actual time=0.007..10.296 rows=250000.00 loops=2)
                     Buffers: shared hit=2213
         ->  Hash  (cost=1.00..1.00 rows=100 width=4) (actual time=0.058..0.058 rows=100.00 loops=3)
               Buckets: 1024  Batches: 1  Memory Usage: 12kB
               ->  Function Scan on generate_series gs  (cost=0.00..1.00 rows=100 width=4) (actual time=0.027..0.036 rows=100.00 loops=3)
 Planning:
   Buffers: shared hit=12
 Planning Time: 0.237 ms
 Execution Time: 58.041 ms

问题解决了！现在我们也能在pg_stats中看到分区表自身的统计信息：

SELECT *
FROM pg_stats
WHERE tablename = 'tab'
  AND attname = 'pk';

-[ RECORD 1 ]----------+----------------------
schemaname             | redrock
tablename              | tab
attname                | pk
inherited              | t
null_frac              | 0
avg_width              | 4
n_distinct             | -1
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {29,10972,20818,...}
correlation            | 0.5027541
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
range_length_histogram | 
range_empty_frac       | 
range_bounds_histogram |

如何收集分区表统计信息？

显然，我们需要收集分区表统计信息，至少对于某些查询（如连接查询）而言是必需的。但自动清理无法完成这项工作，因此我们必须为分区表安排显式的ANALYZE任务。可通过以下方式实现：

使用操作系统调度工具（如 Linux 系统上的cron）；
使用 PostgreSQL 扩展（如定时任务扩展pg_timetable）；
集成到应用程序自定义的调度逻辑中。

那么，多久执行一次ANALYZE合适？这取决于具体业务场景，但大多数情况下，每天或每周执行一次就足够了。毕竟分区表通常是大型表，其大部分数据在一天或一周内发生显著变化的可能性较低。

为何分区表会被排除在自动清理之外？

该特性很可能目前还处于待实现的状态。

对于普通表，PostgreSQL 会通过pg_stat_all_tables.n_mod_since_analyze这一监控指标（自上次ANALYZE后的修改次数），判断表是否需要自动分析（autoanalyze）。但分区表自身不存储数据，因此该字段的值始终为 0。要实现分区表的自动分析，有两种可行方案：

每当某个分区的n_mod_since_analyze更新时，同步更新分区表的该字段值；
自动清理判断是否需要对分区表执行ANALYZE时，汇总所有分区的n_mod_since_analyze值。

第一种方案会增加所有数据修改语句的开销，因此第二种方案应该会更具可行性。

总结

本文介绍了自动清理不会收集分区表统计信息的特性，若要获得最优执行计划，我们必须通过显式的ANALYZE命令收集这些统计信息。

了解更多

PostgreSQL 优化