PostgreSQL 教程: EXPLAIN 扫描节点

八月 21, 2025

摘要：在本教程中，你将了解在执行计划中可能会看到的各种扫描 “节点” 或操作。

顺序扫描（Seq Scan）

顺序扫描看起来是这样的：

explain analyze select * from pg_class;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=202) (actual time=0.009..0.049 rows=295 loops=1)
 Total runtime: 0.249 ms
(2 rows)

这是最简单的操作：PostgreSQL 打开表文件，一行一行地读取行，将它们返回给用户或计划树中的上层节点，例如 Limit 节点，就像这样：

explain analyze select * from pg_class limit 2;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.07 rows=2 width=202) (actual time=0.014..0.014 rows=2 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=202) (actual time=0.009..0.009 rows=2 loops=1)
 Total runtime: 0.132 ms
(3 rows)

重要的是要理解，返回的行的顺序不是特定的。它不是 “按插入顺序”，也不是 “最后更新的在前” 之类的。并发的 SELECT、UPDATE、DELETE、VACUUM 操作可以随时改变行的顺序。

顺序扫描可以过滤行，也就是说拒绝一些行被返回。例如，当你添加 “WHERE” 子句时就会发生这种情况：

explain analyze select * from pg_class
  where relname ~ 'a';
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on pg_class  (cost=0.00..11.65 rows=227 width=202) (actual time=0.030..0.294 rows=229 loops=1)
   Filter: (relname ~ 'a'::text)
   Rows Removed by Filter: 66
 Total runtime: 0.379 ms
(4 rows)

正如你所看到的，现在我们有了筛选条件（Filter）的信息。而且还有 “被筛选器移除的行”（Rows removed by filter）这一行。

索引扫描（Index Scan）

这种扫描似乎非常直接，大多数人至少在一种情况下理解它的使用：

explain analyze select * from pg_class
  where oid = 1247;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using pg_class_oid_index on pg_class  (cost=0.15..8.17 rows=1 width=202) (actual time=0.007..0.007 rows=1 loops=1)
   Index Cond: (oid = 1247::oid)
 Total runtime: 0.077 ms
(3 rows)

也就是说，我们有一个与条件匹配的索引，所以 PostgreSQL 会：

打开索引
在索引中找到（表数据中）可能有符合给定条件的行的位置
打开表
获取索引指向的行
如果这些行可以被返回，即它们对当前会话可见，就会被返回

你当然会问：一行怎么会不可见呢？这可能发生在已删除但仍在表中的行（尚未被清理）、已更新的行或者已插入但发生在当前事务之后的行上面。

当你希望使用索引的顺序对某些数据进行排序时，也会使用索引扫描。就像这样：

explain analyze select * from pg_class
  order by oid limit 10;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.15..1.67 rows=10 width=206) (actual time=0.017..0.029 rows=10 loops=1)
   ->  Index Scan using pg_class_oid_index on pg_class  (cost=0.15..44.53 rows=292 width=206) (actual time=0.014..0.026 rows=10 loops=1)
 Total runtime: 0.145 ms
(3 rows)

这里没有条件，但我们可以很容易地添加条件，像这样：

explain analyze select * from pg_class
  where oid > 1247 order by oid limit 10;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.15..4.03 rows=10 width=206) (actual time=0.021..0.035 rows=10 loops=1)
   ->  Index Scan using pg_class_oid_index on pg_class  (cost=0.15..37.84 rows=97 width=206) (actual time=0.017..0.031 rows=10 loops=1)
         Index Cond: (oid > 1247::oid)
 Total runtime: 0.132 ms
(4 rows)

在这些情况下，PostgreSQL 在索引中找到起始点（要么是第一个大于 1247 的行，要么就是索引中最小的值），然后返回下一行或值，直到满足限制条件为止。

索引扫描有一种版本，叫做 “反向索引扫描”（Index Scan Backward），它做的事情相同，但用于按降序扫描：

explain analyze select * from pg_class
  where oid < 1247 order by oid desc limit 10;
                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.15..4.03 rows=10 width=206) (actual time=0.012..0.026 rows=10 loops=1)
   ->  Index Scan Backward using pg_class_oid_index on pg_class  (cost=0.15..37.84 rows=97 width=206) (actual time=0.009..0.022 rows=10 loops=1)
         Index Cond: (oid < 1247::oid)
 Total runtime: 0.119 ms
(4 rows)

这是同一种操作：打开索引，对于索引指向的每一行，从表中获取行，只是它不是 “从小到大”，而是 “从大到小”。

仅索引扫描（Index Only Scan）

让我们创建一个简单的表：

create table test (id serial primary key, i int4);

insert into test (i)
  select random() * 1000000000
    from generate_series(1,100000);

vacuum analyze test;

这给了我一个像这样的表：

select * from test limit 10;
 id |     i
----+-----------
  1 | 546119592
  2 | 253476978
  3 | 235791031
  4 | 654694043
  5 | 187647296
  6 | 709050245
  7 | 210316749
  8 | 348927354
  9 | 120463097
 10 |   5611946
(10 rows)

在这里，我在 id 上有一个索引：

\d test
                         Table "public.test"
 Column |  Type   |                     Modifiers
--------+---------+---------------------------------------------------
 id     | integer | not null default nextval('test_id_seq'::regclass)
 i      | integer | 
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)

所以，如果满足某些条件（稍后会详细说明），我可以得到这样的计划：

explain analyze select id from test
  order by id asc limit 10;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..0.55 rows=10 width=4) (actual time=0.039..0.042 rows=10 loops=1)
   ->  Index Only Scan using test_pkey on test  (cost=0.29..2604.29 rows=100000 width=4) (actual time=0.036..0.038 rows=10 loops=1)
         Heap Fetches: 0
 Total runtime: 0.092 ms
(4 rows)

请注意 “Index Only Scan” 中的 “Only” 一词。这意味着 PostgreSQL 意识到我只查询索引中存在的数据（列）。而且它可能不需要检查表文件中的任何内容。所以它会直接从索引返回数据。

这些扫描可以比普通的索引扫描快得多，因为它们不必验证表数据中的任何内容。

问题是，为了使其工作，索引必须知道给定的行所在的页面最近没有任何更改。这意味着为了利用仅索引扫描，你必须对表进行良好的清理。如果正常启用了自动清理功能，这应该不是什么大问题。

位图索引扫描（Bitmap Index Scan）

位图索引扫描看起来像这样：

create index i1 on test (i);

explain analyze select * from test where i < 100000;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=4.37..39.99 rows=10 width=8) (actual time=0.025..0.110 rows=13 loops=1)
   Recheck Cond: (i < 100000)
   ->  Bitmap Index Scan on i1  (cost=0.00..4.37 rows=10 width=0) (actual time=0.013..0.013 rows=13 loops=1)
         Index Cond: (i < 100000)
 Total runtime: 0.154 ms
(5 rows)

位图扫描总是至少包含两个节点。第一个较低级别的是位图索引扫描（Bitmap Index Scan），然后是位图堆扫描（Bitmap Heap Scan）。

它是如何工作的呢？

假设你的表有 100000 个页（大约 780MB）。位图索引扫描会创建一个位图，其中表的每个页都有一个位。所以在这种情况下，我们会得到一个 100000 位的内存块，约 12.5kB。所有这些位都被设置为 0。然后位图索引扫描会将一些位设置为 1，这取决于表中的哪些页可能包含应该返回的行。

这部分根本不涉及表数据，只涉及索引。完成后，也就是说，所有可能包含应该返回的行的页都被 “标记” 后，这个位图会被传递给上层的位图堆扫描节点，它会以更顺序的方式读取它们。

这种操作的意义是什么呢？普通的索引扫描会导致随机 IO，也就是说，磁盘中的页是以随机方式加载的。至少在旋转磁盘上，这是很慢的。

顺序扫描对于获取单个页来说更快，但另一方面，你并不总是需要所有的页。

当你需要从表中获取很多行，但不是所有行，并且你要返回的行不在单个块中（如果我执行 “…where id < …” 可能会是这种情况）时，位图索引扫描结合了这两种情况。位图扫描还有另一个有趣的特性，那就是，它们可以将两个操作、两个索引连接在一起。就像这样：

explain analyze select * from test
  where i < 5000000 or i > 950000000;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=107.36..630.60 rows=5323 width=8) (actual time=1.023..4.353 rows=5386 loops=1)
   Recheck Cond: ((i < 5000000) OR (i > 950000000))
   ->  BitmapOr  (cost=107.36..107.36 rows=5349 width=0) (actual time=0.922..0.922 rows=0 loops=1)
         ->  Bitmap Index Scan on i1  (cost=0.00..12.25 rows=527 width=0) (actual time=0.120..0.120 rows=491 loops=1)
               Index Cond: (i < 5000000)
         ->  Bitmap Index Scan on i1  (cost=0.00..92.46 rows=4822 width=0) (actual time=0.799..0.799 rows=4895 loops=1)
               Index Cond: (i > 950000000)
 Total runtime: 4.765 ms
(8 rows)

在这里，我们看到了两个位图索引扫描（还可以更多），然后它们使用 BitmapOr 连接在一起。

正如你所记得的，位图索引扫描的输出是一个位图，也就是一个包含一些 0 和一些 1 的内存块。有了多个这样的位图，你就可以很容易地对其进行逻辑运算：或、与或非。

在这里，我们看到两个这样的位图使用 “或” 运算符连接在一起，得到的位图被传递给位图堆扫描，后者从表中加载相应的行。

虽然在这里两个索引扫描使用相同的索引，但情况并非总是如此。例如，让我们来快速添加更多的一些列：

alter table test add column j int4 default random() * 1000000000;
alter table test add column h int4 default random() * 1000000000;
create index i2 on test (j);
create index i3 on test (h);

现在：

explain analyze select * from test
  where j < 50000000 and i < 50000000 and h > 950000000;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=280.76..323.61 rows=12 width=16) (actual time=2.295..2.352 rows=11 loops=1)
   Recheck Cond: ((h > 950000000) AND (j < 50000000) AND (i < 50000000))
   ->  BitmapAnd  (cost=280.76..280.76 rows=12 width=0) (actual time=2.278..2.278 rows=0 loops=1)
         ->  Bitmap Index Scan on i3  (cost=0.00..92.53 rows=4832 width=0) (actual time=0.546..0.546 rows=4938 loops=1)
               Index Cond: (h > 950000000)
         ->  Bitmap Index Scan on i2  (cost=0.00..93.76 rows=4996 width=0) (actual time=0.783..0.783 rows=5021 loops=1)
               Index Cond: (j < 50000000)
         ->  Bitmap Index Scan on i1  (cost=0.00..93.96 rows=5022 width=0) (actual time=0.798..0.798 rows=4998 loops=1)
               Index Cond: (i < 50000000)
 Total runtime: 2.428 ms
(10 rows)