# PostgreSQL 教程: 优化范围匹配的选择率估计

## 直方图是如何工作的？

``````CREATE TABLE test2 (id integer, val integer);

INSERT INTO test2 (id, val)
SELECT i, CAST(random() * 400 AS integer)
FROM generate_series(1, 10000) AS s(i);

ANALYZE test2;

SHOW default_statistics_target;
default_statistics_target
---------------------------
100
(1 row)
``````

``````SELECT '...' || right(histogram_bounds::text, 60) AS histogram_bounds
FROM pg_stats s
WHERE s.tablename = 'test2' AND s.attname = 'val';
histogram_bounds
-----------------------------------------------------------------
...346,350,353,357,360,364,370,374,378,382,386,389,393,397,400}
(1 row)
``````

``````EXPLAIN SELECT * FROM test2 WHERE val > 302;
QUERY PLAN
----------------------------------------------------------
Seq Scan on test2  (cost=0.00..150.00 rows=2462 width=8)
Filter: (val > 302)
(2 rows)
``````

``````SELECT sum(s.most_common_freqs[
array_position(s.most_common_vals::text::integer[], v)
])
FROM pg_stats s, unnest(s.most_common_vals::text::integer[]) v
WHERE s.tablename = 'test2' AND s.attname = 'val'
AND v > 302;
sum
--------
0.0237
(1 row)
``````

``````SELECT sum(freq)
FROM pg_stats s, unnest(s.most_common_freqs) freq
WHERE s.tablename = 'test2' AND s.attname = 'val';
sum
--------
0.1442
(1 row)
``````

`val`列中没有 NULL 值：

``````SELECT s.null_frac
FROM pg_stats s
WHERE s.tablename = 'test2' AND s.attname = 'val';
null_frac
-----------
0
(1 row)
``````

``````SELECT round( reltuples * (
0.0237                          -- from most common values
+ (1 - 0.1442 - 0) * (26 / 100.0) -- from histogram
))
FROM pg_class WHERE relname = 'test2';
round
-------
2462
(1 row)
``````

## 当直方图数据过时

``````SELECT '...' || right(histogram_bounds::text, 60) AS histogram_bounds
FROM pg_stats s
WHERE s.tablename = 'test2' AND s.attname = 'id';
histogram_bounds
-----------------------------------------------------------------
...900,9000,9100,9200,9300,9400,9500,9600,9700,9800,9900,10000}
(1 row)
``````

``````INSERT INTO test2 (id, val)
SELECT i, CAST(random() * 400 AS integer)
FROM generate_series(10001, 11000) AS s(i);

EXPLAIN (analyze) SELECT * FROM test2 WHERE id > 10000;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on test2  (cost=0.00..168.00 rows=1 width=8) (actual time=1.527..1.752 rows=1000 loops=1)
Filter: (id > 10000)
Rows Removed by Filter: 10000
Planning time: 0.115 ms
Execution time: 1.809 ms
(5 rows)
``````

PostgreSQL 优化