由 John Doe 三月 9, 2026
pg_ash 可以用于生成 Postgres 的活动会话历史报告,轻量级等待事件采样,无数据膨胀。

纯 SQL + PL/pgSQL 实现,可在任何 Postgres 14 及以上版本运行,包括各类云托管服务。无需 C 扩展、无需配置 shared_preload_libraries、无需云厂商审批、无需重启数据库。只需执行 \i 导入即可使用。
为什么需要它
Postgres 没有内置会话历史。一小时前出现的性能慢查询,事后没有任何数据可查。pg_ash 每秒采样一次 pg_stat_activity,并以紧凑格式存储结果,可直接用普通 SQL 查询。
与同类工具对比
pg_ash
- 安装:纯 SQL 脚本
- 托管实例:支持 RDS、Cloud SQL、Supabase 等
- 采样频率:1 秒(pg_cron)
- 数据可见性:数据库内部
- 存储:磁盘,约 30MiB / 天
- 历史查询:支持,持久化
- 实现:纯 SQL
- 维护开销:无
- 依赖:pg_cron 1.5+
pg_wait_sampling
- 安装:需 shared_preload_libraries
- 托管实例:仅 Cloud SQL(2026 初)
- 采样频率:10 毫秒(进程内)
- 数据可见性:数据库内部
- 存储:仅内存
- 历史查询:环形缓冲区,重启丢失
- 实现:C 扩展,非纯 SQL
- 维护开销:无
- 依赖:shared_preload_libraries,需重启
pgsentinel
- 安装:需 shared_preload_libraries(打包 / 编译)
- 托管实例:暂不支持
- 采样频率:10 毫秒(进程内)
- 数据可见性:数据库内部
- 存储:仅内存
- 历史查询:环形缓冲区,重启丢失
- 实现:C 扩展,非纯 SQL
- 维护开销:无
- 依赖:shared_preload_libraries,需重启
外部采样工具
- 安装:独立基础设施
- 托管实例:支持但需额外配置
- 采样频率:15-60 秒
- 数据可见性:仅外部
- 存储:外部存储
- 历史查询:取决于配置
- 实现:非纯 SQL
- 维护开销:高
- 依赖:代理 + 存储
使用示例
查看状态
select * from ash.status();
metric | value
----------------------------+-------------------------------
version | 1.2
current_slot | 0
sample_interval | 00:00:01
rotation_period | 1 day
include_bg_workers | false
samples_in_current_slot | 56
last_sample_ts | 2026-02-16 08:39:03+00
wait_event_map_count | 11
wait_event_map_utilization | 0.03%
query_map_count | 8
pg_cron_available | no
最近性能问题概览
-- morning coffee: what happened overnight?
select * from ash.activity_summary('5 minutes');
metric | value
----------------------+---------------------------------------------------------------------------------------------
time_range | 00:05:00
total_samples | 56
avg_active_backends | 6.6
peak_active_backends | 10
peak_time | 2026-02-16 08:38:16+00
databases_active | 1
top_wait_1 | Client:ClientRead (46.77%)
top_wait_2 | Timeout:PgSleep (11.83%)
top_wait_3 | Lock:transactionid (9.68%)
top_query_1 | -2835399305386018931 — COMMIT (29.73%)
top_query_2 | 3365820675399133794 — UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE b (23.24%)
top_query_3 | -4378765880691287891 — UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE t (11.35%)
-- top wait events (default: top 10 + Other)
select * from ash.top_waits('5 minutes');
wait_event | samples | pct | bar
--------------------+---------+-------+---------------------------------------
Client:ClientRead | 174 | 46.77 | ██████████████████████████████ 46.77%
Timeout:PgSleep | 44 | 11.83 | ████████ 11.83%
Lock:transactionid | 36 | 9.68 | ██████ 9.68%
CPU* | 35 | 9.41 | ██████ 9.41%
LWLock:WALWrite | 31 | 8.33 | █████ 8.33%
IdleTx | 26 | 6.99 | ████ 6.99%
IO:WalSync | 19 | 5.11 | ███ 5.11%
Lock:tuple | 5 | 1.34 | █ 1.34%
LWLock:LockManager | 2 | 0.54 | █ 0.54%
-- top queries with text from pg_stat_statements
select * from ash.top_queries_with_text('5 minutes', 5);
query_id | samples | pct | calls | total_exec_time_ms | mean_exec_time_ms | query_text
----------------------+---------+-------+--------+--------------------+-------------------+---------------------------------------------------------------------
-2835399305386018931 | 110 | 29.73 | 283202 | 1234.56 | 0.00 | commit
3365820675399133794 | 86 | 23.24 | 283195 | 518349.35 | 1.83 | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2
5457019535816659310 | 44 | 11.89 | 11 | 195225.25 | 17747.75 | select pg_sleep($1)
-4378765880691287891 | 42 | 11.35 | 283195 | 113278.00 | 0.40 | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2
-- breakdown by wait event type
select * from ash.top_by_type('5 minutes');
wait_event_type | samples | pct | bar
-----------------+---------+-------+-------------------------------------------------
Client | 174 | 46.77 | ████████████████████████████████████████ 46.77%
Timeout | 44 | 11.83 | ██████████ 11.83%
Lock | 41 | 11.02 | █████████ 11.02%
CPU* | 35 | 9.41 | ████████ 9.41%
LWLock | 33 | 8.87 | ████████ 8.87%
IdleTx | 26 | 6.99 | ██████ 6.99%
IO | 19 | 5.11 | ████ 5.11%
分析特定查询
-- what is query 3365820675399133794 waiting on?
select * from ash.query_waits(3365820675399133794, '5 minutes');
wait_event | samples | pct | bar
--------------------+---------+-------+-------------------------------------------------
Client:ClientRead | 32 | 54.24 | ████████████████████████████████████████ 54.24%
Lock:transactionid | 12 | 20.34 | ███████████████ 20.34%
LWLock:WALWrite | 6 | 10.17 | ████████ 10.17%
CPU* | 4 | 6.78 | █████ 6.78%
IO:WalSync | 3 | 5.08 | ████ 5.08%
IdleTx | 2 | 3.39 | ██ 3.39%
-- same, but during a specific time window
select * from ash.query_waits_at(3365820675399133794, '2026-02-16 08:38', '2026-02-16 08:40');
跟踪某个等待事件
-- which queries are stuck on Lock:transactionid?
select * from ash.event_queries('Lock:transactionid', '1 hour');
-- or by wait type (matches all events of that type)
select * from ash.event_queries('IO', '1 hour');
查看原始采样
-- see the last 20 decoded samples with query text
select * from ash.samples('10 minutes', 20);
sample_time | database_name | active_backends | wait_event | query_id | query_text
------------------------+---------------+-----------------+--------------------+----------------------+--------------------------------------------------------------
2026-02-16 11:18:51+00 | postgres | 7 | CPU* | -2835399305386018931 | END
2026-02-16 11:18:51+00 | postgres | 7 | CPU* | 3365820675399133794 | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE ...
2026-02-16 11:18:49+00 | postgres | 5 | Client:ClientRead | 9144568883098003499 | SELECT abalance FROM pgbench_accounts WHERE aid = $1
2026-02-16 11:18:49+00 | postgres | 5 | IO:WalSync | -2835399305386018931 | END
2026-02-16 11:18:49+00 | postgres | 3 | Lock:transactionid | -2835399305386018931 | END
2026-02-16 11:18:49+00 | postgres | 5 | LWLock:WALWrite | -2835399305386018931 | END
-- raw samples during an incident
select * from ash.samples_at('2026-02-14 03:00', '2026-02-14 03:05', 50);
定位故障
使用带绝对时间戳的_at后缀函数,可聚焦分析指定的时间窗口:
-- what happened between 3:00 and 3:10 am?
select * from ash.top_waits_at('2026-02-14 03:00', '2026-02-14 03:10');
-- which queries were running during the incident?
select * from ash.top_queries_at('2026-02-14 03:00', '2026-02-14 03:10');
-- minute-by-minute timeline of the incident
select * from ash.wait_timeline_at(
'2026-02-14 03:00',
'2026-02-14 03:10',
'1 minute'
);
依赖要求
- Postgres 14 及以上版本(要求
pg_stat_activity视图中包含query_id字段) - pg_cron 1.5 及以上版本(用于支持分钟级以下的调度)
- pg_stat_statements(可选组件,启用后可在
top_queries_with_text()和event_queries()函数中显示查询文本、calls(调用次数)、total_exec_time_ms(总执行时间,毫秒)、mean_exec_time_ms(平均执行时间,毫秒);不安装该组件时,其他所有函数仍可正常使用)
关于 query_id 的说明:默认配置 compute_query_id = auto,仅在 pg_stat_statements 被加入 shared_preload_libraries 时,才会填充 query_id 字段。如果发现 pg_stat_activity 中的 query_id 字段值为 NULL,请执行以下设置:
alter system set compute_query_id = 'on';
-- requires reload: select pg_reload_conf();
参考
pg_ash:https://github.com/NikolayS/pg_ash