PostgreSQL Tutorial: Tune Stats Collector

May 4, 2024

Summary: In this tutorial, you will learn how to tune stats collector in PostgreSQL.

Table of Contents

Although PostgreSQL 15 has improved the cumulative statistics system with shared memory, it is still very important to know how to tune stats collector in previous versions.

Previously the cumulative statistics data was sent to a statistics collector process via UDP packets, and could only be read by sessions after transferring it via the file system. There is no longer a separate statistics collector process in PostgreSQL 15 and newer versions.

Background

If you set appropriate logging level, you can see messages such as the ones below in the PostgreSQL 14 log:

2023-12-19 23:22:03.338 [14641] LOG: parameter "log_min_messages" changed to "debug3"
2023-12-19 23:22:03.341 [14644] DEBUG: checkpointer updated shared memory configuration values
2023-12-19 23:22:03.343 [14648] DEBUG: received inquiry for database 0
2023-12-19 23:22:03.343 [14648] DEBUG: writing stats file "pg_stat_tmp/global.stat"
2023-12-19 23:22:03.344 [14648] DEBUG: writing stats file "pg_stat_tmp/db_0.stat"
2023-12-19 23:23:03.406 [14648] DEBUG: received inquiry for database 0
2023-12-19 23:23:03.406 [14648] DEBUG: writing stats file "pg_stat_tmp/global.stat"
2023-12-19 23:23:03.406 [14648] DEBUG: writing stats file "pg_stat_tmp/db_0.stat"
2023-12-19 23:23:03.418 [16112] DEBUG: InitPostgres
2023-12-19 23:23:03.419 [16112] DEBUG: autovacuum: processing database "postgres"
2023-12-19 23:23:03.419 [14648] DEBUG: received inquiry for database 14486
2023-12-19 23:23:03.419 [14648] DEBUG: writing stats file "pg_stat_tmp/global.stat"
2023-12-19 23:23:03.419 [14648] DEBUG: writing stats file "pg_stat_tmp/db_14486.stat"
2023-12-19 23:23:03.419 [14648] DEBUG: writing stats file "pg_stat_tmp/db_0.stat"

The above messages indicate that the stats collector process writes statistical data into temporary files, resulting in a significant amount of I/O because it needs to gather the information frequently. During high load, this may lead to the message “LOG: using stale statistics instead of current ones because stats collector is not responding”.

Option for PostgreSQL 14 and older

Simply, moving the directory in which the stats files are generated (stats_temp_directory parameter) to a location in RAMFS or TMPFS, which are RAM-based filesystems, can save the IO overhead while making the stats collector much more efficient and accurate, then the autovacuum workers can obtain more fresh statistics to work more efficiently.

On most systems, the default location of stats_temp_directory is within the data directory only.

PostgreSQL documentation also officially advises so to reduce the impact.

From the PostgreSQL documentation

For better performance, stats_temp_directory can be pointed at a RAM-based file system, decreasing physical I/O requirements. When the server shuts down cleanly, a permanent copy of the statistics data is stored in the pg_stat subdirectory so that statistics can be retained across server restarts.

How much space is required?

The current location where stats files are generated can be checked by inspecting the value of stats_temp_directory.

On Red Hat clones, the default location will be within the data directory.

postgres=# show stats_temp_directory ;
 stats_temp_directory 
----------------------
 pg_stat_tmp

and on Debian/Ubuntu, it will be in /var/run/postgresql, for example:

postgres=# show stats_temp_directory ;
          stats_temp_directory           
-----------------------------------------
 /var/run/postgresql/14-main.pg_stat_tmp

Once the location is identified, it is just a matter of checking the size of that directory using du -sh.

Generally, it won’t be higher than a few hundred MBs. The size depends on the number of databases and objects (tables and indexes) within those databases. More importantly, what are the statistics collected, which is controlled by parameters like track_activities, track_activity_query_size, track_commit_timestamp, track_counts, track_functions and track_io_timing as mentioned in PostgreSQL Documentation.

Ramfs or tmpfs?

There are two main RAM-based filesystems: ramfs and tmpfs.

A ramfs can be mounted using a /etc/fstab entry like

ramfs /var/lib/pgsql_stats_ram ramfs size=1G,uid=postgres,gid=postgres 0 0

However, there are a couple of disadvantages. Even if we specify the uid and gid, as shown above, the ramfs will be mounted as root, and we need a script or method to change ownership or grant permission to the “postgres” account under which PostgreSQL will be running.

Another problem with ramfs is that we won’t be able to see the size using the df command. But there is an advantage also: ramfs is dynamically resizable, and it can grow dynamically as needed. This eliminates the problem of estimating the size required. However, ramfs cannot use swap if required, so there is a slight risk of a system hang, especially on those systems with high memory constraints.

Considering the risk and demerits, ramfs is less popular, and tmpfs is the one which is generally used.

Here is a sample /etc/fstab entry for tmpfs:

tmpfs /var/lib/pgsql_stats_tmp tmpfs size=1G,uid=postgres,gid=postgres 0 0

Once this filesystem is mounted, it will appear as a regular filesystem, unlike ramfs:

$ df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs      1.0G    0  1.0G   0% /var/lib/pgsql_stats_tmp
...

And it is now a matter of instructing PostgreSQL to use this location as the stats_temp_directory. Please remember that this is a dynamic parameter and doesn’t need a PostgreSQL restart.

ALTER SYSTEM SET stats_temp_directory = '/var/lib/pgsql_stats_tmp';

We just need to signal (SIGHUP) the PostgreSQL to reload the new configuration.

select pg_reload_conf();

Another option is to make use of /dev/shm. However, it is recommended to maintain a subdirectory that is owned by the Postgres user account with strict permissions.

Additional note

Taking the stats_temp_directory outside of the data directory needs additional care if we plan to run multiple instances in the same host machine. Each PostgreSQL instance/cluster needs to have its own stats_temp_directory.