pg_profile: PostgreSQL historical workload reports

March 12, 2024

Summary: The pg_profile extension helps you to find out most resource-consuming activities in PostgreSQL databases.

Table of Contents

Concepts

Extension is based on statistic views of PostgreSQL and several contrib extensions. It is written in pure PL/pgSQL and doesn’t need any external libraries or software, but PostgreSQL database itself, and a cron-like tool performing periodic tasks. Initially developed and tested on PostgreSQL 9.6, extension may be incompatible with earlier releases.

pg_profile will use pg_stat_kcache data, providing information about statements CPU usage and filesystem load (if this extension is available).

Historic repository will be created in your database by this extension. This repository will hold statistic “samples” for your PostgreSQL clusters. Sample is taken by calling take_sample() function. PostgreSQL doesn’t have any job-like engine, so you’ll need to use cron.

Periodic samples can help you finding most resource intensive activities in the past. Suppose, you were reported performance degradation several hours ago. Resolving such issue, you can build a report between two samples bounding performance issue period to see the load profile of your database. You can use provided grafana dashboard to locate the exact time when performance issues was happening.

You can take an explicit sample before running any batch processing, and after it will be done.

Any time you are taking a sample, pg_stat_statements_reset() will be called, ensuring you wont loose statements due to reaching pg_stat_statements.max. Also, report will contain section, informing you if captured statements count in any sample reaches 90% of pg_stat_statements.max. Resetting of pg_stat_statements data will affect monitoring systems.

If pg_wait_sampling extension is available, pg_profile will collect summary wait statistics of the PostgreSQL instance and reset the pg_wait_sampling_profile view data. This behaviour can also affect monitoring systems.

pg_profile, installed in one cluster is able to collect statistics from other clusters, called servers. You just need to define some servers, providing names and connection strings and make sure connection can be established to all databases of all servers. Now you can track, for example, statistics on your standbys from master, or from any other server. Once extension is installed, a local server is automatically created - this is a server for cluster where pg_profile resides.

Extension architecture

Extension consists of four parts:

Historic repository is a storage for sampling data. Repository is a set of extension tables.
Sample management engine is a set of functions, used for taking samples and support repository by removing obsolete sample data from it.
Report engine is a set of functions used for report generation based on data from historic repository.
Administrative functions allows you to create and manage servers and baselines.

Prerequisites

Although pg_profile is usually installed in the target cluster, it also can collect performance data from other clusters. Hence, we have prerequisites for pg_profile database, and for servers.

pg_profile database prerequisites

pg_profile extension depends on extensions plpgsql and dblink.

Servers prerequisites

The only mandatory requirement for server cluster is the ability to connect from pg_profile database using provided server connection string. All other requirements are optional, but they can improve completeness of gathered statistics.

Consider setting following Statistics Collector parameters:

track_activities = on
track_counts = on
track_io_timing = on
track_wal_io_timing = on      # Since PostgreSQL 14
track_functions = all/pl

If you need statement statistics in reports, then database, mentioned in server connection string must have pg_stat_statements extension configured. Set pg_stat_statements parameters to meet your needs (see PostgreSQL documentation):

pg_stat_statements.max - low setting for this parameter may cause some statements statistics to be wiped out before sample is taken. Report will warn you if your pg_stat_statements.max is seems to be undersized.
pg_stat_statements.track = ’top’ - before PostgreSQL 14 all value will affect accuracy of %Total fields for statements-related sections of a report.

Installation

Step 1 Installation of extension files

Extract extension files (see Releases page) to PostgreSQL extensions location, which is

# tar xzf pg_profile-<version>.tar.gz --directory $(pg_config --sharedir)/extension

Just make sure you are using appropriate pg_config utility.

Step 2 Creating extensions

The most easy way is to install everything in public schema of a database:

postgres=# CREATE EXTENSION dblink;
postgres=# CREATE EXTENSION pg_stat_statements;
postgres=# CREATE EXTENSION pg_profile;

If you want to install pg_profile in other schema, just create it, and install extension in that schema:

postgres=# CREATE EXTENSION dblink;
postgres=# CREATE EXTENSION pg_stat_statements;
postgres=# CREATE SCHEMA profile;
postgres=# CREATE EXTENSION pg_profile SCHEMA profile;

All objects will be created in schema, defined by SCHEMA clause. Installation in dedicated schema is the recommended way - the extension will create its own tables, views, sequences and functions. It is a good idea to keep them separate. If you don’t want to specify schema qualifier when using module, consider changing search_path setting.

Step 3 Update to new version

New versions of pg_profile will contain migration script (when possible). So, in case of update you will only need to install extension files (see Step 1) and update the extension, like this:

postgres=# ALTER EXTENSION pg_profile UPDATE;

Privileges

Up to three roles can be involved while pg_profile is in action:

The pg_profile owner is the owner of pg_profile extension.
The collecting role is used by pg_profile while connecting to databases.
The reporting role is used to generate reports. If you use the superuser role postgres to perform all actions with pg_profile, you can skip the most of the following steps.

The pg_profile owner

This role can be used to perform all actions related to pg_profile extension. This role will have access to server connection strings, which may contain a passwords. You should use this role to run take_sample() function. Dblink extension is needed for this user. Complicated example assuming each extension in own schema:

\c postgres postgres
CREATE SCHEMA dblink;
CREATE EXTENSION dblink SCHEMA dblink;
CREATE USER profile_usr with password 'profile_pwd';
GRANT USAGE ON SCHEMA dblink TO profile_usr;
CREATE SCHEMA profile AUTHORIZATION profile_usr;
\c postgres profile_usr
CREATE EXTENSION pg_profile SCHEMA profile;

The collecting role

This role should be used by pg_profile for connecting to databases and performing statistics collection. Unprivileged users can’t open connections using the dblink extension without a password, so you have to provide it in the connection string for each server. This role should have access to all supported statistics extensions. Also it should be able to perform a reset of statistic extensions. The most complicated example:

\c postgres postgres
CREATE SCHEMA pgss;
CREATE SCHEMA pgsk;
CREATE SCHEMA pgws;
CREATE EXTENSION pg_stat_statements SCHEMA pgss;
CREATE EXTENSION pg_stat_kcache SCHEMA pgsk;
CREATE EXTENSION pg_wait_sampling SCHEMA pgws;
CREATE USER profile_collector with password 'collector_pwd';
GRANT pg_read_all_stats TO profile_collector;
GRANT USAGE ON SCHEMA pgss TO profile_collector;
GRANT USAGE ON SCHEMA pgsk TO profile_collector;
GRANT USAGE ON SCHEMA pgws TO profile_collector;
GRANT EXECUTE ON FUNCTION pgsk.pg_stat_kcache_reset TO profile_collector;
GRANT EXECUTE ON FUNCTION pgss.pg_stat_statements_reset TO profile_collector;
GRANT EXECUTE ON FUNCTION pgws.pg_wait_sampling_reset_profile TO profile_collector;

Now you should setup a connection string pointing to the database with statistics extensions installed:

\c postgres profile_usr
SELECT profile.set_server_connstr('local','dbname=postgres port=5432 host=localhost user=profile_collector password=collector_pwd');

The password authentication must be configured in the pg_hba.conf file for the profile_collector user.

Obviously, the collecting role should be configured properly on all servers observed by pg_profile.

You should be able to perform a take_sample() call now using the profile_usr role:

\c postgres profile_usr
SELECT * FROM take_sample();

It is time to configure the scheduler (in our example crontab of postgres user):

*/30 * * * *   psql -U profile_usr -d postgres -c 'SELECT profile.take_sample()' > /dev/null 2>&1

Note that you can use the postgres password file to store passwords.

The reporting role

Any postgres user can build a pg_profile report. The minimal privileges needed to generate a pg_profile reports are granted to the public role. However full report with query texts available only to the member of pg_read_all_stats role. Anyway the reporting role can’t access server connection strings, so it can’t get the passwords of servers.