pg_profile: Usage

March 9, 2024

Summary: in this section, you will learn how to use pg_profile in PostgreSQL.

This section contains the following subsections:

  1. Setting extension parameters
  2. Managing servers
  3. Rare relation sizes collection
  4. Samples
  5. Baselines
  6. Data export and import
  7. Reports

Setting extension parameters

You can define extension parameters in postgresql.conf. Default values:

  • pg_profile.topn = 20 - Number of top objects (statements, relations, etc.), to be reported in each sorted report table. Also, this parameter affects size of a sample - the more objects you want to appear in your report, the more objects we need to keep in a sample.
  • pg_profile.max_sample_age = 7 - Retention time of samples in days. Samples, aged pg_profile.max_sample_age days and more will be automatically deleted on next take_sample() call.
  • pg_profile.track_sample_timings = off - when this parameter is on, pg_profile will track detailed sample taking timings.
  • pg_profile.max_query_length = 20000 - query length limit for reports. All queries in a report will be truncated to this length. This setting does not affect query text collection - during a sample full query texts are collected, thus can be obtained.

Managing servers

Once installed, extension will create one enabled local server - this is for cluster, where extension is installed.

Servers management functions:

  • create_server(server name, server_connstr text, server_enabled boolean = TRUE, max_sample_age integer = NULL, description text = NULL) Creates a new server description. Function arguments:

    • server - server name (must be unique)
    • server_connstr - server connection string
    • enabled - server enable flag. When is set, server will be included in the common take_sample() call
    • max_sample_age - server sample retention parameter overrides global pg_profile.max_sample_age setting for this server
    • description - server description text. Will be included in reports
  • drop_server(server name) Drops a server and all its samples.

  • enable_server(server name) Includes server in common take_sample() call.

  • disable_server(server name) Excludes server from common take_sample() call.

  • rename_server(server name, new_name name) Renames a server.

  • set_server_max_sample_age(server name, max_sample_age integer) Set new retention period (in days) for a server. max_sample_age is integer value. To reset a server max_sample_age setting set it to NULL.

  • set_server_db_exclude(server name, exclude_db name[]) Set exclude databases list for a server. Used in cases, when you unable to connect to some databases in cluster (for example in RDS instances).

  • set_server_connstr(server name, new_connstr text) Set new connection string for a server.

  • set_server_description(server name, description text) Set new server description.

  • show_servers() Display existing servers.

Server creation example:

SELECT profile.create_server('omega','host=name_or_ip dbname=postgres port=5432');

Rare relation sizes collection

Postgres relation size functions may take considerable amount of time to collect sizes of all relations in a database. Also those functions require AccessExclusiveLock on a relation. However daily relation sizes collection may be quite enough for you. pg_profile is able to skip relation sizes collection while taking samples guided by server size collection policy. Policy is defined as a daily window when relation size collection is permitted, and a minimal gap between two samples with relation sizes collected. Thus when size collection policy is defined sample taking function will collect relation sizes only when sample is taken in a window and the previous sample with sizes is older then gap. Top growing tables/indexes report sections will be available in a report only if it bounded by samples with sizes collected. See with_growth parameter of a get_report function description for further reference. Function set_server_size_sampling defines the size collection policy:

  • set_server_size_sampling(server name, window_start time with time zone = NULL, window_duration interval hour to second = NULL, sample_interval interval day to minute = NULL)
    • server - server name
    • window_start - size collection window start time
    • window_duration - size collection window duration
    • sample_interval - minimum time gap between two samples with relation sizes collected

Example:

SELECT set_server_size_sampling('local','23:00+03',interval '2 hour',interval '8 hour');

Function show_servers_size_sampling show defined sizes collection policy for all servers:

postgres=# SELECT * FROM show_servers_size_sampling();
 server_name | window_start | window_end  | window_duration | sample_interval
-------------+--------------+-------------+-----------------+-----------------
 local       | 23:00:00+03  | 01:00:00+03 | 02:00:00        | 08:00:00

When you build a report between samples either of which lacks relation sizes data then relation growth sections of a report will be based on pg_class.relpages data. However, with_growth parameter of report generation functions will expand report bounds to nearest samples with relation sizes data collected and growth data will be more accurate.

Samples

Every sample contains statistic information about database workload since previous sample.

Sample functions

  • take_sample() Function take_sample() will collect a sample for all enabled servers. Server samples will be taken serially one by one. Function returns a table:

    server      name,
    result      text,
    elapsed     interval
    

    Where:

    • server is a server name
    • result is a result of taken sample. It can be ‘OK’ if sample was taken successively, and will contain error text in case of exception
    • elapsed is a time elapsed taking a sample for server. Such return makes it easy to control samples creation using SQL query.
  • take_sample_subset([sets_cnt integer], [current_set integer]) Due to serial samples processing take_sample() function can take considerable amount of time. Function take_sample_subset() will take samples for a subset of enabled servers. It is convenient for parallel samples collection. sets_cnt is number of servers subsets. current_set is a subset to process, taking values between 0 and sets_cnt - 1 inclusive. Function returns a table:

    server      name,
    result      text,
    elapsed     interval
    

    Where:

    • server is a server name
    • result is a result of taken sample. It can be ‘OK’ if sample was taken successively, and will contain error text in case of exception
    • elapsed is a time elapsed taking a sample for server
  • take_sample(server name [, skip_sizes boolean]) Will collect a sample for specified server. Use it, for example, when you want to use different sampling frequencies, or if you want to take explicit sample on a server.

    • server - name of a server to take sample
    • skip_sizes - override server relation size collection policy. Policy applies only when skip_size argument is omitted or set to null. false value of skip_sizes argument cause take sample with relation sizes, while true will cause skipping relation sizes collection during a sample.
  • show_samples([server name,] [days integer]) Returns a table, containing existing samples of a server (local server assumed if server is omitted) for days last days (all existing samples if omitted):

    sample            integer,
    sample_time       timestamp (0) with time zone,
    sizes_collected   boolean,
    dbstats_reset     timestamp (0) with time zone,
    clustats_reset    timestamp (0) with time zone,
    archstats_reset   timestamp (0) with time zone
    

    Where:

    • sample is a sample identifier
    • sample_time is a time when this sample was taken
    • sizes_collected is set if all relation sizes was collectid in this sample
    • dbstats_reset, clustats_reset and archstats_reset is usual null, but will contain pg_stat_database, pg_stat_bgwriter and pg_stat_archiver statistics reset timestamp if it was happend since previous sample. Sample-collecting functions are also supports the server repository - it will delete obsolete samples and baselines with respect to retention policy.
  • delete_samples([server_name name] [, start_id integer, end_id integer]) Function delete_samples() can be used to manually delete samples of specified server.

    • server_name - name of a server to delete samples. Server named ’local’ is used if omitted
    • start_id, end_id - bounding identifiers of sample interval to delete (inclusive). This function will delete all samples of a server if bounding identifier is omitted
  • delete_samples(server_name name, time_range tstzrange) You can use this function to delete all samples contained in specified time range interval

    • server_name - name of a server to delete samples.
    • time_range - time range of samples to delete

Taking samples

You must create at least 2 samples to be able to build your first report between 1st and 2nd samples. Samples for all enabled servers are taken by calling take_sample() function. There is no need in frequent samples creation - usual essential frequency is one or two samples per hour. You can use cron-like tool to schedule samples creation. Example with 30 min period:

*/30 * * * *   psql -c 'SELECT profile.take_sample()' > /dev/null 2>&1

However, such call has no error checking on take_sample() function results. Consider using more smart usage of take_sample() function, providing results to your monitoring system.

Function will return ‘OK’ for all servers with successfully taken samples, and show error text for failed servers:

select * from take_sample();
  server   |                                   result                                    |   elapsed
-----------+-----------------------------------------------------------------------------+-------------
 ok_node   | OK                                                                          | 00:00:00.48
 fail_node | could not establish connection                                             +| 00:00:00
           | SQL statement "SELECT dblink_connect('server_connection',server_connstr)"  +|
           | PL/pgSQL function take_sample(integer) line 69 at PERFORM                  +|
           | PL/pgSQL function take_sample_subset(integer,integer) line 27 at assignment+|
           | SQL function "take_sample" statement 1                                     +|
           | FATAL:  database "nodb" does not exist                                      |
(2 rows)

Sample data retention

We can’t store sample data forever, thus we have a retention policy. You can define retentions on three levels:

  • Setting parameter pg_profile.max_sample_age in postgresql.conf file. This is a common retention, it is effective if none of others is defined.
  • Define server max_sample_age setting while creating a server, or using set_server_max_sample_age() function for existing server. This setting overrides global pg_profile.max_sample_age setting for a specific server.
  • Create a baseline (see below). Baseline will override retention period for included samples with highest priority.

Listing samples

Use show_samples() function to get list of existing samples in the repository. This function will show detected statistics reset times.

Sample taking timings

pg_profile will collect detailed sample taking timing statistics when parameter pg_profile.track_sample_timings is on. Results can be obtained from v_sample_timings view. v_sample_timings fields description:

  • server_name - sampled server name
  • sample_id - sample identifier
  • sample_time - when the sample was taken
  • event - sample taking stage
  • time_spent - amount of time spent in the event

Event descriptions:

  • total - Taking the sample (all stages).
  • connect - Making dblink connection to the server.
  • get server environment - Getting server GUC parameters, available extensions, etc.
  • collect database stats - Querying the pg_stat_database view for statistics on databases.
  • calculate database stats - Calculating differential statistics on databases since the previous sample.
  • collect tablespace stats - Querying the pg_tablespace view for statistics on tablespaces.
  • collect statement stats - Collecting statistics on statements using the pg_stat_statements and pg_stat_kcache extensions.
  • query pg_stat_bgwriter - Collecting cluster statistics using the pg_stat_bgwriter view.
  • query pg_stat_wal - Collecting cluster WAL statistics using the pg_stat_wal view.
  • query pg_stat_archiver - Collecting cluster statistics using the pg_stat_archiver view.
  • collect object stats - Collecting statistics on database objects. Includes following events:
    • db:dbname collect tables stats - Collecting statistics on tables for the dbname database.
      • db:dbname collect limited table sizes - Collecting vacuumed or sequentially scanned table sizes when sizes collection is disabled.
    • db:dbname collect indexes stats - Collecting statistics on indexes for the dbname database.
      • db:dbname collect limited index sizes - Collecting index sizes for vacuumed tables when sizes collection is disabled.
    • db:dbname collect functions stats - Collecting statistics on functions for the dbname database.
  • maintain repository - Executing support routines.
  • calculate tablespace stats - Calculating differential statistics on tablespaces.
  • calculate object stats - Calculating differential statistics on database objects. Includes following events:
    • calculate tables stats - Calculating differential statistics on tables of all databases.
    • calculate indexes stats - Calculating differential statistics on indexes of all databases.
    • calculate functions stats - Calculating differential statistics on functions of all databases.
  • calculate cluster stats - Calculating cluster differential statistics.
  • calculate archiver stats - Calculating archiever differential statistics.
  • delete obsolete samples - Deleting obsolete baselines and samples.

Baselines

Baseline is a named sample sequence, having its own retention setting. Baseline can be used in report-building functions as a sample interval. Undefined baseline retention means infinite retention. You can use baselines to save information about database workload on certain time period. For example, you may want to save samples, gathered during load testing, or during regular load on your system for further reference. Baseline management functions:

  • create_baseline([server name,] baseline_name varchar(25), start_id integer, end_id integer [, days integer]) - create a baseline

    • server - server name. local server is assumed if omitted
    • name - baseline name. Each baseline must have unique name within a server.
    • start_id, end_id - first and last samples, included in baseline.
    • days - baseline retention time. Defined in integer days since now(). This parameter may be omitted (or set to null), meaning infinite retention.
  • create_baseline([server name,] baseline_name varchar(25), time_range tstzrange [, days integer]) - create a baseline

    • server - server name. local server is assumed if omitted
    • name - baseline name. Each baseline must have unique name within a server.
    • time_range - baseline time interval. Baseline will include all available samples, overlapping this interval.
    • days - baseline retention time. Defined in integer days since now(). This parameter may be omitted (or be set to null), meaning infinite retention.
  • drop_baseline([server name,] name varchar(25)) - drop a baseline

    • server - server name. local server is assumed if omitted
    • name - baseline name to drop. Dropping a baseline does not mean immediate drop of all its samples, they are just excluded from baseline, thus is not more covered with baseline retention.
  • keep_baseline([server name,] name varchar(25) [, days integer]) - change retention of baselines

    • server - server name. local server is assumed if omitted
    • name - baseline name. This parameter may be omitted (or be set to null) to change retention of all existing baselines.
    • days - retention time of a baseline in days since now(). Also, may be omitted (or set to null) to set infinite retention.
  • show_baselines([server name]) - displays existing baselines. Call this function to get information about existing baselines (names, sample intervals, and retention times)

    • server - server name. local server is assumed if omitted
    postgres=# SELECT * FROM profile.baseline_show('local');
    

Data export and import

Collected samples can be exported from instance of pg_profile extension and than loaded into another one. This feature is useful when you want to move servers from one instance to another, or when you need to send collected data to your support team.

Data export

Data is exported as a regular table by function export_data(). You can use any method to export this table from your database. For example, you can use copy command of psql to obtain single .csv file:

postgres=# \copy (select * from export_data()) to 'export.csv'

By default export_data() function will export all samples of all configured servers. However you can limit export to only one server, and further limit sample range too:

  • export_data([server name, [min_sample_id integer,] [max_sample_id integer]] [, obfuscate_queries boolean]) - export collected data
    • server is a server name. All servers is assumed if omitted
    • min_sample_id and max_sample_id - export bounding sample identifiers (inclusive). Null value of min_sample_id bound cause export of all samples till max_sample_id, and null value of max_sample_id cause export of all samples since min_sample_id.
    • obfuscate_queries - use this parameter only when you want to hide query texts - they will be exported as MD5 hash.

Data import

Data can be imported from local table only, thus previously exported data is need to be loaded first. In our case with copy command:

postgres=# CREATE TABLE import (section_id bigint, row_data json);
CREATE TABLE
postgres=# \copy import from 'export.csv'
COPY 6437

Now we can perform data import, providing this table to import_data() function:

postgres=# SELECT * FROM import_data('import');

Although server descriptions is also imported, your local pg_profile servers with matching names will prohibit import operations. Consider temporarily rename those servers or use server_name_prefix parameter to avoid conflicts. If you’ll need to import new data for previously imported servers, they will be matched by system identifier, so fell free to rename imported sever as you wish. All servers are imported in disabled state. import_data() function takes only the imported table:

  • import_data(data regclass, server_name_prefix text)
    • data - table containing exported data
    • server_name_prefix - prefix used for importing servers. This function returns number of rows actually loaded in extension tables. After successful import operation you can drop import table.

Reports

Reports are generated in HTML markup by reporting functions. There are two types of reports available in pg_profile:

  • Regular reports, containing statistical information about instance workload during report interval
  • Differential reports, containing data from two intervals with same objects statistic values located one next to other, making it easy to compare the workload

Regular report functions

  • get_report([server name,] start_id integer, end_id integer [, description text [, with_growth boolean]]) - generate report by sample identifiers
  • get_report([server name,] time_range tstzrange [, description text [, with_growth boolean]]) - generate report for the shortest sample interval, covering provided time_range.
  • get_report([server name], baseline varchar(25) [, description text [, with_growth boolean]]) - generate report, using baseline as samples interval
  • get_report_latest([server name]) - generate report for two latest samples Function arguments:
    • server - server name. local server is assumed if omitted
    • start_id - interval begin sample identifier
    • end_id - interval end sample identifier
    • time_range - time range (tstzrange type)
    • baseline - a baseline name
    • with_growth - a flag, requesting interval expansion to nearest bounds with relation growth data available. Default value is false
    • description - a text memo, it will be included in report as a report description

Differential report functions

You can generate differential report using sample identifiers, baselines and time ranges as interval bounds:

  • get_diffreport([server name,] start1_id integer, end1_id integer, start2_id integer, end2_id integer [, description text [, with_growth boolean]]) - generate differential report on two intervals by sample identifiers
  • get_diffreport([server name,] baseline1 varchar(25), baseline2 varchar(25) [, description text [, with_growth boolean]]) - generate differential report on two intervals, defined by baseline names
  • get_diffreport([server name,] time_range1 tstzrange, time_range2 tstzrange [, description text [, with_growth boolean]]) - generate differential report on two intervals, defined by time ranges
    • server is server name. local server is assumed if omitted
    • start1_id, end1_id - sample identifiers of first interval
    • start2_id, end2_id - sample identifiers of second interval
    • baseline1 - baseline name of first interval
    • baseline2 - baseline name of second interval
    • time_range1 - first interval time range
    • time_range2 - second interval time range
    • description is a text memo - it will be included in report as report description
    • with_growth is a flag, requesting both intervals expansion to nearest bounds with relation growth data available. Default value is false

Also, you can use some combinations of the above:

  • get_diffreport([server name,] baseline varchar(25), time_range tstzrange [, description text [, with_growth boolean]])
  • get_diffreport([server name,] time_range tstzrange, baseline varchar(25) [, description text [, with_growth boolean]])
  • get_diffreport([server name,] start1_id integer, end1_id integer, baseline varchar(25) [, description text [, with_growth boolean]])
  • get_diffreport([server name,] baseline varchar(25), start2_id integer, end2_id integer [, description text [, with_growth boolean]])

Report generation example:

$ psql -Aqtc "SELECT profile.get_report(480,482)" -o report_480_482.html

For any other server, use it’s name:

$ psql -Aqtc "SELECT profile.get_report('omega',12,14)" -o report_omega_12_14.html

Report generation using time ranges:

psql -Aqtc "select profile.get_report(tstzrange('2020-05-13 11:51:35+03','2020-05-13 11:52:18+03'))" -o report_range.html

Also, time ranges is useful for generating periodic reports. Let’s build last 24-hour report:

psql -Aqtc "select profile.get_report(tstzrange(now() - interval '1 day',now()))" -o last24h_report.html

Now you can view report file using any web browser.

See more

pg_profile: PostgreSQL historical workload reports

comments powered by Disqus