By John Doe December 27, 2024
Summary: In this article, we will learn how to troubleshooting replication in PostgreSQL.
Table of Contents
Introduction
Postgres replication is a powerful tool for replicating data between databases; unlike physical replication, logical replication gives you more control and flexibility over what data is replicated and how it’s used.
This article explores queries that make it easier to manage logical replication for your PostgreSQL database.
Monitoring Postgres Logical Replication Status
Monitoring the status of your logical replication setup is critical to ensure that your replication is running smoothly. Querying the pg_stat_subscription view can help you monitor the status of all of the subscriptions in your database:
SELECT
subname AS subscription_name,
pid AS process_id,
usename AS user_name,
application_name,
client_addr AS client_address,
state,
sync_state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
clock_timestamp() - write_lsn_timestamp AS replication_delay
FROM
pg_stat_subscription
ORDER BY
subscription_name;
subscription_name | process_id | user_name | application_name | client_address | state | sync_state | sent_lsn | write_lsn | flush_lsn | replay_lsn | replication_delay
-------------------+------------+-----------+------------------+----------------+-------------+------------+--------------+--------------+--------------+--------------+-------------------
sub1 | 23456 | postgres | logical_rep_sub | 192.168.1.10 | streaming | synced | 0/3000128 | 0/3000128 | 0/3000128 | 0/3000128 | 00:00:00.12345
sub2 | 23478 | postgres | logical_rep_sub | 192.168.1.11 | catchup | async | 0/4000238 | 0/4000200 | 0/40001F8 | 0/40001E0 | 00:00:02.67890
-
subname: The name of the subscription.
-
state: The state of the subscription process (e.g., streaming, catchup, initializing).
-
sync_state: The synchronization state of the subscription.
-
sent_lsn, write_lsn, flush_lsn, replay_lsn: These columns represent various Log Sequence Numbers (LSNs) that indicate replication progress.
-
replication_delay: The delay between the LSN being written and its application on the subscriber is crucial for identifying lag in replication.
This query provides a comprehensive overview of the logical replication status, allowing you to identify issues such as replication lag or disconnected subscribers quickly.
Analyzing Postgres Replication Lag
Understanding replication lag is essential in maintaining the consistency and freshness of data across your replicated databases. The pg_replication_slots system view can help you calculate the replication lag between the publisher and subscriber:
SELECT
s.slot_name,
s.active,
s.restart_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), s.restart_lsn) AS replication_lag_bytes,
clock_timestamp() - pg_last_xact_replay_timestamp() AS replication_lag_time
FROM
pg_replication_slots s
WHERE
s.active = true
AND
s.plugin = 'pgoutput';
slot_name | active | restart_lsn | replication_lag_bytes | replication_lag_time
-----------+--------+-------------+-----------------------+-----------------------
slot1 | t | 0/3000128 | 65536 | 00:00:00.12345
slot2 | t | 0/4000238 | 131072 | 00:00:02.67890
-
slot_name: The name of the replication slot being used.
-
replication_lag_bytes: The difference in bytes between the current WAL position on the publisher and the last WAL position acknowledged by the subscriber.
-
replication_lag_time: The time difference between the last transaction replayed on the subscriber and the current time.
This query helps you assess the size and time-based lag in your logical replication, enabling you to take proactive measures if the lag exceeds acceptable thresholds.
Monitoring Replication Slot Usage
Replication slots are critical in logical replication, ensuring that WAL segments are retained until all subscribers process them. You can query the pg_replication_slots view to monitor the use of replication slots:
SELECT
slot_name,
plugin,
slot_type,
active,
confirmed_flush_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS slot_lag_bytes
FROM
pg_replication_slots
WHERE
slot_type = 'logical';
slot_name | plugin | slot_type | active | confirmed_flush_lsn | slot_lag_bytes
-----------+---------+-----------+--------+---------------------+----------------
slot1 | pgoutput| logical | t | 0/3000128 | 65536
slot2 | pgoutput| logical | t | 0/4000238 | 131072
- slot_name: The name of the replication slot.
- slot_lag_bytes: The lag in bytes between the current WAL position and the last position is confirmed as flushed by the slot.
Monitoring replication slot usage is crucial for preventing issues related to WAL segment retention, which could potentially lead to disk space exhaustion on the publisher.
Dropping Unused Replication Slots
Over time, you may accumulate unused replication slots, especially after removing subscribers or changing replication configurations. These unused slots can cause unnecessary retention of WAL files, leading to wasted disk space. The following query identifies and drops unused replication slots:
DO $$
DECLARE
slot_record RECORD;
BEGIN
FOR slot_record IN
SELECT slot_name FROM pg_replication_slots WHERE active = false
LOOP
EXECUTE format('SELECT pg_drop_replication_slot(%L)', slot_record.slot_name);
END LOOP;
END $$;
This query iterates over your inactive replication slots and uses the pg_drop_replication_slot management function to drop them. Regularly cleaning up unused replication slots will ensure that your database remains efficient and prevent potential issues with WAL file retention.
Creating Replication Slots
If you need to create a new logical replication slot, the following query is useful:
SELECT * FROM pg_create_logical_replication_slot('my_slot', 'pgoutput');
slot_name | xlog_position
-----------+---------------
my_slot | 0/3000128
This query uses the pg_create_logical_replication_slot function to create a new logical replication slot with the specified name and output plugin (pgoutput in our example). The query is useful when setting up new logical replication configurations; use it to confirm that the subscriber can start receiving changes from the correct point in the WAL records.
Optimizing Logical Replication with pglogical
If you’re using the pglogical extension for more advanced logical replication capabilities, the following query can help you check the status of all pglogical subscriptions:
SELECT
subscription_name,
status,
received_lsn,
replay_lag,
last_received_change,
pending_changes
FROM
pglogical.show_subscription_status();
subscription_name | status | received_lsn | replay_lag | last_received_change | pending_changes
-------------------+-------------+--------------+--------------+----------------------+-----------------
sub_pglogical1 | replicating | 0/3000128 | 00:00:01.234 | 2024-08-22 10:30:00 | 5
sub_pglogical2 | idle | 0/4000238 | 00:00:00.000 | 2024-08-22 10:29:30 | 0
- subscription_name: The name of the pglogical subscription.
- replay_lag: The lag between the last received change and the current time.
- pending_changes: The number of changes pending to be applied on the subscriber.
This query provides a detailed overview of your pglogical subscriptions, helping you fine-tune replication settings and troubleshoot issues.