PostgreSQL Tutorial: Check Streaming Replication Delay

January 8, 2024

Summary: in this tutorial, you will learn how to check the streaming replication delay in PostgreSQL.

Table of Contents

Introduction

As we maintain a PostgreSQL instance, we may need to check and prevent latency in streaming replication. Avoid data loss that may occur when a standby instance needs to take over when the primary instance fails. In addition, when we use the standby instance to provide read-only access services, if the latency of stream replication is too large, the data read from the standby instance will be seriously inconsistent with the data on the primary instance.

Solution

The administrative function pg_last_xact_replay_timestamp() provided by PostgreSQL gives the time stamp of the final transaction replayed during recovery. This is the time when the commit or abort WAL record for that specific transaction was created on the primary.

Upon executing select now() - pg_last_xact_replay_timestamp() as replication_lag on a replica, the output would be a duration indicating the time difference between the current clock and the timestamp of the latest WAL record that was applied from the replication stream.

It’s important to note that in cases where the master isn’t receiving new changes, there won’t be any WAL records to stream. Consequently, the lag calculated using this method will increase without necessarily indicating any replication delays. However, if the master is undergoing frequent changes, it will continuously stream WALs, and the above query will serve as an excellent estimate of the time it takes for changes on the master to materialize on the slave. The precision of this approximation will, of course, depend on how closely synchronized the system clocks on both hosts are.

To estimate the slave lag, you can use the query below if your database is frequently updated.

select now() - pg_last_xact_replay_timestamp() AS replication_lag;

When dealing with databases that have minimal writes, it is recommended to use a more precise query for determining replication lag. It should be noted that if the master has not sent any writes to the slave, the query mentioned above may not provide an accurate measurement of slave lag since pg_last_xact_replay_timestamp() may remain constant.

SELECT
  CASE WHEN NOT pg_is_in_recovery() THEN 0
  WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0
  ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
  END AS replication_lag;

Causes of Streaming Replication Delay

In PostgreSQL, a replication lag can occur by these factors:

  • Network issues
  • Not able to find the WAL segment from the primary. Usually, this is due to the checkpoint behavior where WAL segments are rotated or recycled
  • Busy nodes (primary and standby(s)). Can be caused by external processes or some bad queries caused to be a resource intensive
  • Bad hardware or hardware issues causing to take some lag

See more

PostgreSQL Monitoring