October 18, 2023
Summary: Streaming replication (physical replication), which is byte-by-byte replication, involves a continuous application of WAL records from Primary to Standby. As it uses a file-based log shipping method, it is one of the fastest replication methods when compared with logical replication or other trigger-based methods.
It is asynchronous by default. For every replica/standby, there exists a WAL sender process on the primary server and a WAL receiver process on the standby server in streaming replication. These processes are responsible for streaming and applying the WAL records from the WAL segments, including the segments that are not full.
Starting from PostgreSQL 13, there have been some changes in how we perform streaming replication. For example, we don’t see a
recovery.conf file with PostgreSQL 13, unlike version 11 and earlier. Instead, the parameters that were initially added to
recovery.conf can now be set in
In this tutorial, we shall see the steps involved in setting up streaming replication with PostgreSQL 13.
We can set up streaming replication within two instances running on different ports on the same server. But that doesn’t solve the purpose of high availability. So, it is recommended to build replication between servers that are geographically distributed. In order to get started with the steps being discussed, we need two servers (a master and a standby) with PostgreSQL 13 installed and with PostgreSQL running on the master.
How to do it…
We will set up the replication using the following steps:
1. Create a replication user on the master:
$ psql -c "CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secret'"
2. Add necessary entries to the
pg_hba.conf file of the master:
$ echo "host replication replicator <slave_ip_address>/32 md5" >> $PGDATA/pg_hba.conf $ psql -c "select pg_reload_conf()"
3. On the master, validate the parameters required to set up replication. The best way is as follows:
$ psql -c "select name, setting from pg_settings where name IN ('listen_addresses','archive_mode','archive_command','wal_keep_segments','restore_command')"
4. Modify parameters that require modification on the master:
$ psql -c "ALTER SYSTEM SET listen_addresses TO '*'"; $ psql -c "ALTER SYSTEM SET archive_mode TO 'ON'"; $ psql -c "ALTER SYSTEM SET archive_command TO 'cp %p /archives/%f'"; $ psql -c "ALTER SYSTEM SET restore_command TO 'cp /archives/%f %p'"; $ psql -c "ALTER SYSTEM SET wal_keep_segments TO '100'";
5. Restart the master if required or reload it, depending on the parameters modified:
$ pg_ctl -D $PGDATA restart -mf
Reload the master using the following command:
$ pg_ctl -d $PGDATA reload
pg_basebackup from the standby to take a backup of the master:
$ pg_basebackup -h <master_ip> -U replicator -p 5432 -D $PGDATA -Fp -Xs -P -R
7. Validate the mandatory parameters needed for replication on the standby:
$ cat $PGDATA/postgresql.auto.conf
8. Add the
primary_conninfo setting and
standby.signal if they do not exist:
$ echo "primary_conninfo = 'user=replicator password=secret host=<master_IP> port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'" >> $PGDATA/postgresql.auto.conf $ echo "standby_mode = 'ON'" >> $PGDATA/postgresql.auto.conf $ touch $PGDATA/standby.signal
9. Start the standby server or reload the configuration, depending on the parameter changes made:
$ pg_ctl -D $PGDATA start
Or, using the following command:
$ sudo systemctl start postgresql-13
How it works…
The first step to setting up streaming replication is creating a replication user that has a
REPLICATION role. We may use the superuser
postgres for this purpose but it is not recommended. Once the user is created, we need to add an entry in the
pg_hba.conf file of the master to allow replication connections from the standby, as seen in Step 2. It is important to replace <
slave_ip_address>/32 with the standby server IP or hostname/subnet so that the replication connections are allowed by the master.
As seen in Step 3, there are some parameters that play a vital role when we set up streaming replication. Except for
listen_addresses, most of the default parameters hold good for replication. But, if all these parameters are set appropriately, then we shall have a perfect replication setup.
As an example, in Step 4, we see that
listen_addresses is set to ‘
*’ to allow a specific IP interface or all (using
*) to connect to the master. To enable archiving, we can set
archive_mode to ‘ON’ and
archive_command can be set to the shell command that can be used by the archiver process to copy the completed WAL segment to a safe location for recovery purposes.
Additionally, we can set
restore_command, which will help when the master becomes a standby in the future. This parameter must set with the shell command that is used to fetch an archived WAL segment from the archive location. If this parameter is set on the master, it also gets replicated to the standby upon backup, so it need not be set twice. And finally,
wal_keep_segments can be set with the number of WAL segments that are to be retained by the master server at any given time. This will help when the standby need not fetch WALs from the archive location if it gets lagged in replication by a few WALs.
Once all the parameters have been properly set, we can move on to Step 5 to restart the master server if the
archive_mode parameters have been modified, otherwise, this step can be skipped with just a reload of the configuration.
We can now proceed to Step 6, to back up the master’s data directory from the standby. In Step 6, we used
pg_basebackup with plain format (-Fp). If the master server is huge in size, we may use
pg_basebackup with compressed tar format (-Ft -z) on the master and copy the backup to the standby to restore it. This was discussed in the Restore of a backup taken using pg_basebackup tutorial.
If you noticed the command used in Step 6 carefully, it contains the command-line argument
-R. This creates the replication of specific files and entries in the data directory of the standby. You may use multiple approaches such as rsync or any other disk backup methods to copy the master’s data directory to the standby. Depending on the backup method we choose, we need to set a few parameters that are not automatically taken care of, as seen in the case of
We then proceed to validate the mandatory parameters needed to set up streaming replication on the standby in Step 7. The parameter
primary_conninfo is automatically added when we use
pg_basebackup. Otherwise, we need to add a similar entry as seen in Step 8, to tell the standby how to connect to the master. We must replace <
master_IP> with the master server’s IP address or hostname. There is an important file (
standby.signal) that must exist in a standby data directory to help Postgres determine its state as a standby. It is automatically created when we use the
-R option while performing
pg_basebackup. If not, we may simply use
touch to create this empty file as seen in Step 8. And then, to tell that the database is in standby mode, we need to set
standby_mode to ON.
And the last step, Step 9, is to start the standby server using