PostgreSQL Tutorial: Setting up Streaming Replication

October 18, 2023

Summary: Streaming replication (physical replication), which is byte-by-byte replication, involves a continuous application of WAL records from Primary to Standby. As it uses a file-based log shipping method, it is one of the fastest replication methods when compared with logical replication or other trigger-based methods.

It is asynchronous by default. For every replica/standby, there exists a WAL sender process on the primary server and a WAL receiver process on the standby server in streaming replication. These processes are responsible for streaming and applying the WAL records from the WAL segments, including the segments that are not full.

Starting from PostgreSQL 13, there have been some changes in how we perform streaming replication. For example, we don’t see a recovery.conf file with PostgreSQL 13, unlike version 11 and earlier. Instead, the parameters that were initially added to recovery.conf can now be set in postgresql.conf or postgresql.auto.conf directly.

In this tutorial, we shall see the steps involved in setting up streaming replication with PostgreSQL 13.

Getting ready

We can set up streaming replication within two instances running on different ports on the same server. But that doesn’t solve the purpose of high availability. So, it is recommended to build replication between servers that are geographically distributed. In order to get started with the steps being discussed, we need two servers (a master and a standby) with PostgreSQL 13 installed and with PostgreSQL running on the master.

How to do it…

We will set up the replication using the following steps:

1. Create a replication user on the master:

$ psql -c "CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secret'"

2. Add necessary entries to the pg_hba.conf file of the master:

$ echo "host replication replicator <slave_ip_address>/32 md5" >>
$PGDATA/pg_hba.conf
$ psql -c "select pg_reload_conf()"

3. On the master, validate the parameters required to set up replication. The best way is as follows:

$ psql -c "select name, setting from pg_settings where name IN ('listen_addresses','archive_mode','archive_command','wal_keep_segments','restore_command')"

4. Modify parameters that require modification on the master:

$ psql -c "ALTER SYSTEM SET listen_addresses TO '*'";
$ psql -c "ALTER SYSTEM SET archive_mode TO 'ON'";
$ psql -c "ALTER SYSTEM SET archive_command TO 'cp %p /archives/%f'";
$ psql -c "ALTER SYSTEM SET restore_command TO 'cp /archives/%f %p'";
$ psql -c "ALTER SYSTEM SET wal_keep_segments TO '100'";

5. Restart the master if required or reload it, depending on the parameters modified:

$ pg_ctl -D $PGDATA restart -mf

Reload the master using the following command:

$ pg_ctl -d $PGDATA reload

6. Run pg_basebackup from the standby to take a backup of the master:

$ pg_basebackup -h <master_ip> -U replicator -p 5432 -D $PGDATA -Fp -Xs -P -R

7. Validate the mandatory parameters needed for replication on the standby:

$ cat $PGDATA/postgresql.auto.conf

8. Add the primary_conninfo setting and standby.signal if they do not exist:

$ echo "primary_conninfo = 'user=replicator password=secret host=<master_IP> port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'" >> $PGDATA/postgresql.auto.conf
$ echo "standby_mode = 'ON'" >> $PGDATA/postgresql.auto.conf
$ touch $PGDATA/standby.signal

9. Start the standby server or reload the configuration, depending on the parameter changes made:

$ pg_ctl -D $PGDATA start

Or, using the following command:

$ sudo systemctl start postgresql-13

How it works…

The first step to setting up streaming replication is creating a replication user that has a REPLICATION role. We may use the superuser postgres for this purpose but it is not recommended. Once the user is created, we need to add an entry in the pg_hba.conf file of the master to allow replication connections from the standby, as seen in Step 2. It is important to replace <slave_ip_address>/32 with the standby server IP or hostname/subnet so that the replication connections are allowed by the master.

As seen in Step 3, there are some parameters that play a vital role when we set up streaming replication. Except for listen_addresses, most of the default parameters hold good for replication. But, if all these parameters are set appropriately, then we shall have a perfect replication setup.

As an example, in Step 4, we see that listen_addresses is set to ‘*’ to allow a specific IP interface or all (using *) to connect to the master. To enable archiving, we can set archive_mode to ‘ON’ and archive_command can be set to the shell command that can be used by the archiver process to copy the completed WAL segment to a safe location for recovery purposes.

Additionally, we can set restore_command, which will help when the master becomes a standby in the future. This parameter must set with the shell command that is used to fetch an archived WAL segment from the archive location. If this parameter is set on the master, it also gets replicated to the standby upon backup, so it need not be set twice. And finally, wal_keep_segments can be set with the number of WAL segments that are to be retained by the master server at any given time. This will help when the standby need not fetch WALs from the archive location if it gets lagged in replication by a few WALs.

Once all the parameters have been properly set, we can move on to Step 5 to restart the master server if the listen_addresses or archive_mode parameters have been modified, otherwise, this step can be skipped with just a reload of the configuration.

We can now proceed to Step 6, to back up the master’s data directory from the standby. In Step 6, we used pg_basebackup with plain format (-Fp). If the master server is huge in size, we may use pg_basebackup with compressed tar format (-Ft -z) on the master and copy the backup to the standby to restore it. This was discussed in the Restore of a backup taken using pg_basebackup tutorial.

If you noticed the command used in Step 6 carefully, it contains the command-line argument -R. This creates the replication of specific files and entries in the data directory of the standby. You may use multiple approaches such as rsync or any other disk backup methods to copy the master’s data directory to the standby. Depending on the backup method we choose, we need to set a few parameters that are not automatically taken care of, as seen in the case of pg_basebackup with -R.

We then proceed to validate the mandatory parameters needed to set up streaming replication on the standby in Step 7. The parameter primary_conninfo is automatically added when we use -R with pg_basebackup. Otherwise, we need to add a similar entry as seen in Step 8, to tell the standby how to connect to the master. We must replace <master_IP> with the master server’s IP address or hostname. There is an important file (standby.signal) that must exist in a standby data directory to help Postgres determine its state as a standby. It is automatically created when we use the -R option while performing pg_basebackup. If not, we may simply use touch to create this empty file as seen in Step 8. And then, to tell that the database is in standby mode, we need to set standby_mode to ON.

And the last step, Step 9, is to start the standby server using pg_ctl.