PostgreSQL Tutorial: Setting WAL Archiving Command

April 19, 2024

Summary: In this tutorial, you will learn how to setup WAL archiving command in PostgreSQL.

Background

Nowadays, as the emergence of multi-core with multi-thread processors becoming common, as well as memory size and bus speed increasing, the storage speed is prone to bottlenecks. Enterprises tried to solve it with sophisticated SAN drives, Specialized Storages with cache, etc. But it has remained for many years, even now as enterprises started increasingly shifting to NVMe drives.

Recently we started observing a new bottleneck which is becoming a pain point for many database users. As the capability of the single-host server increased, it started processing a huge number of transactions. There are systems that produce thousands of WAL files in a couple of minutes, and there were a few cases reported where WAL archiving to a cheaper, slower disk system was not able to catch up with WAL generation. To add more complexity, many organizations prefer to store WAL archives over a low bandwidth network.

Warning: There is an inherent problem in Postgres Archiving that if it lags behind, it tends to lag more because the archive process needs to search among .ready files. which won’t be discussed here.

WAL Archiving Command

PostgreSQL offers a archive_command configuration parameter, you can specify the shell command to archive WAL segments. In archive_command, %p is replaced by the path name of the file to archive, while %f is replaced by only the file name. The path name is relative to the current working directory, i.e., the cluster’s data directory.

Then, how can we set up the archive command to meet the needs of various scenarios?

Archiving to a Remote Server

We can change the archive_command setting in postgresql.conf, and set archive_command with the rsync shell command that copies archives to a safe location in a remote server.

archive_command = 'rsync -a %p postgres@backup::archive/%f'

Next, we will want to set up the backup server with an rsyncd.conf file:

[db_sync]
    path = /db/pgdata
    comment = DB Backup Server
    uid = postgres
    gid = postgres
    read only = false
    use chroot = true
 [archive]
    path = /db/archive
    comment = Archived Transaction Logs
    uid = postgres
    gid = postgres
    read only = false
    use chroot = true

Make sure /db/pgdata and /db/archive exist on both servers and the rsync daemon has been restarted.

Of course, you can also set archive_command with the scp shell command to archive WAL segments to a remote server.

archive_command = 'scp %p postgres@backup:/db/archive/%f'

Archiving to an Object Storage Service

First, download and install a command line tool offered by the public cloud vendor.

Set archive_command with the command-line tool that copies archives to an object storage service. Let’s take the alibaba cloud as an example:

archive_command = 'ossutil64 cp %p oss://examplebucket/desfolder/'

You can also use a WAL archiving tool like WAL-G. To setup continuous archival of PostgreSQL to AWS S3 (or S3-like storage), you can put the valid AWS credentials into a file .aws/credentials. It should look like following:

[default]
aws_access_key_id = AKIDblahblah
aws_secret_access_key = acc3ssk3yyblahblah

Create an archival script to archive WAL segments to remote S3 server. File wal-g-archive.sh:

#!/bin/bash

# Connect using Unix socket
export PGHOST=/var/run/postgresql

export WALG_S3_PREFIX=s3://postgres-archive-bucket/production-db
export AWS_REGION=us-east-2
export AWS_SHARED_CREDENTIALS_FILE=/var/lib/postgresql/.aws/credentials

# Optional for AWS S3 (just comment out the code). But necessary for SeaweedFS or MinIO.
export AWS_ENDPOINT=http://192.168.1.182:8333 # SeaweedFS or MinIO server address.
export AWS_S3_FORCE_PATH_STYLE=true

wal-g wal-push $1

Set archive_command with the archival script to ship the WAL files to the S3 bucket using the wal-g.

archive_command = 'wal-g-archive.sh /db/pgdata/%p'

Compressing and Archiving

To compress WALs before archiving, you can create an archival script as following. File archive-script.sh:

#!/bin/bash

walname = `basename $1`
gzip < $1 > /tmp/archive/$walname.gz
rsync -a /tmp/archive/$walname.gz postgres@backup::archive/$walname.gz
rm /tmp/archive/$walname.gz

Set archive_command with the archival script to archive the WAL files.

archive_command = 'archive-script.sh /db/pgdata/%p'

If you want to compress WALs before ship the WAL files to the S3-like storage, maybe you can write a python script like following, using AWS SDK for Python.

#!/usr/bin/python3

import gzip
import boto3

walname = sys.argv[1]

# Let's use Amazon S3
s3 = boto3.client('s3')

# read and compress the wal data
wal = open(walname, 'rb')
data = gzip.compress(wal.read())
wal.close()

# ship the compressed WAL to the S3 bucket
s3.put_object(Body=data, Bucket='my-bucket', Key='objectkey')

In this way, we can bypass intermediate compressed files, and put the compressed WAL data into the S3 bucket directly.

Then, you can set archive_command with the python script to archive the WAL files.