April 19, 2024
Summary: In this tutorial, you will learn how to setup WAL archiving command in PostgreSQL.
Table of Contents
Background
Nowadays, as the emergence of multi-core with multi-thread processors becoming common, as well as memory size and bus speed increasing, the storage speed is prone to bottlenecks. Enterprises tried to solve it with sophisticated SAN drives, Specialized Storages with cache, etc. But it has remained for many years, even now as enterprises started increasingly shifting to NVMe drives.
Recently we started observing a new bottleneck which is becoming a pain point for many database users. As the capability of the single-host server increased, it started processing a huge number of transactions. There are systems that produce thousands of WAL files in a couple of minutes, and there were a few cases reported where WAL archiving to a cheaper, slower disk system was not able to catch up with WAL generation. To add more complexity, many organizations prefer to store WAL archives over a low bandwidth network.
Warning: There is an inherent problem in Postgres Archiving that if it lags behind, it tends to lag more because the archive process needs to search among
.ready
files. which won’t be discussed here.
WAL Archiving Command
PostgreSQL offers a archive_command configuration parameter, you can specify the shell command to archive WAL segments. In archive_command
, %p
is replaced by the path name of the file to archive, while %f
is replaced by only the file name. The path name is relative to the current working directory, i.e., the cluster’s data directory.
Then, how can we set up the archive command to meet the needs of various scenarios?
Archiving to a Remote Server
We can change the archive_command
setting in postgresql.conf
, and set archive_command
with the rsync
shell command that copies archives to a safe location in a remote server.
archive_command = 'rsync -a %p postgres@backup::archive/%f'
Next, we will want to set up the backup
server with an rsyncd.conf
file:
[db_sync]
path = /db/pgdata
comment = DB Backup Server
uid = postgres
gid = postgres
read only = false
use chroot = true
[archive]
path = /db/archive
comment = Archived Transaction Logs
uid = postgres
gid = postgres
read only = false
use chroot = true
Make sure /db/pgdata
and /db/archive
exist on both servers and the rsync
daemon has been restarted.
Of course, you can also set archive_command
with the scp
shell command to archive WAL segments to a remote server.
archive_command = 'scp %p postgres@backup:/db/archive/%f'
Archiving to an Object Storage Service
First, download and install a command line tool offered by the public cloud vendor.
Set archive_command
with the command-line tool that copies archives to an object storage service. Let’s take the alibaba cloud as an example:
archive_command = 'ossutil64 cp %p oss://examplebucket/desfolder/'
You can also use a WAL archiving tool like WAL-G. To setup continuous archival of PostgreSQL to AWS S3 (or S3-like storage), you can put the valid AWS credentials into a file .aws/credentials
. It should look like following:
[default]
aws_access_key_id = AKIDblahblah
aws_secret_access_key = acc3ssk3yyblahblah
Create an archival script to archive WAL segments to remote S3 server. File wal-g-archive.sh
:
#!/bin/bash
# Connect using Unix socket
export PGHOST=/var/run/postgresql
export WALG_S3_PREFIX=s3://postgres-archive-bucket/production-db
export AWS_REGION=us-east-2
export AWS_SHARED_CREDENTIALS_FILE=/var/lib/postgresql/.aws/credentials
# Optional for AWS S3 (just comment out the code). But necessary for SeaweedFS or MinIO.
export AWS_ENDPOINT=http://192.168.1.182:8333 # SeaweedFS or MinIO server address.
export AWS_S3_FORCE_PATH_STYLE=true
wal-g wal-push $1
Set archive_command
with the archival script to ship the WAL files to the S3 bucket using the wal-g
.
archive_command = 'wal-g-archive.sh /db/pgdata/%p'
Compressing and Archiving
To compress WALs before archiving, you can create an archival script as following. File archive-script.sh
:
#!/bin/bash
walname = `basename $1`
gzip < $1 > /tmp/archive/$walname.gz
rsync -a /tmp/archive/$walname.gz postgres@backup::archive/$walname.gz
rm /tmp/archive/$walname.gz
Set archive_command
with the archival script to archive the WAL files.
archive_command = 'archive-script.sh /db/pgdata/%p'
If you want to compress WALs before ship the WAL files to the S3-like storage, maybe you can write a python script like following, using AWS SDK for Python.
#!/usr/bin/python3
import gzip
import boto3
walname = sys.argv[1]
# Let's use Amazon S3
s3 = boto3.client('s3')
# read and compress the wal data
wal = open(walname, 'rb')
data = gzip.compress(wal.read())
wal.close()
# ship the compressed WAL to the S3 bucket
s3.put_object(Body=data, Bucket='my-bucket', Key='objectkey')
In this way, we can bypass intermediate compressed files, and put the compressed WAL data into the S3 bucket directly.
Then, you can set archive_command
with the python script to archive the WAL files.