November 12, 2024
Summary: In this tutorial, we will deep dive into the PostgreSQL Automatic Failover (PAF) solution by ClusterLabs.
Table of Contents
PostgreSQL Automatic Failover
PostgreSQL Automatic Failover (PAF) is a high availability management solution for PostgreSQL by ClusterLabs. PAF makes use of the popular, industry-standard Pacemaker and Corosync stack. With Pacemaker and Corosync together, you’ll be able to detect failures in the system and act accordingly.
Pacemaker is capable of managing many resources, and does so with the help of their resource agents. Resource agents then have the responsibility of handling a specific resource, how they should behave, and inform Pacemaker of their results.
Your resource agent implementation must comply to the Open Cluster Framework (OCF) specification. This specification defines resource agents’ behavior and implementation of methods like stop, start, promote, demote and interaction with Pacemaker.
PAF is an OCF resource agent for PostgreSQL written in Perl. Once your PostgreSQL cluster is built using internal streaming replication, PAF is able to expose to Pacemaker the current status of the PostgreSQL instance on each node: master, slave, stopped, catching up, etc.
How it works
PAF communicates with Pacemaker regarding the cluster status and monitors the PostgreSQL functioning. In the event of a failure, it informs Pacemaker, and if there’s no chance of the current master being recovered, it will trigger an election between the current standby servers. With the robust Pacemaker in place, PAF will perform management actions like start, stop, monitor, and failover on all the PostgreSQL nodes.
Are there any setup requirements?
- PAF supports PostgreSQL version 9.3 and higher.
- PAF is not responsible for PostgreSQL master/standby creation or its setup - you must create and setup streaming replication before using PAF.
- PAF doesn’t edit any configuration of Postgres. However, it requires users to follow a few prerequisites like:
- Slave must be configured as hot standby.
- A recovery template file (default: <postgresql_data_location>/recovery.conf.pcmk) has to be provided with below parameters:
- standby_mode = on
- recovery_target_timeline = ’latest'
- primary_conninfo must have the application_name parameter defined and set to local node name as in Pacemaker.
- PAF exposes multiple parameters related to the management of a PostgreSQL resource. This can be configured to suit one’s requirement. Below are the parameters:
- bindir: location of the PostgreSQL binaries (default: /usr/bin)
- pgdata: location of the PGDATA of your instance (default: /var/lib/pgsql/data)
- datadir: path to the directory set in data_directory from your postgresql.conf file
- pghost: the socket directory or IP address to use to connect to the local instance (default: /tmp)
- pgport: the port to connect to the local instance (default: 5432)
- recovery_template: the local template that will be copied as the PGDATA/recovery.conf file. This template file must exists on all node (default: $PGDATA/recovery.conf.pcmk)
- start_opts: Additional arguments given to the Postgres process on startup. See “postgres --help” for available options. Useful when the postgresql.conf file is not in the data directory (PGDATA), eg.:
-c config_file=/etc/postgresql/9.3/main/postgresql.conf
- system_user: the system owner of your instance’s process (default: postgres)
- maxlag: maximum lag allowed on a standby before we set a negative master score on it
PAF Pros
- PAF provides the user a free hands-on configuration and setup of PostgreSQL.
- PAF can handle node failures and trigger elections when the master goes down.
- Quorum behavior can be enforced in PAF.
- It will provide a complete high availability management solution for the resource, including start, stop, and monitor, and handle network isolation scenarios.
- It’s a distributed solution, which enables the management of any node from another node.
PAF Cons
- PAF doesn’t detect if a standby is misconfigured with an unknown or non-existent node in recovery configuration. Node will be shown as slave, even if standby is running without connecting to the master/cascading standby node.
- Requires an extra port (Default 5405) to be opened for the Pacemaker and Corosync components’ communication using UDP.
- Does not support NAT-based configuration.
- No pg_rewind support.
High Availability test scenarios
We conducted a few tests to determine the capability of the PostgreSQL HA management using PAF. All of these tests were run while the application was running and inserting data to the PostgreSQL database. The application was written using PostgreSQL Java JDBC Driver leveraging the connection failover capability.
Standby server tests
Test scenario | Observation |
---|---|
Kill the PostgreSQL process | Pacemaker brought the PostgreSQL process back to running state. There was no disruption in writer application. |
Stop the PostgreSQL process | Pacemaker brought the PostgreSQL process back to running state. There was no disruption in writer application. |
Reboot the server | Standby server was marked offline initially. Once the server came up after reboot, PostgreSQL was started by Pacemaker and the server was marked as online. If fencing was enabled, the node wouldn’t have been added automatically to cluster. There was no disruption in writer application. |
Stop the Pacemaker process | It will stop the PostgreSQL process also, and the server will be marked offline. There was no disruption in writer application. |
Master/Primary server tests
Test scenario | Observation |
---|---|
Kill the PostgreSQL process | Pacemaker brought the PostgreSQL process back to running state. Primary was recovered within the threshold time and, hence, election was not triggered. The writer application was down for about 26 seconds. |
Stop the PostgreSQL process | Pacemaker brought the PostgreSQL process back to running state. Primary was recovered within the threshold time and, hence, election was not triggered. There was a downtime in writer application for about 26 seconds. |
Reboot the server | Election was triggered by Pacemaker after the threshold time for which master was not available. The most eligible standby server was promoted as the new master. Once the old master came up after reboot, it was added back to the cluster as a standby. If fencing was enabled, the node wouldn’t have been added automatically to cluster. The writer application was down for about 26 seconds. |
Stop the Pacemaker process | It will stop the PostgreSQL process also and server will be marked offline. Election will be triggered and new master will be elected. There was downtime in writer application. |
Network isolation tests
Test scenario | Observation |
---|---|
Network isolate the standby server from other servers | Corosync traffic was blocked on the standby server. The server was marked offline and PostgreSQL service was turned off due to quorum policy. There was no disruption in the writer application. |
Network isolate the master server from other servers (split-brain scenario) | Corosync traffic was blocked on the master server. PostgreSQL service was turned off and master server was marked offline due to quorum policy. A new master was elected in the majority partition. There was a downtime in the writer application. |
Miscellaneous tests
Test scenario | Observation |
---|---|
Degrade the cluster by turning off all the standby servers. | When all the standby servers went down, PostgreSQL service on master was stopped due to quorum policy. After this test, when all the standby servers was turned on, a new master was elected. There was a downtime in the writer application. |
Randomly turn off all the servers one after the other, starting with the master, and bring them all back at same time | All the servers came up and joined the cluster. New master was elected. There was a downtime in the writer application. |
Inference
PostgreSQL Automatic Failover provides several advantages in handling PostgreSQL high availability. PAF uses IP address failover instead of rebooting the standby to connect to the new master during a failover event. This proves advantageous in scenarios where the user does not want to restart the standby nodes. PAF also needs very little manual intervention and manages the overall health of all the resources. The only case where manual intervention is a requirement is in the event of a timeline divergence where the user can elect to use pg_rewind.