PostgreSQL Tutorial: High Availability with Patroni

April 28, 2026

Summary: In this tutorial, you will learn how Patroni brings high availability to PostgreSQL.

Table of Contents

What Patroni does

Patroni does something Postgres still doesn’t do: it builds a cluster of Postgres nodes. It achieves this through the “HAProxy and DCS” three-layer architecture that looks something like this:

  flowchart TB
%% Nodes
    HAProxy["HAProxy"]
    DCS["DCS"]

    subgraph "data layer"
    %% Nodes
        Patroni1["Patroni"]
        Patroni2["Patroni"]
        Patroni3["Patroni"]

        Primary[("Primary")]
        Replica1[("Replica")]
        Replica2[("Replica")]
    %% Edge connections between nodes
        Patroni1 --> Primary
        Patroni2 --> Replica1
        Patroni3 --> Replica2
    end

%% Edge connections between nodes
    HAProxy <--> Primary
    HAProxy <--> Replica1
    HAProxy <--> Replica2

    Patroni1 <--> DCS
    Patroni2 <--> DCS
    Patroni3 <--> DCS

By assembling PostgreSQL into a self-managed integrated system in this way, Patroni glues all the components together. It’s the missing communications nexus that records the composition of the cluster, the status of its members, and routes connections where they need to go. Next, let’s dive a bit deeper into how it does all of this.

Quorum

The first and most important aspect of Patroni’s operational role is that of maintaining quorum. Here’s a handy definition for a quorum:

The minimal number of officers and members of a committee or organization, usually a majority, who must be present for valid transactions of business.

The critical aspect here is the voting majority, otherwise known as a Consensus. The standard formula for this for some number of nodes N is: N/2 + 1. While a two-node cluster would need both nodes to remain online to maintain a majority, a three-node cluster would also require two nodes to maintain a majority. It’s this “extra” node that creates resilience in a network cluster. Should one node become isolated from the others, either through failure or a network partition, the quorum remains and the cluster stays operational. More nodes usually also confers better protection; three is best out of five, after all. Due to communication overhead caused by node topology, most consensus layers suggest staying below a “handful” of nodes, which tends to mean “fewer than ten”.

Ironically, Patroni handles quorum by delegating that responsibility to another piece of software entirely. Patroni reports compatibility with four different key/value or Distributed Configuration Store services, including etcd, Consul, ZooKeeper, and even Kubernetes. In reality, Patroni doesn’t really care where the DCS layer lives or what it’s composed of, just so long as it responds to read and write requests.

That’s why the “DCS” layer in the diagram is a flat plane supporting all of the Postgres nodes. The DCS could be anywhere, using any number of nodes, and Patroni doesn’t have to manage it.

Orchestration

Patroni is a specialized high-availability tool designed specifically for Postgres. As a result, it knows how to manage anything associated with a cluster of Postgres instances, including but not limited to:

  • starting and stopping the Postgres service.
  • promoting replicas.
  • bootstrapping new replicas.
  • demoting primary nodes.
  • Log Sequence Numbers (LSNs).
  • replication slots.

Patroni stores all metadata in the DCS layer, updating it regularly for every node. The “cluster” always knows the status of all nodes, including any replication lag. The magic of how Patroni works so well is how it knows which node is the Primary for the cluster: the leadership token. Here’s how it works:

  • Patroni checks to see if the current node owns the leadership token.
  • If yes, refresh the token and restart the loop.
  • If no, can this node take the leadership token?
  • If yes, take the token, promote this node, and restart the loop.
  • If no, act as a normal replica, reconfigure for the current primary if needed.

There are other steps involved of course, but since the consensus layer is distributed, there can only ever be one leadership token. Once a node has the token, no other node can claim to be the primary node. Based on which node has the token, Patroni will reconfigure all other nodes to use it as the primary. If a replica encounters replay errors, or was a previous primary, Patroni will use pg_rewind, pg_basebackup from the current primary, or even recover from a stored backup to rebuild the node.

That’s something almost none of the other HA tools do. Not only will Patroni promote a replacement primary, but it will rebuild the failed one if it can. If you add a new node to the cluster, it creates the data directory on your behalf based on the cluster configuration. The DCS is the single source of truth Patroni operates from, and in a very real sense, the DCS itself is the cluster.

Things really start to get interesting when the DCS layer itself experiences failures.

Fencing

The idea behind fencing is that a misbehaving node should be decommissioned. The reasoning for this is deceptively simple: in the absence of consensus, you can’t trust any written data. There are many reasons a node could lose contact with the DCS, or the DCS refuses to respond, and none of them matter at all. The safest course of action is to stop Postgres.

If the primary node can’t maintain its ownership of the leadership token, another node seizes it. The Patroni process on that node promotes it to leader, the cluster reconfigures itself around that new primary, and the beat goes on. Isolated replicas don’t have to worry about writes, but they also can’t participate in the leadership race.

An isolated primary knows another node has been promoted in its absence, that it should reject writes to prevent split brain risk, that it should no longer accept new connections. Similarly, a replica cut off from the DCS can’t be monitored, is likely accumulating replication lag, and is otherwise suspect. As a result, Patroni stops the Postgres service on that node.

Believe it or not, most Postgres HA solutions omit this critical factor. Almost all of them will detect a primary failure and promote a standby, but almost none consider what happens if the failure is the network and not the node or Postgres itself. In these systems, an isolated node keeps accepting writes from colocated systems or established connections, keeps operating normally, and doesn’t know or care that a promotion happened elsewhere.

The Postgres service on isolated nodes absolutely must self-terminate, and Patroni ensures that outcome by its very design. Lose contact with the DCS, or if the DCS refuses requests for any reason, shut down. Easy.

Note: One failure scenario for the cluster is that the DCS itself loses quorum. In a five node cluster, a network error could split two nodes from the other three. In such a situation, two nodes lost the majority and will refuse to operate in that state. The Patroni service for any affected nodes doesn’t know this, and indeed, it doesn’t matter. The end result is always to stop Postgres.

Routing

The final thing Patroni does to establish a Postgres cluster is manage connection routing. It does this by tracking the ownership of the leadership token and providing an HTTP REST status interface. Any front-end routing system can interrogate a Patroni node for its current state. Whether or not Postgres is online, if the node should be considered writable, if there’s too much replication lag, and so on.

The usual choice for this routing layer is HAProxy as reflected in the architecture diagram, but it could easily be an F5 load balancer, an Amazon ELB, and so on. This determines which connections reach what node—or whether a node should allow connections at all. Users who wish to connect to the Postgres primary simply need to connect to the routing layer. Is it important to connect to a replica that has less than 5MB of replication lag? Routing layer. Patroni evaluates criteria encoded in the health check request and responds accordingly.

Fencing is one half of the equation, and routing control is the other. If Patroni determines a node should not be routable for any reason, it simply returns a failure on the REST interface. A properly configured routing component will then immediately cut any established connections and refuse future routing for that node until it is healthy again.

More importantly, users and applications don’t need to know anything about the node they’re connecting to. In a very real sense, they’re not connecting to a node at all, but the cluster itself. Now, finally, it’s possible to accurately describe Postgres as a cluster of nodes. Each individual node doesn’t actually operate any differently, but Patroni and the underlying DCS establish an underlying fabric that binds everything together.

Conclusion

Every Postgres node is an island, as a cluster, it needs Patroni to handle failover between data node replicas. Postgres simply isn’t a self-organizing cluster without some external orchestration layer. For now, Patroni is the best of these.

See more

PostgreSQL Administration