May 20, 2026
Summary: In this tutorial, you will learn how to enable atomic writes at the storage layer and disable full_page_writes to boost OLTP processing performance of PostgreSQL.
Table of Contents
In traditional PostgreSQL architectures, ensuring 8KB page atomic writes at the storage layer is the only technical prerequisite to safely turn off full_page_writes. The core logic of this solution is straightforward: if the underlying storage can guarantee that every 8KB data page write operation of PostgreSQL is an atomic operation — either fully completed or completely failed — the torn-page failure issue that FPW was originally designed to resolve will be fundamentally eliminated.
Technical Essence of Atomic Writes & Its Substitution for FPW
Root Cause of Torn Pages
PostgreSQL adopts 8KB data pages by default, while modern storage stacks are layered as follows:
- Application layer: PostgreSQL 8KB pages
- Filesystem layer: typically 4KB pages under x86 architecture
- Block device layer: 512-byte sectors for traditional HDDs, 4KB physical sectors for modern SSDs
When PostgreSQL performs an 8KB page write, the operating system splits it into two 4KB filesystem write requests, which are further divided into multiple disk sector writes. If a power outage or system crash occurs during the write process, only partial data may be persisted to disk, resulting in half-page corruption.
How Atomic Writes Replace FPW
FPW works by storing full page copies in WAL to serve as a safe baseline for crash recovery. With atomic write guarantees from the storage layer:
Any 8KB page write will either write the entire 8KB data to disk successfully, or write nothing at all, keeping the original intact page on disk.
In the event of a crash, pages on disk remain either fully original or fully updated, with no intermediate torn-page states possible. Crash recovery can directly replay incremental modifications in WAL without relying on full page copies provided by FPW.
Detailed Introduction to Mainstream 8KB Atomic Write Storage Solutions
Solution 1: Hardware RAID Controller with BBU (Battery Backup Unit)
This is the most mature and reliable solution for enterprise environments, and also the most widely adopted choice among traditional DBAs.
Working Mechanism
- The RAID controller is equipped with an independent Battery Backup Unit (BBU) or Flash Backup Unit (FBU).
- When the OS sends an 8KB write request, data is first written to the controller’s built-in high-speed cache.
- Once data enters the battery-protected cache, the controller returns a write-completed acknowledgment to the operating system.
- The controller flushes cached data to physical disks in batches atomically in the background.
- Even during power failures, BBU/FBU can power the cache for hours to ensure all data is ultimately persisted.
Key Configuration Requirements
-
Enable Write-Back Mode Mandatorily
- Disable Write-Through mode: In write-through mode, data is written directly to disks bypassing cache, which cannot guarantee atomicity.
- Check RAID card cache policy:
MegaCli -LDGetProp -Cache -LAll -aAll - Set write-back mode:
MegaCli -LDSetProp WB -LAll -aAll
-
Ensure Healthy Status of BBU/FBU
- Check battery status regularly:
MegaCli -AdpBbuCmd -GetBbuStatus -aAll - When battery capacity drops below the threshold or requires a learn cycle, the RAID controller automatically downgrades to write-through mode, invalidating atomic write guarantees.
- Replace BBU batteries every 3 to 5 years as recommended.
- Check battery status regularly:
-
Disable Physical Disk Internal Cache
- Disk write caches are usually unprotected by batteries and prone to data loss after power cuts.
- Configuration command:
MegaCli -LDSetProp -DisDskCache -LAll -aAll
Verification Method
# Simulate 8KB aligned writes with dd and test with repeated power cuts
dd if=/dev/zero of=/var/lib/pgsql/16/data/testfile bs=8k count=100000 oflag=direct,sync
# Verify file integrity after reboot
md5sum /var/lib/pgsql/16/data/testfile
Pros & Cons
- Advantages: Excellent performance, broad compatibility across all OS and filesystems, enterprise-grade reliability.
- Disadvantages: High hardware cost, dependent on BBU battery lifespan, requires regular maintenance.
Solution 2: Atomic Write Commands for NVMe SSDs
Modern NVMe SSDs natively support atomic write commands, enabling atomicity guarantees without additional RAID controllers.
Working Mechanism
- NVMe 1.1 and above specifications define the Atomic Write Unit (AWU) parameter.
- Most consumer-grade NVMe SSDs feature a 16KB AWU, while enterprise-grade models commonly support larger AWU such as 64KB and 128KB.
- SSD firmware ensures write atomicity as long as write requests are aligned and do not exceed the AWU limit.
- For PostgreSQL 8KB pages, an SSD AWU of 8KB or higher fully meets requirements.
Key Configuration Requirements
- Confirm SSD Atomic Write Capability
# Query NVMe device information
nvme id-ns /dev/nvme0n1 -H | grep "Atomic Write Unit"
- The displayed Atomic Write Unit value shall be no less than 8KB (16 × 512-byte sectors).
- Enable Filesystem Direct I/O
- PostgreSQL opens data files with the
O_DIRECTflag by default to bypass OS page cache. - Ensure direct I/O is not disabled in filesystem mount parameters.
- Disable Volatile SSD Write Cache
- Most enterprise-grade NVMe SSDs disable volatile write cache by default.
- Inspection command:
nvme get-feature /dev/nvme0n1 -f 0x06 - If write cache is enabled, confirm the SSD is fitted with Power Loss Protection (PLP) capacitors.
Verification Method
Use the fio tool to validate atomic writes:
fio --name=atomic-write-test --filename=/var/lib/pgsql/testfile --rw=write --bs=8k --size=10G --ioengine=libaio --direct=1 --iodepth=32 --runtime=300 --time_based
Perform repeated power cuts during the test, then verify filesystem and data integrity after reboot.
Pros & Cons
- Advantages: No RAID controller required, lower latency, better performance and lower power consumption.
- Disadvantages: Reliant on SSD firmware implementation with inconsistent quality across vendors; consumer-grade SSDs may lack PLP support.
Solution 3: Copy-on-Write (CoW) Filesystems (ZFS, Btrfs)
The inherent write mechanism of CoW filesystems guarantees atomicity, making them the optimal software-based atomic write implementation.
Working Mechanism
- CoW filesystems never overwrite data in-place; new data is written to free disk space instead.
- Metadata pointers are updated to point to new data blocks only after new data is fully persisted.
- The entire process is atomic, so only complete old or new data remains after crashes with no intermediate states.
- Both ZFS and Btrfs ensure atomicity for PostgreSQL 8KB page writes regardless of underlying disk sector sizes.
Key ZFS Configuration Requirements
- Create PostgreSQL-optimized ZFS Pool
zpool create -o ashift=12 tank /dev/nvme0n1
# ashift=12 sets 4KB block size matching physical sectors of modern SSDs
- Create PostgreSQL Dataset
zfs create -o recordsize=8k -o compression=lz4 -o atime=off -o logbias=throughput tank/pgdata
- recordsize=8k: Matches PostgreSQL page size to eliminate write amplification.
- compression=lz4: Enables LZ4 compression with negligible performance overhead and improved throughput.
- atime=off: Disables access time updates to reduce redundant writes.
- logbias=throughput: Optimizes performance for large write operations.
- Configure ZFS Intent Log (ZIL)
- Deploy dedicated high-speed SSDs as separate ZIL devices for write-heavy workloads.
- Command:
zfs add tank log /dev/nvme1n1
Key Btrfs Configuration Requirements
# Mount parameters
mount -o noatime,nodiratime,compress=lz4,space_cache=v2 /dev/nvme0n1 /var/lib/pgsql
- Avoid nodatacow: This parameter disables copy-on-write and invalidates atomic write guarantees.
- compress=lz4: Activate LZ4 compression.
- space_cache=v2: Adopt efficient space caching mechanism.
Pros & Cons
- Advantages: Pure software implementation with no special hardware demands; built-in checksum, snapshot and compression features.
- Disadvantages: Moderate performance overhead; poorer Linux compatibility of ZFS compared with ext4; stability concerns of Btrfs in certain scenarios.
Solution 4: Enterprise-Grade SAN Storage
Enterprise SAN storage arrays (e.g. EMC, NetApp, HPE) provide array-level atomic write guarantees.
Working Mechanism
- SAN controllers are equipped with large-capacity battery-backed cache.
- All write requests enter cache first before being flushed to disks in batches by controllers.
- Controllers ensure atomicity for all writes equal to or smaller than the array block size (commonly 8KB or 16KB).
Key Configuration Requirements
- Set the SAN array block size to 8KB or larger.
- Enable controller write-back cache mode.
- Maintain normal operation of controller battery backup units.
- Deploy multipath software such as DM-Multipath for path redundancy.
Solution Comparison & Selection Guidelines
| Solution | Atomicity Guarantee | Performance | Cost | OPS Complexity | Applicable Scenarios |
|---|---|---|---|---|---|
| Hardware RAID with BBU | Extremely High | High | High | Medium | Traditional enterprise data centers with strict reliability requirements |
| NVMe SSD with PLP | High | Extremely High | Medium | Low | Modern servers & high-performance OLTP workloads |
| ZFS Filesystem | Extremely High | Medium-High | Low | Medium | General scenarios requiring snapshots, compression and other advanced features |
| Btrfs Filesystem | Medium-High | Medium-High | Low | Low | Testing environments & non-core businesses |
| Enterprise SAN | Extremely High | Medium | Extremely High | High | Large enterprises with existing SAN infrastructure |
Selection Priority
- Core Business: Prioritize hardware RAID with BBU or ZFS filesystem.
- High-Performance Workloads: Choose enterprise-grade PLP-enabled NVMe SSDs.
- Cost-Sensitive Scenarios: Adopt ZFS to leverage software-native advanced features.
- Existing SAN Infrastructure: Utilize built-in atomic write capabilities of SAN storage directly.
Risk Assessment & Limitations
Even with 8KB atomic write guarantees at the storage layer, disabling FPW still comes with inherent risks and limitations.
Unresolvable Issues
Silent Data Corruption: Atomic writes only ensure write integrity and cannot fix silent corruption caused by disk media damage or firmware bugs.
- Enable PostgreSQL data checksums:
initdb --data-checksums - Run
pg_checksumsperiodically for full database integrity inspection.
Extreme Failure Scenarios: Metadata write failures may still trigger filesystem-level corruption.
- Adopt journaled filesystems including ext4, XFS, ZFS and Btrfs.
- Schedule regular filesystem consistency checks.
Concurrent Primary & Standby Outage: Local crash recovery is still required if both primary and standby databases go offline simultaneously.
- Atomic writes eliminate torn pages, yet other types of data corruption remain possible.
- Ensure full reliability and recoverability of backup solutions.
Common Pitfalls
BBU Battery Degradation: Insufficient BBU power triggers automatic fallback to write-through mode and breaks atomic write guarantees.
- Establish dedicated monitoring and alerting for BBU health status.
- Arrange periodic battery calibration and replacement.
SSD Firmware Defects: Flawed atomic write implementations in certain SSD firmware versions may lead to data corruption.
- Select SSD models fully validated by the PostgreSQL community.
- Keep SSD firmware updated to the latest stable release timely.
Incorrect Filesystem Configuration: Parameters like nodatacow on Btrfs disable CoW and break atomicity.
- Follow official recommended configuration standards strictly.
- Regularly verify filesystem mount parameters.
Complete Implementation Plan
Pre-Deployment Preparation
Storage Selection & Testing
- Select appropriate storage solutions based on business demands.
- Conduct pressure testing and fault injection tests to verify atomic write effectiveness.
- Evaluate performance metrics under diverse configuration combinations.
Database Preparation
- Enable data checksums: Use
initdb --data-checksumsfor new instances; applypg_checksums --enablefor existing databases. - Upgrade PostgreSQL to the latest stable version.
- Deploy comprehensive monitoring systems.
Rollout Procedures
Validation in Test Environments
- Replicate identical storage and database configurations matching production environments.
- Execute business pressure tests and simulate various failure scenarios including power cuts, server restarts and disk failures.
- Maintain stable operation observation for at least one month to confirm zero data corruption.
Phased Production Rollout
- Deploy on non-core business systems first with 1-2 weeks of observation.
- Implement the configuration on standby nodes of core databases for another 1-2 weeks of monitoring.
- Finally apply changes to primary core database nodes.
Parameter Configuration
# postgresql.conf
full_page_writes = off
wal_log_hints = on # Enable wal_log_hints for page repair even after FPW is disabled
Daily OPS & Monitoring
Storage Layer Monitoring
- Monitor RAID controller status and BBU battery health.
- Track SSD health metrics, wear level and operating temperature.
- Oversee filesystem usage, inode consumption and disk error logs.
Database Layer Monitoring
- Track checksum failure errors via the
checksum_failuresfield inpg_stat_database. - Monitor WAL generation rate, which shall drop significantly after FPW disablement.
- Observe transaction throughput and latency to confirm expected performance gains.
Periodic Maintenance Tasks
- Run full-database integrity checks via
pg_checksumsweekly. - Conduct backup restoration drills monthly.
- Execute fault injection tests quarterly.
Conclusion
Disabling FPW by leveraging storage-layer 8KB atomic writes is a technically feasible solution with strict prerequisites for traditional PostgreSQL architectures. It delivers substantial performance improvements especially for write-heavy workloads, while imposing higher requirements on underlying storage infrastructure and operational capabilities.
Core Conclusions
- Hardware RAID with BBU and ZFS filesystem are the two most reliable solutions proven by long-term production practice.
- Data checksums must be enabled alongside complete backup and high availability architectures even with atomic write support.
- Disabling full_page_writes is a high-risk operation that requires sufficient verification and gradual phased deployment.