PostgreSQL Tutorial: Tuning Linux Page Cache

May 7, 2024

Summary: In this tutorial, you will learn how to tune the page cache in Linux.

Table of Contents

Introduction

Page cache is a disk cache which holds data of files and executable programs, for example pages with actual contents of files or block devices. Page cache (disk cache) is used to reduce the number of disk reads.

File system caching in Linux is a mechanism that allows the kernel to store frequently accessed data in memory for faster access. The kernel uses the page cache to store recently-read data from files and file system metadata.

For instance, when a program reads data from a file, the kernel performs several tasks:

checks the page cache to see if the data is already in memory.
if the data is in memory, the kernel simply returns the data from the cache.
otherwise, it reads the data from the drive and stores a copy of it in the cache for future use.

In addition, the kernel uses the dentries cache to store information about file system objects. These file system objects include directories and inodes.

Hence, the page cache handles data from files while the dentries cache manages the file system objects.

Again, the kernel uses a Least Recently Used (LRU) algorithm to manage the page and dentries cache. In other words, when the cache is full and there’s more data to add, the kernel removes the least recently used data to make room for the new data.

Checking the Cache

The vmstat command provides detailed information about virtual memory. In particular, it shows the amount of memory in use for caching:

$ vmstat
procs  -----------memory----------   ---swap---    -----io----   --system--    ------cpu-----
r   b  swpd     free   buff  cache   si     so      bi     bo     in    cs    us sy id wa st
0   0     0  6130448 11032 589532    0       0      422    52     160   362    3  3 76 18  0

The cache column shows the amount of memory used for file system caching in kilobytes. In addition, to get more details using the vmstat command, we can use the -s flag:

$ vmstat -s
8016140 K total memory
1282340 K used memory
 207744 K active memory
 711356 K inactive memory
6133536 K free memory
  11032 K buffer memory
 589232 K swap cache
2097148 K total swap
      0 K used swap
2097148 K free swap
   3458 non-nice user cpu ticks
    389 nice user cpu ticks
   3371 system cpu ticks
  60823 idle cpu ticks
  20782 IO-wait cpu ticks
      0 IRQ cpu ticks
     34 softirq cpu ticks
      0 stolen cpu ticks
 494275 pages paged in
  56168 pages paged out
      0 pages swapped in
      0 pages swapped out
 170063 interrupts
 384058 CPU context switches
1673971944 boot time
   5151 forks

Alternatively, we can use the free command to check the amount of file system cache memory in the system. It shows the memory usage in kilobytes under the buff/cache column:

$ free
                  total        used           free      shared      buff/cache      available
Mem:            8016140     1284652        6130952      144680          600536       6353032
Swap:           2097148           0        2097148

The -m flag alters the command output values to megabytes. Notably, the value of the buff/cache column is the sum of the values of the buffer memory and swap cache rows for vmstat.

Page cache settings

To optimize the page cache, we can modify several parameters:

vm.vfs_cache_pressure
vm.swappiness
vm.dirty_background_ratio
vm.dirty_background_bytes
vm.dirty_ratio
vm.dirty_bytes
vm.dirty_writeback_centisecs
vm.dirty_expire_centisecs

These parameters control the percentage of total system memory we can use for caching. They regulate the caching memory before the kernel writes dirty pages to the storage. Importantly, dirty pages are memory pages that aren’t written to secondary memory yet.

In general, we can use the sysctl command to configure the file system cache in Linux. Also, the sysctl command can modify kernel parameters in the /etc/sysctl.conf file. This file contains system-wide kernel parameters that we can set at runtime.

vm.vfs_cache_pressure

The system parameter vm.vfs_cache_pressure, controls the tendency of the kernel to reclaim the memory used for caching directory and inode objects:

$ sudo sysctl -w vm.vfs_cache_pressure=50
vm.vfs_cache_pressure = 50

Here, we set the vfs_cache_pressure value to 50 via the -w switch of sysctl. Consequently, the kernel will prefer inode and dentry caches over the page cache. This can help improve performance on systems with a large number of files.

Notably, a higher value makes the kernel prefer to reclaim inodes and dentries over cached memory. On the other hand, a lower value makes it reclaim cached memory over inodes and entries. Hence, we can adjust the value according to our preference.

vm.swappiness

Swappiness controls how aggressively the kernel swaps memory pages. Lowering the value of swappiness means the kernel will be less likely to swap out less frequently used memory pages. Thus, the kernel will be more likely to keep these pages cached in RAM for faster access.

Further, we can again use sysctl to set the vm.swappiness parameter:

$ sudo sysctl -w vm.swappiness=10
vm.swappiness = 10

Here, the command sets the value of vm.swappiness to 10. Again, lower values will make the kernel prefer to keep more data in RAM. Thus, higher values make the kernel swap more.

vm.dirty_background_ratio

The vm.dirty_background_ratio parameter is the amount of system memory in percentage that can be filled with dirty pages before they’re written to the drive. For instance, if we set the value of the vm.dirty_background_ratio parameter of a 64GB RAM system to 10, it entails that 6.4GB of data (dirty pages) can stay in RAM before they’re written to the storage.

Now, let’s configure the value of vm.dirty_background_ratio for our system:

$ sudo sysctl -w vm.dirty_background_ratio=10
vm.dirty_background_ratio = 10

Alternatively, we can set the vm.dirty_background_bytes variable in place of vm.dirty_background_ratio. The *_bytes version takes the amount of memory in bytes. For example, we can set the amount of memory for dirty background caching to 512MB:

$ sudo sysctl -w vm.dirty_background_bytes=511870912

However, the *_ratio variant will become 0 if we set the * _bytes variant, and vice versa.

vm.dirty_ratio

Specifically, vm.dirty_ratio is the absolute maximum amount of system memory in percentage that can be filled with dirty pages before they’re written to the drive. At this level, all new I/O activities halt until dirty pages are written to storage.

Notably, the vm.dirty_bytes turns to 0 when we set a percentage value for vm.dirty_ratio and vice versa. To illustrate, let’s define the value for vm.dirty_ratio:

$ sudo sysctl -w vm.dirty_ratio=20
vm.dirty_ratio = 20

Similarly, the vm.dirty_ratio will become 0 if we configure a value in bytes for the vm.dirty _bytes.

dirty_expire_centisecs and dirty_writeback_centisecs

Of course, data cached in the system memory is at risk of loss in case of a power outage. Hence, to safeguard the system from data loss, the following variables dictate how long and how often data is written to secondary storage:

vm.dirty_expire_centisecs
vm.dirty_writeback_centisecs

The vm.dirty_expire_centisecs manages how long data can be in the cache before it’s written to drive. Let’s set the variable so that data can stay for 40 seconds in the cache:

$ sudo sysctl -w vm.dirty_expire_centisecs=4000
vm.dirty_expire_centisecs = 4000

In this case, cached info can stay up to 40 seconds before it’s written to the drive. Notably, 1s equals 100 centisecs.

Further, the vm.dirty_writeback_centisecs is the variable for how often the write background process checks to see if there’s data to write to secondary storage. Thus, the lower the value, the higher the frequency, and vice versa.

Let’s configure vm.dirty_writeback_centisecs to check the cache every 5 seconds:

$ sudo sysctl -w vm.dirty_writeback_centisecs=500
vm.dirty_writeback_centisecs = 500

Again, the 500 centisecs value is equal to 5 seconds.