PostgreSQL Tutorial: How does page caches affect query performance?

March 22, 2024

Summary: in this tutorial, you will learn how to troubleshooting the page caches in PostgreSQL.

Table of Contents

Introduction

Quite frequently I run into question like this: I have restarted PostgreSQL server, but my query is still much faster second time I run it, why is that ?

The answer to this question is simple – because restarting database server only clears buffer caches but it has no change on other caches, which are:

Buffer Caches - The shared buffer pool where PostgreSQL loads pages with tables and indexes from disk, to work directly from memory, reducing the disk access.

Page Caches - Operation Systems typically cache file IO unless you explicitely bypass it by using O_DIRECT flag or mounting file system in direct IO mode.

Hardware Caches - State of CPU cache may affect query execution speed but only lightly, the hardware IO cache may however cause dramatic difference. Hardware RAID cache is the one but more important SAN caches which can be pretty big.

Getting ready

Let’s walk through an example to better understand how page caches can affect query performance.

Suppose we have a table named t1:

CREATE TABLE t1 (id integer, str text);

Here’s an example SQL query to generate the data:

INSERT INTO t1 (id, str)
  SELECT i, repeat('Pg', 64)
    FROM generate_series(1, 1000000) AS s(i);

We’ve filled this table with millions of rows of sample data.

Page caches example

Before we look at the impact of page caches on query performance, we need to stop PostgreSQL server, and drop system caches as root account first:

# echo 3 > /proc/sys/vm/drop_caches

Then, start the PostgreSQL server.

Now, let’s say we want to retrieve total amount of records:

SET max_parallel_workers_per_gather TO 0;

EXPLAIN (analyze, buffers) SELECT count(*) FROM t1;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=32909.00..32909.01 rows=1 width=8) (actual time=439.977..439.978 rows=1 loops=1)
   Buffers: shared read=20409
   ->  Seq Scan on t1  (cost=0.00..30409.00 rows=1000000 width=0) (actual time=0.244..349.652 rows=1000000 loops=1)
         Buffers: shared read=20409
 Planning:
   Buffers: shared hit=13 read=6
 Planning Time: 3.522 ms
 Execution Time: 440.979 ms
(8 rows)

That’s fine. Let’s restart the PostgreSQL server.

Actually, we can look at the page caches statistics through pgfincore.

Now, let’s retrieve total amount of records again and see how it affects the performance:

SET max_parallel_workers_per_gather TO 0;

EXPLAIN (analyze, buffers) SELECT count(*) FROM t1;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=32909.00..32909.01 rows=1 width=8) (actual time=199.904..199.906 rows=1 loops=1)
   Buffers: shared read=20409
   ->  Seq Scan on t1  (cost=0.00..30409.00 rows=1000000 width=0) (actual time=1.131..113.739 rows=1000000 loops=1)
         Buffers: shared read=20409
 Planning:
   Buffers: shared hit=13 read=6
 Planning Time: 0.413 ms
 Execution Time: 199.955 ms
(8 rows)

The query performance is significantly better now. We’ve reduced the execution time by more than two times!