January 17, 2024
Summary: in this tutorial, you will learn how to check cache hit ratio in PostgreSQL.
Table of Contents
Introduction
If your investigation leads you to believe that there are database performance issues, then you should consider analyzing the cache performance of your database. PostgreSQL databases use both an internal cache and the machine’s page cache for storing commonly requested data. That way, instead of pulling the data from disk, which can take milliseconds, the database can fetch the data in-memory in sub-millisecond times.
PostgreSQL provides cache hit rate statistics for all tables in the database in the pg_statio_user_tables
table. The two useful columns in that table are the heap_blks_read
, defined as the “number of disk blocks read from this table” and the heap_blks_hit
, defined as the “number of buffer hits in this table”.
Index hit ratio
This query will provide you with your index hit rate across all nodes. Index hit rate is useful in determining how often indices are used when querying:
SELECT 100 * (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) AS index_hit_rate
FROM pg_statio_user_indexes;
Cache hit ratio
Most applications typically access a small fraction of their total data at once. PostgreSQL keeps frequently accessed data in memory to avoid slow reads from disk. You can see statistics about it in the pg_statio_user_tables view.
An important measurement is what percentage of data comes from the memory cache vs the disk in your workload:
SELECT
sum(heap_blks_read) AS heap_read,
sum(heap_blks_hit) AS heap_hit,
100 * sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS cache_hit_rate
FROM
pg_statio_user_tables;
If you find yourself with a ratio significantly lower than 99%, then you likely want to consider increasing the cache available to your database.
Improving internal cache performance
One of the main reasons why the cache hit rate for a table is low is that the database doesn’t have enough space allocated in its internal cache buffer. This internal cache buffer is used for loading the table rows from disk to memory, and if there isn’t enough space allocated, then the database server will constantly have to fetch the data from disk.
For PostgreSQL databases, the cache buffer size is configured with the shared_buffers
configuration. It tells the database how much of the machine’s memory it can allocate for storing data in memory. The default is incredibly low (128 MB) because some kernels do not support more without changing the kernel settings. The optimal value ultimately depends on the data access patterns for a given database, but PostgreSQL recommends that you should initially configure it to 25% of the machine’s total memory if you are using a dedicated database server. More memory can be beneficial for larger databases, but there’s a limit to the optimization since PostgreSQL databases also use the system’s cache. The more memory you allocate to the shared_buffers, the less memory will be available for the machine’s cache.