July 31, 2024
Summary: in this tutorial, you will learn how to check uses of temporary files in PostgreSQL.
Table of Contents
Introduction
As you perhaps know certain operations (such as a sort or hash table) in PostgreSQL can create temporary files. These files are created when operation done by PostgreSQL backend needs to use more memory than it can use, due to limits of work_mem. Sort operations are used for ORDER BY
, DISTINCT
, and merge joins. Hash tables are used in hash joins, hash-based aggregation, result cache nodes and hash-based processing of IN
subqueries.
Example
First, we can set work_mem to a small size, like 1MB:
SET work_mem = 1MB;
Let’s do something that fits in this amount of RAM (1MB), it will not use temp files:
EXPLAIN analyze SELECT COUNT(*) FROM (SELECT random() AS i FROM generate_series(1,1000) ORDER BY i) AS x;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------─
Aggregate (cost=77.33..77.34 rows=1 width=0) (actual time=1.308..1.308 rows=1 loops=1)
-> Sort (cost=62.33..64.83 rows=1000 width=0) (actual time=1.099..1.244 rows=1000 loops=1)
Sort Key: (random())
Sort Method: quicksort Memory: 71kB
-> Function Scan ON generate_series (cost=0.00..12.50 rows=1000 width=0) (actual time=0.184..0.380 rows=1000 loops=1)
Total runtime: 1.449 ms
(6 rows)
As you can see sort used only RAM, but when we do something more demanding, it changed:
EXPLAIN analyze SELECT COUNT(*) FROM (SELECT random() AS i FROM generate_series(1,20000) ORDER BY i) AS x;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------─
Aggregate (cost=77.33..77.34 rows=1 width=0) (actual time=31.668..31.669 rows=1 loops=1)
-> Sort (cost=62.33..64.83 rows=1000 width=0) (actual time=27.529..30.368 rows=20000 loops=1)
Sort Key: (random())
Sort Method: external merge Disk: 352kB
-> Function Scan ON generate_series (cost=0.00..12.50 rows=1000 width=0) (actual time=4.822..10.117 rows=20000 loops=1)
Total runtime: 32.248 ms
(6 rows)
Let’s check the PostgreSQL logs:
LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp22791.1", size 360448
STATEMENT: EXPLAIN analyze SELECT COUNT(*) FROM (SELECT random() AS i FROM generate_series(1,20000) ORDER BY i) AS x;
LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp22791.0", size 280000
STATEMENT: EXPLAIN analyze SELECT COUNT(*) FROM (SELECT random() AS i FROM generate_series(1,20000) ORDER BY i) AS x;
LOG: duration: 32.506 ms statement: EXPLAIN analyze SELECT COUNT(*) FROM (SELECT random() AS i FROM generate_series(1,20000) ORDER BY i) AS x;
As shown above, the explain analyze showed just part of temp file usage, my assumption is that PostgreSQL used first 352kB of temp file, and then rewrote it to 280kB of file, but at the same time it removed the 352 kB file – so explain shows peak usage, not total usage.
Query the pg_stat_database view
You can see database-level statistics in the view pg_stat_database
, which have temp_files
and temp_bytes
. These two columns are of incredible importance because they will tell you whether your database has to write temporary files to disk, which will inevitably slow down operations. What can be the reasons for high temporary file usage? The major reasons are as follows:
- Poor settings: If your work_mem settings are too low, there is no way to do anything in RAM, and therefore PostgreSQL will go to disk.
- Stupid operations: It happens quite frequently that people torture their system with fairly expensive and pointless queries. If you see many temporary files on an OLTP system, consider checking for expensive queries.
- Indexing and other administrative tasks: Once in a while, indexes may be created or people may run DDLs. These operations can lead to temporary file I/O but are not necessarily considered a problem (in many cases).
In short, temporary files can occur, even if your system is perfectly fine. However, it definitely makes sense to keep an eye on them and ensure that temporary files are not needed frequently. We can check uses of temporary files using the following query:
SELECT temp_files, temp_bytes FROM pg_stat_database
WHERE datname = current_database();
Once you’ve found very high temporary file usage, you can locate the queries caused temp file usage to optimize them.