February 3, 2024
Summary: in this tutorial, you will learn how to tune work_mem
setting in PostgreSQL.
Table of Contents
What is work_mem?
In PostgreSQL, work_mem
sets the maximum amount of memory to be used by a query operation like sorting and hashing before writing to temporary disk files. This is crucial for operations like ORDER BY, DISTINCT, and JOINs. By default, work_mem
is set to 4MB. It can be set inside postgresql.conf file or at session level.
set work_mem to '64MB';
Postgres creates temporary files inside the PGDATA/base/pgsql_tmp
directory for operations that exceed work_mem
. work_mem
is allocated for each operation individually, not for the entire query. For example, if a single query has 8 sort operations, work_mem
would be allocated eight times, once for each sort operation.
Parallel workers in PostgreSQL allow multiple processes to work on a single query, dividing the task to improve performance. These workers cooperate by sharing intermediate results through shared memory. If 2 parallel workers are processing a query with sort operation each parallel worker will use work_mem separately.
When to tune work_mem?
Tuning work_mem
can improve performance in a number of scenarios:
- When you are running a query that involves a large amount of sorting or hashing.
- When there are large amounts of temporary files being created on disk.
- When you see external merge disk operation inside EXPLAIN ANALYZE plan.
Benefits of increasing work_mem
There are a number of benefits to increasing the work_mem
parameter. For example, increasing work_mem
can:
- Sorting large result sets is a common database task, and an appropriately set
work_mem
can significantly accelerate this process. It enables more data to be sorted in memory, reducing the need for slow disk operations. - Hash joins are used to combine data from various tables during query execution. A larger
work_mem
value can enhance the performance of hash joins by facilitating more in-memory data storage and minimizing disk I/O needs. - With faster sorting and joining, queries can be completed more swiftly. This translates to more responsive database performance and shorter response times for users.
- Adjusting
work_mem
suitably facilitates parallelism and takes full advantage of multi-core processors, further boosting query efficiency.
NOTE: It’s essential to note that setting
work_mem
too high can lead to excessive memory usage and potential memory competition issues. Thus, finding the right balance for your specific workload is key.
Tuning work_mem example
Let’s walk through an example to better understand how tuning work_mem
can lead to substantial performance improvements.
Suppose we have a table named sales_data
:
CREATE TABLE sales_data (
id SERIAL PRIMARY KEY,
date DATE NOT NULL,
amount DECIMAL(10, 2) NOT NULL
);
To generate millions of rows of sample data for the sales_data
table, you can use the generate_series
function in PostgreSQL. Below is an example of how you can insert 1 million rows of sample data into the sales_data
table.
Here’s an example SQL query to generate the data:
INSERT INTO sales_data (date, amount)
SELECT
generate_series('2020-01-01'::date, '2022-12-31'::date, '1 minute'::interval) AS date,
round(random()::numeric * 1000 + 50, 2) AS amount
FROM generate_series(1, 20);
In this query:
1. We’re using generate_series
to create a series of dates starting from January 1, 2020, to December 31, 2022, with a one-minute interval between each date. This will create about 1.5 million unique timestamps.
2. For each generated date, we’re creating 20 rows with random amounts between 50 and 1050, rounded to two decimal places. You can change these limits to fill as many many row you want to fill in.
3. The INSERT INTO sales_data (date, amount)
part of the query is then used to insert the generated data into the sales_data
table.
Please adjust the date range, interval, and the number of rows per timestamp according to your needs. If you need more than 1 million rows, you can adjust the parameters of the generate_series
function or increase the number of rows generated per timestamp.
We’ve filled this table with millions of rows of sample data.
Now, let’s say we want to retrieve total sales amounts, sorted by date:
EXPLAIN ANALYZE SELECT date, SUM(amount) FROM sales_data GROUP BY date ORDER BY date;
QUERY PLAN
----------------------------------------------------
...
Sort Method: external merge Disk: 31528kB
Execution Time: 1426.504 ms
With the default work_mem
setting, you might notice that PostgreSQL is resorting to disk-based sort operations, which can be slow.
Now, let’s increase work_mem
to 64MB and see how it affects the performance:
SET work_mem = '64MB';
EXPLAIN ANALYZE SELECT date, SUM(amount) FROM sales_data GROUP BY date ORDER BY date;
QUERY PLAN
----------------------------------------------------
...
Sort Method: quicksort Memory: 52428kB
Execution Time: 438.372 ms
The query performance is significantly better now. We’ve reduced the execution time by more than three times!