PostgreSQL Tutorial: Tuning work_mem setting

February 3, 2024

Summary: in this tutorial, you will learn how to tune work_mem setting in PostgreSQL.

What is work_mem?

In PostgreSQL, work_mem sets the maximum amount of memory to be used by a query operation like sorting and hashing before writing to temporary disk files. This is crucial for operations like ORDER BY, DISTINCT, and JOINs. By default, work_mem is set to 4MB. It can be set inside postgresql.conf file or at session level.

set work_mem to '64MB';

Postgres creates temporary files inside the PGDATA/base/pgsql_tmp directory for operations that exceed work_mem. work_mem is allocated for each operation individually, not for the entire query. For example, if a single query has 8 sort operations, work_mem would be allocated eight times, once for each sort operation.

Parallel workers in PostgreSQL allow multiple processes to work on a single query, dividing the task to improve performance. These workers cooperate by sharing intermediate results through shared memory. If 2 parallel workers are processing a query with sort operation each parallel worker will use work_mem separately.

When to tune work_mem?

Tuning work_mem can improve performance in a number of scenarios:

  • When you are running a query that involves a large amount of sorting or hashing.
  • When there are large amounts of temporary files being created on disk.
  • When you see external merge disk operation inside EXPLAIN ANALYZE plan.

Benefits of increasing work_mem

There are a number of benefits to increasing the work_mem parameter. For example, increasing work_mem can:

  • Sorting large result sets is a common database task, and an appropriately set work_mem can significantly accelerate this process. It enables more data to be sorted in memory, reducing the need for slow disk operations.
  • Hash joins are used to combine data from various tables during query execution. A larger work_mem value can enhance the performance of hash joins by facilitating more in-memory data storage and minimizing disk I/O needs.
  • With faster sorting and joining, queries can be completed more swiftly. This translates to more responsive database performance and shorter response times for users.
  • Adjusting work_mem suitably facilitates parallelism and takes full advantage of multi-core processors, further boosting query efficiency.

NOTE: It’s essential to note that setting work_mem too high can lead to excessive memory usage and potential memory competition issues. Thus, finding the right balance for your specific workload is key.

Tuning work_mem example

Let’s walk through an example to better understand how tuning work_mem can lead to substantial performance improvements.

Suppose we have a table named sales_data:

CREATE TABLE sales_data (
  id SERIAL PRIMARY KEY,
  date DATE NOT NULL,
  amount DECIMAL(10, 2) NOT NULL
);

To generate millions of rows of sample data for the sales_data table, you can use the generate_series function in PostgreSQL. Below is an example of how you can insert 1 million rows of sample data into the sales_data table.

Here’s an example SQL query to generate the data:

INSERT INTO sales_data (date, amount)
SELECT
generate_series('2020–01–01'::date, '2022–12–31'::date, '1 minute'::interval) AS date,
round(random() * 1000 + 50, 2) AS amount
FROM generate_series(1, 20);

In this query:

1. We’re using generate_series to create a series of dates starting from January 1, 2020, to December 31, 2022, with a one-minute interval between each date. This will create about 1.5 million unique timestamps.

2. For each generated date, we’re creating 20 rows with random amounts between 50 and 1050, rounded to two decimal places. You can change these limits to fill as many many row you want to fill in.

3. The INSERT INTO sales_data (date, amount) part of the query is then used to insert the generated data into the sales_data table.

Please adjust the date range, interval, and the number of rows per timestamp according to your needs. If you need more than 1 million rows, you can adjust the parameters of the generate_series function or increase the number of rows generated per timestamp.

We’ve filled this table with millions of rows of sample data.

Now, let’s say we want to retrieve total sales amounts, sorted by date:

EXPLAIN ANALYZE SELECT date, SUM(amount) FROM sales_data GROUP BY date ORDER BY date;
                     QUERY PLAN
----------------------------------------------------
 ...
 Sort Method: external merge Disk: 31528kB
 Execution Time: 1426.504 ms

With the default work_mem setting, you might notice that PostgreSQL is resorting to disk-based sort operations, which can be slow.

Now, let’s increase work_mem to 64MB and see how it affects the performance:

SET work_mem = '64MB';

EXPLAIN ANALYZE SELECT date, SUM(amount) FROM sales_data GROUP BY date ORDER BY date;
                     QUERY PLAN
----------------------------------------------------
 ...
 Sort Method: quicksort Memory: 52428kB
 Execution Time: 438.372 ms

The query performance is significantly better now. We’ve reduced the execution time by more than three times!

comments powered by Disqus