PostgreSQL Tutorial: Using temporary tablespaces

May 6, 2024

Summary: In this tutorial, we will discuss how to use temporary tablespaces, when and why to use it.

Table of Contents

Introduction

As the name implies, temporary tablespaces are there for temporary objects. The question is then, what exactly is considered as a temporary object? Well, the obvious case is temporary tables. Without any temporary tablespaces, when you create a temporary table, the temporary files get created in the default tablespace of the current database. In a standard PostgreSQL setup it looks like this:

postgres=# l+
                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 872 MB  | pg_default | default administrative connection database
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 7393 kB | pg_default | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |            |
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 7393 kB | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |            |

All the databases have a default tablespace called “pg_default” which is a kind of pseudo tablespace as it does not really exist. Asking the catalog about the location of that tablespace will show an empty location:

SELECT spcname AS "Name"
     , pg_catalog.pg_get_userbyid(spcowner) AS "Owner"
     , pg_catalog.pg_tablespace_location(oid) AS "Location"
  FROM pg_catalog.pg_tablespace
 WHERE pg_catalog.pg_tablespace.spcname = 'pg_default'
 ORDER BY 1;

    Name    |  Owner   | Location
------------+----------+----------
 pg_default | postgres |
(1 row)

If we create temporary objects, where will the files be created then?

CREATE TEMPORARY TABLE tmp1 ( a int, b text, c date );

SELECT pg_relation_filepath('tmp1');
 pg_relation_filepath
----------------------
 base/12732/t3_16436
(1 row)

This is the standard directory of my “postgres” database:

$ cd $PGDATA

$ oid2name
All databases:
    Oid  Database Name  Tablespace
----------------------------------
  12732       postgres  pg_default
  12731      template0  pg_default
      1      template1  pg_default

$ ls -l base/12732/t3_16436
-rw-------. 1 postgres postgres 0 Mar 12 18:17 base/12732/t3_16436

So, by default, files required for temporary tables go to the same location as all the other files that make up the specific database. If we populate the temporary table the files will grow, of course:

INSERT INTO tmp1 (a,b,c) SELECT i, i::text, now() FROM generate_series(1,100) i;
$ ls -la $PGDATA/base/12732/t3_16436
-rw-------. 1 postgres postgres 8192 Mar 12 18:41 /data/pgsql/base/12732/t3_16436
INSERT INTO tmp1 (a,b,c) SELECT i, i::text, now() FROM generate_series(1,1000) i;
$ ls -la $PGDATA/base/12732/t3_16436
-rw-------. 1 postgres postgres 49152 Mar 12 18:42 /data/pgsql/base/12732/t3_16436

What are the consequences of using the default tablespace for temporary objects?

  1. The I/O for temporary tables will compete with I/O for all the other objects in this PostgreSQL cluster.
  2. The temporary tables can potentially fill your file system until it is full and no one will be be able to work from then on. This is the first reason for creating one or more dedicated temporary tablespaces: By doing this you can avoid that temporary tables going crazy impact your whole cluster as long as the temporary tablespace is on it’s own file system.
  3. Slows down access to temporary tables and sort operations. Although you can increase temp_buffers and work_mem parameter values, those values are session level, too high values can lead to excessive memory usage and potential memory competition issues. Therefore, we can choose a faster file system or storage device to create a separate temporary tablespace.

Temporary tablespace

Creating a temporary tablespace is not different from creating a normal tablespace as it is actually exactly the same:

$ mkdir /var/tmp/tbstmp
CREATE TABLESPACE tbstmp LOCATION '/var/tmp/tbstmp';
postgres=# db+
                                      List of tablespaces
    Name    |  Owner   |    Location     | Access privileges | Options |  Size   | Description 
------------+----------+-----------------+-------------------+---------+---------+-------------
 pg_default | postgres |                 |                   |         | 886 MB  | 
 pg_global  | postgres |                 |                   |         | 575 kB  | 
 tbstmp     | postgres | /var/tmp/tbstmp |                   |         | 0 bytes | 
(3 rows)

Once we have the new tablespace we can tell PostgreSQL to use it as the default for temporary objects:

ALTER SYSTEM SET temp_tablespaces = 'tbstmp';

SELECT pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

SHOW temp_tablespaces;
 temp_tablespaces
------------------
 tbstmp
(1 row)

Using temporary table

Creating another temporary table will result in the files being created in the new location:

CREATE TEMPORARY TABLE tmp2 ( a int, b text, c date );

SELECT pg_relation_filepath('tmp2');
              pg_relation_filepath
------------------------------------------------
 pg_tblspc/16442/PG_13_202003051/12732/t3_16443
(1 row)
$ ls -la $PGDATA/pg_tblspc/
total 4
drwx------.  2 postgres postgres   19 Mar 12 18:50 .
drwx------. 20 postgres postgres 4096 Mar 12 18:54 ..
lrwxrwxrwx.  1 postgres postgres   15 Mar 12 18:50 16442 -> /var/tmp/tbstmp

$ ls -la $PGDATA/pg_tblspc/16442/
total 0
drwx------. 3 postgres postgres  29 Mar 12 18:50 .
drwxrwxrwt. 7 root     root     163 Mar 12 18:49 ..
drwx------. 3 postgres postgres  19 Mar 12 18:53 PG_13_202003051

$ ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/
total 0
drwx------. 3 postgres postgres 19 Mar 12 18:53 .
drwx------. 3 postgres postgres 29 Mar 12 18:50 ..
drwx------. 2 postgres postgres 54 Mar 12 18:53 12732

$ ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/12732/
total 8
drwx------. 2 postgres postgres   54 Mar 12 18:53 .
drwx------. 3 postgres postgres   19 Mar 12 18:53 ..
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16443
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16446
-rw-------. 1 postgres postgres 8192 Mar 12 18:53 t3_16448

If you are wondering why there are three files, here is the answer:

SELECT relname FROM pg_class WHERE oid IN (16443,16446,16448);

       relname
----------------------
 pg_toast_16443
 pg_toast_16443_index
 tmp2
(3 rows)

The toast objects get created as well as I have a “text” column in my temporary table. Creating a temporary table with data types that do not require toast objects will result in one file only:

CREATE TEMPORARY TABLE tmp3 ( a int, b date );

SELECT pg_relation_filepath('tmp3');
              pg_relation_filepath
------------------------------------------------
 pg_tblspc/16442/PG_13_202003051/12732/t3_16449
(1 row)
$ ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/12732/
total 8
drwx------. 2 postgres postgres   70 Mar 12 19:07 .
drwx------. 3 postgres postgres   19 Mar 12 18:53 ..
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16443
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16446
-rw-------. 1 postgres postgres 8192 Mar 12 18:53 t3_16448
-rw-------. 1 postgres postgres    0 Mar 12 19:07 t3_16449

Logging temp files

So for now we know that all temporary tables will go to the new temporary table space. What else will go there from now on. There is a parameter log_temp_files which can be used to report temp file usage into the PostgreSQL log file and this comes quite handy if you want to know what goes there. The default setting is “-1” which disables such logging, “0” means log everything, all other values greater than “1” specify the minimum size of the temp files for being logged. Setting it to “0” will, as said, log all temp files being created in the background so let’s do that:

ALTER SYSTEM SET log_temp_files = 0;

SELECT pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

Creating another temporary table and then checking the log file will confirm that this is working and we’ll get the information we want:

CREATE TEMPORARY TABLE tmp4 AS SELECT * FROM generate_series(1,1000000);

The entry in the log file will look like this and it confirms that the temporary files have been written to the temporary tablespsace we created above:

2020-03-13 02:33:35.658 [10535] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10535.0", size 14000000
2020-03-13 02:33:35.658 [10535] STATEMENT:  CREATE TEMPORARY TABLE tmp4 AS SELECT * FROM generate_series(1,1000000);

But: If you create, with the default configuration of PostgreSQL, this temporary table:

CREATE TEMPORARY TABLE tmp4 AS SELECT * FROM generate_series(1,100000);

… you will not see any lines in the log file for this. Why? Because there is temp_buffers and temporary files will be only be reported in the log file if they exceed the value of this parameter. In the default configuration this is ‘8MB’ and that is not enough for the smaller temporary table to be logged. Decreasing the parameter will log the temporary files for the smaller table as well:

SET temp_buffers = '1024kB';

CREATE TEMPORARY TABLE tmp5 AS SELECT * FROM generate_series(1,100000);

Sorts

So now we know how to log the creation of temporary files to the PostgreSQL log file. What other operations will cause temporary file to be created? Sorts?

SET work_mem = '64kB';

SELECT * FROM generate_series(1,1000000) ORDER BY random();

Yes, definitely:

2020-03-13 02:47:14.297 [10609] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.2", size 26083328
2020-03-13 02:47:14.297 [10609] STATEMENT:  SELECT * FROM generate_series(1,1000000) ORDER BY random();
2020-03-13 02:47:14.298 [10609] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1", size 14000000
2020-03-13 02:47:14.298 [10609] STATEMENT:  SELECT * FROM generate_series(1,1000000) ORDER BY random();
2020-03-13 02:47:14.298 [10609] LOG:  duration: 2994.386 ms  statement: SELECT * FROM generate_series(1,1000000) ORDER BY random();

Creating index

CREATE TABLE tt1 AS SELECT * FROM generate_series(1,1000000);

CREATE INDEX ii1 ON tt1(generate_series);

Yes, that creates temporary files as well:

2020-03-13 02:54:00.933 [10609] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.0.sharedfileset/0.0", size 12394496
2020-03-13 02:54:00.933 [10609] STATEMENT:  CREATE INDEX ii1 ON tt1(generate_series);
2020-03-13 02:54:00.934 [10609] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.0.sharedfileset/1.0", size 7659520
2020-03-13 02:54:00.934 [10609] STATEMENT:  CREATE INDEX ii1 ON tt1(generate_series);
2020-03-13 02:54:00.948 [10609] LOG:  duration: 1149.625 ms  statement: CREATE INDEX ii1 ON tt1(generate_series);

Adding foreign keys

CREATE TABLE ttt1 AS SELECT * FROM generate_series(1,1000000) a;

CREATE UNIQUE INDEX iii1 ON ttt1(a);

INSERT INTO ttt2 SELECT a,a FROM generate_series(1,1000000) a;

ALTER TABLE ttt2 ADD CONSTRAINT fk_t FOREIGN KEY (b) REFERENCES ttt1(a);

Yes, that as well:

2020-03-13 03:01:07.127 [10609] LOG:  duration: 1127.768 ms  statement: ALTER TABLE ttt2 ADD CONSTRAINT fk_t FOREIGN KEY (b) REFERENCES ttt1(a);
2020-03-13 03:01:15.375 [10609] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.6", size 67374

So quite a few operations that generate temporary files in background. Separating this on a separate mount point actually can make a lot of sense. From a performance perspective (if I/O spread on the storage layer as well) but also from a security perspective as huge operations that require temporary files will not affect “normal” operations on the instance.

Creating materialized view

There is another case generating temporary files which is not maybe not clear to everybody. Consider this:

CREATE MATERIALIZED VIEW mv1 AS SELECT ttt1.a, ttt2.b FROM ttt1, ttt2 WHERE ttt1.a = ttt2.b;

This will create many temporary files in the background as well:

...
2020-03-13 03:11:03.721 [10609] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1289", size 22704
2020-03-13 03:11:03.721 [10609] STATEMENT:  CREATE MATERIALIZED VIEW mv1 AS SELECT ttt1.a, ttt2.b FROM ttt1, ttt2 WHERE ttt1.a = ttt2.b;
2020-03-13 03:11:03.722 [10609] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.143", size 23136
2020-03-13 03:11:03.722 [10609] STATEMENT:  CREATE MATERIALIZED VIEW mv1 AS SELECT ttt1.a, ttt2.b FROM ttt1, ttt2 WHERE ttt1.a = ttt2.b;
2020-03-13 03:11:03.723 [10609] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1166", size 23136
2020-03-13 03:11:03.723 [10609] STATEMENT:  CREATE MATERIALIZED VIEW mv1 AS SELECT ttt1.a, ttt2.b FROM ttt1, ttt2 WHERE ttt1.a = ttt2.b;
2020-03-13 03:11:03.723 [10609] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.792", size 23640
...

Refresh materialized view

And even refreshes consume temporary files:

CREATE UNIQUE INDEX mv_i1 ON mv1(a);

REFRESH MATERIALIZED VIEW CONCURRENTLY mv1;

From the log file:

...
2020-03-13 03:14:05.866 [10609] STATEMENT:  REFRESH MATERIALIZED VIEW CONCURRENTLY mv1;
2020-03-13 03:14:05.866 [10609] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.8193", size 26768
2020-03-13 03:14:05.866 [10609] CONTEXT:  SQL statement "CREATE TEMP TABLE pg_temp_3.pg_temp_16513_2 AS SELECT mv.ctid AS tid, newdata FROM public.mv1 mv FULL JOIN pg_temp_3.pg_temp_16513 newdata ON (newdata.a OPERATOR(pg_catalog.=) mv.a AND newdata OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid"
2020-03-13 03:14:05.866 [10609] STATEMENT:  REFRESH MATERIALIZED VIEW CONCURRENTLY mv1;
2020-03-13 03:14:05.866 [10609] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.6147", size 28487
2020-03-13 03:14:05.866 [10609] CONTEXT:  SQL statement "CREATE TEMP TABLE pg_temp_3.pg_temp_16513_2 AS SELECT mv.ctid AS tid, newdata FROM public.mv1 mv FULL JOIN pg_temp_3.pg_temp_16513 newdata ON (newdata.a OPERATOR(pg_catalog.=) mv.a AND newdata OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid"
...

There are more operations that require temporary files in the background but for the scope of this post we stop here. One last thing: The amount of temporary files generated can also be limited by temp_file_limit:

SET temp_file_limit = '1MB';

REFRESH MATERIALIZED VIEW CONCURRENTLY mv1;
ERROR:  temporary file size exceeds temp_file_limit (1024kB)
comments powered by Disqus