pg_repack: Reorganize tables with minimal locks

December 23, 2023

Summary: pg_repack is a PostgreSQL extension which lets you remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes.

This article contains the following sections:

  1. Introduction
  2. Requirements
  3. Usage
  4. Environment
  5. Examples
  6. Diagnostics
  7. Restrictions
  8. Details
  9. See Also

Introduction

pg_repack is a PostgreSQL extension which lets you remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes. Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient to boot, with performance comparable to using CLUSTER directly.

You can choose one of the following methods to reorganize:

  • Online CLUSTER (ordered by cluster index)
  • Ordered by specified columns
  • Online VACUUM FULL (packing rows only)
  • Rebuild or relocate only the indexes of a table

NOTICE:

  • Only superusers can use the utility.
  • Target table must have a PRIMARY KEY, or at least a UNIQUE total index on a NOT NULL column.

Requirements

  • PostgreSQL versions

    PostgreSQL 9.5, 9.6, 10, 11, 12, 13, 14, 15, 16.PostgreSQL 9.4 and before it are not supported.

  • Disks

    Performing a full-table repack requires free disk space about twice as large as the target table(s) and its indexes. For example, if the total size of the tables and indexes to be reorganized is 1GB, an additional 2GB of disk space is required.

Usage

pg_repack [OPTION]... [DBNAME]

The following options can be specified in OPTIONS.

Reorg Options

  • -a, --all

    Attempt to repack all the databases of the cluster. Databases where the pg_repack extension is not installed will be skipped.

  • -t TABLE, --table=TABLE

    Reorganize the specified table(s) only. Multiple tables may be reorganized by writing multiple -t switches. By default, all eligible tables in the target databases are reorganized.

  • -I TABLE, --parent-table=TABLE

    Reorganize both the specified table(s) and its inheritors. Multiple table hierarchies may be reorganized by writing multiple -I switches.

  • -c, --schema

    Repack the tables in the specified schema(s) only. Multiple schemas may be repacked by writing multiple -c switches. May be used in conjunction with --tablespace to move tables to a different tablespace.

  • -o COLUMNS [,...], --order-by=COLUMNS [,...]

    Perform an online CLUSTER ordered by the specified columns.

  • -n, --no-order

    Perform an online VACUUM FULL. Since version 1.2 this is the default for non-clustered tables.

  • -N, --dry-run

    List what would be repacked and exit.

  • -j, --jobs

    Create the specified number of extra connections to PostgreSQL, and use these extra connections to parallelize the rebuild of indexes on each table. Parallel index builds are only supported for full-table repacks, not with --index or --only-indexes options. If your PostgreSQL server has extra cores and disk I/O available, this can be a useful way to speed up pg_repack.

  • -s TBLSPC, --tablespace=TBLSPC

    Move the repacked tables to the specified tablespace: essentially an online version of ALTER TABLE ... SET TABLESPACE. The tables’ indexes are left in the original tablespace unless --moveidx is specified too.

  • -S, --moveidx

    Also move the indexes of the repacked tables to the tablespace specified by the --tablespace option.

  • -i, --index

    Repack the specified index(es) only. Multiple indexes may be repacked by writing multiple -i switches. May be used in conjunction with --tablespace to move the index to a different tablespace.

  • -x, --only-indexes

    Repack only the indexes of the specified table(s), which must be specified with the --table or --parent-table options.

  • -T SECS, --wait-timeout=SECS

    pg_repack needs to take one exclusive lock at the beginning as well as one exclusive lock at the end of the repacking process. This setting controls how many seconds pg_repack will wait to acquire this lock. If the lock cannot be taken after this duration and --no-kill-backend option is not specified, pg_repack will forcibly cancel the conflicting queries. If you are using PostgreSQL version 8.4 or newer, pg_repack will fall back to using pg_terminate_backend() to disconnect any remaining backends after twice this timeout has passed. The default is 60 seconds.

  • -D, --no-kill-backend

    Skip to repack table if the lock cannot be taken for duration specified --wait-timeout, instead of cancelling conflicting queries. The default is false.

  • -Z, --no-analyze

    Disable ANALYZE after a full-table reorganization. If not specified, run ANALYZE after the reorganization.

  • -k, --no-superuser-check

    Skip the superuser checks in the client. This setting is useful for using pg_repack on platforms that support running it as non-superusers.

  • -C, --exclude-extension

    Skip tables that belong to the specified extension(s). Some extensions may heavily depend on such tables at planning time etc.

  • --switch-threshold

    Switch tables when that many tuples are left in log table. This setting can be used to avoid the inability to catchup with write-heavy tables.

Connection Options

Options to connect to servers. You cannot use --all and --dbname or --table or --parent-table together.

  • -a, --all

    Reorganize all databases.

  • -d DBNAME, --dbname=DBNAME

    Specifies the name of the database to be reorganized. If this is not specified and -a (or --all) is not used, the database name is read from the environment variable PGDATABASE. If that is not set, the user name specified for the connection is used.

  • -h HOSTNAME, --host=HOSTNAME

    Specifies the host name of the machine on which the server is running. If the value begins with a slash, it is used as the directory for the Unix domain socket.

  • -p PORT, --port=PORT

    Specifies the TCP port or local Unix domain socket file extension on which the server is listening for connections.

  • -U USERNAME, --username=USERNAME

    User name to connect as.

  • -w, --no-password

    Never issue a password prompt. If the server requires password authentication and a password is not available by other means such as a .pgpass file, the connection attempt will fail. This option can be useful in batch jobs and scripts where no user is present to enter a password.

  • -W, --password

    Force the program to prompt for a password before connecting to a database.

    This option is never essential, since the program will automatically prompt for a password if the server demands password authentication. However, pg_repack will waste a connection attempt finding out that the server wants a password. In some cases it is worth typing -W to avoid the extra connection attempt.

Generic Options

  • -e, --echo

    Echo commands sent to server.

  • -E LEVEL, --elevel=LEVEL

    Choose the output message level from DEBUG, INFO, NOTICE, WARNING, ERROR, LOG, FATAL, and PANIC. The default is INFO.

  • --help

    Show usage of the program.

  • --version

    Show the version number of the program.

Environment

  • PGDATABASE, PGHOST, PGPORT, PGUSER

    Default connection parameters

    This utility, like most other PostgreSQL utilities, also uses the environment variables supported by libpq (see Environment Variables).

Examples

Perform an online CLUSTER of all the clustered tables in the database test, and perform an online VACUUM FULL of all the non-clustered tables:

$ pg_repack test

Perform an online VACUUM FULL on the tables foo and bar in the database test (an eventual cluster index is ignored):

$ pg_repack --no-order --table foo --table bar test

Move all indexes of table foo to tablespace tbs:

$ pg_repack -d test --table foo --only-indexes --tablespace tbs

Move the specified index to tablespace tbs:

$ pg_repack -d test --index idx --tablespace tbs

Diagnostics

Error messages are reported when pg_repack fails. The following list shows the cause of errors.

You need to cleanup by hand after fatal errors. To cleanup, just remove pg_repack from the database and install it again: for PostgreSQL version 9.1 or newer, execute DROP EXTENSION pg_repack CASCADE in the database where the error occurred, followed by CREATE EXTENSION pg_repack; for previous version load the script $SHAREDIR/contrib/uninstall_pg_repack.sql into the database where the error occured and then load $SHAREDIR/contrib/pg_repack.sql again.

  • INFO: database “db” skipped: pg_repack VER is not installed in the database

    pg_repack is not installed in the database when the --all option is specified.

    Create the pg_repack extension in the database.

  • ERROR: pg_repack VER is not installed in the database

    pg_repack is not installed in the database specified by --dbname.

    Create the pg_repack extension in the database.

  • ERROR: program ‘pg_repack V1’ does not match database library ‘pg_repack V2’

    There is a mismatch between the pg_repack binary and the database library (.so or .dll).

    The mismatch could be due to the wrong binary in the $PATH or the wrong database being addressed. Check the program directory and the database; if they are what expected you may need to repeat pg_repack installation.

  • ERROR: extension ‘pg_repack V1’ required, found ‘pg_repack V2’

    The SQL extension found in the database does not match the version required by the pg_repack program.

    You should drop the extension from the database and reload it.

  • ERROR: relation “table” must have a primary key or not-null unique keys

    The target table doesn’t have a PRIMARY KEY or any UNIQUE constraints defined.

    Define a PRIMARY KEY or a UNIQUE constraint on the table.

  • ERROR: query failed: ERROR: column “col” does not exist

    The target table doesn’t have columns specified by --order-by option.

    Specify existing columns.

  • WARNING: the table “tbl” already has a trigger called repack_trigger

    The trigger was probably installed during a previous attempt to run pg_repack on the table which was interrupted and for some reason failed to clean up the temporary objects.

    You can remove all the temporary objects by dropping and re-creating the extension.

  • ERROR: Another pg_repack command may be running on the table. Please try again later.

    There is a chance of deadlock when two concurrent pg_repack commands are run on the same table. So, try to run the command after some time.

  • WARNING: Cannot create index “schema”.“index_xxxxx”, already exists

    DETAIL: An invalid index may have been left behind by a previous pg_repack on the table which was interrupted. Please use DROP INDEX “schema”.“index_xxxxx” to remove this index and try again.

    A temporary index apparently created by pg_repack has been left behind, and we do not want to risk dropping this index ourselves. If the index was in fact created by an old pg_repack job which didn’t get cleaned up, you should just use DROP INDEX and try the repack command again.

Restrictions

pg_repack comes with the following restrictions.

Temp tables

pg_repack cannot reorganize temp tables.

GiST indexes

pg_repack cannot cluster tables by GiST indexes.

DDL commands

You will not be able to perform DDL commands of the target table(s) except VACUUM or ANALYZE while pg_repack is working. pg_repack will hold an ACCESS SHARE lock on the target table during a full-table repack, to enforce this restriction.

If you are using version 1.1.8 or earlier, you must not attempt to perform any DDL commands on the target table(s) while pg_repack is running. In many cases pg_repack would fail and rollback correctly, but there were some cases in these earlier versions which could result in data corruption.

Details

Full Table Repacks

To perform a full-table repack, pg_repack will:

  1. create a log table to record changes made to the original table
  2. add a trigger onto the original table, logging INSERTs, UPDATEs and DELETEs into our log table
  3. create a new table containing all the rows in the old table
  4. build indexes on this new table
  5. apply all changes which have accrued in the log table to the new table
  6. swap the tables, including indexes and toast tables, using the system catalogs
  7. drop the original table

pg_repack will only hold an ACCESS EXCLUSIVE lock for a short period during initial setup (steps 1 and 2 above) and during the final swap-and-drop phase (steps 6 and 7). For the rest of its time, pg_repack only needs to hold an ACCESS SHARE lock on the original table, meaning INSERTs, UPDATEs, and DELETEs may proceed as usual.

Index Only Repacks

To perform an index-only repack, pg_repack will:

  1. create new indexes on the table using CONCURRENTLY matching the definitions of the old indexes
  2. swap out the old for the new indexes in the catalogs
  3. drop the old indexes

Creating indexes concurrently comes with a few caveats, please see the documentation for details.

See Also

comments powered by Disqus