pgcopydb: Features Highlights

March 29, 2024

Summary: This section of documentation introduces you the major features in the pgcopydb program.

Table of Contents

The reason why pgcopydb has been developed is mostly to allow two aspects that are not possible to achieve directly with pg_dump and pg_restore, and that requires just enough fiddling around that not many scripts have been made available to automate around.

Bypass intermediate files for the TABLE DATA

First aspect is that for pg_dump and pg_restore to implement concurrency they need to write to an intermediate file first.

The docs for pg_dump say the following about the --jobs parameter:

From the PostgreSQL documentation

You can only use this option with the directory output format because this is the only output format where multiple processes can write their data at the same time.

The docs for pg_restore say the following about the --jobs parameter:

From the PostgreSQL documentation

Only the custom and directory archive formats are supported with this option. The input must be a regular file or directory (not, for example, a pipe or standard input).

So the first idea with pgcopydb is to provide the --jobs concurrency and bypass intermediate files (and directories) altogether, at least as far as the actual TABLE DATA set is concerned.

The trick to achieve that is that pgcopydb must be able to connect to the source database during the whole operation, when pg_restore may be used from an export on-disk, without having to still be able to connect to the source database. In the context of pgcopydb requiring access to the source database is fine. In the context of pg_restore, it would not be acceptable.

Large-Objects Support

The Postgres Large-Objects API is nobody’s favorite, though the trade-offs implemented there are found very useful by many application developers. In the context dump and restore, Postgres separates the large objects metadata from the large object contents.

Specifically, the metadata consists of a large-object OID and ACLs, and is considered to be part of the pre-data section of a Postgres dump.

This means that pgcopydb relies on pg_dump to import the large object metadata from the source to the target Postgres server, but then implements its own code to migrate the large objects contents, using several worker processes depending on the setting of the command-line option --large-objects-jobs.

Concurrency

A major feature of pgcopydb is how concurrency is implemented, including options to obtain same-table COPY concurrency. See the Notes about concurrency chapter of the documentation for more about it.

Change Data Capture

pgcopydb implements full Postgres replication solution based on the lower-level API for Postgres Logical Decoding. This allows pgcopydb to be compatible with old versions of Postgres, starting with version 9.4.

Always first do a test migration without the --follow option to have an idea of the downtime window needed in your very own case. This will inform your decision about using the Change Data Capture mode, which makes a migration a lot more complex to drive to success.

PostgreSQL Logical Decoding Client

The replication client of pgcopydb has been designed to be able to fetch changes from the source Postgres instance concurrently to the initial COPY of the data. Three worker processes are created to handle the logical decoding client:

The streaming process fetches data from the Postgres replication slot using the Postgres replication protocol.
The transform process transforms the data fetched from a intermediate JSON format into a derivative of the SQL language. In prefetch mode this is implemented as a batch operation, in replay mode this is done in a streaming fashion, one line at a time, reading from a unix pipe.
The apply process then applies the SQL script to the target Postgres database system and uses Postgres APIs for Replication Progress Tracking.

During the initial COPY phase of operations, pgcopydb follow runs in prefetch mode and does not apply changes yet. After the initial COPY is done, then pgcopy replication system enters a loop that switches between the following two modes of operation:

In prefetch mode, changes are stored to JSON files on-disk, the transform process operates on files when a SWITCH occurs, and the apply process catches-up with changes on-disk by applying one file at time.

When the next file to apply does not exists (yet), then the 3 transform worker processes stop and the main follow supervisor process then switches to replay mode.
In replay mode changes are streamed from the streaming worker process to the transform worker process using a Unix PIPE mechanism, and the obtained SQL statements are sent to the replay worker process using another Unix PIPE.

Changes are then replayed in a streaming fashion, end-to-end, with a transaction granularity.

The internal SQL-like script format

The Postgres Logical Decoding API does not provide a CDC format, instead it allows Postgres extension developers to implement logical decoding output plugins. The Postgres core distribution implements such an output plugin named test_decoding. Another commonly used output plugin is named wal2json.

pgcopydb is compatible with both test_decoding and also wal2json and as a user it’s possible to choose an output plugin with the --plugin command-line option.

The output plugin compatibility means that pgcopydb has to implement code to parse the output plugin syntax and make sense of it. Internally, the messages from the output plugin are stored by pgcopydb in a JSON Lines formatted file, where each line is a JSON record with decoded metadata about the changes and the output plugin message, as-is.

This JSON Lines format is transformed into SQL scripts. At first, pgcopydb would just use SQL for the intermediate format, but then as an optimisation support for prepared statements was added. This means that our SQL script uses commands such as the following examples:

PREPARE d33a643f AS INSERT INTO public.rental ("rental_id", "rental_date", "inventory_id", "customer_id", "return_date", "staff_id", "last_update") overriding system value VALUES ($1, $2, $3, $4, $5, $6, $7), ($8, $9, $10, $11, $12, $13, $14);
EXECUTE d33a643f["16050","2022-06-01 00:00:00+00","371","291",null,"1","2022-06-01 00:00:00+00","16051","2022-06-01 00:00:00+00","373","293",null,"2","2022-06-01 00:00:00+00"];

As you can see in the example, pgcopydb is now able to use a single INSERT statement with multiple VALUES, which is a huge performance boost. In order to simplify pgcopydb parsing of the SQL syntax, the choice was made to format the EXECUTE argument list as a JSON array, which does not comply with the actual SQL syntax, but is simple and fast to process.

Finally, it’s not possible for the transform process to anticipate the actual session management of the apply process, so SQL statements are always included with both the PREPARE and the EXECUTE steps. The pgcopydb apply code knows how to skip PREPARing again, of course.

Unfortunately that means that our SQL files are not actually using SQL syntax and can’t be processed as-is with any SQL client software. At the moment either using pgcopydb stream apply or writing your own processing code is required.

Internal Catalogs (SQLite)

To be able to implement pgcopydb operations, a list of SQL objects such as tables, indexes, constraints and sequences is needed internally. While pgcopydb used to handle such a list as an array in-memory, with also a hash-table for direct lookup (by oid and by restore list name), in some cases the source database contains so many objects that these arrays do not fit in memory.

As pgcopydb is written in C, the current best approach to handle an array of objects that needs to spill to disk and supports direct lookup is actually the SQLite library, file format, and embedded database engine.

That’s why the current version of pgcopydb uses SQLite to handle its catalogs.

Internally pgcopydb stores metadata information in three different catalogs, all found in the ${TMPDIR}/pgcopydb/schema/ directory by default, unless using the recommended --dir option.

The source catalog registers metadata about the source database, and also some metadata about the pgcopydb context, consistency, and progress.
The filters catalog is only used with the --filters option is used, and it registers metadata about the objects in the source database that are going to be skipped.

This is necessary because the filtering is implemented using the pg_restore --list and pg_restore --use-list options. The Postgres archive Table Of Contents format contains an object OID and its restore list name, and pgcopydb needs to be able to lookup for that OID or name in its filtering catalogs.
The target catalog registers metadata about the target database, such as the list of roles, the list of schemas, or the list of already existing constraints found on the target database.