duckdb_fdw: PostgreSQL Foreign Data Wrapper for DuckDB

March 31, 2024

Summary: duckdb_fdw is a foreign data wrapper (FDW) to connect PostgreSQL to DuckDB database file.

PostgreSQL + DuckDB

DuckDB is a simple, portable and fast embedded database engine. It is designed to support analytical query workloads, also known as online analytical processing (OLAP).

This article contains the following sections:

  1. Features
  2. Requirements
  3. Usage
  4. Examples
  5. Limitations

Features

  • Transactions
  • Support TRUNCATE by deparsing into DELETE statement without WHERE clause
  • Allow control over whether foreign servers keep connections open after transaction completion. This is controlled by keep_connections and defaults to on
  • Support list cached connections to foreign servers by using function duckdb_fdw_get_connections()
  • Support discard cached connections to foreign servers by using function duckdb_fdw_disconnect(), duckdb_fdw_disconnect_all().
  • Support Bulk INSERT by using batch_size option
  • Support INSERT/UPDATE with generated column

Requirements

duckdb_fdw was developed on macOS and tested on Linux, so it should run on any reasonably POSIX-compliant system.

The extension works with PostgreSQL 9.6 and newer versions, and confirmed with some versions of DuckDB.

Usage

CREATE SERVER options

duckdb_fdw accepts the following options via the CREATE SERVER command:

  • database as string, required

    DuckDB database path.

  • truncatable as boolean, optional, default false

    Allows foreign tables to be truncated using the TRUNCATE command.

  • keep_connections as boolean, optional, default false

    Allows to keep connections to DuckDB while there is no SQL operations between PostgreSQL and DuckDB.

  • batch_size as integer, optional, default 1

    Specifies the number of rows which should be inserted in a single INSERT operation. This setting can be overridden for individual tables.

  • temp_directory as string, optional, default NULL

    Specifies the directory to which to write temp files.

CREATE USER MAPPING options

There is no user or password conceptions in DuckDB, hence duckdb_fdw no need any CREATE USER MAPPING command.

In OS duckdb_fdw works as executed code with permissions of user of PostgreSQL server. Usually it is postgres OS user. For interacting with DuckDB database without access errors ensure this user have permissions on DuckDB file and, sometimes, directory of the file.

  • read permission on all directories by path to the DuckDB database file;
  • read permission on DuckDB database file;

CREATE FOREIGN TABLE options

duckdb_fdw accepts the following table-level options via the CREATE FOREIGN TABLE command:

  • table as string, optional, no default

    DuckDB table name. Use if not equal to name of foreign table in PostgreSQL. Also see about identifier case handling.

  • truncatable as boolean, optional, default from the same CREATE SERVER option

    See CREATE SERVER options section for details.

  • batch_size as integer, optional, default from the same CREATE SERVER option

    See CREATE SERVER options section for details.

duckdb_fdw accepts the following column-level options via the CREATE FOREIGN TABLE command:

  • column_name as string, optional, no default

    This option gives the column name to use for the column on the remote server. Also see about identifier case handling.

  • column_type as string, optional, no default

    Option to convert INT DuckDB column (epoch Unix Time) to be treated/visualized as TIMESTAMP in PostgreSQL.

  • key as boolean, optional, default false

    Indicates a column as a part of primary key or unique key of DuckDB table.

IMPORT FOREIGN SCHEMA options

duckdb_fdw supports IMPORT FOREIGN SCHEMA (PostgreSQL 9.5+) and accepts no custom options for this command.

TRUNCATE support

duckdb_fdw implements the foreign data wrapper TRUNCATE API, available from PostgreSQL 14.

As SQlite does not provide a TRUNCATE command, it is simulated with a simple unqualified DELETE operation.

TRUNCATE ... CASCADE support not described.

Functions

As well as the standard duckdb_fdw_handler() and duckdb_fdw_validator() functions, duckdb_fdw provides the following user-callable utility functions:

  • SETOF record duckdb_fdw_get_connections(server_name text, valid bool)

  • bool duckdb_fdw_disconnect(text)

    Closes connection from PostgreSQL to DuckDB in the current session.

  • bool duckdb_fdw_disconnect_all()

  • duckdb_fdw_version(); Returns standard “version integer” as major version * 10000 + minor version * 100 + bugfix.

 duckdb_fdw_version
--------------------
              20300

duckdb_execute

FUNCTION duckdb_execute(server name, stmt text) RETURNS void

This function can be used to execute arbitrary SQL statements on the remote DuckDB server. That will only work with statements that do not return results (typically DDL statements).

Be careful when using this function, since it might disturb the transaction management of duckdb_fdw. Remember that running a DDL statement in DuckDB will issue an implicit COMMIT. You are best advised to use this function outside of multi-statement transactions.

It is very useful to use command that duckdb_fdw does not support, for example,

  • add more table or view to DuckDB directly.
SELECT duckdb_execute('duckdb_server',
'create or replace view iris_parquet as select * from parquet_scan(''temp/iris.parquet'');');

create foreign TABLE duckdb.iris_parquet(
  "Sepal.Length" float,
  "Sepal.Width" float,
  "Petal.Length" float,
  "Petal.Width" float,
  "Species" text)
SERVER duckdb_server OPTIONS (table 'iris_parquet');

-- or an easy way

IMPORT FOREIGN SCHEMA public limit to (iris_parquet) FROM SERVER
duckdb_server INTO duckdb;
  • run Copy command on Foreign table
SELECT duckdb_execute('duckdb_server'
,'CREATE TABLE test (a INTEGER, b INTEGER, c VARCHAR(10));
');
SELECT duckdb_execute('duckdb_server'  
,'COPY test FROM ''/tmp/test.csv'';');

Identifier case handling

PostgreSQL folds identifiers to lower case by default. DuckDB behaviour not described. It’s important to be aware of potential issues with table and column names.

Generated columns

DuckDB provides support for generated columns. Behaviour of duckdb_fdw with this columns isn’t yet described.

Note that while duckdb_fdw will INSERT or UPDATE the generated column value in DuckDB, there is nothing to stop the value being modified within DuckDB, and hence no guarantee that in subsequent SELECT operations the column will still contain the expected generated value. This limitation also applies to postgres_fdw.

For more details on generated columns see:

Examples

Install the extension

Once for a database you need, as PostgreSQL superuser.

CREATE EXTENSION duckdb_fdw;

Create a foreign server with appropriate configuration:

Once for a foreign datasource you need, as PostgreSQL superuser. Please specify DuckDB database path using database option.

CREATE SERVER duckdb_server
FOREIGN DATA WRAPPER duckdb_fdw
OPTIONS (
  database '/path/to/database'
);

Grant usage on foreign server to normal user in PostgreSQL:

Once for a normal user (non-superuser) in PostgreSQL, as PostgreSQL superuser. It is a good idea to use a superuser only where really necessary, so let’s allow a normal user to use the foreign server (this is not required for the example to work, but it’s secirity recomedation).

GRANT USAGE ON FOREIGN SERVER duckdb_server TO pguser;

Where pguser is a sample user for works with foreign server (and foreign tables).

User mapping

There is no user or password conceptions in DuckDB, hence duckdb_fdw no need any CREATE USER MAPPING command. About access problems see in CREATE USER MAPPING options.

Create foreign table

All CREATE FOREIGN TABLE SQL commands can be executed as a normal PostgreSQL user if there were correct GRANT USAGE ON FOREIGN SERVER. No need PostgreSQL supersuer for secirity reasons but also works with PostgreSQL supersuer.

Please specify table option if DuckDB table name is different from foreign table name.

CREATE FOREIGN TABLE t1 (
  a integer,
  b text
)
SERVER duckdb_server
OPTIONS (
  table 't1_duckdb'
);

If you want to update tables, please add OPTIONS (key 'true') to a primary key or unique key like the following:

CREATE FOREIGN TABLE t1(
  a integer OPTIONS (key 'true'),
  b text
)
SERVER duckdb_server 
OPTIONS (
  table 't1_duckdb'
);

If you need to convert INT DuckDB column (epoch Unix Time) to be treated/visualized as TIMESTAMP in PostgreSQL, please add OPTIONS (column_type 'INT') when defining FOREIGN table at PostgreSQL like the following:

CREATE FOREIGN TABLE t1(
  a integer,
  b text,
  c timestamp without time zone OPTIONS (column_type 'INT')
)
SERVER duckdb_server
OPTIONS (
  table 't1_duckdb'
);

As above, but with aliased column names:

CREATE FOREIGN TABLE t1(
  a integer,
  b text OPTIONS (column_name 'test_id'),
  c timestamp without time zone OPTIONS (column_type 'INT', column_name 'unixtime')
)
SERVER duckdb_server
OPTIONS (
  table 't1_duckdb'
);

Import a DuckDB database as schema to PostgreSQL:

IMPORT FOREIGN SCHEMA someschema
FROM SERVER duckdb_server
INTO public;

Note: someschema has no particular meaning and can be set to an arbitrary value.

Access foreign table

For the table from previous examples

SELECT * FROM t1;

Limitations

  • INSERT into a partitioned table which has foreign partitions is not supported. Error Not support partition insert will display.
  • TRUNCATE in duckdb_fdw always delete data of both parent and child tables (no matter user inputs TRUNCATE table CASCADE or TRUNCATE table RESTRICT) if there are foreign-keys references with ON DELETE CASCADE clause.
  • RETURNING is not supported.
comments powered by Disqus