wal2json: 用于提取变更集的 JSON 输出插件

四月 28, 2024

摘要wal2json是 PostgreSQL 中用于变更集提取的一个 JSON 输出插件。

目录

介绍

wal2json 是一个用于逻辑解码的输出插件。这意味着插件可以访问 INSERT 和 UPDATE 生成的元组。此外,还可以根据配置的 REPLICA IDENTITY 访问 UPDATE/DELETE 的旧行版本。可以使用流复制协议(逻辑复制槽)或特殊的 SQL API 来使用更改。

格式版本 1 为每个事务生成一个 JSON 对象。所有新/旧元组都会出现在 JSON 对象中。此外,还有一些选项可以控制包含事务时间戳、模式限定、数据类型和事务 ID 等属性。

格式版本 2 为每个元组生成一个 JSON 对象。事务开始和结束可选择生成 JSON 对象。此外,还有多种选项可以控制包含属性。

配置

postgresql.conf

您需要在 postgresql.conf 中设置wal_level参数:

wal_level = logical

更改此参数后,需要重新启动。

参数

  • include-xids:将 xid 添加到每个变更集。默认值为 false
  • include-timestamp:为每个变更集添加时间戳。默认值为 false
  • include-schemas:为每个更改添加模式。默认值为 true
  • include-types:为每个更改添加类型。默认值为 true
  • include-typmod:将修饰符添加到具有修饰符的类型(例如 varchar(20) 而不是 varchar)。默认值为 true
  • include-type-oids:添加类型 OID。默认值为 false
  • include-domain-data-type:将域类型名替换为底层数据类型。默认值为 false
  • include-column-positions:添加列位置(pg_attribute.attnum)。默认值为 false
  • include-origin:添加一段数据以包含来源。默认值为 false
  • include-not-null:以 column optionals 形式添加 not null 信息。默认值为 false
  • include-default:添加默认值表达式。默认值为 false
  • include-pk:以 pk 形式添加主键信息。包含列名和数据类型。默认值为 false
  • numeric-data-types-as-string:对 numeric 数据类型使用字符串。JSON 规范不能将InfinityNaN识别为有效的数值。对于双精度浮点数,可能会存在潜在的互操作性问题。默认值为 false
  • pretty-print:向 JSON 结构添加空格和缩进。默认值为 false
  • write-in-chunks:在每个更改之后写入,而不是在每个更改集之后写入。仅当format-version1时才使用。默认值为 false
  • include-lsn:将 nextlsn 添加到每个变更集。默认值为 false
  • include-transaction:发出用来表示每个事务的开始和结束的记录。默认值为 true
  • include-unchanged-toast(已弃用):不要使用它。它已弃用。
  • filter-origins:排除指定来源的更改。默认值为空,这意味着不会过滤任何来源。它是一个用逗号分隔的值。
  • filter-tables:排除指定表中的行。默认值为空,这意味着不会筛选任何表。它是一个逗号分隔的值。这些表应该是模式限定的。*.foo表示所有模式中的 foo 表,bar.*表示模式 bar 中的所有表。特殊字符(空格、单引号、逗号、句点、星号)必须用反斜杠进行转义。模式和表区分大小写。表"public"."Foo bar"应指定为public.Foo\ bar
  • add-tables:仅包含指定表中的行。默认值为所有模式中的所有表。它与filter-tables中的规则相同。
  • filter-msg-prefixes:如果消息前缀在列表中,则排除消息。默认值为空,这意味着不会过滤任何消息。它是一个逗号分隔的值。
  • add-msg-prefixes:仅包括消息前缀在列表中的消息。默认为所有前缀。它是一个逗号分隔的值。wal2json会在此参数之前先应用filter-msg-prefixes
  • format-version:定义要使用的格式。默认值为 1
  • actions:定义发送哪些操作。默认为所有操作(插入、更新、删除和截断)。但是,如果您使用的format-version是 1,则不会启用 truncate(向后兼容性)。

示例

有两种方法可以从 wal2json 插件获取更改(JSON 对象):通过 SQL 调用函数,或者使用 pg_recvlogical

pg_recvlogical

除了上面的配置之外,还需要配置复制连接以使用 pg_recvlogical。版本 9.4、9.5 和 9.6 中的逻辑复制连接,要求数据库列中有replication关键字。从版本 10 开始,逻辑复制可以匹配带数据库名称或关键字(如all)的普通条目。

首先,在 pg_hba.conf(9.4、9.5 和 9.6)添加一条复制连接规则:

local    replication     myuser                     trust

如果您使用的是版本 10 或更高版本:

local    mydatabase      myuser                     trust

另外,在 postgresql.conf 中设置 max_wal_senders:

max_wal_senders = 1

如果更改了 max_wal_senders,则需要重新启动。

现在,您已准备好试用 wal2json 了。可以在一个终端中执行:

$ pg_recvlogical -d postgres --slot test_slot --create-slot -P wal2json
$ pg_recvlogical -d postgres --slot test_slot --start -o pretty-print=1 -o add-msg-prefixes=wal2json -f -

在另一个终端中:

$ cat /tmp/example1.sql
CREATE TABLE table1_with_pk (a SERIAL, b VARCHAR(30), c TIMESTAMP NOT NULL, PRIMARY KEY(a, c));
CREATE TABLE table1_without_pk (a SERIAL, b NUMERIC(5,2), c TEXT);

BEGIN;
INSERT INTO table1_with_pk (b, c) VALUES('Backup and Restore', now());
INSERT INTO table1_with_pk (b, c) VALUES('Tuning', now());
INSERT INTO table1_with_pk (b, c) VALUES('Replication', now());
SELECT pg_logical_emit_message(true, 'wal2json', 'this message will be delivered');
SELECT pg_logical_emit_message(true, 'pgoutput', 'this message will be filtered');
DELETE FROM table1_with_pk WHERE a < 3;
SELECT pg_logical_emit_message(false, 'wal2json', 'this non-transactional message will be delivered even if you rollback the transaction');

INSERT INTO table1_without_pk (b, c) VALUES(2.34, 'Tapir');
-- it is not added to stream because there isn't a pk or a replica identity
UPDATE table1_without_pk SET c = 'Anta' WHERE c = 'Tapir';
COMMIT;

DROP TABLE table1_with_pk;
DROP TABLE table1_without_pk;

上面的脚本产生了以下输出:

$ psql -At -f /tmp/example1.sql postgres
CREATE TABLE
CREATE TABLE
BEGIN
INSERT 0 1
INSERT 0 1
INSERT 0 1
3/78BFC828
3/78BFC880
DELETE 2
3/78BFC990
INSERT 0 1
UPDATE 1
COMMIT
DROP TABLE
DROP TABLE

第一个终端的输出为:

{
	"change": [
	]
}
{
	"change": [
	]
}
{
    "change": [
        {
            "kind": "message",
            "transactional": false,
            "prefix": "wal2json",
            "content": "this non-transactional message will be delivered even if you rollback the transaction"
        }
    ]
}
WARNING:  table "table1_without_pk" without primary key or replica identity is nothing
{
	"change": [
		{
			"kind": "insert",
			"schema": "public",
			"table": "table1_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [1, "Backup and Restore", "2018-03-27 11:58:28.988414"]
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table1_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [2, "Tuning", "2018-03-27 11:58:28.988414"]
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table1_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [3, "Replication", "2018-03-27 11:58:28.988414"]
		}
        ,{
            "kind": "message",
            "transactional": true,
            "prefix": "wal2json",
            "content": "this message will be delivered"
        }
		,{
			"kind": "delete",
			"schema": "public",
			"table": "table1_with_pk",
			"oldkeys": {
				"keynames": ["a", "c"],
				"keytypes": ["integer", "timestamp without time zone"],
				"keyvalues": [1, "2018-03-27 11:58:28.988414"]
			}
		}
		,{
			"kind": "delete",
			"schema": "public",
			"table": "table1_with_pk",
			"oldkeys": {
				"keynames": ["a", "c"],
				"keytypes": ["integer", "timestamp without time zone"],
				"keyvalues": [2, "2018-03-27 11:58:28.988414"]
			}
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table1_without_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "numeric(5,2)", "text"],
			"columnvalues": [1, 2.34, "Tapir"]
		}
	]
}
{
    "change": [
    ]
}
{
    "change": [
    ]
}

在第一个终端中删除复制槽:

Ctrl+C
$ pg_recvlogical -d postgres --slot test_slot --drop-slot

SQL 函数

$ cat /tmp/example2.sql
CREATE TABLE table2_with_pk (a SERIAL, b VARCHAR(30), c TIMESTAMP NOT NULL, PRIMARY KEY(a, c));
CREATE TABLE table2_without_pk (a SERIAL, b NUMERIC(5,2), c TEXT);

SELECT 'init' FROM pg_create_logical_replication_slot('test_slot', 'wal2json');

BEGIN;
INSERT INTO table2_with_pk (b, c) VALUES('Backup and Restore', now());
INSERT INTO table2_with_pk (b, c) VALUES('Tuning', now());
INSERT INTO table2_with_pk (b, c) VALUES('Replication', now());
SELECT pg_logical_emit_message(true, 'wal2json', 'this message will be delivered');
SELECT pg_logical_emit_message(true, 'pgoutput', 'this message will be filtered');
DELETE FROM table2_with_pk WHERE a < 3;
SELECT pg_logical_emit_message(false, 'wal2json', 'this non-transactional message will be delivered even if you rollback the transaction');

INSERT INTO table2_without_pk (b, c) VALUES(2.34, 'Tapir');
-- it is not added to stream because there isn't a pk or a replica identity
UPDATE table2_without_pk SET c = 'Anta' WHERE c = 'Tapir';
COMMIT;

SELECT data FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'pretty-print', '1', 'add-msg-prefixes', 'wal2json');
SELECT 'stop' FROM pg_drop_replication_slot('test_slot');

DROP TABLE table2_with_pk;
DROP TABLE table2_without_pk;

上面的脚本产生了以下输出:

$ psql -At -f /tmp/example2.sql postgres
CREATE TABLE
CREATE TABLE
init
BEGIN
INSERT 0 1
INSERT 0 1
INSERT 0 1
3/78C2CA50
3/78C2CAA8
DELETE 2
3/78C2CBD8
INSERT 0 1
UPDATE 1
COMMIT
{
    "change": [
        {
            "kind": "message",
            "transactional": false,
            "prefix": "wal2json",
            "content": "this non-transactional message will be delivered even if you rollback the transaction"
        }
    ]
}
psql:/tmp/example2.sql:17: WARNING:  table "table2_without_pk" without primary key or replica identity is nothing
{
	"change": [
		{
			"kind": "insert",
			"schema": "public",
			"table": "table2_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [1, "Backup and Restore", "2018-03-27 12:05:29.914496"]
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table2_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [2, "Tuning", "2018-03-27 12:05:29.914496"]
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table2_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [3, "Replication", "2018-03-27 12:05:29.914496"]
		}
        ,{
            "kind": "message",
            "transactional": true,
            "prefix": "wal2json",
            "content": "this message will be delivered"
        }
		,{
			"kind": "delete",
			"schema": "public",
			"table": "table2_with_pk",
			"oldkeys": {
				"keynames": ["a", "c"],
				"keytypes": ["integer", "timestamp without time zone"],
				"keyvalues": [1, "2018-03-27 12:05:29.914496"]
			}
		}
		,{
			"kind": "delete",
			"schema": "public",
			"table": "table2_with_pk",
			"oldkeys": {
				"keynames": ["a", "c"],
				"keytypes": ["integer", "timestamp without time zone"],
				"keyvalues": [2, "2018-03-27 12:05:29.914496"]
			}
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table2_without_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "numeric(5,2)", "text"],
			"columnvalues": [1, 2.34, "Tapir"]
		}
	]
}
stop
DROP TABLE
DROP TABLE

让我们采用format-version为 2 来重复同样的例子:

$ cat /tmp/example3.sql
CREATE TABLE table3_with_pk (a SERIAL, b VARCHAR(30), c TIMESTAMP NOT NULL, PRIMARY KEY(a, c));
CREATE TABLE table3_without_pk (a SERIAL, b NUMERIC(5,2), c TEXT);

SELECT 'init' FROM pg_create_logical_replication_slot('test_slot', 'wal2json');

BEGIN;
INSERT INTO table3_with_pk (b, c) VALUES('Backup and Restore', now());
INSERT INTO table3_with_pk (b, c) VALUES('Tuning', now());
INSERT INTO table3_with_pk (b, c) VALUES('Replication', now());
SELECT pg_logical_emit_message(true, 'wal2json', 'this message will be delivered');
SELECT pg_logical_emit_message(true, 'pgoutput', 'this message will be filtered');
DELETE FROM table3_with_pk WHERE a < 3;
SELECT pg_logical_emit_message(false, 'wal2json', 'this non-transactional message will be delivered even if you rollback the transaction');

INSERT INTO table3_without_pk (b, c) VALUES(2.34, 'Tapir');
-- it is not added to stream because there isn't a pk or a replica identity
UPDATE table3_without_pk SET c = 'Anta' WHERE c = 'Tapir';
COMMIT;

SELECT data FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'format-version', '2', 'add-msg-prefixes', 'wal2json');
SELECT 'stop' FROM pg_drop_replication_slot('test_slot');

DROP TABLE table3_with_pk;
DROP TABLE table3_without_pk;

上面的脚本产生了以下输出:

$ psql -At -f /tmp/example3.sql postgres
CREATE TABLE
CREATE TABLE
init
BEGIN
INSERT 0 1
INSERT 0 1
INSERT 0 1
3/78CB8F30
3/78CB8F88
DELETE 2
3/78CB90B8
INSERT 0 1
UPDATE 1
COMMIT
psql:/tmp/example3.sql:20: WARNING:  no tuple identifier for UPDATE in table "public"."table3_without_pk"
{"action":"M","transactional":false,"prefix":"wal2json","content":"this non-transactional message will be delivered even if you rollback the transaction"}
{"action":"B"}
{"action":"I","schema":"public","table":"table3_with_pk","columns":[{"name":"a","type":"integer","value":1},{"name":"b","type":"character varying(30)","value":"Backup and Restore"},{"name":"c","type":"timestamp without time zone","value":"2019-12-29 04:58:34.806671"}]}
{"action":"I","schema":"public","table":"table3_with_pk","columns":[{"name":"a","type":"integer","value":2},{"name":"b","type":"character varying(30)","value":"Tuning"},{"name":"c","type":"timestamp without time zone","value":"2019-12-29 04:58:34.806671"}]}
{"action":"I","schema":"public","table":"table3_with_pk","columns":[{"name":"a","type":"integer","value":3},{"name":"b","type":"character varying(30)","value":"Replication"},{"name":"c","type":"timestamp without time zone","value":"2019-12-29 04:58:34.806671"}]}
{"action":"M","transactional":true,"prefix":"wal2json","content":"this message will be delivered"}
{"action":"D","schema":"public","table":"table3_with_pk","identity":[{"name":"a","type":"integer","value":1},{"name":"c","type":"timestamp without time zone","value":"2019-12-29 04:58:34.806671"}]}
{"action":"D","schema":"public","table":"table3_with_pk","identity":[{"name":"a","type":"integer","value":2},{"name":"c","type":"timestamp without time zone","value":"2019-12-29 04:58:34.806671"}]}
{"action":"I","schema":"public","table":"table3_without_pk","columns":[{"name":"a","type":"integer","value":1},{"name":"b","type":"numeric(5,2)","value":2.34},{"name":"c","type":"text","value":"Tapir"}]}
{"action":"C"}
stop
DROP TABLE
DROP TABLE

了解更多

wal2json 项目