pg_mooncake: PostgreSQL 中的列式存储表

John Doe 四月 9, 2025

你有没有想要在 PostgreSQL 中使用列式存储表,来加速数据分析查询?

一头与 SQL 共舞的大象

初始设置

pg_mooncake 是 PostgreSQL 中的一种列式存储扩展。这个扩展的目标是为分析工作优化 PostgreSQL,它可以将表以 IcebergDelta Lake 格式存储在本地磁盘或像 S3 这样的云存储上。

从该扩展的项目仓库下载源代码,编译并安装完成后,将该扩展添加到数据库中:

postgres=# create extension pg_mooncake;
CREATE EXTENSION
postgres=# \dx
                                    List of installed extensions
        Name        | Version |   Schema   |                         Description                          
--------------------+---------+------------+----------------------------------------------------------------
 pg_mooncake        | 0.1.2   | public     | Columnstore Table in Postgres
 pg_stat_statements | 1.11    | public     | track planning and execution statistics of all SQL statements executed
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
(3 rows)

准备测试数据

为了将这种表布局与标准的 PostgreSQL 布局进行比较,我们创建两个表,其中一个使用列式存储格式:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# create table t2 ( a int, b text ) using columnstore;
CREATE TABLE
postgres=# \d+ t1
                                    Table "public.t1"
  Column |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
---------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
  a      | integer |           |          |         | plain    |             |              | 
  b      | text    |           |          |         | extended |             |              | 
Access method: heap

postgres=# \d+ t2
                                    Table "public.t2"
  Column |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
---------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
  a      | integer |           |          |         | plain    |             |              | 
  b      | text    |           |          |         | extended |             |              | 
Access method: columnstore

向这两个表中各插入一百万行数据:

insert into t1 select i, md5(i::text) from generate_series(1,1000000) i;

insert into t2 select i, md5(i::text) from generate_series(1,1000000) i;

检查物理布局

查看磁盘情况,我们会发现:

select pg_relation_filepath('t1');
 pg_relation_filepath 
----------------------
 base/5/24715
(1 row)

select pg_relation_filepath('t2');
 pg_relation_filepath 
----------------------
 base/5/24718
(1 row)

select pg_size_pretty ( pg_relation_size ( 't1' ));
 pg_size_pretty 
----------------
 65 MB
(1 row)

select pg_size_pretty ( pg_relation_size ( 't2' ));
 pg_size_pretty 
----------------
 0 bytes
(1 row)

列式存储的表显示大小为零字节,实际上磁盘上甚至没有对应 24718 的文件:

$ ls -la $PGDATA/base/5/24715
-rw-------. 1 postgres postgres 68272128 Feb 24 14:19 /db/pgdata/17/base/5/24715
$ ls -la $PGDATA/base/5/24718
ls: cannot access '/db/pgdata/17/base/5/24718': No such file or directory

相反,该表以 Parquet 文件的形式存储在这里:

$ ls -la $PGDATA/mooncake_local_tables/mooncake_postgres_t2_24708/
total 107224
drwx------. 3 postgres postgres      180 Feb 24 14:31 .
drwx------. 3 postgres postgres       40 Feb 24 14:15 ..
-rw-------. 1 postgres postgres 36596064 Feb 24 14:19 560c6efe-1226-4a76-985f-1301169bcc44.parquet
-rw-------. 1 postgres postgres 36596064 Feb 24 14:18 ca0550d6-bd84-4bf9-b8cf-6ce85a65346b.parquet
drwx------. 2 postgres postgres     4096 Feb 24 14:19 _delta_log
-rw-------. 1 postgres postgres 36596064 Feb 24 14:19 fba0eff4-3c57-4dbb-bd9b-469f6622ab92.parquet

这里有一篇更详细的文章,介绍了相关设计决策,实际上,该扩展在内部集成了 DuckDB

测试性能表现

查看针对这两个表的查询计划,也能发现差异:

postgres=# explain (analyze) select * from t1 where a = 1;
                                          QUERY PLAN                                          
-----------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..14542.43 rows=1 width=37) (actual time=1.591..41.014 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..13542.33 rows=1 width=37) (actual time=0.022..11.067 rows=0 loops=3)
         Filter: (a = 1)
         Rows Removed by Filter: 333333
 Planning Time: 0.861 ms
 Execution Time: 41.086 ms
(8 rows)

postgres=# explain (analyze) select * from t2 where a = 1;
                                           QUERY PLAN                                           
--------------------------------------------------------------------------------------------------------------
 Custom Scan (MooncakeDuckDBScan)  (cost=0.00..0.00 rows=0 width=0) (actual time=7.797..7.816 rows=1 loops=1)
   DuckDB Execution Plan: 
 
 ┌─────────────────────────────────────┐
 │┌───────────────────────────────────┐│
 ││    Query Profiling Information    ││
 │└───────────────────────────────────┘│
 └─────────────────────────────────────┘
 EXPLAIN ANALYZE SELECT a, b FROM pgmooncake.public.t2 WHERE (a = 1)
 ┌────────────────────────────────────────────────┐
 │┌──────────────────────────────────────────────┐│
 ││              Total Time: 0.0043s             ││
 │└──────────────────────────────────────────────┘│
 └────────────────────────────────────────────────┘
 ┌───────────────────────────┐
            QUERY           
 └─────────────┬─────────────┘
 ┌─────────────┴─────────────┐
       EXPLAIN_ANALYZE      
     ────────────────────   
            0 Rows          
           (0.00s)          
 └─────────────┬─────────────┘
 ┌─────────────┴─────────────┐
          TABLE_SCAN        
     ────────────────────   
          Function:         
       COLUMNSTORE_SCAN     
                            
         Projections:       
              a             
              b             
                            
           Filters:         
    a=1 AND a IS NOT NULL   
                            
            1 Rows          
           (0.00s)          
 └───────────────────────────┘
 Planning Time: 1.693 ms
 Execution Time: 8.584 ms
(43 rows)

和通常的列式数据存储一样,当数据在列的基础上能够很好地压缩时,这种存储方式效果最佳,例如:

truncate t1,t2;

insert into t1 select 1, md5(i::text) from generate_series(1,1000000) i;
Time: 852.812 ms

insert into t2 select 1, md5(i::text) from generate_series(1,1000000) i;
Time: 243.532 ms

向 t2 表插入数据的速度始终比向普通表插入数据的速度快(多重复几次插入操作就能有直观感受)。在读取压缩效果好的列时也是如此(同样,多重复几次查询操作就能体会到):

select count(a) from t1;
  count  
---------
 2000000
(1 row)
Time: 60.463 ms

select count(a) from t2;
  count  
---------
 2000000
(1 row)
Time: 10.272 ms

如果你有相关使用场景,pg_mooncake可能是一个不错的选择。