pg_ducklake

基于 DuckDB 与 Parquet 的 DuckLake 湖仓一体扩展

概览

扩展包名版本分类许可证语言
pg_ducklake1.0.0OLAPMITC++
ID扩展名BinLibLoadCreateTrustReloc模式
2490pg_ducklakeducklake
相关扩展pg_duckdb duckdb_fdw pg_mooncake pg_analytics pg_parquet columnar citus_columnar

版本

类型仓库版本PG 大版本包名依赖
EXTPIGSTY1.0.01817161514pg_ducklake-
RPMPIGSTY1.0.01817161514pg_ducklake_$v-
DEBPIGSTY1.0.01817161514postgresql-$v-pg-ducklake-
OS / PGPG18PG17PG16PG15PG14
el8.x86_64PIGSTY MISSPIGSTY MISSPIGSTY MISSPIGSTY MISSPIGSTY MISS
el8.aarch64PIGSTY MISSPIGSTY MISSPIGSTY MISSPIGSTY MISSPIGSTY MISS
el9.x86_64
el9.aarch64
el10.x86_64
el10.aarch64
d12.x86_64
d12.aarch64
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
d13.x86_64
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
d13.aarch64
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
u22.x86_64
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
u22.aarch64
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
u24.x86_64
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
u24.aarch64
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
u26.x86_64
u26.aarch64
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0
PIGSTY 1.0.0

构建

您可以使用 pig build 命令构建 pg_ducklake 扩展的 RPM / DEB 包:

pig build pkg pg_ducklake         # 构建 RPM / DEB 包

安装

您可以直接安装 pg_ducklake 扩展包的预置二进制包,首先确保 PGDGPIGSTY 仓库已经添加并启用:

pig repo add pgsql -u          # 添加仓库并更新缓存

使用 pig 或者是 apt/yum/dnf 安装扩展:

pig install pg_ducklake;          # 当前活跃 PG 版本安装
pig ext install -y pg_ducklake -v 18  # PG 18
pig ext install -y pg_ducklake -v 17  # PG 17
pig ext install -y pg_ducklake -v 16  # PG 16
pig ext install -y pg_ducklake -v 15  # PG 15
pig ext install -y pg_ducklake -v 14  # PG 14
dnf install -y pg_ducklake_18       # PG 18
dnf install -y pg_ducklake_17       # PG 17
dnf install -y pg_ducklake_16       # PG 16
dnf install -y pg_ducklake_15       # PG 15
dnf install -y pg_ducklake_14       # PG 14
apt install -y postgresql-18-pg-ducklake   # PG 18
apt install -y postgresql-17-pg-ducklake   # PG 17
apt install -y postgresql-16-pg-ducklake   # PG 16
apt install -y postgresql-15-pg-ducklake   # PG 15
apt install -y postgresql-14-pg-ducklake   # PG 14

预加载配置

shared_preload_libraries = 'pg_ducklake';

创建扩展

CREATE EXTENSION pg_ducklake;

用法

来源:READMEv1.0.0 releaseproject docs

pg_ducklake 为 PostgreSQL 增加 DuckLake tables。DuckLake metadata 存储在 PostgreSQL 中,表数据以 Parquet 存储并通过 DuckDB 查询,让 PostgreSQL SQL clients 可以访问 snapshots、time travel、partitioning、sort keys 和外部对象存储等 lakehouse 功能。

创建 DuckLake 表

CREATE EXTENSION pg_ducklake;

CREATE TABLE events (
  id int,
  kind text,
  ts timestamptz
) USING ducklake;

INSERT INTO events VALUES
  (1, 'login', now()),
  (2, 'click', now());

SELECT * FROM events ORDER BY id;

当数据需要放在默认路径之外时,显式设置 table path:

CREATE TABLE lake_events (
  id int,
  payload jsonb
) WITH (
  ducklake.table_path = 's3://my-bucket/prefix/'
) USING ducklake;

Time Travel

每次 commit 都会创建 snapshot。在修改前记录 snapshot id,然后查询旧状态:

SELECT max(snapshot_id) AS before_delete
FROM ducklake.ducklake_snapshot \gset

DELETE FROM events WHERE id = 1;

SELECT * FROM ducklake.time_travel('events'::regclass, :before_delete);

转换与加载数据

可从已有 PostgreSQL heap tables 或外部 data readers 创建 DuckLake tables:

CREATE TABLE row_store AS
SELECT i AS id, 'hello pg_ducklake' AS msg
FROM generate_series(1, 10000) AS i;

CREATE TABLE col_store USING ducklake AS
SELECT * FROM row_store;

CREATE TABLE titanic USING ducklake AS
SELECT * FROM ducklake.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv');

Inlining、分区与维护

小批写入默认 inline 到 metadata 中,避免产生大量小 Parquet files。可调整行数限制或显式 flush:

CALL ducklake.set_option('data_inlining_row_limit', 100);
SELECT * FROM ducklake.flush_inlined_data('events'::regclass);

为表设置 partition 和 sort keys,以便 pruning 和 analytics:

CALL ducklake.set_partition('events'::regclass, 'bucket(4, id)', 'month(ts)');
CREATE INDEX ON events USING ducklake_sorted (id, ts);

当自动后台维护不够时,可以按需执行维护:

SELECT * FROM ducklake.merge_adjacent_files('events'::regclass);
CALL ducklake.set_option('expire_older_than', '7 days');
SELECT * FROM ducklake.expire_snapshots();
SELECT * FROM ducklake.cleanup_old_files();

外部 DuckDB 访问

DuckDB clients 可以 attach 同一份 DuckLake metadata:

INSTALL ducklake;
LOAD ducklake;
ATTACH 'ducklake:postgres:dbname=postgres host=localhost' AS my_ducklake
  (METADATA_SCHEMA 'ducklake');

SELECT * FROM my_ducklake.public.events;

注意事项

  • 版本 1.0.0 支持 PostgreSQL 14-18。
  • README 列出的源码构建目标包括 Ubuntu 22.04-24.04 和 macOS。
  • Cloud credentials 通过 ducklake_secret foreign server 和 per-user mappings 存储;应像保护其他数据库 secrets 一样保护这些 catalog objects。
  • 对于 incremental heap-to-DuckLake conversion,上游指向单独的 pg_duckpipe 项目。

最后修改 2026-07-01: routine extension update (d1ad21a)