vchord

使用Rust重写的高性能向量扩展

Module:

PGEXT

扩展总览

PIGSTY 第三方扩展： vchord ：使用Rust重写的高性能向量扩展

基本信息

扩展编号： 1810
扩展名称： vchord
标准包名： vchord
扩展类目： RAG
开源协议： AGPLv3
官方网站： https://github.com/tensorchord/VectorChord
编程语言： Rust
其他标签： pgrx
备注信息：

元数据

默认版本： 0.3.0
PG大版本： 17,16,15,14
动态加载： 需要显式加载
需要DDL：需要执行 CREATE EXTENSION DDL
可重定位：无法安装至任意模式下
信任程度：未受信任，创建扩展需要超级用户权限
所需模式：无
所需扩展： vector

软件包

RPM仓库：PIGSTY
RPM包名：vchord_$v
RPM版本：0.3.0
RPM依赖：pgvector_$v
DEB仓库：PIGSTY
DEB包名：postgresql-$v-vchord
DEB版本：0.3.0
DEB依赖：postgresql-$v-pgvector

系统	架构	PG17	PG16	PG15	PG14
`el8`	`x86_64`	`vchord_17` PIGSTY 0.3.0	`vchord_16` PIGSTY 0.3.0	`vchord_15` PIGSTY 0.3.0	`vchord_14` PIGSTY 0.3.0
`el8`	`aarch64`	`vchord_17` PIGSTY 0.3.0	`vchord_16` PIGSTY 0.3.0	`vchord_15` PIGSTY 0.3.0	`vchord_14` PIGSTY 0.3.0
`el9`	`x86_64`	`vchord_17` PIGSTY 0.3.0	`vchord_16` PIGSTY 0.3.0	`vchord_15` PIGSTY 0.3.0	`vchord_14` PIGSTY 0.3.0
`el9`	`aarch64`	`vchord_17` PIGSTY 0.3.0	`vchord_16` PIGSTY 0.3.0	`vchord_15` PIGSTY 0.3.0	`vchord_14` PIGSTY 0.3.0
`d12`	`x86_64`	`postgresql-17-vchord` PIGSTY 0.3.0	`postgresql-16-vchord` PIGSTY 0.3.0	`postgresql-15-vchord` PIGSTY 0.3.0	`postgresql-14-vchord` PIGSTY 0.3.0
`d12`	`aarch64`	`postgresql-17-vchord` PIGSTY 0.3.0	`postgresql-16-vchord` PIGSTY 0.3.0	`postgresql-15-vchord` PIGSTY 0.3.0	`postgresql-14-vchord` PIGSTY 0.3.0
`u22`	`x86_64`	`postgresql-17-vchord` PIGSTY 0.3.0	`postgresql-16-vchord` PIGSTY 0.3.0	`postgresql-15-vchord` PIGSTY 0.3.0	`postgresql-14-vchord` PIGSTY 0.3.0
`u22`	`aarch64`	`postgresql-17-vchord` PIGSTY 0.3.0	`postgresql-16-vchord` PIGSTY 0.3.0	`postgresql-15-vchord` PIGSTY 0.3.0	`postgresql-14-vchord` PIGSTY 0.3.0
`u24`	`x86_64`	`postgresql-17-vchord` PIGSTY 0.3.0	`postgresql-16-vchord` PIGSTY 0.3.0	`postgresql-15-vchord` PIGSTY 0.3.0	`postgresql-14-vchord` PIGSTY 0.3.0
`u24`	`aarch64`	`postgresql-17-vchord` PIGSTY 0.3.0	`postgresql-16-vchord` PIGSTY 0.3.0	`postgresql-15-vchord` PIGSTY 0.3.0	`postgresql-14-vchord` PIGSTY 0.3.0

扩展安装

使用 pig 命令行工具安装 vchord 扩展：

pig ext install vchord

使用 Pigsty剧本安装 vchord 扩展：

./pgsql.yml -t pg_extension -e '{"pg_extensions": ["vchord"]}' # -l <集群名>

从 YUM仓库 手工安装 vchord RPM 包：

dnf install vchord_17;
dnf install vchord_16;
dnf install vchord_15;
dnf install vchord_14;

从 APT仓库 手工安装 vchord DEB 包：

apt install postgresql-17-vchord;
apt install postgresql-16-vchord;
apt install postgresql-15-vchord;
apt install postgresql-14-vchord;

扩展 vchord 需要通过 shared_preload_libraries 进行 动态加载：

shared_preload_libraries = 'vchord'; # 修改 PG 集群配置

使用以下 SQL 命令在已经安装此扩展插件的 PG 集群上启用 vchord 扩展：

CREATE EXTENSION vchord CASCADE;

使用方法

Add this extension to shared_preload_libraries in postgresql.conf

CREATE EXTENSION vchord CASCADE;

Create Index on embedding:

CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
residual_quantization = true
[build.internal]
lists = [4096]
spherical_centroids = false
$$);

Docs

Query

The query statement is exactly the same as pgvector. VectorChord supports any filter operation and WHERE/JOIN clauses like pgvecto.rs with VBASE.

SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

Supported distance functions are:

<-> - L2 distance
<#> - (negative) inner product
<=> - cosine distance

Query Performance Tuning

You can fine-tune the search performance by adjusting the probes and epsilon parameters:

-- Set probes to control the number of lists scanned. 
-- Recommended range: 3%–10% of the total `lists` value.
SET vchordrq.probes = 100;

-- Set epsilon to control the reranking precision.
-- Larger value means more rerank for higher recall rate.
-- Don't change it unless you only have limited memory.
-- Recommended range: 1.0–1.9. Default value is 1.9.
SET vchordrq.epsilon = 1.9;

-- vchordrq relies on a projection matrix to optimize performance.
-- Add your vector dimensions to the `prewarm_dim` list to reduce latency.
-- If this is not configured, the first query will have higher latency as the matrix is generated on demand.
-- Default value: '64,128,256,384,512,768,1024,1536'
-- Note: This setting requires a database restart to take effect.
ALTER SYSTEM SET vchordrq.prewarm_dim = '64,128,256,384,512,768,1024,1536';

And for postgres’s setting

-- If using SSDs, set `effective_io_concurrency` to 200 for faster disk I/O.
SET effective_io_concurrency = 200;

-- Disable JIT (Just-In-Time Compilation) as it offers minimal benefit (1–2%) 
-- and adds overhead for single-query workloads.
SET jit = off;

-- Allocate at least 25% of total memory to `shared_buffers`. 
-- For disk-heavy workloads, you can increase this to up to 90% of total memory. You may also want to disable swap with network storage to avoid io hang.
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET shared_buffers = '8GB';

Indexing prewarm

To prewarm the index, you can use the following SQL. It will significantly improve performance when using limited memory.

-- vchordrq_prewarm(index_name::regclass) to prewarm the index into the shared buffer
SELECT vchordrq_prewarm('gist_train_embedding_idx'::regclass)"

Index Build Time

Index building can parallelized, and with external centroid precomputation, the total time is primarily limited by disk speed. Optimize parallelism using the following settings:

-- Set this to the number of CPU cores available for parallel operations.
SET max_parallel_maintenance_workers = 8;
SET max_parallel_workers = 8;

-- Adjust the total number of worker processes. 
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET max_worker_processes = 8;

Indexing Progress

You can check the indexing progress by querying the pg_stat_progress_create_index view.

SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" FROM pg_stat_progress_create_index;

External Index Precomputation

Unlike pure SQL, an external index precomputation will first do clustering outside and insert centroids to a PostgreSQL table. Although it might be more complicated, external build is definitely much faster on larger dataset (>5M).

To get started, you need to do a clustering of vectors using faiss, scikit-learn or any other clustering library.

The centroids should be preset in a table of any name with 3 columns:

id(integer): id of each centroid, should be unique
parent(integer, nullable): parent id of each centroid, should be NULL for normal clustering
vector(vector): representation of each centroid, pgvector vector type

And example could be like this:

-- Create table of centroids
CREATE TABLE public.centroids (id integer NOT NULL UNIQUE, parent integer, vector vector(768));
-- Insert centroids into it
INSERT INTO public.centroids (id, parent, vector) VALUES (1, NULL, '{0.1, 0.2, 0.3, ..., 0.768}');
INSERT INTO public.centroids (id, parent, vector) VALUES (2, NULL, '{0.4, 0.5, 0.6, ..., 0.768}');
INSERT INTO public.centroids (id, parent, vector) VALUES (3, NULL, '{0.7, 0.8, 0.9, ..., 0.768}');
-- ...

-- Create index using the centroid table
CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
[build.external]
table = 'public.centroids'
$$);

To simplify the workflow, we provide end-to-end scripts for external index pre-computation, see scripts.

Limitations

Data Type Support: Currently, only the f32 data type is supported for vectors.
Architecture Compatibility: The fast-scan kernel is optimized for x86_64 architectures. While it runs on aarch64, performance may be lower.
KMeans Clustering: The built-in KMeans clustering is not yet fully optimized and may require substantial memory. We strongly recommend using external centroid precomputation for efficient index construction.

反馈

这个页面有帮助吗？

很高兴收到您的反馈！欢迎告诉我们有什么可以改进的地方.

很遗憾听到这个消息，欢迎告诉我们哪里可以继续改进.

最后修改 2025-05-07: update extension catalog (270b243)