vchord
Module:
Categories:
扩展总览
PIGSTY 第三方扩展: vchord
: 使用Rust重写的高性能向量扩展
基本信息
- 扩展编号: 1810
- 扩展名称:
vchord
- 标准包名:
vchord
- 扩展类目:
RAG
- 开源协议: AGPLv3
- 官方网站: https://github.com/tensorchord/VectorChord
- 编程语言: Rust
- 其他标签:
pgrx
- 备注信息:
元数据
- 默认版本: 0.2.2
- PG大版本:
17
,16
,15
,14
- 动态加载: 需要显式加载
- 需要DDL: 需要执行
CREATE EXTENSION
DDL - 可重定位: 无法安装至任意模式下
- 信任程度: 未受信任,创建扩展需要超级用户权限
- 所需模式: 无
- 所需扩展:
vector
软件包
- RPM仓库:PIGSTY
- RPM包名:
vchord_$v
- RPM版本:
0.2.1
- RPM依赖:
pgvector_$v
- DEB仓库:PIGSTY
- DEB包名:
postgresql-$v-vchord
- DEB版本:
0.2.1
- DEB依赖:
postgresql-$v-pgvector
最新版本
系统 | 架构 | PG17 | PG16 | PG15 | PG14 | PG13 |
---|---|---|---|---|---|---|
el8 |
x86_64 |
vchord_17 PIGSTY 0.2.2 |
vchord_16 PIGSTY 0.2.2 |
vchord_15 PIGSTY 0.2.2 |
vchord_14 PIGSTY 0.2.2 |
|
el8 |
aarch64 |
vchord_17 PIGSTY 0.2.2 |
vchord_16 PIGSTY 0.2.2 |
vchord_15 PIGSTY 0.2.2 |
vchord_14 PIGSTY 0.2.2 |
|
el9 |
x86_64 |
vchord_17 PIGSTY 0.2.2 |
vchord_16 PIGSTY 0.2.2 |
vchord_15 PIGSTY 0.2.2 |
vchord_14 PIGSTY 0.2.2 |
|
el9 |
aarch64 |
vchord_17 PIGSTY 0.2.2 |
vchord_16 PIGSTY 0.2.2 |
vchord_15 PIGSTY 0.2.2 |
vchord_14 PIGSTY 0.2.2 |
|
d12 |
x86_64 |
postgresql-17-vchord PIGSTY 0.2.2 |
postgresql-16-vchord PIGSTY 0.2.2 |
postgresql-15-vchord PIGSTY 0.2.2 |
postgresql-14-vchord PIGSTY 0.2.2 |
|
d12 |
aarch64 |
postgresql-17-vchord PIGSTY 0.2.2 |
postgresql-16-vchord PIGSTY 0.2.2 |
postgresql-15-vchord PIGSTY 0.2.2 |
postgresql-14-vchord PIGSTY 0.2.2 |
|
u22 |
x86_64 |
postgresql-17-vchord PIGSTY 0.2.2 |
postgresql-16-vchord PIGSTY 0.2.2 |
postgresql-15-vchord PIGSTY 0.2.2 |
postgresql-14-vchord PIGSTY 0.2.2 |
|
u22 |
aarch64 |
postgresql-17-vchord PIGSTY 0.2.2 |
postgresql-16-vchord PIGSTY 0.2.2 |
postgresql-15-vchord PIGSTY 0.2.2 |
postgresql-14-vchord PIGSTY 0.2.2 |
|
u24 |
x86_64 |
postgresql-17-vchord PIGSTY 0.2.2 |
postgresql-16-vchord PIGSTY 0.2.2 |
postgresql-15-vchord PIGSTY 0.2.2 |
postgresql-14-vchord PIGSTY 0.2.2 |
|
u24 |
aarch64 |
postgresql-17-vchord PIGSTY 0.2.2 |
postgresql-16-vchord PIGSTY 0.2.2 |
postgresql-15-vchord PIGSTY 0.2.2 |
postgresql-14-vchord PIGSTY 0.2.2 |
扩展安装
使用 pig
命令行工具安装 vchord
扩展:
pig ext install vchord
使用 Pigsty剧本 安装 vchord 扩展:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["vchord"]}' # -l <集群名>
dnf install vchord_17;
dnf install vchord_16;
dnf install vchord_15;
dnf install vchord_14;
apt install postgresql-17-vchord;
apt install postgresql-16-vchord;
apt install postgresql-15-vchord;
apt install postgresql-14-vchord;
扩展 vchord
需要通过 shared_preload_libraries
进行 动态加载:
shared_preload_libraries = 'vchord'; # 修改 PG 集群配置
使用以下 SQL 命令在已经安装此扩展插件的 PG 集群上 启用 vchord
扩展:
CREATE EXTENSION vchord CASCADE;
使用方法
- https://github.com/tensorchord/VectorChord
- Launch Blog: VectorChord: Store 400k Vectors for $1 in PostgreSQL
Add this extension to shared_preload_libraries in postgresql.conf
CREATE EXTENSION vchord CASCADE;
Create Index on embedding:
CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
residual_quantization = true
[build.internal]
lists = [4096]
spherical_centroids = false
$$);
Docs
Query
The query statement is exactly the same as pgvector. VectorChord supports any filter operation and WHERE/JOIN clauses like pgvecto.rs with VBASE.
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
Supported distance functions are:
- <-> - L2 distance
- <#> - (negative) inner product
- <=> - cosine distance
Query Performance Tuning
You can fine-tune the search performance by adjusting the probes
and epsilon
parameters:
-- Set probes to control the number of lists scanned.
-- Recommended range: 3%–10% of the total `lists` value.
SET vchordrq.probes = 100;
-- Set epsilon to control the reranking precision.
-- Larger value means more rerank for higher recall rate.
-- Don't change it unless you only have limited memory.
-- Recommended range: 1.0–1.9. Default value is 1.9.
SET vchordrq.epsilon = 1.9;
-- vchordrq relies on a projection matrix to optimize performance.
-- Add your vector dimensions to the `prewarm_dim` list to reduce latency.
-- If this is not configured, the first query will have higher latency as the matrix is generated on demand.
-- Default value: '64,128,256,384,512,768,1024,1536'
-- Note: This setting requires a database restart to take effect.
ALTER SYSTEM SET vchordrq.prewarm_dim = '64,128,256,384,512,768,1024,1536';
And for postgres’s setting
-- If using SSDs, set `effective_io_concurrency` to 200 for faster disk I/O.
SET effective_io_concurrency = 200;
-- Disable JIT (Just-In-Time Compilation) as it offers minimal benefit (1–2%)
-- and adds overhead for single-query workloads.
SET jit = off;
-- Allocate at least 25% of total memory to `shared_buffers`.
-- For disk-heavy workloads, you can increase this to up to 90% of total memory. You may also want to disable swap with network storage to avoid io hang.
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET shared_buffers = '8GB';
Indexing prewarm
To prewarm the index, you can use the following SQL. It will significantly improve performance when using limited memory.
-- vchordrq_prewarm(index_name::regclass) to prewarm the index into the shared buffer
SELECT vchordrq_prewarm('gist_train_embedding_idx'::regclass)"
Index Build Time
Index building can parallelized, and with external centroid precomputation, the total time is primarily limited by disk speed. Optimize parallelism using the following settings:
-- Set this to the number of CPU cores available for parallel operations.
SET max_parallel_maintenance_workers = 8;
SET max_parallel_workers = 8;
-- Adjust the total number of worker processes.
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET max_worker_processes = 8;
Indexing Progress
You can check the indexing progress by querying the pg_stat_progress_create_index
view.
SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" FROM pg_stat_progress_create_index;
External Index Precomputation
Unlike pure SQL, an external index precomputation will first do clustering outside and insert centroids to a PostgreSQL table. Although it might be more complicated, external build is definitely much faster on larger dataset (>5M).
To get started, you need to do a clustering of vectors using faiss
, scikit-learn
or any other clustering library.
The centroids should be preset in a table of any name with 3 columns:
- id(integer): id of each centroid, should be unique
- parent(integer, nullable): parent id of each centroid, should be NULL for normal clustering
- vector(vector): representation of each centroid,
pgvector
vector type
And example could be like this:
-- Create table of centroids
CREATE TABLE public.centroids (id integer NOT NULL UNIQUE, parent integer, vector vector(768));
-- Insert centroids into it
INSERT INTO public.centroids (id, parent, vector) VALUES (1, NULL, '{0.1, 0.2, 0.3, ..., 0.768}');
INSERT INTO public.centroids (id, parent, vector) VALUES (2, NULL, '{0.4, 0.5, 0.6, ..., 0.768}');
INSERT INTO public.centroids (id, parent, vector) VALUES (3, NULL, '{0.7, 0.8, 0.9, ..., 0.768}');
-- ...
-- Create index using the centroid table
CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
[build.external]
table = 'public.centroids'
$$);
To simplify the workflow, we provide end-to-end scripts for external index pre-computation, see scripts.
Limitations
- Data Type Support: Currently, only the
f32
data type is supported for vectors. - Architecture Compatibility: The fast-scan kernel is optimized for x86_64 architectures. While it runs on aarch64, performance may be lower.
- KMeans Clustering: The built-in KMeans clustering is not yet fully optimized and may require substantial memory. We strongly recommend using external centroid precomputation for efficient index construction.