pg_pinyin

PostgreSQL 拼音转写与检索辅助扩展

概览

扩展包名版本分类许可证语言
pg_pinyin0.0.4FTSMITRust
ID扩展名BinLibLoadCreateTrustReloc模式
2190pg_pinyinpinyin
相关扩展zhparser pg_search pg_trgm pg_bigm pgroonga pgroonga_database pg_tokenizer fuzzystrmatch

optional tokenizer-input overload can integrate with pg_search; pgrx patched to 0.18.1.

版本

类型仓库版本PG 大版本包名依赖
EXTPIGSTY0.0.41817161514pg_pinyin-
RPMPIGSTY0.0.41817161514pg_pinyin_$v-
DEBPIGSTY0.0.41817161514postgresql-$v-pinyin-
OS / PGPG18PG17PG16PG15PG14
el8.x86_64
el8.aarch64
el9.x86_64
el9.aarch64
el10.x86_64
el10.aarch64
d12.x86_64
d12.aarch64
d13.x86_64
d13.aarch64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
u22.x86_64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
u22.aarch64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
u24.x86_64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
u24.aarch64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
u26.x86_64
u26.aarch64

构建

您可以使用 pig build 命令构建 pg_pinyin 扩展的 RPM / DEB 包:

pig build pkg pg_pinyin         # 构建 RPM / DEB 包

安装

您可以直接安装 pg_pinyin 扩展包的预置二进制包,首先确保 PGDGPIGSTY 仓库已经添加并启用:

pig repo add pgsql -u          # 添加仓库并更新缓存

使用 pig 或者是 apt/yum/dnf 安装扩展:

pig install pg_pinyin;          # 当前活跃 PG 版本安装
pig ext install -y pg_pinyin -v 18  # PG 18
pig ext install -y pg_pinyin -v 17  # PG 17
pig ext install -y pg_pinyin -v 16  # PG 16
pig ext install -y pg_pinyin -v 15  # PG 15
pig ext install -y pg_pinyin -v 14  # PG 14
dnf install -y pg_pinyin_18       # PG 18
dnf install -y pg_pinyin_17       # PG 17
dnf install -y pg_pinyin_16       # PG 16
dnf install -y pg_pinyin_15       # PG 15
dnf install -y pg_pinyin_14       # PG 14
apt install -y postgresql-18-pinyin   # PG 18
apt install -y postgresql-17-pinyin   # PG 17
apt install -y postgresql-16-pinyin   # PG 16
apt install -y postgresql-15-pinyin   # PG 15
apt install -y postgresql-14-pinyin   # PG 14

创建扩展

CREATE EXTENSION pg_pinyin;

用法

来源:pg_pinyin upstream READMEChinese READMElocal metadata

pg_pinyin 将中文文本转换为拼音,可以按字符转换,也可以按词转换。它适用于生成搜索列、trigram 搜索,以及需要拼音输入的 pg_search BM25 查询。

CREATE EXTENSION pg_pinyin;

函数

函数说明
pinyin_char_romanize(text)字符级拼音罗马化
pinyin_char_romanize(text, suffix text)使用自定义词典后缀进行字符级罗马化
pinyin_word_romanize(text)词级拼音罗马化
pinyin_word_romanize(text, suffix text)使用自定义词典后缀进行词级罗马化
pinyin_word_romanize(tokenizer_input anyelement)pg_search tokenizer 输入进行词级罗马化,例如 name::pdb.icu::text[]
pinyin_word_romanize(tokenizer_input anyelement, suffix text)使用自定义词典后缀处理 tokenizer 输入
pinyin_regex_phrase(text, slope integer DEFAULT NULL, max_expansions integer DEFAULT NULL, generated_pinyin boolean DEFAULT false)返回 pdb.querypg_search 查询辅助函数;仅在 CREATE EXTENSION pg_pinyin 前已启用 pg_search 时可用
pinyin_regex_phrase_patterns(text, generated_pinyin boolean DEFAULT false)内部辅助函数,以 text[] 返回正则短语 token

生成列 + Trigram 搜索

CREATE EXTENSION IF NOT EXISTS pg_pinyin;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE TABLE voice (
  id bigserial PRIMARY KEY,
  description text NOT NULL,
  pinyin text GENERATED ALWAYS AS (public.pinyin_char_romanize(description)) STORED
);

CREATE INDEX voice_pinyin_trgm_idx ON voice USING gin (pinyin gin_trgm_ops);

INSERT INTO voice (description) VALUES ('郑爽ABC');
SELECT id, description, pinyin FROM voice;

面向词的搜索使用 pinyin_word_romanize。当 pg_search 可用时,它可以消费 pdb.icu::text[] 这类 tokenizer 输入。

CREATE EXTENSION IF NOT EXISTS pg_search;
CREATE EXTENSION IF NOT EXISTS pg_pinyin;

CREATE TABLE voice (
  id bigserial PRIMARY KEY,
  description text NOT NULL,
  pinyin text GENERATED ALWAYS AS (public.pinyin_word_romanize(description)) STORED
);

CREATE INDEX voice_pinyin_bm25_idx
ON voice
USING bm25 (id, pinyin)
WITH (key_field='id');

SELECT *
FROM voice
WHERE pinyin @@@ public.pinyin_regex_phrase('zhengshuang');

SELECT public.pinyin_word_romanize('郑爽ABC'::pdb.icu::text[]);

pinyin_regex_phrase 的返回类型是 pdb.query,因此必须先在数据库中启用 pg_search,再创建 pg_pinyin。如果先创建 pg_pinyin,上游文档说明罗马化函数仍会安装,但 pinyin_regex_phrase 会以带有清晰异常信息的错误占位函数安装。

字典表

执行 CREATE EXTENSION pg_pinyin 时,扩展会在 pinyin schema 下初始化内置字典表;正常使用扩展时不需要额外加载数据。内置数据覆盖字符映射、词 token 和词映射。

可以在 pinyin schema 中提供带后缀的自定义词典表。调用时指定该后缀会把基础字典和后缀表合并,且后缀表条目优先。

CREATE TABLE IF NOT EXISTS pinyin.pinyin_mapping_suffix1 (
  character text PRIMARY KEY,
  pinyin text NOT NULL
);

CREATE TABLE IF NOT EXISTS pinyin.pinyin_words_suffix1 (
  word text PRIMARY KEY,
  pinyin text NOT NULL
);

INSERT INTO pinyin.pinyin_mapping_suffix1 (character, pinyin)
VALUES ('郑', '|zhengx|')
ON CONFLICT (character) DO UPDATE SET pinyin = EXCLUDED.pinyin;

INSERT INTO pinyin.pinyin_words_suffix1 (word, pinyin)
VALUES ('郑爽', '|zhengx| |shuangx|')
ON CONFLICT (word) DO UPDATE SET pinyin = EXCLUDED.pinyin;

SELECT public.pinyin_char_romanize('郑爽ABC', '_suffix1');
SELECT public.pinyin_word_romanize('郑爽ABC'::pdb.icu::text[], '_suffix1');

最后修改 2026-06-18: extension data update (63e2bd9)