bootstrap: OPTIONAL, make sure ansible is installed, and use offline package /tmp/pkg.tgz if applicable
configure: OPTIONAL, recommend & generate pigsty.yml config according to your env.
install.yml: REQUIRED, install Pigsty modules according to your config file.
It may take 5-20 minutes to complete the installation according to your network speed and hardware spec.
After that, you will get a pigsty singleton node ready, with Web service on port 80/443 and Postgres on port 5432.
BTW: If you feel Pigsty is too complicated, you can consider the Slim Installation, which only installs the necessary components for HA PostgreSQL clusters.
Example: Singleton Installation on RockyLinux 9.3:
Prepare
Check Preparation for a complete guide of resource preparation.
Pigsty support the Linux kernel and x86_64/aarch64 arch. It can run on any nodes: physical machine, bare metal, virtual machines, or VM-like containers, but a static IPv4 address is required.
The minimum spec is 1C1G. It is recommended to use bare metals or VMs with at least 2C4G. There’s no upper limit, and node param will be auto-tuned.
We recommend using fresh RockyLinux 8.10 / 9.4 or Ubuntu 22.04 as underlying operating systems.
For a complete list of supported operating systems, please refer to Compatibility.
Public key ssh access to localhost and NOPASSWD sudo privilege is required to perform the installation, Try not using the root user.
If you wish to manage more nodes, these nodes needs to be ssh / sudo accessible via your current admin node & admin user.
DO NOT use the root user
While it is possible to install Pigsty as the root user, It would be much safer using a dedicate admin user (dba, admin, …). due to security consideration
which has to be different from root and dbsu (postgres). Pigsty will create an optional admin user dba according to the config by default.
Pigsty relies on Ansible to execute playbooks. you have to install ansible and jmespath packages fist before install procedure.
This can be done with the following command, or through the bootstrap procedure, especially when you don’t have internet access.
The main branch may in an unstable development status.
Always checkout a version when using git, check Release Notes for available versions.
Configure
configure will create a pigsty.ymlconfig file according to your env. This procedure is OPTIONAL if you know how to configure pigsty manually.
./configure # interactive-wizard, ask for IP address./configure [-i|--ip <ipaddr>]# give primary IP & config mode[-c|--conf <conf>]# specify config template (relative to conf/ dir) [-r|--region <default|china|europe>]# choose upstream repo region[-n|--non-interactive]# skip interactive wizard[-x|--proxy]# write proxy env to config
Configure Example Output
$ ./configure
configure pigsty v3.0.4 begin
[ OK ]region= china
[ OK ]kernel= Linux
[ OK ]machine= x86_64
[ OK ]package= rpm,yum
[ OK ]vendor= centos (CentOS Linux)[ OK ]version=7(7)[ OK ]sudo= vagrant ok
[ OK ]ssh= vagrant@127.0.0.1 ok
[WARN] Multiple IP address candidates found:
(1) 192.168.121.110 inet 192.168.121.110/24 brd 192.168.121.255 scope global noprefixroute dynamic eth0
(2) 10.10.10.10 inet 10.10.10.10/24 brd 10.10.10.255 scope global noprefixroute eth1
[ OK ]primary_ip= 10.10.10.10 (from demo)[ OK ]admin= vagrant@10.10.10.10 ok
[WARN]mode= el7, CentOS 7.9 EOL @ 2024-06-30, deprecated, consider using el8 or el9 instead
[ OK ] configure pigsty doneproceed with ./install.yml
-c|--conf: Generate config from templates according to mode
-i|--ip: Replace IP address placeholder 10.10.10.10 with your primary ipv4 address of current node.
-r|--region: Set upstream repo mirror according to region (default|china|europe)
-n|--non-interactive: skip interactive wizard and using default/arg values
-x|--proxy: write current proxy env to the config proxy_env (http_proxy/HTTP_PROXY, HTTPS_PROXY, ALL_PROXY, NO_PROXY)
When -n|--non-interactive is specified, you have to specify a primary IP address with -i|--ip <ipaddr> in case of multiple IP address,
since there’s no default value for primary IP address in this case.
If your machine’s network interface have multiple IP addresses, you’ll need to explicitly specify a primary IP address for the current node using -i|--ip <ipaddr>, or provide it during interactive inquiry. The address should be a static IP address, and you should avoid using any public IP addresses.
You can check and modify the generated config file ~/pigsty/pigsty.yml before installation.
Change the default passwords!
PLEASE CHANGE THE DEFAULT PASSWORDs in the config file before installation, check secure password for details.
Install
Run the install.yml playbook to perform a full installation on current node
./install.yml # install everything in one-pass
Installation Output Example
[vagrant@meta pigsty]$ ./install.yml
PLAY [IDENTITY] ********************************************************************************************************************************
TASK [node_id : get node fact] *****************************************************************************************************************
changed: [10.10.10.10]...
...
PLAY RECAP **************************************************************************************************************************************************************************
10.10.10.10 : ok=288changed=215unreachable=0failed=0skipped=64rescued=0ignored=0localhost : ok=3changed=0unreachable=0failed=0skipped=4rescued=0ignored=0
It’s a standard ansible playbook, you can have fine-grained control with ansible options:
-l: limit execution targets
-t: limit execution tasks
-e: passing extra args
-i: use another config
…
DON'T EVER RUN THIS PLAYBOOK AGAIN!
It’s very DANGEROUS to re-run install.yml on existing deployment!
It may nuke your entire deployment!!! Only do this when you know what you are doing.
Otherwise, consider rm install.yml or chmod a-x install.yml to avoid accidental execution.
You can access these web UI directly via IP + port. While the common best practice would be access them through Nginx and distinguish via domain names. You’ll need configure DNS records, or use the local static records (/etc/hosts) for that.
How to access Pigsty Web UI by domain name?
There are several options:
Resolve internet domain names through a DNS service provider, suitable for systems accessible from the public internet.
Configure internal network DNS server resolution records for internal domain name resolution.
Modify the local machine’s /etc/hosts file to add static resolution records. (For Windows, it’s located at:)
We recommend the third method for common users. On the machine (which runs the browser), add the following record into /etc/hosts (sudo required) or C:\Windows\System32\drivers\etc\hosts in Windows:
You have to use the external IP address of the node here.
How to configure server side domain names?
The server-side domain name is configured with Nginx. If you want to replace the default domain name, simply enter the domain you wish to use in the parameter infra_portal. When you access the Grafana monitoring homepage via http://g.pigsty, it is actually accessed through the Nginx proxy to Grafana’s WebUI:
How to install pigsty without Internet access? How to make your own offline packages.
Pigsty’s Installation procedure requires Internet access, but production database servers are often isolated from the Internet.
To address this issue, Pigsty supports offline installation from offline packages, which can help you install Pigsty in an environment without Internet access, and increase the certainty, reliability, speed and consistency of the installation process.
Pigsty’s install procedure will download all the required rpm/deb packages and all its dependencies from the upstream yum/apt repo, and build a local repo before installing the software.
The repo is served by Nginx and is available to all nodes in the deployment environment, including itself. All the installation will go through this local repo without further internet access.
There are certain benefits to using a local repo:
It can avoid repetitive download requests and traffic consumption, significantly speeding up the installation and improving its reliability.
It will take a snapshot of current software versions, ensuring the consistency of the software versions installed across nodes in the environment.
The snapshot contains all the deps, so it can avoid upstream dependency changes that may cause installation failures. One successful node can ensure all nodes in the same env.
The built local software repo can be packaged as a whole tarball and copied to an isolated environment with the same operating system for offline installation.
The default location for local repo is /www/pigsty (customizable by nginx_home & repo_name).
The repo will be created by createrepo_c or dpkg_dev according to the OS distro, and referenced by all nodes in the environment through repo_upstream entry with module=local.
You can perform install on one node with the exact same OS version, then copy the local repo directory to another node with the same OS version for offline installation.
A more common practice is to package the local software repo directory into an offline package and copy it to the isolated node for installation.
Make Offline Pacakge
Pigsty offers a cache.yml playbook to make offline package.
For example, the following command will take the local software repo on the infra node /www/pigsty and package it into an offline package, and retrieve it to the local dist/${version} directory.
./cache.yml -l infra
You can customize the output directory and name of the offline package with the cache_pkg_dir and cache_pkg_name parameters.
For example, the following command will fetch the made offline package to files/pkg.tgz.
The simpler way is to copy the offline package to /tmp/pkg.tgz on the isolated node to be installed, and Pigsty will automatically unpack it during the bootstrap process and install from it.
When building the local software repo, Pigsty will generate a marker file repo_complete to mark it as a finished Pigsty local software repo.
When Pigsty install.yml playbook finds that the local software repo already exists, it will enter offline install mode.
In offline install mode, pigsty will no longer download software from the internet, but install from the local software repo directly.
Criteria for Offline Install Mode
The criteria for the existence of a local repo is the presence of a marker file located by default at /www/pigsty/repo_complete.
This marker file is automatically generated after the download is complete during the standard installation procedure, indicating a usable local software repo is done.
Deleting the repo_complete marker file of the local repo will mark the procedure for re-download missing packages from upstream.
Compatibility Notes
The software packages (rpm/deb) can be roughly divided into 3 categories:
INFRA Packages such as Prometheus & Grafana stacks and monitoring agents, which are OS distro/version independent.
PGSQL Packages such as pgsql kernel & extensions, which are optionally bound to the Linux distro major version.
NODE Packages such as so libs, utils, deps, which are bound to the Linux distro major & minor version.
Therefore, the compatibility of offline packages depends on the OS major version and minor version because it contains all three types of packages.
Usually, offline packages can be used in an environment with the exact same OS major/minor version.
If the major version does not match, INFRA packages can usually be installed successfully, while PGSQL and NODE packages may have missing or conflicting dependencies.
If the minor version does not match, INFRA and PGSQL packages can usually be installed successfully, while NODE packages have a chance of success and a chance of failure.
For example, offline packages made on RockLinux 8.9 may have a greater chance of success when offline installed on RockyLinux 8.10 environment.
While offline packages made on Ubuntu 22.04.3 is most likely to fail on Ubuntu 22.04.4.
(Yes the minor version here in Ubuntu is final .3 rather than .04, and the major version is 22.04|jammy)
If the OS minor version is not exactly matched, you can use a hybrid strategy to install, that is, after the bootstrap process,
remove the /www/pigsty/repo_complete marker file, so that Pigsty will re-download the missing NODE packages and related dependencies during the installation process.
Which can effectively solve the dependency conflict problem when using offline packages, and don’t lose the benefits of offline installation.
Download Pre-made Package
Pigsty does not offer pre-made offline packages for download starting from Pigsty v3.0.0.
It will use online installation by default, since it can download the exact NODE packages from the official repo to avoid dependency conflicts.
Besides, there are too much maintenance overhead to keep the offline packages for so many OS distros / major / minor version combinations.
BUT, we do offer pre-made offline packages for the following precise OS versions, which include Docker and pljava/jdbc_fw components, ready to use with a fair price.
RockyLinux 8.9
RockyLinux 9.3
Ubuntu 22.04.3
Debian 12.4
All the integration tests of Pigsty are based on the pre-made offline package snapshot before release,
Using these can effectively reduce the delivery risk caused by upstream dependency changes, save you the trouble and waiting time.
And show your support for the open-source cause, with a fair price of $99, please contact @Vonng (rh@vonng.com) to get the download link.
We use offline package to deliver our pro version, which is precisely matched to your specific OS distro major/minor version and has been tested after integration.
Offline Package
Therefore, Pigsty offers an offline installation feature, allowing you to complete the installation and deployment in an environment without internet access.
If you have internet access, downloading the pre-made Offline Package in advance can help speed up the installation process and enhance the certainty and reliability of the installation.
Pigsty will no longer provide offline software packages for download starting from v3.0.0
You can make your own with the bin/cache script after the standard installation process.
Pigsty Pro offers pre-made offline packages for various OS distros.
Bootstrap
Pigsty needs ansible to run the playbooks, so it is not suitable to install Ansible through playbooks.
The Bootstrap script is used to solve this problem: it will try its best to ensure that Ansible is installed on the node before the real installation.
./bootstrap # make suare ansible installed (if offline package available, setup & use offline install)
If you are using offline package, the Bootstrap script will automatically recognize and process the offline package located at /tmp/pkg.tgz, and install Ansible from it if applicable.
Otherwise, if you have internet access, Bootstrap will automatically add the upstrema yum/apt repo of the corresponding OS/region and install Ansible from it.
If neither internet nor offline package is available, Bootstrap will leave it to the user to handle this issue, and the user needs to ensure that the repo configured on the node contains a usable Ansible.
There are some optional parameters for the Bootstrap script, you can use the -p|--path parameter to specify a different offline package location other than /tmp/pkg.tgz. or designate a region with the -r|--region parameter:
./boostrap
[-r|--region <region][default,china,europe][-p|--path <path>] specify another offline pkg path
[-k|--keep] keep existing upstream repo during bootstrap
And bootstrap will automatically backup and remove the current repo (/etc/yum.repos.d/backup / /etc/apt/source.list.d/backup) of the node during the process to avoid software source conflicts. If this is not the behavior you expected, or you have already configured a local software repo, you can use the -k|--keep parameter to keep the existing software repo.
Example: Use offline package (EL8)
Bootstrap with offline package on a RockyLinux 8.9 node:
[vagrant@el8 pigsty]$ ls -alh /tmp/pkg.tgz
-rw-r--r--. 1 vagrant vagrant 1.4G Sep 1 10:20 /tmp/pkg.tgz
[vagrant@el8 pigsty]$ ./bootstrap
bootstrap pigsty v3.0.4 begin
[ OK ]region= china
[ OK ]kernel= Linux
[ OK ]machine= x86_64
[ OK ]package= rpm,dnf
[ OK ]vendor= rocky (Rocky Linux)[ OK ]version=8(8.9)[ OK ]sudo= vagrant ok
[ OK ]ssh= vagrant@127.0.0.1 ok
[ OK ]cache= /tmp/pkg.tgz exists
[ OK ]repo= extract from /tmp/pkg.tgz
[WARN] old repos= moved to /etc/yum.repos.d/backup
[ OK ] repo file= use /etc/yum.repos.d/pigsty-local.repo
[WARN] rpm cache= updating, may take a whilepigsty local8 - x86_64 49 MB/s | 1.3 MB 00:00
Metadata cache created.
[ OK ] repo cache= created
[ OK ] install el8 utils
........ yum install output
Installed:
createrepo_c-0.17.7-6.el8.x86_64 createrepo_c-libs-0.17.7-6.el8.x86_64 drpm-0.4.1-3.el8.x86_64 modulemd-tools-0.7-8.el8.noarch python3-createrepo_c-0.17.7-6.el8.x86_64
python3-libmodulemd-2.13.0-1.el8.x86_64 python3-pyyaml-3.12-12.el8.x86_64 sshpass-1.09-4.el8.x86_64 unzip-6.0-46.el8.x86_64
ansible-9.2.0-1.el8.noarch ansible-core-2.16.3-2.el8.x86_64 git-core-2.43.5-1.el8_10.x86_64 mpdecimal-2.5.1-3.el8.x86_64
python3-cffi-1.11.5-6.el8.x86_64 python3-cryptography-3.2.1-7.el8_9.x86_64 python3-jmespath-0.9.0-11.el8.noarch python3-pycparser-2.14-14.el8.noarch
python3.12-3.12.3-2.el8_10.x86_64 python3.12-cffi-1.16.0-2.el8.x86_64 python3.12-cryptography-41.0.7-1.el8.x86_64 python3.12-jmespath-1.0.1-1.el8.noarch
python3.12-libs-3.12.3-2.el8_10.x86_64 python3.12-pip-wheel-23.2.1-4.el8.noarch python3.12-ply-3.11-2.el8.noarch python3.12-pycparser-2.20-2.el8.noarch
python3.12-pyyaml-6.0.1-2.el8.x86_64
Complete!
[ OK ]ansible= ansible [core 2.16.3][ OK ] boostrap pigsty completeproceed with ./configure
Example: Bootstrap from Internet without offline Package (Debian 12)
On a debian 12 node with internet access, Pigsty add the upstream repo and install ansible and its dependencies using apt:
vagrant@d12:~/pigsty$ ./bootstrap
bootstrap pigsty v3.0.4 begin
[ OK ]region= china
[ OK ]kernel= Linux
[ OK ]machine= x86_64
[ OK ]package= deb,apt
[ OK ]vendor= debian (Debian GNU/Linux)[ OK ]version=12(12)[ OK ]sudo= vagrant ok
[ OK ]ssh= vagrant@127.0.0.1 ok
[WARN] old repos= moved to /etc/apt/backup
[ OK ] repo file= add debian bookworm china upstream
[WARN] apt cache= updating, may take a while....... apt install output
[ OK ]ansible= ansible [core 2.14.16][ OK ] boostrap pigsty completeproceed with ./configure
Example: Bootstrap from the Default (Ubuntu 20.04)
One an Ubuntu 20.04 node without internet access & offline package, Pigsty will assume you already have resolved this issue with your own way:
Such as a local software repo / mirror / CDROM / intranet repo, etc…
You can explicitly keep the current repo config with the -k parameter, or Pigsty will keep it by default if it detects no internet access and no offline package.
vagrant@ubuntu20:~/pigsty$ ./bootstrap -k
bootstrap pigsty v3.0.4 begin
[ OK ]region= china
[ OK ]kernel= Linux
[ OK ]machine= x86_64
[ OK ]package= deb,apt
[ OK ]vendor= ubuntu (Ubuntu)[ OK ]version=20(20.04)[ OK ]sudo= vagrant ok
[WARN] ubuntu 20 focal does not have corresponding offline package, use online install
[WARN]cache= missing and skip download
[WARN]repo= skip (/tmp/pkg.tgz not exists)[ OK ] repo file= add ubuntu focal china upstream
[WARN] apt cache= updating, make take a while...(apt update/install output)[ OK ]ansible= ansible 2.10.8
[ OK ] boostrap pigsty completeproceed with ./configure
1.3 - Minimal Install
How to perform minimal install with HA PostgreSQL related components only
Pigsty has an entire infrastructure stack as an enclosure of HA PostgreSQL clusters, BUT it is viable to install only the PostgreSQL components without the rest of the stack. This is called a minimal installation.
Overview
The minimal installation focus on Pure HA-PostgreSQL Cluster, and it only installs essential components for this purpose.
There’s NO Infra modules, No monitoring, No local repo Just partial of NODE module, along with ETCD & PGSQL modules
Systemd Service Installed in this mode:
patroni: REQUIRED, bootstrap HA PostgreSQL cluster
etcd: REQUIRED, DCS for patroni
pgbouncer: OPTIONAL, connection pooler for postgres
vip-manager: OPTIONAL, if you want to use a L2 VIP bind to primary
haproxy: OPTIONAL, if you wish to auto-routing service
chronyd: OPTIONAL, if you wish to sync time with NTP server
tuned: OPTIONAL, manage node template and kernel parameters
You can turn-off the optional components, the only two essential components are patroni and etcd.
Configure
To perform a minimal installation, you need to disable some switches in the pigsty.yml config file:
all:children:infra:{hosts:{10.10.10.10:{infra_seq: 1 } }, vars:{docker_enabled:true}}etcd:hosts:10.10.10.10:{etcd_seq:1}#10.10.10.11: { etcd_seq: 2 } # optional#10.10.10.12: { etcd_seq: 3 } # optionalvars:{etcd_cluster:etcd }pg-meta:hosts:10.10.10.10:{pg_seq: 1, pg_role:primary } # init one single-node pgsql cluster by default, with:#10.10.10.11: { pg_seq: 2, pg_role: replica } # optional replica : bin/pgsql-add pg-meta 10.10.10.11#10.10.10.12: { pg_seq: 3, pg_role: replica } # optional replica : bin/pgsql-add pg-meta 10.10.10.12vars:pg_cluster:pg-metapg_users: # define business users here:https://pigsty.io/docs/pgsql/user/- {name: dbuser_meta ,password: DBUser.Meta ,pgbouncer: true ,roles: [ dbrole_admin ] ,comment:pigsty default user }pg_databases: # define business databases here:https://pigsty.io/docs/pgsql/db/- {name: meta ,comment:pigsty default database }pg_hba_rules: # define HBA rules here:https://pigsty.io/docs/pgsql/hba/#define-hba- {user: dbuser_meta , db: all ,addr: world ,auth: pwd ,title:'allow default user world access with password (not a good idea!)'}node_crontab:# define backup policy with crontab (full|diff|incr)- '00 01 * * * postgres /pg/bin/pg-backup full'#pg_vip_address: 10.10.10.2/24 # optional l2 vip address and netmaskpg_extensions: # define pg extensions (340 available):https://pigsty.io/docs/pgext/- postgis timescaledb pgvectorvars:version:v3.0.4 # pigsty version stringadmin_ip:10.10.10.10# admin node ip addressregion: default # upstream mirror region:default|china|europenode_tune:tiny # use tiny template for NODE in demo environmentpg_conf:tiny.yml # use tiny template for PGSQL in demo environment# minimal installation setupnode_repo_modules:node,infra,pgsqlnginx_enabled:falsedns_enabled:falseprometheus_enabled:falsegrafana_enabled:falsepg_exporter_enabled:falsepgbouncer_exporter_enabled:falsepg_vip_enabled:false
How to perform slim install with HA PostgreSQL related components only
Pigsty has an entire infrastructure stack as an enclosure of HA PostgreSQL clusters, BUT it is viable to install only the PostgreSQL components without the rest of the stack. This is called a slim installation.
Overview
The minimal installation focus on Pure HA-PostgreSQL Cluster, and it only installs essential components for this purpose.
There’s NO Infra modules, No monitoring, No local repo Just partial of NODE module, along with ETCD & PGSQL modules
Systemd Service Installed in this mode:
patroni: REQUIRED, bootstrap HA PostgreSQL cluster
etcd: REQUIRED, DCS for patroni
pgbouncer: OPTIONAL, connection pooler for postgres
vip-manager: OPTIONAL, if you want to use a L2 VIP bind to primary
haproxy: OPTIONAL, if you wish to auto-routing service
chronyd: OPTIONAL, if you wish to sync time with NTP server
tuned: OPTIONAL, manage node template and kernel parameters
You can turn off the optional components, the only two essential components are patroni and etcd.
Configure
To perform a minimal installation, you need to disable some switches in the pigsty.yml config file:
all:children:# actually not usedinfra:{hosts:{10.10.10.10:{infra_seq:1}}}# etcd cluster for ha postgres, still required in minimal installationetcd:{hosts:{10.10.10.10:{etcd_seq: 1 } }, vars:{etcd_cluster:etcd } }# postgres cluster 'pg-meta' with 2 instancespg-meta:hosts:10.10.10.10:{pg_seq: 1, pg_role:primary }10.10.10.11:{pg_seq: 2, pg_role:replica }vars:pg_cluster:pg-metapg_databases:[{name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [ pigsty ] ,extensions:[{name:vector }] } ]pg_users:- {name: dbuser_meta ,password: DBUser.Meta ,pgbouncer: true ,roles: [ dbrole_admin ] ,comment:pigsty admin user }- {name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [ dbrole_readonly ] ,comment:read-only viewer for meta database }node_crontab:['00 01 * * * postgres /pg/bin/pg-backup full']# make a full backup every 1amvars:# global parametersversion:v3.1.0 # pigsty version stringadmin_ip:10.10.10.10# admin node ip addressregion: default # upstream mirror region:default,china,europenode_tune: oltp # node tuning specs:oltp,olap,tiny,critpg_conf: oltp.yml # pgsql tuning specs:{oltp,olap,tiny,crit}.yml# slim installation setupnginx_enabled:false# nginx not existsdns_enabled:false# dnsmasq not existsprometheus_enabled:false# prometheus not existsgrafana_enabled:false# grafana not existspg_exporter_enabled:false# disable pg_exporterpgbouncer_exporter_enabled:falsepg_vip_enabled:false#----------------------------------## Repo, Node, Packages#----------------------------------#node_repo_modules:node,infra,pgsql# use node_repo_modules instead of repo_modulesnode_repo_remove:true# remove existing node repo for node managed by pigsty?repo_packages:[# default packages to be downloaded (if `repo_packages` is not explicitly set)node-bootstrap, infra-package, infra-addons, node-package1, node-package2, pgsql-common#,docker]repo_extra_packages:[# default postgres packages to be downloadedpgsql-main#,pgsql-core,pgsql-time,pgsql-gis,pgsql-rag,pgsql-fts,pgsql-olap,pgsql-feat,pgsql-lang,pgsql-type,pgsql-func,pgsql-admin,pgsql-stat,pgsql-sec,pgsql-fdw,pgsql-sim,pgsql-etl,#,pg17-core,pg17-time,pg17-gis,pg17-rag,pg17-fts,pg17-olap,pg17-feat,pg17-lang,pg17-type,pg17-func,pg17-admin,pg17-stat,pg17-sec,pg17-fdw,pg17-sim,pg17-etl,#,pg16-core,pg16-time,pg16-gis,pg16-rag,pg16-fts,pg16-olap,pg16-feat,pg16-lang,pg16-type,pg16-func,pg16-admin,pg16-stat,pg16-sec,pg16-fdw,pg16-sim,pg16-etl,]
Describe database and infrastructure as code using declarative Configuration
Pigsty treats Infra & Database as Code. You can describe the infrastructure & database clusters through a declarative interface. All your essential work is to describe your need in the inventory, then materialize it with a simple idempotent playbook.
Inventory
Each pigsty deployment has a corresponding config inventory. It could be stored in a local git-managed file in YAML format or dynamically generated from CMDB or any ansible compatible format. Pigsty uses a monolith YAML config file as the default config inventory, which is pigsty.yml, located in the pigsty home directory.
The inventory consists of two parts: global vars & multiple group definitions. You can define new clusters with inventory groups: all.children. And describe infra and set global default parameters for clusters with global vars: all.vars. Which may look like this:
all: # Top-level object:allvars:{...} # Global Parameterschildren:# Group Definitionsinfra: # Group Definition:'infra'hosts:{...} # Group Membership:'infra'vars:{...} # Group Parameters:'infra'etcd:{...} # Group Definition:'etcd'pg-meta:{...} # Group Definition:'pg-meta'pg-test:{...} # Group Definition:'pg-test'redis-test:{...} # Group Definition:'redis-test'# ...
Each group may represent a cluster, which could be a Node cluster, PostgreSQL cluster, Redis cluster, Etcd cluster, or Minio cluster, etc… They all use the same format: group vars & hosts. You can define cluster members with all.children.<cls>.hosts and describe cluster with cluster parameters in all.children.<cls>.vars. Here is an example of 3 nodes PostgreSQL HA cluster named pg-test:
pg-test:# Group Namevars:# Group Vars (Cluster Parameters)pg_cluster:pg-testhosts:# Group Host (Cluster Membership)10.10.10.11:{pg_seq: 1, pg_role:primary }# Host110.10.10.12:{pg_seq: 2, pg_role:replica }# Host210.10.10.13:{pg_seq: 3, pg_role:offline }# Host3
You can also define parameters for a specific host, as known as host vars. It will override group vars and global vars. Which is usually used for assigning identities to nodes & database instances.
Parameter
Global vars, Group vars, and Host vars are dict objects consisting of a series of K-V pairs. Each pair is a named Parameter consisting of a string name as the key and a value of one of five types: boolean, string, number, array, or object. Check parameter reference for detailed syntax & semantics.
Every parameter has a proper default value except for mandatory IDENTITY PARAMETERS; they are used as identifiers and must be set explicitly, such as pg_cluster, pg_role, and pg_seq.
Parameters can be specified & overridden with the following precedence.
Playbook Args > Host Vars > Group Vars > Global Vars > Defaults
For examples:
Force removing existing databases with Playbook CLI Args -e pg_clean=true
Override an instance role with Instance Level Parameter pg_role on Host Vars
Override a cluster name with Cluster Level Parameter pg_cluster on Group Vars.
Specify global NTP servers with Global Parameter node_ntp_servers on Global Vars
If no pg_version is set, it will use the default value from role implementation (16 by default)
Template
There are numerous preset config templates for different scenarios under the conf/ directory.
During configure process, you can specify a template using the -c parameter.
Otherwise, the single-node installation config template will be automatically selected based on your OS distribution.
Details about this built-in configuration files can be found @ Configuration
Switch Config Inventory
To use a different config inventory, you can copy & paste the content into the pigsty.yml file in the home dir as needed.
You can also explicitly specify the config inventory file to use when executing Ansible playbooks by using the -i command-line parameter, for example:
./node.yml -i files/pigsty/rpmbuild.yml # use another file as config inventory, rather than the default pigsty.yml
If you want to modify the default config inventory filename, you can change the inventory parameter in the ansible.cfg file in the home dir to point to your own inventory file path.
This allows you to run the ansible-playbook command without explicitly specifying the -i parameter.
Pigsty allows you to use a database (CMDB) as a dynamic configuration source instead of a static configuration file. Pigsty provides three convenient scripts:
bin/inventory_load: Loads the content of the pigsty.yml into the local PostgreSQL database (meta.pigsty)
bin/inventory_cmdb: Switches the configuration source to the local PostgreSQL database (meta.pigsty)
bin/inventory_conf: Switches the configuration source to the local static configuration file pigsty.yml
Reference
Pigsty have 280+ parameters, check Parameter for details.
How to prepare the nodes, network, OS distros, admin user, ports, and permissions for Pigsty.
Node
Pigsty supports the Linux kernel and x86_64/aarch64 arch, applicable to any node.
A “node” refers to a resource that is SSH accessible and offers a bare OS environment, such as a physical machine, a virtual machine, or an OS container equipped with systemd and sshd.
Deploying Pigsty requires at least 1 node. The minimum spec requirement is 1C1G, but it is recommended to use at least 2C4G, with no upper limit: parameters will automatically optimize and adapt.
For demos, personal sites, devbox, or standalone monitoring infra, 1-2 nodes are recommended, while at least 3 nodes are suggested for an HA PostgreSQL cluster. For critical scenarios, 4-5 nodes are advisable.
Leverage IaC Tools for chores
Managing a large-scale prod env could be tedious and error-prone. We recommend using Infrastructure as Code (IaC) tools to address these issues.
You can use the Terraform and Vagrant templates provided by Pigsty,
to create the required node environment with just one command through IaC, provisioning network, OS image, admin user, privileges, etc…
Network
Pigsty requires nodes to use static IPv4 addresses, which means you should explicitly assign your nodes a specific fixed IP address rather than using DHCP-assigned addresses.
The IP address used by a node should be the primary IP address for internal network communications and will serve as the node’s unique identifier.
If you wish to use the optional Node VIP and PG VIP features, ensure all nodes are located within an L2 network.
Your firewall policy should ensure the required ports are open between nodes. For a detailed list of ports required by different modules, refer to Node: Ports.
Which Ports Should Be Exposed?
For beginners or those who are just trying it out, you can just open ports 5432 (PostgreSQL database) and 3000 (Grafana visualization interface) to the world.
For a serious prod env, you should only expose the necessary ports to the exterior, such as 80/443 for web services, open to the office network (or the entire Internet).
Exposing database service ports directly to the Internet is not advisable. If you need to do this, consider consulting Security Best Practices and proceed cautiously.
The method for exposing ports depends on your network implementation, such as security group policies, local iptables records, firewall configurations, etc.
Operating System
Pigsty supports various Linux OS. We recommend using RockyLinux 8.9 or Ubuntu 22.04.3 as the default OS for installing Pigsty.
Pigsty supports RHEL (7,8,9), Debian (11,12), Ubuntu (20,22), and many other compatible OS distros. Check Compatibility For a complete list of compatible OS distros.
When deploying on multiple nodes, we strongly recommend using the same version of the OS distro and the Linux kernel on all nodes.
We strongly recommend using a clean, minimally installed OS environment with en_US set as the primary language.
How to enable en_US locale?
To ensure the en_US locale is available when using other primary language:
Note: The PostgreSQL cluster deployed by Pigsty defaults to the C.UTF8 locale, but character set definitions use en_US to ensure the pg_trgm extension functions properly.
If you do not need this feature, you can configure the value of pg_lc_ctype to C.UTF8 to avoid this issue when en locale is missing.
Admin User
You’ll need an “admin user” on all nodes where Pigsty is meant to be deployed — an OS user with nopass ssh login and nopass sudo permissions.
On the nodes where Pigsty is installed, you need an “administrative user” who has nopass ssh login and nopass sudo permissions.
No password sudo is required to execute commands during the installation process, such as installing packages, configuring system settings, etc.
How to configure nopass sudo for admin user?
Assuming your admin username is vagrant, you can create a file in /etc/sudoers.d/vagrant and add the following content:
%vagrant ALL=(ALL) NOPASSWD: ALL
This will allow the vagrant user to execute all commands without a sudo password. If your username is not vagrant, replace vagrant in the above steps with your username.
Avoid using the root user
While it is possible to install Pigsty using the root user, we do not recommend it.
We recommend using a dedicated admin user, such as dba, different from the root user (root) and the database superuser (postgres).
There is a dedicated playbook subtask that can use an existing admin user (e.g., root) with ssh/sudo password input to create a dedicated admin user.
SSH Permission
In addition to nopass sudo privilege, Pigsty also requires the admin user to have nopass ssh login privilege (login via ssh key).
For single-host installations setup, this means the admin user on the local node should be able to log in to the host itself via ssh without a password.
If your Pigsty deployment involves multiple nodes, this means the admin user on the admin node should be able to log in to all nodes managed by Pigsty (including the local node) via ssh without a password, and execute sudo commands without a password as well.
During the configure procedure, if your current admin user does not have any SSH key, it will attempt to address this issue by generating a new id_rsa key pair and adding it to the local ~/.ssh/authorized_keys file to ensure local SSH login capability for the local admin user.
By default, Pigsty creates an admin user dba (uid=88) on all managed nodes. If you are already using this user, we recommend that you change the node_admin_username to a new username with a different uid, or disable it using the node_admin_enabled parameter.
How to configure nopass SSH login for admin user?
Assuming your admin username is vagrant, execute the following command as the vagrant user will generate a public/private key pair for login. If a key pair already exists, there is no need to generate a new one.
The generated public key is by default located at: /home/vagrant/.ssh/id_rsa.pub, and the private key at: /home/vagrant/.ssh/id_rsa. If your OS username is not vagrant, replace vagrant in the above commands with your username.
You should append the public key file (id_rsa.pub) to the authorized_keys file of the user you need to log into: /home/vagrant/.ssh/authorized_keys. If you already have password access to the remote machine, you can use ssh-copy-id to copy the public key:
ssh-copy-id <ip> # Enter password to complete public key copyingsshpass -p <password> ssh-copy-id <ip> # Or: you can embed the password directly in the command to avoid interactive password entry (cautious!)
Pigsty recommends provisioning the admin user during node provisioning and making it viable by default.
SSH Accessibility
If your environment has some restrictions on SSH access, such as a bastion server or ad hoc firewall rules that prevent simple SSH access via ssh <ip>, consider using SSH aliases.
For example, if there’s a node with IP 10.10.10.10 that can not be accessed directly via ssh but can be accessed via an ssh alias meta defined in ~/.ssh/config,
then you can configure the ansible_host parameter for that node in the inventory to specify the SSH Alias on the host level:
nodes:hosts:# 10.10.10.10 can not be accessed directly via ssh, but can be accessed via ssh alias 'meta'10.10.10.10:{ansible_host:meta }
If the ssh alias does not meet your requirement, there are a plethora of custom ssh connection parameters that can bring fine-grained control over SSH connection behavior.
If the following cmd can be successfully executed on the admin node by the admin user, it means that the target node’s admin user is properly configured.
ssh <ip|alias> 'sudo ls'
Software
On the admin node, Pigsty requires ansible to initiate control.
If you are using the singleton meta installation, Ansible is required on this node. It is not required for common nodes.
The bootstrap procedure will make every effort to do this for you.
But you can always choose to install Ansible manually. The process of manually installing Ansible varies with different OS distros / major versions (usually involving an additional weak dependency jmespath):
sudo dnf install -y ansible python3.12-jmespath
sudo yum install -y ansible # EL7 does not need to install jmespath explicitly
sudo apt install -y ansible python3-jmespath
brew install ansible
To install Pigsty, you also need to prepare the Pigsty source package. You can directly download a specific version from the GitHub Release page or use the following command to obtain the latest stable version:
curl -fsSL https://repo.pigsty.io/get | bash
If your env does not have Internet access, consider using the offline packages, which are pre-packed for different OS distros, and can be downloaded from the GitHub Release page.
1.7 - Playbooks
Pigsty implement module controller with ansible idempotent playbooks, here are some necessary info you need to learn about it.
Playbooks are used in Pigsty to install modules on nodes.
To run playbooks, just treat them as executables. e.g. run with ./install.yml.
Note that there’s a circular dependency between NODE and INFRA:
to register a NODE to INFRA, the INFRA should already exist, while the INFRA module relies on NODE to work.
The solution is that INFRA playbook will also install NODE module in addition to INFRA on infra nodes.
Make sure that infra nodes are init first. If you really want to init all nodes including infra in one-pass, install.yml is the way to go.
Ansible
Playbooks require ansible-playbook executable to run, playbooks which is included in ansible rpm / deb package.
Pigsty will try it’s best to install ansible on admin node during bootstrap.
You can install it by yourself with yum|apt|brewinstall ansible, it is included in default OS repo.
Knowledge about ansible is good but not required. Only four parameters needs your attention:
-l|--limit <pattern> : Limit execution target on specific group/host/pattern (Where)
-t|--tags <tags>: Only run tasks with specific tags (What)
-e|--extra-vars <vars>: Extra command line arguments (How)
-i|--inventory <path>: Using another inventory file (Conf)
Designate Inventory
To use a different config inventory, you can copy & paste the content into the pigsty.yml file in the home dir as needed.
The active inventory file can be specified with the -i|--inventory <path> parameter when running Ansible playbooks.
./node.yml -i files/pigsty/rpmbuild.yml # use another file as config inventory, rather than the default pigsty.yml./pgsql.yml -i files/pigsty/rpm.yml # install pgsql module on machine define in files/pigsty/rpm.yml./redis.yml -i files/pigsty/redis.yml # install redis module on machine define in files/pigsty/redis.yml
If you wish to permanently modify the default config inventory filename, you can change the inventory parameter in the ansible.cfg
Limit Host
The target of playbook can be limited with -l|-limit <selector>.
Missing this value could be dangerous since most playbooks will execute on all host, DO USE WITH CAUTION.
Here are some examples of host limit:
./pgsql.yml # run on all hosts (very dangerous!)./pgsql.yml -l pg-test # run on pg-test cluster./pgsql.yml -l 10.10.10.10 # run on single host 10.10.10.10./pgsql.yml -l pg-* # run on host/group matching glob pattern `pg-*`./pgsql.yml -l '10.10.10.11,&pg-test'# run on 10.10.10.10 of group pg-test/pgsql-rm.yml -l 'pg-test,!10.10.10.11'# run on pg-test, except 10.10.10.11./pgsql.yml -l pg-test # Execute the pgsql playbook against the hosts in the pg-test cluster
Limit Tags
You can execute a subset of playbook with -t|--tags <tags>.
You can specify multiple tags in comma separated list, e.g. -t tag1,tag2.
If specified, tasks with given tags will be executed instead of entire playbook.
Here are some examples of task limit:
./pgsql.yml -t pg_clean # cleanup existing postgres if necessary./pgsql.yml -t pg_dbsu # setup os user sudo for postgres dbsu./pgsql.yml -t pg_install # install postgres packages & extensions./pgsql.yml -t pg_dir # create postgres directories and setup fhs./pgsql.yml -t pg_util # copy utils scripts, setup alias and env./pgsql.yml -t patroni # bootstrap postgres with patroni./pgsql.yml -t pg_user # provision postgres business users./pgsql.yml -t pg_db # provision postgres business databases./pgsql.yml -t pg_backup # init pgbackrest repo & basebackup./pgsql.yml -t pgbouncer # deploy a pgbouncer sidecar with postgres./pgsql.yml -t pg_vip # bind vip to pgsql primary with vip-manager./pgsql.yml -t pg_dns # register dns name to infra dnsmasq./pgsql.yml -t pg_service # expose pgsql service with haproxy./pgsql.yml -t pg_exporter # expose pgsql service with haproxy./pgsql.yml -t pg_register # register postgres to pigsty infrastructure# run multiple tasks: reload postgres & pgbouncer hba rules./pgsql.yml -t pg_hba,pg_reload,pgbouncer_hba,pgbouncer_reload
# run multiple tasks: refresh haproxy config & reload it./node.yml -t haproxy_config,haproxy_reload
Extra Vars
Extra command-line args can be passing via -e|-extra-vars KEY=VALUE.
It has the highest precedence over all other definition.
Here are some examples of extra vars
./node.yml -e ansible_user=admin -k -K # run playbook as another user (with admin sudo password)./pgsql.yml -e pg_clean=true# force purging existing postgres when init a pgsql instance./pgsql-rm.yml -e pg_uninstall=true# explicitly uninstall rpm after postgres instance is removed./redis.yml -l 10.10.10.10 -e redis_port=6379 -t redis # init a specific redis instance: 10.10.10.11:6379./redis-rm.yml -l 10.10.10.13 -e redis_port=6379# remove a specific redis instance: 10.10.10.11:6379
You can also passing complex parameters like array and object via JSON:
# install duckdb packages on node with specified upstream repo module./node.yml -t node_repo,node_pkg -e '{"node_repo_modules":"infra","node_default_packages":["duckdb"]}'
Most playbooks are idempotent, meaning that some deployment playbooks may erase existing databases and create new ones without the protection option turned on.
Please read the documentation carefully, proofread the commands several times, and operate with caution. The author is not responsible for any loss of databases due to misuse.
1.8 - Provisioning
Introduce the 4 node sandbox environment. and provision VMs with vagrant & terraform
Pigsty runs on nodes, which are Bare Metals or Virtual Machines. You can prepare them manually, or using terraform & vagrant for provisioning.
Sandbox
Pigsty has a sandbox, which is a 4-node deployment with fixed IP addresses and other identifiers.
Check full.yml for details.
The sandbox consists of 4 nodes with fixed IP addresses: 10.10.10.10, 10.10.10.11, 10.10.10.12, 10.10.10.13.
There’s a primary singleton PostgreSQL cluster: pg-meta on the meta node, which can be used alone if you don’t care about PostgreSQL high availability.
meta 10.10.10.10 pg-meta pg-meta-1
There are 3 additional nodes in the sandbox, form a 3-instance PostgreSQL HA cluster pg-test.
node-1 10.10.10.11 pg-test.pg-test-1
node-2 10.10.10.12 pg-test.pg-test-2
node-3 10.10.10.13 pg-test.pg-test-3
Two optional L2 VIP are bind on primary instances of cluster pg-meta and pg-test:
10.10.10.2 pg-meta
10.10.10.3 pg-test
There’s also a 1-instance etcd cluster, and 1-instance minio cluster on the meta node, too.
You can run sandbox on local VMs or cloud VMs. Pigsty offers a local sandbox based on Vagrant (pulling up local VMs using Virtualbox or libvirt), and a cloud sandbox based on Terraform (creating VMs using the cloud vendor API).
Local sandbox can be run on your Mac/PC for free. Your Mac/PC should have at least 4C/8G to run the full 4-node sandbox.
Cloud sandbox can be easily created and shared. You will have to create a cloud account for that. VMs are created on-demand and can be destroyed with one command, which is also very cheap for a quick glance.
Vagrant will use VirtualBox as the default VM provider.
however libvirt, docker, parallel desktop and vmware can also be used. We will use VirtualBox in this guide.
Installation
Make sure Vagrant and Virtualbox are installed and available on your OS.
If you are using macOS, You can use homebrew to install both of them with one command (reboot required). You can also use vagrant-libvirt on Linux.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"brew install vagrant virtualbox ansible # Run on MacOS with one command, but only works on x86_64 Intel chips
Configuration
vagarnt/Vagranfile is a ruby script file describing VM nodes. Here are some default specs of Pigsty.
You can switch specs with the vagrant/switch script, it will render the final Vagrantfile according to the spec.
cd ~/pigsty
vagrant/switch <spec>
vagrant/switch meta # singleton meta | alias: `make v1`vagrant/switch full # 4-node sandbox | alias: `make v4`vagrant/switch el7 # 3-node el7 test | alias: `make v7`vagrant/switch el8 # 3-node el8 test | alias: `make v8`vagrant/switch el9 # 3-node el9 test | alias: `make v9`vagrant/switch prod # prod simulation | alias: `make vp`vagrant/switch build # building environment | alias: `make vd`vagrant/switch minio # 3-node minio envvagrant/switch check # 30-node check env
Management
After describing the VM nodes with specs and generate the vagrant/Vagrantfile. you can create the VMs with vagrant up command.
Pigsty templates will use your ~/.ssh/id_rsa[.pub] as the default ssh key for vagrant provisioning.
Make sure you have a valid ssh key pair before you start, you can generate one by: ssh-keygen -t rsa -b 2048
There are some makefile shortcuts that wrap the vagrant commands, you can use them to manage the VMs.
make # = make startmake new # destroy existing vm and create new onesmake ssh # write VM ssh config to ~/.ssh/ (required)make dns # write VM DNS records to /etc/hosts (optional)make start # launch VMs and write ssh config (up + ssh) make up # launch VMs with vagrant upmake halt # shutdown VMs (down,dw)make clean # destroy VMs (clean/del/destroy)make status # show VM status (st)make pause # pause VMs (suspend,pause)make resume # pause VMs (resume)make nuke # destroy all vm & volumes with virsh (if using libvirt)
make meta install # create and install pigsty on 1-node singleton metamake full install # create and install pigsty on 4-node sandboxmake prod install # create and install pigsty on 42-node KVM libvirt environmentmake check install # create and install pigsty on 30-node testing & validating environment...
Terraform
Terraform is an open-source tool to practice ‘Infra as Code’. Describe the cloud resource you want and create them with one command.
Pigsty has terraform templates for AWS, Aliyun, and Tencent Cloud, you can use them to create VMs on the cloud for Pigsty Demo.
Terraform can be easily installed with homebrew, too: brew install terraform. You will have to create a cloud account to obtain AccessKey and AccessSecret credentials to proceed.
The terraform/ dir have two example templates: one for AWS, and one for Aliyun, you can adjust them to fit your need, or modify them if you are using a different cloud vendor.
Take Aliyun as example:
cd terraform # goto the terraform dircp spec/aliyun.tf terraform.tf # use aliyun template
You have to perform terraform init before terraform apply:
terraform init # install terraform provider: aliyun (required only for the first time)terraform apply # generate execution plans: create VMs, virtual segments/switches/security groups
After running apply and answering yes to the prompt, Terraform will create the VMs and configure the network for you.
The admin node ip address will be printed out at the end of the execution, you can ssh login and start pigsty installation.
1.9 - Security
Security considerations and best-practices in Pigsty
Pigsty already provides a secure-by-default authentication and access control model, which is sufficient for most scenarios.
But if you want to further strengthen the security of the system, the following suggestions are for your reference:
Confidentiality
Important Files
Secure your pigsty config inventory
pigsty.yml has highly sensitive information, including passwords, certificates, and keys.
You should limit access to admin/infra nodes, only accessible by the admin/dba users
Limit access to the git repo, if you are using git to manage your pigsty source.
Secure your CA private key and other certs
These files are very important, and will be generated under files/pki under pigsty source dir by default.
You should secure & backup them in a safe place periodically.
Passwords
Always change these passwords, DO NOT USE THE DEFAULT VALUES:
Frequently asked questions about download, setup, configuration, and installation in Pigsty.
If you have any unlisted questions or suggestions, please create an Issue or ask the community for help.
How to Get the Pigsty Source Package?
Use the following command to install Pigsty with one click: curl -fsSL https://repo.pigsty.io/get | bash
This command will automatically download the latest stable version pigsty.tgz and extract it to the ~/pigsty directory. You can also manually download a specific version of the Pigsty source code from the following locations.
If you need to install it in an environment without internet access, you can download it in advance in a networked environment and transfer it to the production server via scp/sftp or CDROM/USB.
How to Speed Up RPM Downloads from Upstream Repositories?
Consider using a local repository mirror, which can be configured with the repo_upstream parameter. You can choose region to use different mirror sites.
For example, you can set region = china, which will use the URL with the key china in the baseurl instead of default.
If some repositories are blocked by a firewall or the GFW, consider using proxy_env to bypass it.
How to resolve node package conflict?
Beware that Pigsty’s pre-built offline packages are tailored for specific minor versions OS Distors.
Therefore, if the major.minor version of your OS distro does not precisely align, we advise against using the offline installation packages.
Instead, following the default installation procedure and download the package directly from upstream repo through the Internet, which will acquire the versions that exactly match your OS version.
If online installation doesn’t work for you, you can first try modifying the upstream software sources used by Pigsty.
For example, in EL family operating systems, Pigsty’s default upstream sources use a major version placeholder $releasever, which resolves to specific major versions like 7, 8, 9.
However, many operating system distributions offer a Vault, allowing you to use a package mirror for a specific version.
Therefore, you could replace the front part of the repo_upstream parameter’s BaseURL with a specific Vault minor version repository, such as:
https://dl.rockylinux.org/pub/rocky/$releasever (Original BaseURL prefix, without vault)
https://vault.centos.org/7.6.1810/ (Using 7.6 instead of the default 7.9)
https://dl.rockylinux.org/vault/rocky/8.6/ (Using 8.6 instead of the default 8.9)
https://dl.rockylinux.org/vault/rocky/9.2/ (Using 9.2 instead of the default 9.3)
Make sure the vault URL path exists & valid before replacing the old values. Beware that some repo like epel do not offer specific minor version subdirs.
Upstream repo that support this approach include: base, updates, extras, centos-sclo, centos-sclo-rh, baseos, appstream, extras, crb, powertools, pgdg-common, pgdg1*
After explicitly defining and overriding the repo_upstream in the Pigsty configuration file, (you may clear the /www/pigsty/repo_complete flag) try the installation again.
If the upstream software source and the mirror source software do not solve the problem, you might consider replacing them with the operating system’s built-in software sources and attempt a direct installation from upstream once more.
Finally, if the above methods do not resolve the issue, consider removing conflicting packages from node_packages, infra_packages, pg_packages, pg_extensions, or remove or upgrade the conflicting packages on the existing system.
What does bootstrap do?
Check the environment, ask for downloading offline packages, and make sure the essential tool ansible is installed.
It will make sure the essential tool ansible is installed by various means.
When you download the Pigsty source code, you can enter the directory and execute the bootstrap script.
It will check if your node environment is ready, and if it does not find offline packages, it will ask if you want to download them from the internet if applicable.
You can choose y to use offline packages, which will make the installation procedure faster.
You can also choose n to skip and download directly from the internet during the installation process,
which will download the latest software versions and reduce the chance of RPM conflicts.
What does configure do?
Detect the environment, generate the configuration, enable the offline package (optional), and install the essential tool Ansible.
After downloading the Pigsty source package and unpacking it, you may have to execute ./configure to complete the environment configuration. This is optional if you already know how to configure Pigsty properly.
The configure procedure will detect your node environment and generate a pigsty config file: pigsty.yml for you.
What is the Pigsty config file?
pigsty.yml under the pigsty home dir is the default config file.
Pigsty uses a single config file pigsty.yml, to describe the entire environment, and you can define everything there. There are many config examples in files/pigsty for your reference.
You can pass the -i <path> to playbooks to use other configuration files. For example, you want to install redis according to another config: redis.yml:
./redis.yml -i files/pigsty/redis.yml
How to use the CMDB as config inventory
The default config file path is specified in ansible.cfg: inventory = pigsty.yml
You can switch to a dynamic CMDB inventory with bin/inventory_cmdb, and switch back to the local config file with bin/inventory_conf. You must also load the current config file inventory to CMDB with bin/inventory_load.
If CMDB is used, you must edit the inventory config from the database rather than the config file.
What is the IP address placeholder in the config file?
Pigsty uses 10.10.10.10 as a placeholder for the current node IP, which will be replaced with the primary IP of the current node during the configuration.
When the configure detects multiple NICs with multiple IPs on the current node, the config wizard will prompt for the primary IP to be used, i.e., the IP used by the user to access the node from the internal network. Note that please do not use the public IP.
This IP will be used to replace 10.10.10.10 in the config file template.
Which parameters need your attention?
Usually, in a singleton installation, there is no need to make any adjustments to the config files.
Pigsty provides 265 config parameters to customize the entire infra/node/etcd/minio/pgsql. However, there are a few parameters that can be adjusted in advance if needed:
When accessing web service components, the domain name is infra_portal (some services can only be accessed using the domain name through the Nginx proxy).
Pigsty assumes that a /data dir exists to hold all data; you can adjust these paths if the data disk mount point differs from this.
Don’t forget to change those passwords in the config file for your production deployment.
Installation
What was executed during installation?
When running make install, the ansible-playbook install.yml will be invoked to install everything on all nodes
Which will:
Install INFRA module on the current node.
Install NODE module on the current node.
Install ETCD module on the current node.
The MinIO module is optional, and will not be installed by default.
Install PGSQL module on the current node.
How to resolve RPM conflict?
There may have a slight chance that rpm conflict occurs during node/infra/pgsql packages installation.
The simplest way to resolve this is to install without offline packages, which will download directly from the upstream repo.
If there are only a few problematic RPM/DEB pakages, you can use a trick to fix the yum/apt repo quickly:
rm -rf /www/pigsty/repo_complete # delete the repo_complete flag file to mark this repo incompleterm -rf SomeBrokenPackages # delete problematic RPM/DEB packages./infra.yml -t repo_upstream # write upstream repos. you can also use /etc/yum.repos.d/backup/*./infra.yml -t repo_pkg # download rpms according to your current OS
How to create local VMs with vagrant
The first time you use Vagrant to pull up a particular OS repo, it will download the corresponding BOX.
Pigsty sandbox uses generic/rocky9 image box by default, and Vagrant will download the rocky/9 box for the first time the VM is started.
Using a proxy may increase the download speed. Box only needs to be downloaded once, and will be reused when recreating the sandbox.
RPMs error on Aliyun CentOS 7.9
Aliyun CentOS 7.9 server has DNS caching service nscd installed by default. Just remove it.
Aliyun’s CentOS 7.9 repo has nscd installed by default, locking out the glibc version, which can cause RPM dependency errors during installation.
Run various business software & apps with docker-compose templates.
Run demos & data apps, analyze data, and visualize them with ECharts panels.
Battery-Included RDS
Run production-grade RDS for PostgreSQL on your own machine in 10 minutes!
While PostgreSQL shines as a database kernel, it excels as a Relational Database Service (RDS) with Pigsty’s touch.
Pigsty is compatible with PostgreSQL 12-16 and runs seamlessly on EL 7, 8, 9, Debian 11/12, Ubuntu 20/22/24 and similar OS distributions.
It integrates the kernel with a rich set of extensions, provides all the essentials for a production-ready RDS, an entire set of infrastructure runtime coupled with fully automated deployment playbooks.
With everything bundled for offline installation without internet connectivity.
You can transit from a fresh node to a production-ready state effortlessly, deploy a top-tier PostgreSQL RDS service in a mere 10 minutes.
Pigsty will tune parameters to your hardware, handling everything from kernel, extensions, pooling, load balancing, high-availability, monitoring & logging, backups & PITR, security and more!
All you need to do is run the command and connect with the given URL.
Plentiful Extensions
Harness the might of the most advanced Open-Source RDBMS or the world!
PostgreSQL’s has an unique extension ecosystem. Pigsty seamlessly integrates these powerful extensions, delivering turnkey distributed solutions for time-series, geospatial, and vector capabilities.
Pigsty boasts over 340 PostgreSQL extensions, and maintaining some not found in official PGDG repositories. Rigorous testing ensures flawless integration for core extensions:
Leverage PostGIS for geospatial data, TimescaleDB for time-series analysis, Citus for horizontal scale out,
PGVector for AI embeddings, Apache AGE for graph data, ParadeDB for Full-Text Search,
and Hydra, DuckdbFDW, pg_analytics for OLAP workloads!
You can also run self-hosted Supabase & PostgresML with Pigsty managed HA PostgreSQL.
If you want to add your own extension, feel free to suggest or compile it by yourself.
All functionality is abstracted as Modules that can be freely composed for different scenarios.
INFRA gives you a modern observability stack, while NODE can be used for host monitoring.
Installing the PGSQL module on multiple nodes will automatically form a HA cluster.
And you can also have dedicated ETCD clusters for distributed consensus & MinIO clusters for backup storage.
REDIS are also supported since they work well with PostgreSQL.
You can reuse Pigsty infra and extend it with your Modules (e.g. GPSQL, KAFKA, MONGO, MYSQL…).
Moreover, Pigsty’s INFRA module can be used alone — ideal for monitoring hosts, databases, or cloud RDS.
Stunning Observability
Unparalleled monitoring system based on modern observability stack and open-source best-practice!
Pigsty will automatically monitor any newly deployed components such as Node, Docker, HAProxy, Postgres, Patroni, Pgbouncer, Redis, Minio, and itself. There are 30+ default dashboards and pre-configured alerting rules, which will upgrade your system’s observability to a whole new level. Of course, it can be used as your application monitoring infrastructure too.
There are over 3K+ metrics that describe every aspect of your environment, from the topmost overview dashboard to a detailed table/index/func/seq. As a result, you can have complete insight into the past, present, and future.
Pigsty has pre-configured HA & PITR for PostgreSQL to ensure your database service is always reliable.
Hardware failures are covered by self-healing HA architecture powered by patroni, etcd, and haproxy, which will perform auto failover in case of leader failure (RTO < 30s), and there will be no data loss (RPO = 0) in sync mode. Moreover, with the self-healing traffic control proxy, the client may not even notice a switchover/replica failure.
Software Failures, human errors, and Data Center Failures are covered with Cold backups & PITR, which are implemented with pgBackRest. It allows you to travel time to any point in your database’s history as long as your storage is capable. You can store them in the local backup disk, built-in MinIO cluster, or S3 service.
Large organizations have used Pigsty for several years. One of the largest deployments has 25K CPU cores and 333 massive PostgreSQL instances. In the past three years, there have been dozens of hardware failures & incidents, but the overall availability remains several nines (99.999% +).
Great Maintainability
Infra as Code, Database as Code, Declarative API & Idempotent Playbooks, GitOPS works like a charm.
Pigsty provides a declarative interface: Describe everything in a config file, and Pigsty operates it to the desired state. It works like Kubernetes CRDs & Operators but for databases and infrastructures on any nodes: bare metal or virtual machines.
To create cluster/database/user/extension, expose services, or add replicas. All you need to do is to modify the cluster definition and run the idempotent playbook. Databases & Nodes are tuned automatically according to their hardware specs, and monitoring & alerting is pre-configured. As a result, database administration becomes much more manageable.
Pigsty has a full-featured sandbox powered by Vagrant, a pre-configured one or 4-node environment for testing & demonstration purposes. You can also provision required IaaS resources from cloud vendors with Terraform templates.
Sound Security
Nothing needs to be worried about database security, as long as your hardware & credentials are safe.
Pigsty use SSL for API & network traffic, Encryption for password & backups, HBA rules for host & clients, and access control for users & objects.
Pigsty has an easy-to-use, fine-grained, and fully customizable access control framework based on roles, privileges, and HBA rules. It has four default roles: read-only, read-write, admin (DDL), offline (ETL), and four default users: dbsu, replicator, monitor, and admin. Newly created database objects will have proper default privileges for those roles. And client access is restricted by a set of HBA rules that follows the least privilege principle.
Your entire network communication can be secured with SSL. Pigsty will automatically create a self-signed CA and issue certs for that. Database credentials are encrypted with the scram-sha-256 algorithm, and cold backups are encrypted with the AES-256 algorithm when using MinIO/S3. Admin Pages and dangerous APIs are protected with HTTPS, and access is restricted from specific admin/infra nodes.
Versatile Application
Lots of applications work well with PostgreSQL. Run them in one command with docker.
The database is usually the most tricky part of most software. Since Pigsty already provides the RDS. It could be nice to have a series of docker templates to run software in stateless mode and persist their data with Pigsty-managed HA PostgreSQL (or Redis, MinIO), including Gitlab, Gitea, Wiki.js, NocoDB, Odoo, Jira, Confluence, Harbour, Mastodon, Discourse, and KeyCloak.
Pigsty also provides a toolset to help you manage your database and build data applications in a low-code fashion: PGAdmin4, PGWeb, ByteBase, PostgREST, Kong, and higher “Database” that use Postgres as underlying storage, such as EdgeDB, FerretDB, and Supabase. And since you already have Grafana & Postgres, You can quickly make an interactive data application demo with them. In addition, advanced visualization can be achieved with the built-in ECharts panel.
Open Source & Free
Pigsty is a free & open source software under AGPLv3. It was built for PostgreSQL with love.
Pigsty allows you to run production-grade RDS on your hardware without suffering from human resources. As a result, you can achieve the same or even better reliability & performance & maintainability with only 5% ~ 40% cost compared to Cloud RDS PG. As a result, you may have an RDS with a lower price even than ECS.
There will be no vendor lock-in, annoying license fee, and node/CPU/core limit. You can have as many RDS as possible and run them as long as possible. All your data belongs to you and is under your control.
Pigsty is free software under AGPLv3. It’s free of charge, but beware that freedom is not free, so use it at your own risk! It’s not very difficult, and we are glad to help. For those enterprise users who seek professional consulting services, we do have a subscription for that.
2.2 - Modules
This section lists the available feature modules within Pigsty, and future planning modules
Basic Modules
Pigsty offers four PRIMARY modules, which are essential for providing PostgreSQL service:
PGSQL : Autonomous PostgreSQL cluster with HA, PITR, IaC, SOP, monitoring, and 335 extensions!.
INFRA : Local software repository, Prometheus, Grafana, Loki, AlertManager, PushGateway, Blackbox Exporter, etc.
NODE : Adjusts the node to the desired state, name, time zone, NTP, ssh, sudo, haproxy, docker, promtail, keepalived.
ETCD : Distributed key-value store, serving as the DCS (Distributed Consensus System) for the highly available Postgres cluster: consensus leadership election, configuration management, service discovery.
Kernel Modules
Pigsty allow using four different PostgreSQL KERNEL modules, as an optional in-place replacement:
MSSQL: Microsoft SQL Server Wire Protocol Compatible kernel powered by AWS, WiltonDB & Babelfish.
IVORY: Oracle Compatible kernel powered by the IvorySQL project supported by HiGo
POLAR: Oracle RAC / CloudNative kernel powered by Alibaba PolarDB for PostgreSQL
CITUS: Distributive PostgreSQL (also known as Azure Hyperscale) as an extension, with native Patroni HA support.
We also have an ORACLE module powered by PolarDB-O, a commercial kernel from Aliyun, Pro version only.
Extended Modules
Pigsty offers four OPTIONAL modules,which are not necessary for the core functionality but can enhance the capabilities of PostgreSQL:
MINIO: S3-compatible simple object storage server, serving as an optional PostgreSQL database backup repository with production deployment support and monitoring.
REDIS: High-performance data structure server supporting standalone m/s, sentinel, and cluster mode deployments, with comprehensive HA & monitoring support.
FERRET: Native FerertDB deployment — adding MongoDB wire protocol level API compatibility to existing HA PostgreSQL cluster.
DOCKER: Docker Daemon allowing users to easily deploy containerized stateless software tool templates, adding various functionalities.
DBMS Modules
Pigsty allow using other PERIPHERAL modules around the PostgreSQL ecosystem:
DUCKDB: Pigsty has duckdb cli, fdw, and PostgreSQL integration extensions such as pg_duckdb, pg_lakehouse, and duckdb_fdw.
SUPABASE: Run firebase alternative on top of existing HA PostgreSQL Cluster!
GREENPLUM: MPP for of PostgreSQL (PG 12 kernel), WIP
CLOUDBERRY: Greenplum OSS fork with PG 14 kernel, WIP
Pigsty is actively developing some new PILOT modules, which are not yet fully mature yet:
MYSQL: Pigsty is researching adding high-availability deployment support for MySQL as an optional extension feature (Beta).
KAFKA: Pigsty plans to offer message queue support (Beta).
KUBE: Pigsty plans to use SealOS to provide out-of-the-box production-grade Kubernetes deployment and monitoring support (Alpha).
VICTORIA: Prometheus & Loki replacement with VictoriaMetrics & VictoriaLogs with better performance(Alpha)
JUPYTER: Battery-included Jupyter Notebook environment for data analysis and machine learning scenarios(Alpha)
Monitoring Other Databases
Pigsty’s INFRA module can be used independently as a plug-and-play monitoring infrastructure for other nodes or existing PostgreSQL databases:
Existing PostgreSQL services: Pigsty can monitor external, non-Pigsty managed PostgreSQL services, still providing relatively complete monitoring support.
RDS PG: Cloud vendor-provided PostgreSQL RDS services, treated as standard external Postgres instances for monitoring.
PolarDB: Alibaba Cloud’s cloud-native database, treated as an external PostgreSQL 11 / 14 instance for monitoring.
KingBase: A trusted domestic database provided by People’s University of China, treated as an external PostgreSQL 12 instance for monitoring.
Greenplum / YMatrixDB monitoring, currently treated as horizontally partitioned PostgreSQL clusters for monitoring.
2.3 - Roadmap
The Pigsty project roadmap, including new features, development plans, and versioning & release policy.
Release Schedule
Pigsty employs semantic versioning, denoted as <major version>.<minor version>.<patch>. Alpha/Beta/RC versions are indicated with a suffix, such as -a1, -b1, -c1.
Major updates signify foundational changes and a plethora of new features; minor updates typically introduce new features, software package version updates, and minor API changes, while patch updates are meant for bug fixes and documentation improvements.
Pigsty plans to release a major update annually, with minor updates usually following the rhythm of PostgreSQL minor releases, aiming to catch up within a month after a new PostgreSQL version is released, typically resulting in 4 - 6 minor updates annually. For a complete release history, refer to Release Notes.
Do not use the main branch
Please always use a version-specificrelease, do not use the GitHub main branch unless you know what you are doing.
New Features on the Radar
A command-line tool that’s actually good
ARM architecture support for infrastructure components
Adding more extensions to PostgreSQL
More pre-configured scenario-based templates
Migrating package repositories and download sources entirely to Cloudflare
Deploying and monitoring high-availability Kubernetes clusters with SealOS!
Support for PostgreSQL 17 alpha
Loki and Promtail seem a bit off; could VictoriaLogs and Vector step up?
Swapping Prometheus storage for VictoriaMetrics to handle time-series data
Monitoring deployments of MySQL databases
Monitoring databases within Kubernetes
Offering a richer variety of Docker application templates
The Origin and Motivation Behind the Pigsty Project, Its Historical Development, and Future Goals and Visions.
Origin Story
The Pigsty project kicked off between 2018 and 2019, originating from Tantan, a dating app akin to China’s Tinder, now acquired by Momo. Tantan, a startup with a Nordic vibe, was founded by a team of Swedish engineers. Renowned for their tech sophistication, they chose PostgreSQL and Go as their core tech stack. Tantan’s architecture, inspired by Instagram, revolves around PostgreSQL. They managed to scale to millions of daily active users, millions of TPS, and hundreds of TBs of data using PostgreSQL exclusively. Almost all business logic was implemented using PG stored procedures, including recommendation algorithms with 100ms latency!
This unconventional development approach, deeply leveraging PostgreSQL features, demanded exceptional engineering and DBA skills. Pigsty emerged from these real-world, high-standard database cluster scenarios as an open-source project encapsulating our top-tier PostgreSQL expertise and best practices.
Dev Journey
Initially, Pigsty didn’t have the vision, objectives, or scope it has today. It was meant to be a PostgreSQL monitoring system for our use. After evaluating every available option—open-source, commercial, cloud-based, datadog, pgwatch,…… none met our observability bar. So, we took matters into our own hands, creating a system based on Grafana and Prometheus, which became the precursor to Pigsty. As a monitoring system, it was remarkably effective, solving countless management issues.
Eventually, developers wanted the same monitoring capabilities on their local dev machines. We used Ansible to write provisioning scripts, transitioning from a one-off setup to a reusable software. New features allowed users to quickly set up local DevBoxes or production servers with Vagrant and Terraform, automating PostgreSQL and monitoring system deployment through Infra as Code.
We then redesigned the production PostgreSQL architecture, introducing Patroni and pgBackRest for high availability and point-in-time recovery. We developed a zero-downtime migration strategy based on logical replication, performing rolling updates across 200 database clusters to the latest major version using blue-green deployments. These capabilities were integrated into Pigsty.
Pigsty, built for our use, reflects our understanding of our needs, avoiding shortcuts. The greatest benefit of “eating our own dog food” is being both developers and users, deeply understanding and not compromising on our requirements.
We tackled one problem after another, incorporating solutions into Pigsty. Its role evolved from a monitoring system to a ready-to-use PostgreSQL distribution. At this stage, we decided to open-source Pigsty, initiating a series of technical talks and promotions, attracting feedback from users across various industries.
Full-time Startup
In 2022, Pigsty secured seed funding from Dr. Qi’s MiraclePlus S22 (Former YC China), enabling me to work on it full-time. As an open-source project, Pigsty has thrived. In the two years since going full-time, its GitHub stars skyrocketed from a few hundred to 2400, On OSSRank, Pigsty ranks 37th among PostgreSQL ecosystem projects.
Originally only compatible with CentOS7, Pigsty now supports all major Linux Distors and PostgreSQL versions 12 - 16, integrating over 340 extensions from the ecosystem. I’ve personally compiled, packaged, and maintained some extensions not found in official PGDG repositories.
Pigsty’s identity has evolved from a PostgreSQL distribution to an open-source cloud database alternative, directly competing with entire cloud database services offered by cloud providers.
Cloud Rebel
Public cloud vendors like AWS, Azure, GCP, and Aliyun offer many conveniences to startups but are proprietary and lock users into high-cost infra rentals.
We believe that top-notch database services should be as accessible same as top-notch database kernel (PostgreSQL), not confined to costly rentals from cyber lords.
Cloud agility and elasticity are great, but it should be open-source, local-first and cheap enough.
We envision a cloud computing universe with an open-source solution, returning the control to users without sacrificing the benefits of the cloud.
Thus, we’re leading the “cloud-exit” movement in China, rebelling against public cloud norms to reshape industry values.
Our Vision
We’d like to see a world where everyone has the factual right to use top services freely, not just view the world from the pens provided by a few public cloud providers.
This is what Pigsty aims to achieve —— a superior, open-source, free RDS alternative. Enabling users to deploy a database service better than cloud RDS with just one click, anywhere (including on cloud servers).
Pigsty is a comprehensive enhancement for PostgreSQL and a spicy satire on cloud RDS. We offer “the Simple Data Stack”, which consists of PostgreSQL, Redis, MinIO, and more optional modules.
Pigsty is entirely open-source and free, sustained through consulting and sponsorship. A well-built system might run for years without issues, but when database problems arise, they’re serious. Often, expert advice can turn a dire situation around, and we offer such services to clients in need—a fairer and more rational model.
About Me
I’m Feng Ruohang, the creator of Pigsty. I’ve developed most of Pigsty’s code solo, with the community contributing specific features.
Unique individuals create unique works —— I hope Pigsty can be one of those creations.
If you are interested in the author, here’s my personal website: https://vonng.com/en/
2.5 - Event & News
Latest activity, event, and news about Pigsty and PostgreSQL.
Latest News
Upcoming: Pigsty 3.1.0 is about to come @ 2024-11-20, with PostgreSQL 17 as the default major version!
Pigsty 3.0.4 Release! Extension Repo and Supabase self-hosting optimizing
The name of this project always makes me grin: PIGSTY is actually an acronym, standing for Postgres In Great STYle! It’s a Postgres distribution that includes lots of components and tools out of the box in areas like availability, deployment, and observability. The latest release pushes everything up to Postgres 16.2 standards and introduces new ParadeDB and DuckDB FDW extensions.
Best Practices for High Availability and Disaster Recovery in PG
2023-03-23
Public Livestream
Bytebase x Pigsty
Best Practices for Managing PostgreSQL: Bytebase x Pigsty
2023-03-04
Tech Conference
PostgreSQL China Tech Conference
Bombarding RDS, Release of Pigsty v2.0
2023-02-01
Tech Conference
DTCC 2022
Open Source RDS Alternatives: Out-of-the-Box, Self-Driving Database Edition Pigsty
2022-07-21
Live Debate
Can Open Source Fight Back Against Cloud Cannibalization?
Can Open Source Fight Back Against Cloud Cannibalization?
2022-07-04
Exclusive Interview
Creators Speak
Post-90s, Quitting Job to Entrepreneur, Aiming to Outperform Cloud Databases
2022-06-28
Public Livestream
Beth’s Roundtable
SQL Review Best Practices
2022-06-12
Public Roadshow
MiraclePlus S22 Demo Day
Cost-Effective Database Edition Pigsty
2022-06-05
Video Livestream
PG Chinese Community Livestream Sharing
Quick Start with New Features of Pigstyv1.5 & Building Production Clusters
2.6 - Community
Pigsty is Build in Public. We have active community in GitHub
The Pigsty community already offers free WeChat/Discord/Telegram Q&A Office Hours, and we are also happy to provide more free value-added services to our supporters.
When having troubles with pigsty. You can ask the community for help, with enough info & context, here’s a template:
What happened? (REQUIRED)
Pigsty Version & OS Version (REQUIRED)
$ grep version pigsty.yml
$ cat /etc/os-release
If you are using a cloud provider, please tell us which cloud provider and what operating system image you are using.
If you have customized and modified the environment after installing the bare OS, or have specific security rules and firewall configurations in your WAN, please also tell us when troubleshooting.
Pigsty Config File (REQUIRED)
Don’t forget to remove sensitive information like passwords, etc…
cat ~/pigsty/pigsty.yml
What did you expect to happen?
Please describe what you expected to happen.
How to reproduce it?
Please tell us as much detail as possible about how to reproduce the problem.
Monitoring Screenshots
If you are using pigsty monitoring system, you can paste RELEVANT screenshots here.
Error Log
Please copy and paste any RELEVANT log output. Do not paste something like “Failed to start xxx service”
Syslog: /var/log/messages (rhel) or /var/log/syslog (debian)
The more information and context you provide, the more likely we are to be able to help you solve the problem.
2.7 - License
Pigsty is open sourced under AGPLv3, here’s the details about permissions, limitations, conditions and exemptions.
Pigsty is open sourced under the AGPLv3 license, which is a copyleft license.
Summary
Pigsty use the AGPLv3 license,
which is a strong copyleft license that requires you to also distribute the source code of your derivative works under the same license.
If you distribute Pigsty, you must make the source code available under the same license, and you must make it clear that the source code is available.
Permissions:
Commercial use
Modification
Distribution
Patent use
Private use
Limitations:
Liability
Warranty
Conditions:
License and copyright notice
State changes
Disclose source
Network use is distribution
Same license
Beware that the Pigsty official website is also open sourced under CC by 4.0 license.
Exemptions
While employing the AGPLv3 license for Pigsty, we extend exemptions to common end users under terms akin to the Apache 2.0 license.
Common end users are defined as all entities except public cloud and database service vendors.
These end users may utilize Pigsty for commercial activities and service provision without AGPL licensing concerns
Our subscription include written guarantees of these terms for additional assurance.
We encourage cloud & databases vendors adhering to AGPLv3 to use Pigsty for derivative works and to contribute back to the community.
Why AGPLv3
We don’t like the idea that public cloud vendors take open-source code, provide as-a-service,
and not give back equally (there controller) to the community. This is a vulnerability in the GPL license that AGPLv3 was designed to close.
The AGPLv3 does not affect regular end users: using Pigsty internally is not “distributing” it,
so you don’t have to worry about whether your business code needs to be open-sourced. If you do worry about it, you can always choose the pro version with written guarantees.
Wile you only need to consider AGPLv3 when you “distribute” Pigsty or modifications to it as part of a software/service offering. Such as database/software/cloud vendors who provide Pigsty as a service or part of their software to their customers.
Sponsors and Investors of Pigsty, Thank You for Your Support of This Project!
Investor
Pigsty is funded by MiraclePlus (formal YC China), S22 Batch.
Thanks to MiraclePlus and Dr.Qi’s support.
Sponsorship
Pigsty is a free & open-source software nurtured by the passion of PostgreSQL community members.
If our work has helped you, please consider sponsoring or supporting our project. Every penny counts, and advertisements are also a form of support:
Make Donation & Sponsor us.
Share your experiences and use cases of Pigsty through articles, lectures, and videos.
Allow us to mention your organization in “These users who use Pigsty”
Nominate/Recommend our project and services to your friends, colleagues, and clients in need.
Follow our WeChat Column and share technical articles with your friends.
2.9 - Privacy Policy
How we process your data & protect your privacy in this website and pigsty’s software.
Pigsty Software
When you install the Pigsty software, if you use offline packages in a network-isolated environment, we will not receive any data about you.
If you choose to install online, then when downloading relevant software packages, our server or the servers of our cloud providers will automatically log the visiting machine’s IP address and/or hostname, as well as the name of the software package you downloaded, in the logs.
We will not share this information with other organizations unless legally required to do so.
When you visit our website, our servers automatically log your IP address and/or host name.
We store information such as your email address, name and locality only if you decide to send us such information by completing a survey, or registering as a user on one of our sites
We collect this information to help us improve the content of our sites, customize the layout of our web pages and to contact people for technical and support purposes. We will not share your email address with other organisations unless required by law.
This website uses Google Analytics, a web analytics service provided by Google, Inc. (“Google”). Google Analytics uses “cookies”, which are text files placed on your computer, to help the website analyze how users use the site.
The information generated by the cookie about your use of the website (including your IP address) will be transmitted to and stored by Google on servers in the United States. Google will use this information for the purpose of evaluating your use of the website, compiling reports on website activity for website operators and providing other services relating to website activity and internet usage. Google may also transfer this information to third parties where required to do so by law, or where such third parties process the information on Google’s behalf. Google will not associate your IP address with any other data held by Google. You may refuse the use of cookies by selecting the appropriate settings on your browser, however please note that if you do this you may not be able to use the full functionality of this website. By using this website, you consent to the processing of data about you by Google in the manner and for the purposes set out above.
If you have any questions or comments around this policy, or to request the deletion of personal data, you can contact us at rh@vonng.com
2.10 - Service
Pigsty has professional support which provides services & consulting to cover corner-cases!
Pigsty aims to gather the combined strength of the PG ecosystem and replace manual database operations with database autopilot software, helping users make the most of PostgreSQL.
We understand the importance of professional support services for enterprise customers, so we provide commercial subscription plan to help users make better use of PostgreSQL and Pigsty.
If you have the following needs, consider our subscription:
Seeking backup & insurance for Pigsty and PostgreSQL-related issues.
Running databases in critical scenarios and needing strict SLA guarantees.
Wanting guidance on best practices for PostgreSQL/Pigsty production environments.
Needing experts to help interpret monitoring charts, analyze and locate performance bottlenecks and fault root causes, and provide opinions.
Planning a database architecture that meets security, disaster recovery, and compliance requirements based on existing resources and business needs.
Needing to migrate other databases to PostgreSQL, or migrate and transform legacy instances.
Building an observability system, data dashboard, and visualization application based on the Prometheus/Grafana tech stack.
Seeking support for domestic trusted operating systems/domestic trusted ARM chip architectures and providing Chinese/localized interface support.
Moving off the cloud and seeking an open-source alternative to RDS for PostgreSQL - a cloud-neutral, vendor-lock-in-free solution.
Seeking professional technical support for Redis/ETCD/MinIO/Greenplum/Citus/TimescaleDB.
Wanting to avoid the restrictions of the AGPLv3 license of Pigsty itself, doing derivative works being forced to use the same open-source license for secondary development and branding.
Consider selling Pigsty as SaaS/PaaS/DBaaS or providing technical services/consulting services based on this distribution.
Subscription
In addition to the open-source version, Pigsty offers two subscription plans: Pro and Enterprise.
You can choose the appropriate subscription plan based on your actual situation and needs.
The price of Pigsty subscription is directly proportional to the number of nodes deployed, which is bound to the system’s complexity and scale.
A node can be a physical machine or a virtual machine, and the number of CPUs on the node does not affect the pricing.
The number of nodes is defined as the total number of independent IP addresses that exist as Hosts in the config inventory of a Pigsty deployment, i.e., the total number of nodes managed by Pigsty.
Pigsty OSS allows junior developers and operations engineers to create, manage and maintain a production-grade PostgreSQL database cluster with ease, even in the absence of any database experts.
Pigsty open-source version is licensed under AGPLv3, providing full core/extended functionality without any charge, but also without any warranty.
If you find any bugs in Pigsty, we welcome you to raise an Issue on GitHub.
For ordinary end-users (excluding public cloud vendors and database vendors), we actually comply with the more permissive Apache 2.0 license — even if you make modifications to Pigsty, we will not pursue you for it.
You can get a written guarantee from us in the form of a commercial subscription.
W have offline packages for PostgreSQL 17 on 3 precise OS minor versions: EL 9.4, Debian 12.7, and Ubuntu 22.04.5,
available on the GitHub Release page, including PostgreSQL 17 and all applicable extensions.
Pigsty Pro offers an expanded range of features, supporting a wider variety of operating system distributions, PostgreSQL major versions, and a richer set of extension plugins. It also includes offline software packages tailored for each OS minor version to ensure optimal compatibility.
Pigsty subscriptions operate on an annual payment model, providing users with an annual license for the Pigsty commercial version.
This includes access to the latest software versions and upgrade paths released within the year, along with comprehensive consulting, Q&A, and service support.
A larger scale implies more complex scenarios, more issues, and a higher chance of failure events: thus, each subscription comes with a node scale limit.
For example, if you are using the Pro subscription and manage 15 nodes, you will need to pay an additional subscription fee for each node beyond the limit (10,000 RMB per node).
Pigsty’s pricing strategy ensures value for money — you can immediately obtain top-notch DBA database architecture solutions and management best practices, all backed by consulting, Q&A, and service support, at a cost that is highly competitive compared to find-out & hiring rare database Guru or using cloud RDS.
If you have the following needs, please consider our subscription:
Running databases in critical scenarios and needing strict SLA guarantees.
Seeking backup for Pigsty and PostgreSQL-related issues.
Wanting guidance on best practices for PostgreSQL/Pigsty production environments.
Needing experts to help interpret monitoring charts, analyze and locate performance bottlenecks and fault root causes, and provide opinions.
Planning a database architecture that meets security, disaster recovery, and compliance requirements based on existing resources and business needs.
Needing to migrate other databases to PostgreSQL, or migrate and transform legacy instances.
Building an observability system, data dashboard, and visualization application based on the Prometheus/Grafana tech stack.
Seeking support for domestic trusted operating systems/domestic trusted ARM chip architectures and providing Chinese/localized interface support.
Moving off the cloud and seeking an open-source alternative to RDS for PostgreSQL - a cloud-neutral, vendor-lock-in-free solution.
Seeking professional technical support for Redis/ETCD/MinIO/Greenplum/Citus/TimescaleDB.
Wanting to avoid the restrictions of the AGPLv3 license of Pigsty itself, doing derivative works being forced to use the same open-source license for secondary development and branding.
Consider selling Pigsty as SaaS/PaaS/DBaaS or providing technical services/consulting services based on this distribution.
Service subscriptions are divided into two different levels, Standard Service Agreement, and Enterprise Service Agreement, as shown in the table below:
Commercial support contact: Email: rh@vonng.com, WeChat: pigsty-cc / RuohangFeng
Miscellaneous
We offer retail expert days that can be used for database architecting, failure analysis, postmortem, troubleshooting, performance analysis, problem-solving, teaching, and training, which can be purchased as needed.
Top Expert: 3,000 $ / day
Senior Expert: 2,000 $ / day
The above prices are exclusive of taxes. The minimum unit is half a day, less than that will be charged as half a day.
Price is doubled outside regular working hours (5x8), and it’s tripled on public holidays.
Pricing & Discount may vary depending on the industry and the technical level of the client’s team.
Expert days need to be arranged at least one day before.
Emergency failure responding is not applicable here and only available to subscribed customers.
We offer teaching and training services on PostgreSQL, priced as follows:
PostgreSQL Application Development: 1 x expert day, up to 20 people.
PostgreSQL Management & Operation: 1 x expert day, up to 20 people.
PostgreSQL Kernel Architecture: 2 x expert day, up to 10 people.
We offer deployment consulting and architecting services, priced as follows:
Planning a deployment solution based on your existing resources and needs.
150 $/h, at least one hour per case, remote only. Delivery includes the pigsty.yml file.
3 - Concept
Learn about core concept about Pigsty: architecture, cluster models, infra, PG HA, PITR, and service access.
3.1 - Architecture
Pigsty’s modular architecture, compose modules in a declarative manner
Modular Architecture and Declarative Interface!
Pigsty deployment is described by config inventory and materialized with ansible playbooks.
Pigsty works on Linux x86_64 common nodes, i.e., bare metals or virtual machines.
Pigsty uses a modular design that can be freely composed for different scenarios.
The config controls where & how to install modules with parameters
The playbooks will adjust nodes into the desired status in an idempotent manner.
ETCD: Distributed key-value store will be used as DCS for high-available Postgres clusters.
REDIS: Redis servers in standalone master-replica, sentinel, cluster mode with Redis exporter.
MINIO: S3 compatible simple object storage server, can be used as an optional backup center for Postgres.
You can compose them freely in a declarative manner. If you want host monitoring, INFRA & NODE will suffice. Add additional ETCD and PGSQL are used for HA PG Clusters. Deploying them on multiple nodes will form a ha cluster. You can reuse pigsty infra and develop your modules, considering optional REDIS and MINIO as examples.
Singleton Meta
Pigsty will install on a single node (BareMetal / VirtualMachine) by default. The install.yml playbook will install INFRA, ETCD, PGSQL, and optional MINIO modules on the current node, which will give you a full-featured observability infrastructure (Prometheus, Grafana, Loki, AlertManager, PushGateway, BlackboxExporter, etc… ) and a battery-included PostgreSQL Singleton Instance (Named meta).
This node now has a self-monitoring system, visualization toolsets, and a Postgres database with autoconfigured PITR. You can use this node for devbox, testing, running demos, and doing data visualization & analysis. Or, furthermore, adding more nodes to it!
Monitoring
The installed Singleton Meta can be use as an admin node and monitoring center, to take more nodes & Database servers under it’s surveillance & control.
If you want to install the Prometheus / Grafana observability stack, Pigsty just deliver the best practice for you! It has fine-grained dashboards for Nodes & PostgreSQL, no matter these nodes or PostgreSQL servers are managed by Pigsty or not, you can have a production-grade monitoring & alerting immediately with simple configuration.
HA PG Cluster
With Pigsty, you can have your own local production-grade HA PostgreSQL RDS as much as you want.
And to create such a HA PostgreSQL cluster, All you have to do is describe it & run the playbook:
Which will gives you a following cluster with monitoring , replica, backup all set.
Hardware failures are covered by self-healing HA architecture powered by patroni, etcd, and haproxy, which will perform auto failover in case of leader failure under 30 seconds. With the self-healing traffic control powered by haproxy, the client may not even notice there’s a failure at all, in case of a switchover or replica failure.
Software Failures, human errors, and DC Failure are covered by pgbackrest, and optional MinIO clusters. Which gives you the ability to perform point-in-time recovery to anytime (as long as your storage is capable)
Database as Code
Pigsty follows IaC & GitOPS philosophy: Pigsty deployment is described by declarative Config Inventory and materialized with idempotent playbooks.
The user describes the desired status with Parameters in a declarative manner, and the playbooks tune target nodes into that status in an idempotent manner. It’s like Kubernetes CRD & Operator but works on Bare Metals & Virtual Machines.
Take the default config snippet as an example, which describes a node 10.10.10.10 with modules INFRA, NODE, ETCD, and PGSQL installed.
# infra cluster for proxy, monitor, alert, etc...infra:{hosts:{10.10.10.10:{infra_seq:1}}}# minio cluster, s3 compatible object storageminio:{hosts:{10.10.10.10:{minio_seq: 1 } }, vars:{minio_cluster:minio } }# etcd cluster for ha postgres DCSetcd:{hosts:{10.10.10.10:{etcd_seq: 1 } }, vars:{etcd_cluster:etcd } }# postgres example cluster: pg-metapg-meta:{hosts:{10.10.10.10:{pg_seq: 1, pg_role: primary }, vars:{pg_cluster:pg-meta } }
To materialize it, use the following playbooks:
./infra.yml -l infra # init infra module on group 'infra'./etcd.yml -l etcd # init etcd module on group 'etcd'./minio.yml -l minio # init minio module on group 'minio'./pgsql.yml -l pg-meta # init pgsql module on group 'pgsql'
It would be straightforward to perform regular administration tasks. For example, if you wish to add a new replica/database/user to an existing HA PostgreSQL cluster, all you need to do is add a host in config & run that playbook on it, such as:
Pigsty abstracts different types of functionalities into modules & clusters.
PGSQL for production environments is organized in clusters, which clusters are logical entities consisting of a set of database instances associated by primary-replica.
Each database cluster is an autonomous serving unit consisting of at least one database instance (primary).
ER Diagram
Let’s get started with ER diagram. There are four types of core entities in Pigsty’s PGSQL module:
PGSQL Cluster: An autonomous PostgreSQL business unit, used as the top-level namespace for other entities.
PGSQL Service: A named abstraction of cluster ability, route traffics, and expose postgres services with node ports.
PGSQL Instance: A single postgres server which is a group of running processes & database files on a single node.
PGSQL Node: An abstraction of hardware resources, which can be bare metal, virtual machine, or even k8s pods.
Naming Convention
The cluster name should be a valid domain name, without any dot: [a-zA-Z0-9-]+
Service name should be prefixed with cluster name, and suffixed with a single word: such as primary, replica, offline, delayed, join by -
Instance name is prefixed with cluster name and suffixed with an integer, join by -, e.g., ${cluster}-${seq}.
Node is identified by its IP address, and its hostname is usually the same as the instance name since they are 1:1 deployed.
Identity Parameter
Pigsty uses identity parameters to identify entities: PG_ID.
In addition to the node IP address, three parameters: pg_cluster, pg_role, and pg_seq are the minimum set of parameters necessary to define a postgres cluster.
Take the sandbox testing cluster pg-test as an example:
The architecture and implementation of Pigsty’s monitoring system, the service discovery details
3.4 - Self-Signed CA
Pigsty comes with a set of self-signed CA PKI for issuing SSL certs to encrypt network traffic.
Pigsty has some security best practices: encrypting network traffic with SSL and encrypting the Web interface with HTTPS.
To achieve this, Pigsty comes with a built-in local self-signed Certificate Authority (CA) for issuing SSL certificates to encrypt network communication.
By default, SSL and HTTPS are enabled but not enforced. For environments with higher security requirements, you can enforce the use of SSL and HTTPS.
Local CA
Pigsty, by default, generates a self-signed CA in the Pigsty source code directory (~/pigsty) on the ADMIN node during initialization. This CA is used when SSL, HTTPS, digital signatures, issuing database client certificates, and advanced security features are needed.
Hence, each Pigsty deployment uses a unique CA, and CAs from different Pigsty deployments do not trust each other.
The local CA consists of two files, typically located in the files/pki/ca directory:
ca.crt: The self-signed root CA certificate, which should be distributed and installed on all managed nodes for certificate verification.
ca.key: The CA private key, used to issue certificates and verify CA identity. It should be securely stored to prevent leaks!
Protect Your CA Private Key File
Please securely store the CA private key file, do not lose it or let it leak. We recommend encrypting and backing up this file after completing the Pigsty installation.
Using an Existing CA
If you already have a CA public and private key infrastructure, Pigsty can also be configured to use an existing CA.
Simply place your CA public and private key files in the files/pki/ca directory.
files/pki/ca/ca.key # The essential CA private key file, must exist; if not, a new one will be randomly generated by defaultfiles/pki/ca/ca.crt # If a certificate file is absent, Pigsty will automatically generate a new root certificate file from the CA private key
When Pigsty executes the install.yml and infra.yml playbooks for installation, if the ca.key private key file is found in the files/pki/ca directory, the existing CA will be used. The ca.crt file can be generated from the ca.key private key, so if there is no certificate file, Pigsty will automatically generate a new root certificate file from the CA private key.
Note When Using an Existing CA
You can configure the ca_method parameter as copy to ensure that Pigsty reports an error and halts if it cannot find the local CA, rather than automatically regenerating a new self-signed CA.
Trust CA
During the Pigsty installation, ca.crt is distributed to all nodes under the /etc/pki/ca.crt path during the node_ca task in the node.yml playbook.
The default paths for trusted CA root certificates differ between EL family and Debian family operating systems, hence the distribution path and update methods also vary.
By default, Pigsty will issue HTTPS certificates for domain names used by web systems on infrastructure nodes, allowing you to access Pigsty’s web systems via HTTPS.
If you do not want your browser on the client computer to display “untrusted CA certificate” messages, you can distribute ca.crt to the trusted certificate directory on the client computer.
You can double-click the ca.crt file to add it to the system keychain, for example, on macOS systems, you need to open “Keychain Access,” search for pigsty-ca, and then “trust” this root certificate.
Check Cert
Use the following command to view the contents of the Pigsty CA certificate
openssl x509 -text -in /etc/pki/ca.crt
Local CA Root Cert Content
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
50:29:e3:60:96:93:f4:85:14:fe:44:81:73:b5:e1:09:2a:a8:5c:0a
Signature Algorithm: sha256WithRSAEncryption
Issuer: O=pigsty, OU=ca, CN=pigsty-ca
Validity
Not Before: Feb 7 00:56:27 2023 GMT
Not After : Jan 14 00:56:27 2123 GMT
Subject: O=pigsty, OU=ca, CN=pigsty-ca
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (4096 bit)
Modulus:
00:c1:41:74:4f:28:c3:3c:2b:13:a2:37:05:87:31:
....
e6:bd:69:a5:5b:e3:b4:c0:65:09:6e:84:14:e9:eb:
90:f7:61
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Subject Alternative Name:
DNS:pigsty-ca
X509v3 Key Usage:
Digital Signature, Certificate Sign, CRL Sign
X509v3 Basic Constraints: critical
CA:TRUE, pathlen:1
X509v3 Subject Key Identifier:
C5:F6:23:CE:BA:F3:96:F6:4B:48:A5:B1:CD:D4:FA:2B:BD:6F:A6:9C
Signature Algorithm: sha256WithRSAEncryption
Signature Value:
89:9d:21:35:59:6b:2c:9b:c7:6d:26:5b:a9:49:80:93:81:18:
....
9e:dd:87:88:0d:c4:29:9e
-----BEGIN CERTIFICATE-----
...
cXyWAYcvfPae3YeIDcQpng==
-----END CERTIFICATE-----
Issue Database Client Certs
If you wish to authenticate via client certificates, you can manually issue PostgreSQL client certificates using the local CA and the cert.yml playbook.
Set the certificate’s CN field to the database username:
The issued certificates will default to being generated in the files/pki/misc/<cn>.{key,crt} path.
3.5 - Infra as Code
Pigsty treat infra & database as code. Manage them in a declarative manner
Infra as Code, Database as Code, Declarative API & Idempotent Playbooks, GitOPS works like a charm.
Pigsty provides a declarative interface: Describe everything in a config file,
and Pigsty operates it to the desired state with idempotent playbooks.
It works like Kubernetes CRDs & Operators but for databases and infrastructures on any nodes: bare metal or virtual machines.
Declare Module
Take the default config snippet as an example, which describes a node 10.10.10.10 with modules INFRA, NODE, ETCD, and PGSQL installed.
# infra cluster for proxy, monitor, alert, etc...infra:{hosts:{10.10.10.10:{infra_seq:1}}}# minio cluster, s3 compatible object storageminio:{hosts:{10.10.10.10:{minio_seq: 1 } }, vars:{minio_cluster:minio } }# etcd cluster for ha postgres DCSetcd:{hosts:{10.10.10.10:{etcd_seq: 1 } }, vars:{etcd_cluster:etcd } }# postgres example cluster: pg-metapg-meta:{hosts:{10.10.10.10:{pg_seq: 1, pg_role: primary }, vars:{pg_cluster:pg-meta } }
To materialize it, use the following playbooks:
./infra.yml -l infra # init infra module on node 10.10.10.10./etcd.yml -l etcd # init etcd module on node 10.10.10.10./minio.yml -l minio # init minio module on node 10.10.10.10./pgsql.yml -l pg-meta # init pgsql module on node 10.10.10.10
Declare Cluster
You can declare the PGSQL module on multiple nodes, and form a cluster.
For example, to create a three-node HA cluster based on streaming replication, just adding the following definition to the all.children section of the config file pigsty.yml:
You can deploy different kinds of instance roles such as
primary, replica, offline, delayed, sync standby, and different kinds of clusters, such as standby clusters, Citus clusters, and even Redis / MinIO / Etcd clusters.
Declare Cluster Internal
Not only can you define clusters in a declarative manner, but you can also specify the databases, users, services, and HBA rules within the cluster. For example, the following configuration file deeply customizes the content of the default pg-meta single-node database cluster:
This includes declaring six business databases and seven business users, adding an additional standby service (a synchronous replica providing read capabilities with no replication delay), defining some extra pg_hba rules, an L2 VIP address pointing to the cluster’s primary database, and a customized backup strategy.
pg-meta:hosts:{10.10.10.10:{pg_seq: 1, pg_role: primary , pg_offline_query:true}}vars:pg_cluster:pg-metapg_databases:# define business databases on this cluster, array of database definition- name:meta # REQUIRED, `name` is the only mandatory field of a database definitionbaseline:cmdb.sql # optional, database sql baseline path, (relative path among ansible search path, e.g files/)pgbouncer:true# optional, add this database to pgbouncer database list? true by defaultschemas:[pigsty] # optional, additional schemas to be created, array of schema namesextensions: # optional, additional extensions to be installed:array of `{name[,schema]}`- {name: postgis , schema:public }- {name:timescaledb }comment:pigsty meta database # optional, comment string for this databaseowner:postgres # optional, database owner, postgres by defaulttemplate:template1 # optional, which template to use, template1 by defaultencoding:UTF8 # optional, database encoding, UTF8 by default. (MUST same as template database)locale:C # optional, database locale, C by default. (MUST same as template database)lc_collate:C # optional, database collate, C by default. (MUST same as template database)lc_ctype:C # optional, database ctype, C by default. (MUST same as template database)tablespace:pg_default # optional, default tablespace, 'pg_default' by default.allowconn:true# optional, allow connection, true by default. false will disable connect at allrevokeconn:false# optional, revoke public connection privilege. false by default. (leave connect with grant option to owner)register_datasource:true# optional, register this database to grafana datasources? true by defaultconnlimit:-1# optional, database connection limit, default -1 disable limitpool_auth_user:dbuser_meta # optional, all connection to this pgbouncer database will be authenticated by this userpool_mode:transaction # optional, pgbouncer pool mode at database level, default transactionpool_size:64# optional, pgbouncer pool size at database level, default 64pool_size_reserve:32# optional, pgbouncer pool size reserve at database level, default 32pool_size_min:0# optional, pgbouncer pool size min at database level, default 0pool_max_db_conn:100# optional, max database connections at database level, default 100- {name: grafana ,owner: dbuser_grafana ,revokeconn: true ,comment:grafana primary database }- {name: bytebase ,owner: dbuser_bytebase ,revokeconn: true ,comment:bytebase primary database }- {name: kong ,owner: dbuser_kong ,revokeconn: true ,comment:kong the api gateway database }- {name: gitea ,owner: dbuser_gitea ,revokeconn: true ,comment:gitea meta database }- {name: wiki ,owner: dbuser_wiki ,revokeconn: true ,comment:wiki meta database }pg_users:# define business users/roles on this cluster, array of user definition- name:dbuser_meta # REQUIRED, `name` is the only mandatory field of a user definitionpassword:DBUser.Meta # optional, password, can be a scram-sha-256 hash string or plain textlogin:true# optional, can log in, true by default (new biz ROLE should be false)superuser:false# optional, is superuser? false by defaultcreatedb:false# optional, can create database? false by defaultcreaterole:false# optional, can create role? false by defaultinherit:true# optional, can this role use inherited privileges? true by defaultreplication:false# optional, can this role do replication? false by defaultbypassrls:false# optional, can this role bypass row level security? false by defaultpgbouncer:true# optional, add this user to pgbouncer user-list? false by default (production user should be true explicitly)connlimit:-1# optional, user connection limit, default -1 disable limitexpire_in:3650# optional, now + n days when this role is expired (OVERWRITE expire_at)expire_at:'2030-12-31'# optional, YYYY-MM-DD 'timestamp' when this role is expired (OVERWRITTEN by expire_in)comment:pigsty admin user # optional, comment string for this user/roleroles: [dbrole_admin] # optional, belonged roles. default roles are:dbrole_{admin,readonly,readwrite,offline}parameters:{}# optional, role level parameters with `ALTER ROLE SET`pool_mode:transaction # optional, pgbouncer pool mode at user level, transaction by defaultpool_connlimit:-1# optional, max database connections at user level, default -1 disable limit- {name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly], comment:read-only viewer for meta database}- {name: dbuser_grafana ,password: DBUser.Grafana ,pgbouncer: true ,roles: [dbrole_admin] ,comment:admin user for grafana database }- {name: dbuser_bytebase ,password: DBUser.Bytebase ,pgbouncer: true ,roles: [dbrole_admin] ,comment:admin user for bytebase database }- {name: dbuser_kong ,password: DBUser.Kong ,pgbouncer: true ,roles: [dbrole_admin] ,comment:admin user for kong api gateway }- {name: dbuser_gitea ,password: DBUser.Gitea ,pgbouncer: true ,roles: [dbrole_admin] ,comment:admin user for gitea service }- {name: dbuser_wiki ,password: DBUser.Wiki ,pgbouncer: true ,roles: [dbrole_admin] ,comment:admin user for wiki.js service }pg_services:# extra services in addition to pg_default_services, array of service definition# standby service will route {ip|name}:5435 to sync replica's pgbouncer (5435->6432 standby)- name: standby # required, service name, the actual svc name will be prefixed with `pg_cluster`, e.g:pg-meta-standbyport:5435# required, service exposed port (work as kubernetes service node port mode)ip:"*"# optional, service bind ip address, `*` for all ip by defaultselector:"[]"# required, service member selector, use JMESPath to filter inventorydest:default # optional, destination port, default|postgres|pgbouncer|<port_number>, 'default' by defaultcheck:/sync # optional, health check url path, / by defaultbackup:"[? pg_role == `primary`]"# backup server selectormaxconn:3000# optional, max allowed front-end connectionbalance: roundrobin # optional, haproxy load balance algorithm (roundrobin by default, other:leastconn)options:'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'pg_hba_rules:- {user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title:'allow grafana dashboard access cmdb from infra nodes'}pg_vip_enabled:truepg_vip_address:10.10.10.2/24pg_vip_interface:eth1node_crontab:# make a full backup 1 am everyday- '00 01 * * * postgres /pg/bin/pg-backup full'
Declare Access Control
You can also deeply customize Pigsty’s access control capabilities through declarative configuration. For example, the following configuration file provides deep security customization for the pg-meta cluster:
Utilizes a three-node core cluster template: crit.yml, to ensure data consistency is prioritized, with zero data loss during failover.
Enables L2 VIP, and restricts the database and connection pool listening addresses to three specific addresses: local loopback IP, internal network IP, and VIP.
The template mandatorily enables SSL API for Patroni and SSL for Pgbouncer, and in the HBA rules, it enforces SSL usage for accessing the database cluster.
Additionally, the $libdir/passwordcheck extension is enabled in pg_libs to enforce a password strength security policy.
Lastly, a separate pg-meta-delay cluster is declared as a delayed replica of pg-meta from one hour ago, for use in emergency data deletion recovery.
pg-meta:# 3 instance postgres cluster `pg-meta`hosts:10.10.10.10:{pg_seq: 1, pg_role:primary }10.10.10.11:{pg_seq: 2, pg_role:replica }10.10.10.12:{pg_seq: 3, pg_role: replica , pg_offline_query:true}vars:pg_cluster:pg-metapg_conf:crit.ymlpg_users:- {name: dbuser_meta , password: DBUser.Meta , pgbouncer: true , roles: [ dbrole_admin ] , comment:pigsty admin user }- {name: dbuser_view , password: DBUser.Viewer , pgbouncer: true , roles: [ dbrole_readonly ] , comment:read-only viewer for meta database }pg_databases:- {name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [pigsty] ,extensions:[{name: postgis, schema:public}, {name: timescaledb}]}pg_default_service_dest:postgrespg_services:- {name: standby ,src_ip:"*",port: 5435 , dest: default ,selector:"[]", backup:"[? pg_role == `primary`]"}pg_vip_enabled:truepg_vip_address:10.10.10.2/24pg_vip_interface:eth1pg_listen:'${ip},${vip},${lo}'patroni_ssl_enabled:truepgbouncer_sslmode:requirepgbackrest_method:miniopg_libs:'timescaledb, $libdir/passwordcheck, pg_stat_statements, auto_explain'# add passwordcheck extension to enforce strong passwordpg_default_roles:# default roles and users in postgres cluster- {name: dbrole_readonly ,login: false ,comment:role for global read-only access }- {name: dbrole_offline ,login: false ,comment:role for restricted read-only access }- {name: dbrole_readwrite ,login: false ,roles: [dbrole_readonly] ,comment:role for global read-write access }- {name: dbrole_admin ,login: false ,roles: [pg_monitor, dbrole_readwrite] ,comment:role for object creation }- {name: postgres ,superuser: true ,expire_in: 7300 ,comment:system superuser }- {name: replicator ,replication: true ,expire_in: 7300 ,roles: [pg_monitor, dbrole_readonly] ,comment:system replicator }- {name: dbuser_dba ,superuser: true ,expire_in: 7300 ,roles: [dbrole_admin] ,pgbouncer: true ,pool_mode: session, pool_connlimit: 16 , comment:pgsql admin user }- {name: dbuser_monitor ,roles: [pg_monitor] ,expire_in: 7300 ,pgbouncer: true ,parameters:{log_min_duration_statement: 1000 } ,pool_mode: session ,pool_connlimit: 8 ,comment:pgsql monitor user }pg_default_hba_rules:# postgres host-based auth rules by default- {user:'${dbsu}',db: all ,addr: local ,auth: ident ,title:'dbsu access via local os user ident'}- {user:'${dbsu}',db: replication ,addr: local ,auth: ident ,title:'dbsu replication from local os ident'}- {user:'${repl}',db: replication ,addr: localhost ,auth: ssl ,title:'replicator replication from localhost'}- {user:'${repl}',db: replication ,addr: intra ,auth: ssl ,title:'replicator replication from intranet'}- {user:'${repl}',db: postgres ,addr: intra ,auth: ssl ,title:'replicator postgres db from intranet'}- {user:'${monitor}',db: all ,addr: localhost ,auth: pwd ,title:'monitor from localhost with password'}- {user:'${monitor}',db: all ,addr: infra ,auth: ssl ,title:'monitor from infra host with password'}- {user:'${admin}',db: all ,addr: infra ,auth: ssl ,title:'admin @ infra nodes with pwd & ssl'}- {user:'${admin}',db: all ,addr: world ,auth: cert ,title:'admin @ everywhere with ssl & cert'}- {user: '+dbrole_readonly',db: all ,addr: localhost ,auth: ssl ,title:'pgbouncer read/write via local socket'}- {user: '+dbrole_readonly',db: all ,addr: intra ,auth: ssl ,title:'read/write biz user via password'}- {user: '+dbrole_offline' ,db: all ,addr: intra ,auth: ssl ,title:'allow etl offline tasks from intranet'}pgb_default_hba_rules:# pgbouncer host-based authentication rules- {user:'${dbsu}',db: pgbouncer ,addr: local ,auth: peer ,title:'dbsu local admin access with os ident'}- {user: 'all' ,db: all ,addr: localhost ,auth: pwd ,title:'allow all user local access with pwd'}- {user:'${monitor}',db: pgbouncer ,addr: intra ,auth: ssl ,title:'monitor access via intranet with pwd'}- {user:'${monitor}',db: all ,addr: world ,auth: deny ,title:'reject all other monitor access addr'}- {user:'${admin}',db: all ,addr: intra ,auth: ssl ,title:'admin access via intranet with pwd'}- {user:'${admin}',db: all ,addr: world ,auth: deny ,title:'reject all other admin access addr'}- {user: 'all' ,db: all ,addr: intra ,auth: ssl ,title:'allow all user intra access with pwd'}# OPTIONAL delayed cluster for pg-metapg-meta-delay:# delayed instance for pg-meta (1 hour ago)hosts:{10.10.10.13:{pg_seq: 1, pg_role: primary, pg_upstream: 10.10.10.10, pg_delay:1h } }vars:{pg_cluster:pg-meta-delay }
Citus Distributive Cluster
Example: Citus Distributed Cluster: 5 Nodes
all:children:pg-citus0:# citus coordinator, pg_group = 0hosts:{10.10.10.10:{pg_seq: 1, pg_role:primary } }vars:{pg_cluster: pg-citus0 , pg_group:0}pg-citus1:# citus data node 1hosts:{10.10.10.11:{pg_seq: 1, pg_role:primary } }vars:{pg_cluster: pg-citus1 , pg_group:1}pg-citus2:# citus data node 2hosts:{10.10.10.12:{pg_seq: 1, pg_role:primary } }vars:{pg_cluster: pg-citus2 , pg_group:2}pg-citus3:# citus data node 3, with an extra replicahosts:10.10.10.13:{pg_seq: 1, pg_role:primary }10.10.10.14:{pg_seq: 2, pg_role:replica }vars:{pg_cluster: pg-citus3 , pg_group:3}vars:# global parameters for all citus clusterspg_mode: citus # pgsql cluster mode:cituspg_shard: pg-citus # citus shard name:pg-cituspatroni_citus_db:meta # citus distributed database namepg_dbsu_password:DBUser.Postgres# all dbsu password access for citus clusterpg_libs:'citus, timescaledb, pg_stat_statements, auto_explain'# citus will be added by patroni automaticallypg_extensions:- postgis34_${ pg_version }* timescaledb-2-postgresql-${ pg_version }* pgvector_${ pg_version }* citus_${ pg_version }*pg_users:[{name: dbuser_meta ,password: DBUser.Meta ,pgbouncer: true ,roles:[dbrole_admin ] } ]pg_databases:[{name: meta ,extensions:[{name:citus }, { name: postgis }, { name: timescaledb } ] } ]pg_hba_rules:- {user: 'all' ,db: all ,addr: 127.0.0.1/32 ,auth: ssl ,title:'all user ssl access from localhost'}- {user: 'all' ,db: all ,addr: intra ,auth: ssl ,title:'all user ssl access from intranet'}
etcd:# dcs service for postgres/patroni ha consensushosts:# 1 node for testing, 3 or 5 for production10.10.10.10:{etcd_seq:1}# etcd_seq required10.10.10.11:{etcd_seq:2}# assign from 1 ~ n10.10.10.12:{etcd_seq:3}# odd number pleasevars:# cluster level parameter override roles/etcdetcd_cluster:etcd # mark etcd cluster name etcdetcd_safeguard:false# safeguard against purgingetcd_clean:true# purge etcd during init process
MinIO Cluster
Example: Minio 3 Node Deployment
minio:hosts:10.10.10.10:{minio_seq:1}10.10.10.11:{minio_seq:2}10.10.10.12:{minio_seq:3}vars:minio_cluster:miniominio_data:'/data{1...2}'# use two disk per nodeminio_node:'${minio_cluster}-${minio_seq}.pigsty'# minio node name patternhaproxy_services:- name:minio # [REQUIRED] service name, uniqueport:9002# [REQUIRED] service port, uniqueoptions:- option httpchk- option http-keep-alive- http-check send meth OPTIONS uri /minio/health/live- http-check expect status 200servers:- {name: minio-1 ,ip: 10.10.10.10 , port: 9000 , options:'check-ssl ca-file /etc/pki/ca.crt check port 9000'}- {name: minio-2 ,ip: 10.10.10.11 , port: 9000 , options:'check-ssl ca-file /etc/pki/ca.crt check port 9000'}- {name: minio-3 ,ip: 10.10.10.12 , port: 9000 , options:'check-ssl ca-file /etc/pki/ca.crt check port 9000'}
3.6 - High Availability
Pigsty uses Patroni to achieve high availability for PostgreSQL, ensuring automatic failover.
Pigsty’s PostgreSQL cluster has battery-included high-availability powered by Patroni, Etcd, and HAProxy
When your have two or more instances in the PostgreSQL cluster, you have the ability to self-healing from hardware failures without any further configuration — as long as any instance within the cluster survives, the cluster can serve its services. Clients simply need to connect to any node in the cluster to obtain full services without worrying about replication topology changes.
By default, the recovery time objective (RTO) for primary failure is approximately 30s ~ 60s, and the data recovery point objective (RPO) is < 1MB; for standby failure, RPO = 0, RTO ≈ 0 (instantaneous). In consistency-first mode, zero data loss during failover is guaranteed: RPO = 0. These metrics can be configured as needed based on your actual hardware conditions and reliability requirements.
Pigsty incorporates an HAProxy load balancer for automatic traffic switching, offering multiple access methods for clients such as DNS/VIP/LVS. Failovers and switchover are almost imperceptible to the business side except for sporadic interruptions, meaning applications do not need connection string modifications or restarts.
What problems does High-Availability solve?
Elevates the availability aspect of data safety C/IA to a new height: RPO ≈ 0, RTO < 30s.
Enables seamless rolling maintenance capabilities, minimizing maintenance window requirements for great convenience.
Hardware failures can self-heal immediately without human intervention, allowing operations DBAs to sleep soundly.
Standbys can carry read-only requests, sharing the load with the primary to make full use of resources.
What are the costs of High Availability?
Infrastructure dependency: High availability relies on DCS (etcd/zk/consul) for consensus.
Increased entry barrier: A meaningful high-availability deployment environment requires at least three nodes.
Additional resource consumption: Each new standby consumes additional resources, which isn’t a major issue.
Since replication is real-time, all changes are immediately applied to the standby. Thus, high-availability solutions based on streaming replication cannot address human errors and software defects that cause data deletions or modifications. (e.g., DROP TABLE, or DELETE data)
Such failures require the use of Delayed Clusters or Point-In-Time Recovery using previous base backups and WAL archives.
Strategy
RTO (Time to Recover)
RPO (Max Data Loss)
Standalone + Do Nothing
Permanent data loss, irrecoverable
Total data loss
Standalone + Basic Backup
Depends on backup size and bandwidth (hours)
Loss of data since last backup (hours to days)
Standalone + Basic Backup + WAL Archiving
Depends on backup size and bandwidth (hours)
Loss of last unarchived data (tens of MB)
Primary-Replica + Manual Failover
Dozens of minutes
Replication Lag (about 100KB)
Primary-Replica + Auto Failover
Within a minute
Replication Lag (about 100KB)
Primary-Replica + Auto Failover + Synchronous Commit
Within a minute
No data loss
Implementation
In Pigsty, the high-availability architecture works as follows:
PostgreSQL uses standard streaming replication to set up physical standby databases. In case of a primary database failure, the standby takes over.
Patroni is responsible for managing PostgreSQL server processes and handles high-availability-related matters.
Etcd provides Distributed Configuration Store (DCS) capabilities and is used for leader election after a failure.
Patroni relies on Etcd to reach a consensus on cluster leadership and offers a health check interface to the outside.
HAProxy exposes cluster services externally and utilizes the Patroni health check interface to automatically route traffic to healthy nodes.
vip-manager offers an optional layer 2 VIP, retrieves leader information from Etcd, and binds the VIP to the node hosting the primary database.
Upon primary database failure, a new round of leader election is triggered. The healthiest standby in the cluster (with the highest LSN and least data loss) wins and is promoted to the new primary. After the promotion of the winning standby, read-write traffic is immediately routed to the new primary.
The impact of a primary failure is temporary unavailability of write services: from the primary’s failure to the promotion of a new primary, write requests will be blocked or directly fail, typically lasting 15 to 30 seconds, usually not exceeding 1 minute.
When a standby fails, read-only traffic is routed to other standbys. If all standbys fail, the primary will eventually carry the read-only traffic.
The impact of a standby failure is partial read-only query interruption: queries currently running on the failed standby will be aborted due to connection reset and immediately taken over by another available standby.
Failure detection is jointly completed by Patroni and Etcd. The cluster leader holds a lease,
if the cluster leader fails to renew the lease in time (10s) due to a failure, the lease will be released, triggering a failover and a new round of cluster elections.
Even without any failures, you can still proactively perform a Switchover to change the primary of the cluster. In this case, write queries on the primary will be interrupted and immediately routed to the new primary for execution. This operation can typically be used for rolling maintenance/upgrades of the database server.
Trade Offs
The ttl can be tuned with pg_rto, which is 30s by default, increasing it will cause longer failover wait time, while decreasing it will increase the false-positive failover rate (e.g. network jitter).
Pigsty will use availability first mode by default, which means when primary fails, it will try to failover ASAP, data not replicated to the replica may be lost (usually 100KB), and the max potential data loss is controlled by pg_rpo, which is 1MB by default.
Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are two parameters that need careful consideration when designing a high-availability cluster.
The default values of RTO and RPO used by Pigsty meet the reliability requirements for most scenarios. You can adjust them based on your hardware level, network quality, and business needs.
Smaller RTO and RPO are not always better!
A smaller RTO increases the likelihood of false positives, and a smaller RPO reduces the probability of successful automatic failovers.
The maximum duration of unavailability during a failover is controlled by the pg_rto parameter, with a default value of 30s. Increasing it will lead to a longer duration of unavailability for write operations during primary failover, while decreasing it will increase the rate of false failovers (e.g., due to brief network jitters).
The upper limit of potential data loss is controlled by the pg_rpo parameter, defaulting to 1MB. Lowering this value can reduce the upper limit of data loss during failovers but also increases the likelihood of refusing automatic failovers due to insufficiently healthy standbys (too far behind).
Pigsty defaults to an availability-first mode, meaning that it will proceed with a failover as quickly as possible when the primary fails, and data not yet replicated to the standby might be lost (under regular ten-gigabit networks, replication delay is usually between a few KB to 100KB).
If you need to ensure no data loss during failovers, you can use the crit.yml template to ensure no data loss during failovers, but this will come at the cost of some performance.
recovery time objective in seconds, This will be used as Patroni TTL value, 30s by default.
If a primary instance is missing for such a long time, a new leader election will be triggered.
Decrease the value can reduce the unavailable time (unable to write) of the cluster during failover,
but it will make the cluster more sensitive to network jitter, thus increase the chance of false-positive failover.
Config this according to your network condition and expectation to trade-off between chance and impact,
the default value is 30s, and it will be populated to the following patroni parameters:
# the TTL to acquire the leader lock (in seconds). Think of it as the length of time before initiation of the automatic failover process. Default value: 30ttl:{{pg_rto }}# the number of seconds the loop will sleep. Default value: 10 , this is patroni check loop intervalloop_wait:{{(pg_rto / 3)|round(0, 'ceil')|int }}# timeout for DCS and PostgreSQL operation retries (in seconds). DCS or network issues shorter than this will not cause Patroni to demote the leader. Default value: 10retry_timeout:{{(pg_rto / 3)|round(0, 'ceil')|int }}# the amount of time a primary is allowed to recover from failures before failover is triggered (in seconds), Max RTO: 2 loop wait + primary_start_timeoutprimary_start_timeout:{{(pg_rto / 3)|round(0, 'ceil')|int }}
recovery point objective in bytes, 1MiB at most by default
default values: 1048576, which will tolerate at most 1MiB data loss during failover.
when the primary is down and all replicas are lagged, you have to make a tough choice to trade off between Availability and Consistency:
Promote a replica to be the new primary and bring system back online ASAP, with the price of an acceptable data loss (e.g. less than 1MB).
Wait for the primary to come back (which may never be) or human intervention to avoid any data loss.
You can use crit.ymlcrit.yml template to ensure no data loss during failover, but it will sacrifice some performance.
3.7 - Point-in-Time Recovery
Pigsty utilizes pgBackRest for PostgreSQL point-in-time recovery, allowing users to roll back to any point within the backup policy limits.
Overview
You can roll back your cluster to any point in time, avoiding data loss caused by software defects and human errors.
Pigsty’s PostgreSQL clusters come with an automatically configured Point in Time Recovery (PITR) solution, based on the backup component pgBackRest and the optional object storage repository MinIO.
High Availability solutions can address hardware failures, but they are powerless against data deletions/overwrites/deletions caused by software defects and human errors. For such scenarios, Pigsty offers an out-of-the-box Point in Time Recovery (PITR) capability, enabled by default without any additional configuration.
Pigsty provides you with the default configuration for base backups and WAL archiving, allowing you to use local directories and disks, or dedicated MinIO clusters or S3 object storage services to store backups and achieve off-site disaster recovery. When using local disks, by default, you retain the ability to recover to any point in time within the past day. When using MinIO or S3, by default, you retain the ability to recover to any point in time within the past week. As long as storage space permits, you can keep a recoverable time span as long as desired, based on your budget.
What problems does Point in Time Recovery (PITR) solve?
Enhanced disaster recovery capability: RPO reduces from ∞ to a few MBs, RTO from ∞ to a few hours/minutes.
Ensures data security: Data Integrity among C/I/A: avoids data consistency issues caused by accidental deletions.
Ensures data security: Data Availability among C/I/A: provides a safety net for “permanently unavailable” disasters.
Singleton Strategy
Event
RTO
RPO
Do nothing
Crash
Permanently lost
All lost
Basic backup
Crash
Depends on backup size and bandwidth (a few hours)
Loss of data after the last backup (a few hours to days)
Basic backup + WAL Archiving
Crash
Depends on backup size and bandwidth (a few hours)
Loss of data not yet archived (a few dozen MBs)
What are the costs of Point in Time Recovery?
Reduced C in data security: Confidentiality, creating additional leakage points, requiring extra protection for backups.
Additional resource consumption: local storage or network traffic/bandwidth costs, usually not a problem.
Increased complexity cost: users need to invest in backup management.
Limitations of Point in Time Recovery
If PITR is the only method for fault recovery, the RTO and RPO metrics are inferior compared to High Availability solutions, and it’s usually best to use both in combination.
RTO: With only a single machine + PITR, recovery time depends on backup size and network/disk bandwidth, ranging from tens of minutes to several hours or days.
RPO: With only a single machine + PITR, a crash might result in the loss of a small amount of data, as one or several WAL log segments might not yet be archived, losing between 16 MB to several dozen MBs of data.
Apart from PITR, you can also use Delayed Clusters in Pigsty to address data deletion or alteration issues caused by human errors or software defects.
How does PITR works?
Point in Time Recovery allows you to roll back your cluster to any “specific moment” in the past, avoiding data loss caused by software defects and human errors. To achieve this, two key preparations are necessary: Base Backups and WAL Archiving. Having a Base Backup allows users to restore the database to the state at the time of the backup, while having WAL Archiving from a certain base backup enables users to restore the database to any point in time after the base backup.
Pigsty uses pgBackRest to manage PostgreSQL backups. pgBackRest will initialize an empty repository on all cluster instances, but it will only use the repository on the primary instance.
pgBackRest supports three backup modes: Full Backup, Incremental Backup, and Differential Backup, with the first two being the most commonly used. A Full Backup takes a complete physical snapshot of the database cluster at a current moment, while an Incremental Backup records the differences between the current database cluster and the last full backup.
Pigsty provides a wrapper command for backups: /pg/bin/pg-backup [full|incr]. You can make base backups periodically as needed through Crontab or any other task scheduling system.
WAL Archiving
By default, Pigsty enables WAL archiving on the primary instance of the cluster and continuously pushes WAL segment files to the backup repository using the pgbackrest command-line tool.
pgBackRest automatically manages the required WAL files and promptly cleans up expired backups and their corresponding WAL archive files according to the backup retention policy.
If you do not need PITR functionality, you can disable WAL archiving by configuring the cluster: archive_mode: off, and remove node_crontab to stop periodic backup tasks.
Implementation
By default, Pigsty provides two preset backup strategies: using the local filesystem backup repository by default, where a full backup is taken daily to ensure users can roll back to any point within a day at any time. The alternative strategy uses a dedicated MinIO cluster or S3 storage for backups, with a full backup on Monday and incremental backups daily, keeping two weeks of backups and WAL archives by default.
Pigsty uses pgBackRest to manage backups, receive WAL archives, and perform PITR. The backup repository can be flexibly configured (pgbackrest_repo): by default, it uses the local filesystem (local) of the primary instance, but it can also use other disk paths, or the optional MinIO service (minio) and cloud-based S3 services.
pgbackrest_repo: # pgbackrest repo:https://pgbackrest.org/configuration.html#section-repositorylocal:# default pgbackrest repo with local posix fspath:/pg/backup # local backup directory, `/pg/backup` by defaultretention_full_type:count # retention full backups by countretention_full:2# keep 2, at most 3 full backup when using local fs repominio:# optional minio repo for pgbackresttype:s3 # minio is s3-compatible, so s3 is useds3_endpoint:sss.pigsty # minio endpoint domain name, `sss.pigsty` by defaults3_region:us-east-1 # minio region, us-east-1 by default, useless for minios3_bucket:pgsql # minio bucket name, `pgsql` by defaults3_key:pgbackrest # minio user access key for pgbackrests3_key_secret:S3User.Backup # minio user secret key for pgbackrests3_uri_style:path # use path style uri for minio rather than host stylepath:/pgbackrest # minio backup path, default is `/pgbackrest`storage_port:9000# minio port, 9000 by defaultstorage_ca_file:/etc/pki/ca.crt # minio ca file path, `/etc/pki/ca.crt` by defaultbundle:y# bundle small files into a single filecipher_type:aes-256-cbc # enable AES encryption for remote backup repocipher_pass:pgBackRest # AES encryption password, default is 'pgBackRest'retention_full_type:time # retention full backup by time on minio reporetention_full:14# keep full backup for last 14 days
Pigsty has two built-in backup options: local file system repository with daily full backups or dedicated MinIO/S3 storage with weekly full and daily incremental backups, retaining two weeks’ worth by default.
The target repositories in Pigsty parameter pgbackrest_repo are translated into repository definitions in the /etc/pgbackrest/pgbackrest.conf configuration file.
For example, if you define a West US region S3 repository for cold backups, you could use the following reference configuration.
You can use the following encapsulated commands for Point in Time Recovery of the PostgreSQL database cluster.
By default, Pigsty uses incremental, differential, parallel recovery, allowing you to restore to a specified point in time as quickly as possible.
pg-pitr # restore to wal archive stream end (e.g. used in case of entire DC failure)
pg-pitr -i # restore to the time of latest backup complete (not often used)
pg-pitr --time="2022-12-30 14:44:44+08" # restore to specific time point (in case of drop db, drop table)
pg-pitr --name="my-restore-point" # restore TO a named restore point create by pg_create_restore_point
pg-pitr --lsn="0/7C82CB8" -X # restore right BEFORE a LSN
pg-pitr --xid="1234567" -X -P # restore right BEFORE a specific transaction id, then promote
pg-pitr --backup=latest # restore to latest backup set
pg-pitr --backup=20221108-105325 # restore to a specific backup set, which can be checked with pgbackrest info
pg-pitr # pgbackrest --stanza=pg-meta restore
pg-pitr -i # pgbackrest --stanza=pg-meta --type=immediate restore
pg-pitr -t "2022-12-30 14:44:44+08" # pgbackrest --stanza=pg-meta --type=time --target="2022-12-30 14:44:44+08" restore
pg-pitr -n "my-restore-point" # pgbackrest --stanza=pg-meta --type=name --target=my-restore-point restore
pg-pitr -b 20221108-105325F # pgbackrest --stanza=pg-meta --type=name --set=20221230-120101F restore
pg-pitr -l "0/7C82CB8" -X # pgbackrest --stanza=pg-meta --type=lsn --target="0/7C82CB8" --target-exclusive restore
pg-pitr -x 1234567 -X -P # pgbackrest --stanza=pg-meta --type=xid --target="0/7C82CB8" --target-exclusive --target-action=promote restore
During PITR, you can observe the LSN point status of the cluster using the Pigsty monitoring system to determine if it has successfully restored to the specified time point, transaction point, LSN point, or other points.
3.8 - Services & Access
Pigsty employs HAProxy for service access, offering optional pgBouncer for connection pooling, and optional L2 VIP and DNS access.
Split read & write, route traffic to the right place, and achieve stable & reliable access to the PostgreSQL cluster.
Service is an abstraction to seal the details of the underlying cluster, especially during cluster failover/switchover.
Personal User
Service is meaningless to personal users. You can access the database with raw IP address or whatever method you like.
psql postgres://dbuser_dba:DBUser.DBA@10.10.10.10/meta # dbsu direct connectpsql postgres://dbuser_meta:DBUser.Meta@10.10.10.10/meta # default business admin userpsql postgres://dbuser_view:DBUser.View@pg-meta/meta # default read-only user
Service Overview
We utilize a PostgreSQL database cluster based on replication in real-world production environments. Within the cluster, only one instance is the leader (primary) that can accept writes. Other instances (replicas) continuously fetch WAL from the leader to stay synchronized. Additionally, replicas can handle read-only queries and offload the primary in read-heavy, write-light scenarios. Thus, distinguishing between write and read-only requests is a common practice.
Moreover, we pool requests through a connection pooling middleware (Pgbouncer) for high-frequency, short-lived connections to reduce the overhead of connection and backend process creation. And, for scenarios like ETL and change execution, we need to bypass the connection pool and directly access the database servers.
Furthermore, high-availability clusters may undergo failover during failures, causing a change in the cluster leadership. Therefore, the RW requests should be re-routed automatically to the new leader.
These varied requirements (read-write separation, pooling vs. direct connection, and client request failover) have led to the abstraction of the service concept.
Typically, a database cluster must provide this basic service:
Read-write service (primary): Can read and write to the database.
For production database clusters, at least these two services should be provided:
Read-write service (primary): Write data: Only carried by the primary.
Read-only service (replica): Read data: Can be carried by replicas, but fallback to the primary if no replicas are available.
Additionally, there might be other services, such as:
Direct access service (default): Allows (admin) users to bypass the connection pool and directly access the database.
Offline replica service (offline): A dedicated replica that doesn’t handle online read traffic, used for ETL and analytical queries.
Synchronous replica service (standby): A read-only service with no replication delay, handled by synchronous standby/primary for read queries.
Delayed replica service (delayed): Accesses older data from the same cluster from a certain time ago, handled by delayed replicas.
Default Service
Pigsty will enable four default services for each PostgreSQL cluster:
Take the default pg-meta cluster as an example, you can access these services in the following ways:
psql postgres://dbuser_meta:DBUser.Meta@pg-meta:5433/meta # pg-meta-primary : production read/write via primary pgbouncer(6432)psql postgres://dbuser_meta:DBUser.Meta@pg-meta:5434/meta # pg-meta-replica : production read-only via replica pgbouncer(6432)psql postgres://dbuser_dba:DBUser.DBA@pg-meta:5436/meta # pg-meta-default : Direct connect primary via primary postgres(5432)psql postgres://dbuser_stats:DBUser.Stats@pg-meta:5438/meta # pg-meta-offline : Direct connect offline via offline postgres(5432)
Here the pg-meta domain name point to the cluster’s L2 VIP, which in turn points to the haproxy load balancer on the primary instance. It is responsible for routing traffic to different instances, check Access Services for details.
Primary Service
The primary service may be the most critical service for production usage.
It means all cluster members will be included in the primary service (selector: "[]"), but the one and only one instance that past health check (check: /primary) will be used as the primary instance.
Patroni will guarantee that only one instance is primary at any time, so the primary service will always route traffic to THE primary instance.
Example: pg-test-primary haproxy config
listen pg-test-primarybind *:5433mode tcpmaxconn 5000balance roundrobinoption httpchkoption http-keep-alivehttp-check send meth OPTIONS uri /primaryhttp-check expect status 200default-server inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100# serversserver pg-test-1 10.10.10.11:6432 check port 8008 weight 100server pg-test-3 10.10.10.13:6432 check port 8008 weight 100server pg-test-2 10.10.10.12:6432 check port 8008 weight 100
Replica Service
The replica service is used for production read-only traffic.
There may be many more read-only queries than read-write queries in real-world scenarios. You may have many replicas.
The replica service traffic will try to use common pg instances with pg_role = replica to alleviate the load on the primary instance as much as possible. It will try NOT to use instances with pg_role = offline to avoid mixing OLAP & OLTP queries as much as possible.
All cluster members will be included in the replica service (selector: "[]") when it passes the read-only health check (check: /read-only).
primary and offline instances are used as backup servers, which will take over in case of all replica instances are down.
Example: pg-test-replica haproxy config
listen pg-test-replicabind *:5434mode tcpmaxconn 5000balance roundrobinoption httpchkoption http-keep-alivehttp-check send meth OPTIONS uri /read-onlyhttp-check expect status 200default-server inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100# serversserver pg-test-1 10.10.10.11:6432 check port 8008 weight 100 backupserver pg-test-3 10.10.10.13:6432 check port 8008 weight 100server pg-test-2 10.10.10.12:6432 check port 8008 weight 100
Default Service
The default service will route to primary postgres (5432) by default.
It is quite like the primary service, except it will always bypass pgbouncer, regardless of pg_default_service_dest.
Which is useful for administration connection, ETL writes, CDC changing data capture, etc…
!> Remember to change these password in production deployment !
pg_dbsu:postgres # os user for the databasepg_replication_username:replicator # system replication userpg_replication_password:DBUser.Replicator # system replication passwordpg_monitor_username:dbuser_monitor # system monitor userpg_monitor_password:DBUser.Monitor # system monitor passwordpg_admin_username:dbuser_dba # system admin userpg_admin_password:DBUser.DBA # system admin password
- GRANT USAGE ON SCHEMAS TO dbrole_readonly- GRANT SELECT ON TABLES TO dbrole_readonly- GRANT SELECT ON SEQUENCES TO dbrole_readonly- GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly- GRANT USAGE ON SCHEMAS TO dbrole_offline- GRANT SELECT ON TABLES TO dbrole_offline- GRANT SELECT ON SEQUENCES TO dbrole_offline- GRANT EXECUTE ON FUNCTIONS TO dbrole_offline- GRANT INSERT ON TABLES TO dbrole_readwrite- GRANT UPDATE ON TABLES TO dbrole_readwrite- GRANT DELETE ON TABLES TO dbrole_readwrite- GRANT USAGE ON SEQUENCES TO dbrole_readwrite- GRANT UPDATE ON SEQUENCES TO dbrole_readwrite- GRANT TRUNCATE ON TABLES TO dbrole_admin- GRANT REFERENCES ON TABLES TO dbrole_admin- GRANT TRIGGER ON TABLES TO dbrole_admin- GRANT CREATE ON SCHEMAS TO dbrole_admin
Newly created objects will have corresponding privileges when it is created by admin users
The \ddp+ may looks like:
Type
Access privileges
function
=X
dbrole_readonly=X
dbrole_offline=X
dbrole_admin=X
schema
dbrole_readonly=U
dbrole_offline=U
dbrole_admin=UC
sequence
dbrole_readonly=r
dbrole_offline=r
dbrole_readwrite=wU
dbrole_admin=rwU
table
dbrole_readonly=r
dbrole_offline=r
dbrole_readwrite=awd
dbrole_admin=arwdDxt
Default Privilege
ALTER DEFAULT PRIVILEGES allows you to set the privileges that will be applied to objects created in the future.
It does not affect privileges assigned to already-existing objects, and objects created by non-admin users.
Pigsty will use the following default privileges:
{%forprivinpg_default_privileges%}ALTERDEFAULTPRIVILEGESFORROLE{{pg_dbsu}}{{priv}};{%endfor%}{%forprivinpg_default_privileges%}ALTERDEFAULTPRIVILEGESFORROLE{{pg_admin_username}}{{priv}};{%endfor%}-- for additional business admin, they can SET ROLE to dbrole_admin
{%forprivinpg_default_privileges%}ALTERDEFAULTPRIVILEGESFORROLE"dbrole_admin"{{priv}};{%endfor%}
Which will be rendered in pg-init-template.sql alone with ALTER DEFAULT PRIVILEGES statement for admin users.
These SQL command will be executed on postgres & template1 during cluster bootstrap, and newly created database will inherit it from tempalte1 by default.
That is to say, to maintain the correct object privilege, you have to run DDL with admin users, which could be:
It’s wise to use postgres as global object owner to perform DDL changes.
If you wish to create objects with business admin user, YOU MUST USE SET ROLE dbrole_admin before running that DDL to maintain the correct privileges.
You can also ALTER DEFAULT PRIVILEGE FOR ROLE <some_biz_admin> XXX to grant default privilege to business admin user, too.
There are 3 database level privileges: CONNECT, CREATE, TEMP, and a special ‘privilege’: OWNERSHIP.
- name:meta # required, `name` is the only mandatory field of a database definitionowner:postgres # optional, specify a database owner, {{ pg_dbsu }} by defaultallowconn:true# optional, allow connection, true by default. false will disable connect at allrevokeconn:false# optional, revoke public connection privilege. false by default. (leave connect with grant option to owner)
If owner exists, it will be used as database owner instead of default {{ pg_dbsu }}
If revokeconn is false, all users have the CONNECT privilege of the database, this is the default behavior.
If revokeconn is set to true explicitly:
CONNECT privilege of the database will be revoked from PUBLIC
CONNECT privilege will be granted to {{ pg_replication_username }}, {{ pg_monitor_username }} and {{ pg_admin_username }}
CONNECT privilege will be granted to database owner with GRANT OPTION
revokeconn flag can be used for database access isolation, you can create different business users as the owners for each database and set the revokeconn option for all of them.
Pigsty revokes the CREATE privilege on database from PUBLIC by default, for security consideration.
And this is the default behavior since PostgreSQL 15.
The database owner have the full capability to adjust these privileges as they see fit.
4 - Configuration
Batteries-included config templates for specific scenarios, and detailed explanations.
meta: Default single-node installation template with extensive configuration parameter descriptions.
rich: Downloads all available PostgreSQL extensions and Docker, provisioning a series of databases for software backups.
pitr: A single-node configuration sample for continuous backups and Point-in-Time Recovery (PITR) using remote object storage.
demo: Configuration file used by the Pigsty demo site, configured to serve publicly with domain names and certificates.
Dual Node
dual: Dual-node template to set up a basic high-availability PostgreSQL cluster with master-slave replication, tolerating the failure of one node.
slim: Minimal installation that avoids setting up local software repositories or infrastructure, relying only on etcd for a highly available PostgreSQL cluster.
pg16: A dual-node PostgreSQL cluster installation using PostgreSQL 16 as a substitute for the current default, PostgreSQL 17.
supa: Uses Docker Compose to start Supabase on a local master-slave PostgreSQL cluster.
Three Node
trio: Three-node configuration template providing a standard High Availability (HA) architecture, tolerating failure of one out of three nodes.
safe: A three-node template with hardened security, following high-standard security best practices.
infra: Deploys a three-node observability infrastructure to monitor other RDS PostgreSQL instances.
minio: Installs a three-node high-availability MinIO cluster to provide S3-compatible object storage services.
Four Node
full: A four-node standard sandbox demonstration environment, featuring two PostgreSQL clusters, MinIO, Etcd, Redis, and a sample FerretDB cluster.
mssql: Replaces PostgreSQL with a Microsoft SQL Server-compatible kernel using WiltonDB or Babelfish.
polar: Substitutes native PostgreSQL with Alibaba Cloud’s PolarDB for PostgreSQL kernel.
ivory: Replaces native PostgreSQL with IvorySQL, an Oracle-compatible kernel by HighGo.
4.2 - 1-node: meta
Default single-node installation template with extensive configuration parameter descriptions.
The meta configuration template is Pigsty’s default template, designed to fulfill Pigsty’s core functionality—deploying PostgreSQL—on a single node.
To maximize compatibility, meta installs only the minimum required software set to ensure it runs across all operating system distributions and architectures.
all:#==============================================================## Clusters, Nodes, and Modules#==============================================================#children:#----------------------------------## infra: monitor, alert, repo, etc..#----------------------------------#infra:{hosts:{10.10.10.10:{infra_seq:1}}}#----------------------------------## etcd cluster for HA postgres DCS#----------------------------------#etcd:{hosts:{10.10.10.10:{etcd_seq: 1 } }, vars:{etcd_cluster:etcd } }#----------------------------------## minio (OPTIONAL backup repo)#----------------------------------##minio: { hosts: { 10.10.10.10: { minio_seq: 1 } }, vars: { minio_cluster: minio } }#----------------------------------## pgsql (singleton on current node)#----------------------------------## this is an example single-node postgres cluster with postgis & timescaledb installed, with one biz database & two biz userspg-meta:hosts:10.10.10.10:{pg_seq: 1, pg_role:primary }# <---- primary instance with read-write capability#x.xx.xx.xx: { pg_seq: 2, pg_role: replica } # <---- read only replica for read-only online traffic#x.xx.xx.xy: { pg_seq: 3, pg_role: offline } # <---- offline instance of ETL & interactive queriesvars:pg_cluster:pg-meta # required identity parameter, usually same as group name# define business databases here: https://pigsty.io/docs/pgsql/db/pg_databases:# define business databases on this cluster, array of database definition- name:meta # REQUIRED, `name` is the only mandatory field of a database definitionbaseline: cmdb.sql # optional, database sql baseline path, (relative path among ansible search path, e.g:files/)schemas:[pigsty ] # optional, additional schemas to be created, array of schema namesextensions: # optional, additional extensions to be installed:array of `{name[,schema]}`- {name:vector } # install pgvector extension on this database by defaultcomment:pigsty meta database # optional, comment string for this database#pgbouncer: true # optional, add this database to pgbouncer database list? true by default#owner: postgres # optional, database owner, postgres by default#template: template1 # optional, which template to use, template1 by default#encoding: UTF8 # optional, database encoding, UTF8 by default. (MUST same as template database)#locale: C # optional, database locale, C by default. (MUST same as template database)#lc_collate: C # optional, database collate, C by default. (MUST same as template database)#lc_ctype: C # optional, database ctype, C by default. (MUST same as template database)#tablespace: pg_default # optional, default tablespace, 'pg_default' by default.#allowconn: true # optional, allow connection, true by default. false will disable connect at all#revokeconn: false # optional, revoke public connection privilege. false by default. (leave connect with grant option to owner)#register_datasource: true # optional, register this database to grafana datasources? true by default#connlimit: -1 # optional, database connection limit, default -1 disable limit#pool_auth_user: dbuser_meta # optional, all connection to this pgbouncer database will be authenticated by this user#pool_mode: transaction # optional, pgbouncer pool mode at database level, default transaction#pool_size: 64 # optional, pgbouncer pool size at database level, default 64#pool_size_reserve: 32 # optional, pgbouncer pool size reserve at database level, default 32#pool_size_min: 0 # optional, pgbouncer pool size min at database level, default 0#pool_max_db_conn: 100 # optional, max database connections at database level, default 100#- { name: grafana ,owner: dbuser_grafana ,revokeconn: true ,comment: grafana primary database } # define another database# define business users here: https://pigsty.io/docs/pgsql/user/pg_users:# define business users/roles on this cluster, array of user definition- name:dbuser_meta # REQUIRED, `name` is the only mandatory field of a user definitionpassword:DBUser.Meta # optional, password, can be a scram-sha-256 hash string or plain textlogin:true# optional, can log in, true by default (new biz ROLE should be false)superuser:false# optional, is superuser? false by defaultcreatedb:false# optional, can create database? false by defaultcreaterole:false# optional, can create role? false by defaultinherit:true# optional, can this role use inherited privileges? true by defaultreplication:false# optional, can this role do replication? false by defaultbypassrls:false# optional, can this role bypass row level security? false by defaultpgbouncer:true# optional, add this user to pgbouncer user-list? false by default (production user should be true explicitly)connlimit:-1# optional, user connection limit, default -1 disable limitexpire_in:3650# optional, now + n days when this role is expired (OVERWRITE expire_at)expire_at:'2030-12-31'# optional, YYYY-MM-DD 'timestamp' when this role is expired (OVERWRITTEN by expire_in)comment:pigsty admin user # optional, comment string for this user/roleroles: [dbrole_admin] # optional, belonged roles. default roles are:dbrole_{admin,readonly,readwrite,offline}parameters:{}# optional, role level parameters with `ALTER ROLE SET`pool_mode:transaction # optional, pgbouncer pool mode at user level, transaction by defaultpool_connlimit:-1# optional, max database connections at user level, default -1 disable limit- {name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly], comment:read-only viewer for meta database }# define pg extensions: https://pigsty.io/docs/pgext/ , and available alias: https://ext.pigsty.io/#/listpg_libs:'pg_stat_statements, auto_explain'# add timescaledb to shared_preload_librariespg_extensions:[pgvector ]# define HBA rules here: https://pigsty.io/docs/pgsql/hba/#define-hbapg_hba_rules:# example hba rules- {user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title:'allow grafana dashboard access cmdb from infra nodes'}#pg_vip_enabled: true # define a L2 VIP which bind to cluster primary instance#pg_vip_address: 10.10.10.2/24 # L2 VIP Address and netmask#pg_vip_interface: eth1 # L2 VIP Network interface, overwrite on host vars if member have different network interface namesnode_crontab:['00 01 * * * postgres /pg/bin/pg-backup full']# make a full backup every 1am#----------------------------------## example cluster (3-node pg-test)#----------------------------------##pg-test: # define the new 3-node cluster pg-test# hosts:# 10.10.10.11: { pg_seq: 1, pg_role: primary } # primary instance, leader of cluster# 10.10.10.12: { pg_seq: 2, pg_role: replica } # replica instance, follower of leader# 10.10.10.13: { pg_seq: 3, pg_role: replica, pg_offline_query: true } # replica with offline access# vars:# pg_cluster: pg-test # define pgsql cluster name# pg_users: [{ name: test , password: test , pgbouncer: true , roles: [ dbrole_admin ] }]# pg_databases: [{ name: test }] # create a database and user named 'test'# node_tune: tiny# pg_conf: tiny.yml# pg_vip_enabled: true# pg_vip_address: 10.10.10.3/24# pg_vip_interface: eth1# node_crontab: # make a full backup on monday 1am, and an incremental backup during weekdays# - '00 01 * * 1 postgres /pg/bin/pg-backup full'# - '00 01 * * 2,3,4,5,6,7 postgres /pg/bin/pg-backup'#==============================================================## Global Parameters#==============================================================#vars:#----------------------------------## Meta Data#----------------------------------#version:v3.1.0 # pigsty version stringadmin_ip:10.10.10.10# admin node ip addressregion: default # upstream mirror region:default|china|europenode_tune: tiny # node tuning specs:oltp,olap,tiny,critpg_conf: tiny.yml # pgsql tuning specs:{oltp,olap,tiny,crit}.ymlproxy_env:# global proxy env when downloading packagesno_proxy:"localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.myqcloud.com,*.tsinghua.edu.cn"# http_proxy: # set your proxy here: e.g http://user:pass@proxy.xxx.com# https_proxy: # set your proxy here: e.g http://user:pass@proxy.xxx.com# all_proxy: # set your proxy here: e.g http://user:pass@proxy.xxx.cominfra_portal:# domain names and upstream servershome :{domain:h.pigsty }grafana :{domain: g.pigsty ,endpoint:"${admin_ip}:3000", websocket:true}prometheus :{domain: p.pigsty ,endpoint:"${admin_ip}:9090"}alertmanager :{domain: a.pigsty ,endpoint:"${admin_ip}:9093"}blackbox :{endpoint:"${admin_ip}:9115"}loki :{endpoint:"${admin_ip}:3100"}#minio : { domain: sss.pigsty ,endpoint: "${admin_ip}:9001" ,scheme: https ,websocket: true }#----------------------------------## MinIO Related Options#----------------------------------##pgbackrest_method: minio # if you want to use minio as backup repo instead of 'local' fs, uncomment this#minio_users: # and configure `pgbackrest_repo` & `minio_users` accordingly# - { access_key: dba , secret_key: S3User.DBA, policy: consoleAdmin }# - { access_key: pgbackrest , secret_key: S3User.Backup, policy: readwrite }#pgbackrest_repo: # pgbackrest repo: https://pgbackrest.org/configuration.html#section-repository# minio: ... # optional minio repo for pgbackrest ...# s3_key: pgbackrest # minio user access key for pgbackrest# s3_key_secret: S3User.Backup # minio user secret key for pgbackrest# cipher_pass: pgBackRest # AES encryption password, default is 'pgBackRest'# if you want to use minio as backup repo instead of 'local' fs, uncomment this, and configure `pgbackrest_repo`#pgbackrest_method: minio#node_etc_hosts: [ '10.10.10.10 h.pigsty a.pigsty p.pigsty g.pigsty sss.pigsty' ]#----------------------------------## Credential: CHANGE THESE PASSWORDS#----------------------------------##grafana_admin_username: admingrafana_admin_password:pigsty#pg_admin_username: dbuser_dbapg_admin_password:DBUser.DBA#pg_monitor_username: dbuser_monitorpg_monitor_password:DBUser.Monitor#pg_replication_username: replicatorpg_replication_password:DBUser.Replicator#patroni_username: postgrespatroni_password:Patroni.API#haproxy_admin_username: adminhaproxy_admin_password:pigsty#----------------------------------## Safe Guard#----------------------------------## you can enable these flags after bootstrap, to prevent purging running etcd / pgsql instancesetcd_safeguard:false# prevent purging running etcd instance?pg_safeguard:false# prevent purging running postgres instance? false by default#----------------------------------## Repo, Node, Packages#----------------------------------## if you wish to customize your own repo, change these settings:repo_modules:infra,node,pgsql # install upstream repo during repo bootstraprepo_remove:true# remove existing repo on admin node during repo bootstrapnode_repo_modules:local # install the local module in repo_upstream for all nodesnode_repo_remove:true# remove existing node repo for node managed by pigstyrepo_packages:[# default packages to be downloaded (if `repo_packages` is not explicitly set)node-bootstrap, infra-package, infra-addons, node-package1, node-package2, pgsql-common#,docker]repo_extra_packages:[# default postgres packages to be downloadedpgsql-main#,pgsql-core,pgsql-time,pgsql-gis,pgsql-rag,pgsql-fts,pgsql-olap,pgsql-feat,pgsql-lang,pgsql-type,pgsql-func,pgsql-admin,pgsql-stat,pgsql-sec,pgsql-fdw,pgsql-sim,pgsql-etl,#,pg17-core,pg17-time,pg17-gis,pg17-rag,pg17-fts,pg17-olap,pg17-feat,pg17-lang,pg17-type,pg17-func,pg17-admin,pg17-stat,pg17-sec,pg17-fdw,pg17-sim,pg17-etl,#,pg16-core,pg16-time,pg16-gis,pg16-rag,pg16-fts,pg16-olap,pg16-feat,pg16-lang,pg16-type,pg16-func,pg16-admin,pg16-stat,pg16-sec,pg16-fdw,pg16-sim,pg16-etl,]
Caveats
To maintain compatibility across operating systems and architectures, the meta template installs only the minimum required software. This is reflected in the choices for repo_packages and repo_extra_packages:
Docker is not downloaded by default.
Only essential PostgreSQL extensions—pg_repack, wal2json, and pgvector—are downloaded by default.
Tools classified under pgsql-utility but not part of pgsql-common—such as pg_activity, pg_timetable, pgFormatter, pg_filedump, pgxnclient, timescaledb-tools, pgcopydb, and pgloader—are not downloaded.
4.3 - 1-node: rich
Downloads all available PostgreSQL extensions and Docker, provisioning a series of databases for software backups.
The rich configuration template is designed for single-node deployments. Built upon meta, it downloads all available PostgreSQL extensions and Docker and preconfigures a set of databases to provide an out-of-the-box environment for software integrations.
The meta configuration template is Pigsty’s default template, designed to fulfill Pigsty’s core functionality—deploying PostgreSQL—on a single node.
To maximize compatibility, meta installs only the minimum required software set to ensure it runs across all operating system distributions and architectures.
all:#==============================================================## Clusters, Nodes, and Modules#==============================================================#children:#----------------------------------## infra: monitor, alert, repo, etc..#----------------------------------#infra:{hosts:{10.10.10.10:{infra_seq:1}}}#----------------------------------## etcd cluster for HA postgres DCS#----------------------------------#etcd:{hosts:{10.10.10.10:{etcd_seq: 1 } }, vars:{etcd_cluster:etcd } }#----------------------------------## minio (OPTIONAL backup repo)#----------------------------------##minio: { hosts: { 10.10.10.10: { minio_seq: 1 } }, vars: { minio_cluster: minio } }#----------------------------------## pgsql (singleton on current node)#----------------------------------## this is an example single-node postgres cluster with postgis & timescaledb installed, with one biz database & two biz userspg-meta:hosts:10.10.10.10:{pg_seq: 1, pg_role:primary }# <---- primary instance with read-write capability#x.xx.xx.xx: { pg_seq: 2, pg_role: replica } # <---- read only replica for read-only online traffic#x.xx.xx.xy: { pg_seq: 3, pg_role: offline } # <---- offline instance of ETL & interactive queriesvars:pg_cluster:pg-meta # required identity parameter, usually same as group name# define business databases here: https://pigsty.io/docs/pgsql/db/pg_databases:# define business databases on this cluster, array of database definition- name:meta # REQUIRED, `name` is the only mandatory field of a database definitionbaseline: cmdb.sql # optional, database sql baseline path, (relative path among ansible search path, e.g:files/)schemas:[pigsty ] # optional, additional schemas to be created, array of schema namesextensions: # optional, additional extensions to be installed:array of `{name[,schema]}`- {name:vector } # install pgvector extension on this database by defaultcomment:pigsty meta database # optional, comment string for this database#pgbouncer: true # optional, add this database to pgbouncer database list? true by default#owner: postgres # optional, database owner, postgres by default#template: template1 # optional, which template to use, template1 by default#encoding: UTF8 # optional, database encoding, UTF8 by default. (MUST same as template database)#locale: C # optional, database locale, C by default. (MUST same as template database)#lc_collate: C # optional, database collate, C by default. (MUST same as template database)#lc_ctype: C # optional, database ctype, C by default. (MUST same as template database)#tablespace: pg_default # optional, default tablespace, 'pg_default' by default.#allowconn: true # optional, allow connection, true by default. false will disable connect at all#revokeconn: false # optional, revoke public connection privilege. false by default. (leave connect with grant option to owner)#register_datasource: true # optional, register this database to grafana datasources? true by default#connlimit: -1 # optional, database connection limit, default -1 disable limit#pool_auth_user: dbuser_meta # optional, all connection to this pgbouncer database will be authenticated by this user#pool_mode: transaction # optional, pgbouncer pool mode at database level, default transaction#pool_size: 64 # optional, pgbouncer pool size at database level, default 64#pool_size_reserve: 32 # optional, pgbouncer pool size reserve at database level, default 32#pool_size_min: 0 # optional, pgbouncer pool size min at database level, default 0#pool_max_db_conn: 100 # optional, max database connections at database level, default 100#- { name: grafana ,owner: dbuser_grafana ,revokeconn: true ,comment: grafana primary database } # define another database# define business users here: https://pigsty.io/docs/pgsql/user/pg_users:# define business users/roles on this cluster, array of user definition- name:dbuser_meta # REQUIRED, `name` is the only mandatory field of a user definitionpassword:DBUser.Meta # optional, password, can be a scram-sha-256 hash string or plain textlogin:true# optional, can log in, true by default (new biz ROLE should be false)superuser:false# optional, is superuser? false by defaultcreatedb:false# optional, can create database? false by defaultcreaterole:false# optional, can create role? false by defaultinherit:true# optional, can this role use inherited privileges? true by defaultreplication:false# optional, can this role do replication? false by defaultbypassrls:false# optional, can this role bypass row level security? false by defaultpgbouncer:true# optional, add this user to pgbouncer user-list? false by default (production user should be true explicitly)connlimit:-1# optional, user connection limit, default -1 disable limitexpire_in:3650# optional, now + n days when this role is expired (OVERWRITE expire_at)expire_at:'2030-12-31'# optional, YYYY-MM-DD 'timestamp' when this role is expired (OVERWRITTEN by expire_in)comment:pigsty admin user # optional, comment string for this user/roleroles: [dbrole_admin] # optional, belonged roles. default roles are:dbrole_{admin,readonly,readwrite,offline}parameters:{}# optional, role level parameters with `ALTER ROLE SET`pool_mode:transaction # optional, pgbouncer pool mode at user level, transaction by defaultpool_connlimit:-1# optional, max database connections at user level, default -1 disable limit- {name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly], comment:read-only viewer for meta database }# define pg extensions: https://pigsty.io/docs/pgext/ , and available alias: https://ext.pigsty.io/#/listpg_libs:'pg_stat_statements, auto_explain'# add timescaledb to shared_preload_librariespg_extensions:[pgvector ]# define HBA rules here: https://pigsty.io/docs/pgsql/hba/#define-hbapg_hba_rules:# example hba rules- {user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title:'allow grafana dashboard access cmdb from infra nodes'}#pg_vip_enabled: true # define a L2 VIP which bind to cluster primary instance#pg_vip_address: 10.10.10.2/24 # L2 VIP Address and netmask#pg_vip_interface: eth1 # L2 VIP Network interface, overwrite on host vars if member have different network interface namesnode_crontab:['00 01 * * * postgres /pg/bin/pg-backup full']# make a full backup every 1am#----------------------------------## example cluster (3-node pg-test)#----------------------------------##pg-test: # define the new 3-node cluster pg-test# hosts:# 10.10.10.11: { pg_seq: 1, pg_role: primary } # primary instance, leader of cluster# 10.10.10.12: { pg_seq: 2, pg_role: replica } # replica instance, follower of leader# 10.10.10.13: { pg_seq: 3, pg_role: replica, pg_offline_query: true } # replica with offline access# vars:# pg_cluster: pg-test # define pgsql cluster name# pg_users: [{ name: test , password: test , pgbouncer: true , roles: [ dbrole_admin ] }]# pg_databases: [{ name: test }] # create a database and user named 'test'# node_tune: tiny# pg_conf: tiny.yml# pg_vip_enabled: true# pg_vip_address: 10.10.10.3/24# pg_vip_interface: eth1# node_crontab: # make a full backup on monday 1am, and an incremental backup during weekdays# - '00 01 * * 1 postgres /pg/bin/pg-backup full'# - '00 01 * * 2,3,4,5,6,7 postgres /pg/bin/pg-backup'#==============================================================## Global Parameters#==============================================================#vars:#----------------------------------## Meta Data#----------------------------------#version:v3.1.0 # pigsty version stringadmin_ip:10.10.10.10# admin node ip addressregion: default # upstream mirror region:default|china|europenode_tune: tiny # node tuning specs:oltp,olap,tiny,critpg_conf: tiny.yml # pgsql tuning specs:{oltp,olap,tiny,crit}.ymlproxy_env:# global proxy env when downloading packagesno_proxy:"localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.myqcloud.com,*.tsinghua.edu.cn"# http_proxy: # set your proxy here: e.g http://user:pass@proxy.xxx.com# https_proxy: # set your proxy here: e.g http://user:pass@proxy.xxx.com# all_proxy: # set your proxy here: e.g http://user:pass@proxy.xxx.cominfra_portal:# domain names and upstream servershome :{domain:h.pigsty }grafana :{domain: g.pigsty ,endpoint:"${admin_ip}:3000", websocket:true}prometheus :{domain: p.pigsty ,endpoint:"${admin_ip}:9090"}alertmanager :{domain: a.pigsty ,endpoint:"${admin_ip}:9093"}blackbox :{endpoint:"${admin_ip}:9115"}loki :{endpoint:"${admin_ip}:3100"}#minio : { domain: sss.pigsty ,endpoint: "${admin_ip}:9001" ,scheme: https ,websocket: true }#----------------------------------## MinIO Related Options#----------------------------------##pgbackrest_method: minio # if you want to use minio as backup repo instead of 'local' fs, uncomment this#minio_users: # and configure `pgbackrest_repo` & `minio_users` accordingly# - { access_key: dba , secret_key: S3User.DBA, policy: consoleAdmin }# - { access_key: pgbackrest , secret_key: S3User.Backup, policy: readwrite }#pgbackrest_repo: # pgbackrest repo: https://pgbackrest.org/configuration.html#section-repository# minio: ... # optional minio repo for pgbackrest ...# s3_key: pgbackrest # minio user access key for pgbackrest# s3_key_secret: S3User.Backup # minio user secret key for pgbackrest# cipher_pass: pgBackRest # AES encryption password, default is 'pgBackRest'# if you want to use minio as backup repo instead of 'local' fs, uncomment this, and configure `pgbackrest_repo`#pgbackrest_method: minio#node_etc_hosts: [ '10.10.10.10 h.pigsty a.pigsty p.pigsty g.pigsty sss.pigsty' ]#----------------------------------## Credential: CHANGE THESE PASSWORDS#----------------------------------##grafana_admin_username: admingrafana_admin_password:pigsty#pg_admin_username: dbuser_dbapg_admin_password:DBUser.DBA#pg_monitor_username: dbuser_monitorpg_monitor_password:DBUser.Monitor#pg_replication_username: replicatorpg_replication_password:DBUser.Replicator#patroni_username: postgrespatroni_password:Patroni.API#haproxy_admin_username: adminhaproxy_admin_password:pigsty#----------------------------------## Safe Guard#----------------------------------## you can enable these flags after bootstrap, to prevent purging running etcd / pgsql instancesetcd_safeguard:false# prevent purging running etcd instance?pg_safeguard:false# prevent purging running postgres instance? false by default#----------------------------------## Repo, Node, Packages#----------------------------------## if you wish to customize your own repo, change these settings:repo_modules:infra,node,pgsql # install upstream repo during repo bootstraprepo_remove:true# remove existing repo on admin node during repo bootstrapnode_repo_modules:local # install the local module in repo_upstream for all nodesnode_repo_remove:true# remove existing node repo for node managed by pigstyrepo_packages:[# default packages to be downloaded (if `repo_packages` is not explicitly set)node-bootstrap, infra-package, infra-addons, node-package1, node-package2, pgsql-common#,docker]repo_extra_packages:[# default postgres packages to be downloadedpgsql-main#,pgsql-core,pgsql-time,pgsql-gis,pgsql-rag,pgsql-fts,pgsql-olap,pgsql-feat,pgsql-lang,pgsql-type,pgsql-func,pgsql-admin,pgsql-stat,pgsql-sec,pgsql-fdw,pgsql-sim,pgsql-etl,#,pg17-core,pg17-time,pg17-gis,pg17-rag,pg17-fts,pg17-olap,pg17-feat,pg17-lang,pg17-type,pg17-func,pg17-admin,pg17-stat,pg17-sec,pg17-fdw,pg17-sim,pg17-etl,#,pg16-core,pg16-time,pg16-gis,pg16-rag,pg16-fts,pg16-olap,pg16-feat,pg16-lang,pg16-type,pg16-func,pg16-admin,pg16-stat,pg16-sec,pg16-fdw,pg16-sim,pg16-etl,]
Caveats
To maintain compatibility across operating systems and architectures, the meta template installs only the minimum required software. This is reflected in the choices for repo_packages and repo_extra_packages:
Docker is not downloaded by default.
Only essential PostgreSQL extensions—pg_repack, wal2json, and pgvector—are downloaded by default.
Tools classified under pgsql-utility but not part of pgsql-common—such as pg_activity, pg_timetable, pgFormatter, pg_filedump, pgxnclient, timescaledb-tools, pgcopydb, and pgloader—are not downloaded.
4.6 - 2-node: dual
Dual-node template to set up a basic high-availability PostgreSQL cluster with master-slave replication, tolerating the failure of one node.
Description: Dual-node template to set up a basic high-availability PostgreSQL cluster with master-slave replication, tolerating the failure of one node.
节点A,10.10.10.10 ,默认为管理节点,运行 Infra 基础设施,单节点 etcd,以及 PGSQL 的从库。
节点B,10.10.10.11 ,只做为 PGSQL 的主库。
在这种情况下,两节点模版允许 B 节点出现故障,并在故障发生后自动切换到 A 节点。
然而当 A 节点出现故障时(整个节点宕机),则需要人工介入。
不过,如果 A 节点不是整个节点宕机离线,而仅仅是 etcd 或 PostgreSQL 本身的问题,整套系统仍然可以继续正常运行。
此模板使用了一个 L2 VIP 实现高可用接入,如果您的网络条件不允许使用 L2 VIP (例如,在受限制的云环境,或跨交换机广播域),您可以考虑使用 DNS 解析或其他接入方式替代。
4.7 - 2-node: slim
Minimal installation that avoids setting up local software repositories or infrastructure, relying only on etcd for a highly available PostgreSQL cluster.
此模板使用双节点部署模板,提供精简安装能力,您可以在不安装 Infra 模块的前提下,直接从互联网安装 PostgreSQL。
Description: Minimal installation that avoids setting up local software repositories or infrastructure, relying only on etcd for a highly available PostgreSQL cluster.
all:children:# actually not usedinfra:{hosts:{10.10.10.10:{infra_seq:1}}}# etcd cluster for ha postgres, still required in minimal installationetcd:{hosts:{10.10.10.10:{etcd_seq: 1 } }, vars:{etcd_cluster:etcd } }# postgres cluster 'pg-meta' with 2 instancespg-meta:hosts:10.10.10.10:{pg_seq: 1, pg_role:primary }10.10.10.11:{pg_seq: 2, pg_role:replica }vars:pg_cluster:pg-metapg_databases:[{name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [ pigsty ] ,extensions:[{name:vector }] } ]pg_users:- {name: dbuser_meta ,password: DBUser.Meta ,pgbouncer: true ,roles: [ dbrole_admin ] ,comment:pigsty admin user }- {name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [ dbrole_readonly ] ,comment:read-only viewer for meta database }node_crontab:['00 01 * * * postgres /pg/bin/pg-backup full']# make a full backup every 1amvars:# global parametersversion:v3.1.0 # pigsty version stringadmin_ip:10.10.10.10# admin node ip addressregion: default # upstream mirror region:default,china,europenode_tune:tiny # use tiny template for NODE in demo environmentpg_conf:tiny.yml # use tiny template for PGSQL in demo environment# slim installation setupnginx_enabled:false# nginx not existsdns_enabled:false# dnsmasq not existsprometheus_enabled:false# prometheus not existsgrafana_enabled:false# grafana not existspg_exporter_enabled:false# disable pg_exporterpgbouncer_exporter_enabled:falsepg_vip_enabled:false#----------------------------------## Repo, Node, Packages#----------------------------------#node_repo_modules:node,infra,pgsql# use node_repo_modules instead of repo_modulesnode_repo_remove:true# remove existing node repo for node managed by pigsty?repo_packages:[# default packages to be downloaded (if `repo_packages` is not explicitly set)node-bootstrap, infra-package, infra-addons, node-package1, node-package2, pgsql-common#,docker]repo_extra_packages:[# default postgres packages to be downloadedpgsql-main#,pgsql-core,pgsql-time,pgsql-gis,pgsql-rag,pgsql-fts,pgsql-olap,pgsql-feat,pgsql-lang,pgsql-type,pgsql-func,pgsql-admin,pgsql-stat,pgsql-sec,pgsql-fdw,pgsql-sim,pgsql-etl,#,pg17-core,pg17-time,pg17-gis,pg17-rag,pg17-fts,pg17-olap,pg17-feat,pg17-lang,pg17-type,pg17-func,pg17-admin,pg17-stat,pg17-sec,pg17-fdw,pg17-sim,pg17-etl,#,pg16-core,pg16-time,pg16-gis,pg16-rag,pg16-fts,pg16-olap,pg16-feat,pg16-lang,pg16-type,pg16-func,pg16-admin,pg16-stat,pg16-sec,pg16-fdw,pg16-sim,pg16-etl,]
Caveat
因为缺少 Infra 模块提供的监控基础设施,精简安装模式不提供数据库监控能力。
4.8 - 3-node: trio
Three-node configuration template providing a standard High Availability (HA) architecture, tolerating failure of one out of three nodes.
三节点是实现真正意义上高可用的最小规格,在这种情况下,DCS(etcd)可以容忍一台服务器的宕机。
在此配置中,使用了三节点的标准 HA 架构, INFRA,ETCD,PGSQL 三个核心模块均使用三节点部署,允许其中一台出现宕机。
Description: A four-node standard sandbox demonstration environment, featuring two PostgreSQL clusters, MinIO, Etcd, Redis, and a sample FerretDB cluster.
Pigsty does not use any virtualization or containerization technologies, running directly on the bare OS. Supported operating systems include EL 7/8/9 (RHEL, Rocky, CentOS, Alma, Oracle, Anolis,…),
Ubuntu 24.04 / 20.04 / 22.04 & Debian 11/12. EL is our long-term supported OS, while support for Ubuntu/Debian systems was introduced in the recent v2.5 version.
The main difference between EL and Debian distributions is the significant variation in package names, as well as the default availability of PostgreSQL extensions.
If you have advanced compatibility requirements, such as using specific operating system distributions, major versions, or minor versions, we offer advance compatibility support options.
Kernel & Arch Support
Currently, Pigsty supports the Linux kernel and the x86_64 / amd64 chip architecture.
MacOS and Windows operating systems can install Pigsty via Linux virtual machines/containers. We provide Vagrant local sandbox support, allowing you to use virtualization software like Vagrant and Virtualbox to effortlessly bring up the deployment environment required by Pigsty on other operating systems.
ARM64 Support
We have partial support for arm64 architecture with our pro service, (EL9 & Debian12 & Ubuntu 22.04)
EL Distribution Support
The EL series operating systems are Pigsty’s primary support target, including compatible distributions such as Red Hat Enterprise Linux, RockyLinux, CentOS, AlmaLinux, OracleLinux, Anolis, etc. Pigsty supports the latest three major versions: 7, 8, 9
EOLed OS, PG16/17, Rust and many other extensions unavailable
RockyLinux 8.9 Recommended
Rocky 8.9 achieves a good balance between system reliability/stability and the novelty/comprehensiveness of software versions. It’s recommended for EL series users to default to this system version.
EL7 End of Life!
Red Hat Enterprise Linux 7 is end of maintenance, Pigsty will no longer have official EL7 support in the open-source version. We have EL7 legacy support in our pro service.
Debian Distribution Support
Pigsty supports Ubuntu / Debian series operating systems and their compatible distributions, currently supporting the two most recent LTS major versions, namely:
U22: Ubuntu 22.04 LTS jammy (24.04.1)
U22: Ubuntu 22.04 LTS jammy (22.04.3 Recommended)
U20: Ubuntu 20.04 LTS focal (20.04.6)
D12: Debian 12 bookworm (12.4)
D11: Debian 11 bullseye (11.8)
Code
Debian Distros
Minor
PG17
PG16
PG15
PG14
PG13
PG12
Limitations
U24
Ubuntu 24.04 (noble)
24.04.1
Missing PGML, citus, topn, timescale_toolkit
U22
Ubuntu 22.04 (jammy)
22.04.3
Standard Debian series feature set
U20
Ubuntu 20.04 (focal)
20.04.6
EOL, Some extensions require online installation
D12
Debian 12 (bookworm)
12.4
Missing polardb, wiltondb/babelfish, and official PGML support
D11
Debian 11 (bullseye)
11.8
EOL
Ubuntu 22.04 LTS Recommended
Ubuntu 22.04 comes with the most comp
PostgresML has official support for the AL/ML extension pgml for Ubuntu 22.04, so users with related needs are advised to use Ubuntu 22.04.3.
Ubuntu 20.04 nearing EOL
Ubuntu 20.04 focal is not recommended, it will end of standard support @ Apr 2025, Check Ubuntu Release Cycle for details
Besides, Ubuntu 20.04 has missing dependencies for the postgresql-16-postgis and postgresql-server-dev-16 packages, requiring online installation in a connected environment.
If your environment does not have internet access, and you need to use the PostGIS extension, please use this operating system with caution.
Vagrant Boxes
When deploying Pigsty on cloud servers, you might consider using the following operating system images in Vagrant, which are also the images used for Pigsty’s development, testing, and building.
When deploying Pigsty on cloud servers, you might consider using the following operating system base images in Terraform, using Alibaba Cloud as an example:
Parameters about pigsty infrastructure components: local yum repo, nginx, dnsmasq, prometheus, grafana, loki, alertmanager, pushgateway, blackbox_exporter, etc…
META
This section contains some metadata of current pigsty deployments, such as version string, admin node IP address, repo mirror region and http(s) proxy when downloading pacakges.
version:v2.6.0 # pigsty version stringadmin_ip:10.10.10.10# admin node ip addressregion: default # upstream mirror region:default,china,europeproxy_env:# global proxy env when downloading packagesno_proxy:"localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.myqcloud.com,*.tsinghua.edu.cn"# http_proxy: # set your proxy here: e.g http://user:pass@proxy.xxx.com# https_proxy: # set your proxy here: e.g http://user:pass@proxy.xxx.com# all_proxy: # set your proxy here: e.g http://user:pass@proxy.xxx.com
version
name: version, type: string, level: G
pigsty version string
default value:v2.6.0
It will be used for pigsty introspection & content rendering.
admin_ip
name: admin_ip, type: ip, level: G
admin node ip address
default value:10.10.10.10
Node with this ip address will be treated as admin node, usually point to the first node that install Pigsty.
The default value 10.10.10.10 is a placeholder which will be replaced during configure
This parameter is referenced by many other parameters, such as:
The exact string ${admin_ip} will be replaced with the actual admin_ip for above parameters.
region
name: region, type: enum, level: G
upstream mirror region: default,china,europe
default value: default
If a region other than default is set, and there’s a corresponding entry in repo_upstream.[repo].baseurl, it will be used instead of default.
For example, if china is used, pigsty will use China mirrors designated in repo_upstream if applicable.
proxy_env
name: proxy_env, type: dict, level: G
global proxy env when downloading packages
default value:
proxy_env:# global proxy env when downloading packageshttp_proxy:'http://username:password@proxy.address.com'https_proxy:'http://username:password@proxy.address.com'all_proxy:'http://username:password@proxy.address.com'no_proxy:"localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.aliyuncs.com,mirrors.tuna.tsinghua.edu.cn,mirrors.zju.edu.cn"
It’s quite important to use http proxy in restricted production environment, or your Internet access is blocked (e.g. Mainland China)
CA
Self-Signed CA used by pigsty. It is required to support advanced security features.
ca_method:create # create,recreate,copy, create by defaultca_cn:pigsty-ca # ca common name, fixed as pigsty-cacert_validity:7300d # cert validity, 20 years by default
ca_method
name: ca_method, type: enum, level: G
available options: create,recreate,copy
default value: create
create: Create a new CA public-private key pair if not exists, use if exists
recreate: Always re-create a new CA public-private key pair
copy: Copy the existing CA public and private keys from local files/pki/ca, abort if missing
If you already have a pair of ca.crt and ca.key, put them under files/pki/ca and set ca_method to copy.
ca_cn
name: ca_cn, type: string, level: G
ca common name, not recommending to change it.
default value: pigsty-ca
you can check that with openssl x509 -text -in /etc/pki/ca.crt
cert_validity
name: cert_validity, type: interval, level: G
cert validity, 20 years by default, which is enough for most scenarios
default value: 7300d
INFRA_ID
Infrastructure identity and portal definition.
#infra_seq: 1 # infra node identity, explicitly requiredinfra_portal:# infra services exposed via portalhome :{domain:h.pigsty }grafana :{domain: g.pigsty ,endpoint:"${admin_ip}:3000",websocket:true}prometheus :{domain: p.pigsty ,endpoint:"${admin_ip}:9090"}alertmanager :{domain: a.pigsty ,endpoint:"${admin_ip}:9093"}blackbox :{endpoint:"${admin_ip}:9115"}loki :{endpoint:"${admin_ip}:3100"}
infra_seq
name: infra_seq, type: int, level: I
infra node identity, REQUIRED, no default value, you have to assign it explicitly.
infra_portal
name: infra_portal, type: dict, level: G
infra services exposed via portal.
default value will expose home, grafana, prometheus, alertmanager via nginx with corresponding domain names.
infra_portal:# infra services exposed via portalhome :{domain:h.pigsty }grafana :{domain: g.pigsty ,endpoint:"${admin_ip}:3000",websocket:true}prometheus :{domain: p.pigsty ,endpoint:"${admin_ip}:9090"}alertmanager :{domain: a.pigsty ,endpoint:"${admin_ip}:9093"}blackbox :{endpoint:"${admin_ip}:9115"}loki :{endpoint:"${admin_ip}:3100"}
Each record consists of a key-value dictionary, with name as the key representing the component name,
and the value containing the REQUIREDdomain field representing the domain name.
domain is the domain name used for external access. It will be added to the Nginx SSL cert SAN field.
The name of the default record is fixed and referenced by other modules, so DO NOT modify the default entry names.
endpoint is a TCP socket that can be reached internally. If specified, Nginx will forward requests to the address specified by endpoint.
If the ${admin_ip} is included in the endpoint, it will be replaced with the actual admin_ip at runtime.
path is a path that can be accessed locally. If specified, it will be used as the root of the local web server, and Nginx will forward requests local files.
endpoint and path are mutually exclusive, you can choose between being an upstream proxy or a local web server in one entry.
websocket is set to true, http protocol will be auto upgraded for ws connections.
When the upstream uses WebSocket, you can enable this option (e.g. Grafana/Jupyter)
schema is given (http or https), it will be used as part of the proxy_pass URL.
When upstream require https instead of http for proxy, use this option (e.g. MinIO )
REPO
This section is about local software repo. Pigsty will create a local software repo (APT/YUM) when init an infra node.
In the initialization process, Pigsty will download all packages and their dependencies (specified by repo_packages) from the Internet upstream repo (specified by repo_upstream) to {{ nginx_home }} / {{ repo_name }} (default is /www/pigsty), and the total size of all dependent software is about 1GB or so.
When creating a local repo, Pigsty will skip the software download phase if the directory already exists and if there is a marker file named repo_complete in the dir.
If the download speed of some packages is too slow, you can set the download proxy to complete the first download by using the proxy_env config entry or directly download the pre-packaged offline package, which is essentially a local software source built on the same operating system.
repo_enabled:true# create a yum repo on this infra node?repo_home:/www # repo home dir, `/www` by defaultrepo_name:pigsty # repo name, pigsty by defaultrepo_endpoint:http://${admin_ip}:80# access point to this repo by domain or ip:portrepo_remove:true# remove existing upstream reporepo_modules:infra,node,pgsql # install upstream repo during repo bootstrap#repo_upstream: [] # where to download#repo_packages: [] # which packages to download#repo_extra_packages: [] # extra packages to downloadrepo_url_packages:# extra packages from url- {name:"pev.html",url:"https://repo.pigsty.io/etc/pev-1.12.1.html"}- {name:"chart.tgz",url:"https://repo.pigsty.io/etc/chart-1.0.0.tgz"}- {name:"plugins.tgz",url:"https://repo.pigsty.io/etc/plugins-11.3.0.tgz"}
repo_enabled
name: repo_enabled, type: bool, level: G/I
create a yum repo on this infra node? default value: true
If you have multiple infra nodes, you can disable yum repo on other standby nodes to reduce Internet traffic.
repo_home
name: repo_home, type: path, level: G
repo home dir, /www by default
repo_name
name: repo_name, type: string, level: G
repo name, pigsty by default, it is not wise to change this value
repo_endpoint
name: repo_endpoint, type: url, level: G
access point to this repo by domain or ip:port, default value: http://${admin_ip}:80
If you have changed the nginx_port or nginx_ssl_port, or use a different infra node from admin node, please adjust this parameter accordingly.
The ${admin_ip} will be replaced with actual admin_ip during runtime.
If you want to keep existing upstream repo, set this value to false.
repo_modules
name: repo_modules, type: string, level: G/A
which repo modules are installed in repo_upstream, default value: infra,node,pgsql
This is a comma separated value string, it is used to filter entries in repo_upstream with corresponding module field.
For Ubuntu / Debian users, you can add redis to the list: infra,node,pgsql,redis
repo_upstream
name: repo_upstream, type: upstream[], level: G
This param defines the upstream software repo for Pigsty. It DOES NOT have default values, you can specify it explicitly, or leaving it empty if you want to use the default values.
When leaving it empty, Pigsty will use the default values from the repo_upstream_default defined in roles/node_id/vars according to you OS.
Pigsty Building template oss.yml has default values for different OS distros.
repo_packages
name: repo_packages, type: string[], level: G
This param is an array os strings, each string is a list of software packages separated by space, specifying which packages to be included & downloaded.
This param DOES NOT have a default value, you can specify it explicitly, or leaving it empty if you want to use the default values.
When leaving it empty, Pigsty will use the default values from the repo_packages_default defined in roles/node_id/vars according to you OS.
Each element in repo_packages will be translated into a list of package names according to the package_map defined in the above file, for specific OS distro version.
For example, on EL systems, it will be translated into:
As a convention, repo_packages usually includes software packages that are not related to the major version of PostgreSQL (such as Infra, Node, and PGDG Common),
while software packages related to the major version of PostgreSQL (kernel, extensions) are usually specified in repo_extra_packages to facilitate switching between PG major versions.
This parameter is same as repo_packages, but it is used for the additional software packages that need to be downloaded. (Usually PG version ad hoc packages)
The default value is an empty list. You can override it at the cluster & instance level to specify additional software packages that need to be downloaded.
If this parameter is not explicitly defined, Pigsty will load the default value from the repo_extra_packages_default defined in roles/node_id/vars, which is:
[pgsql-main ]
Each element in repo_packages will be translated into a list of package names according to the package_map defined in the above file, for specific OS distro version.
For example, on EL systems, it will be translated into:
Here $v will be replaced with the actual PostgreSQL major version number pg_version, So you can add any PG version related packages here, and Pigsty will download them for you.
repo_url_packages
name: repo_url_packages, type: object[] | string[], level: G
extra packages from url, default values:
repo_url_packages:# extra packages from url- {name:"pev.html",url:"https://repo.pigsty.io/etc/pev-1.12.1.html"}# Explain Visualizer- {name:"chart.tgz",url:"https://repo.pigsty.io/etc/chart-1.0.0.tgz"}# Grafana Maps- {name:"plugins.tgz",url:"https://repo.pigsty.io/etc/plugins-11.1.4.tgz"}# Grafana Plugins
The default entries are pev.html, chart.tgz, and plugins.tgz. which are all optional add-ons, and will be downloaded via URL from the Internet directly.
For example, if you don’t download the plugins.tgz, Pigsty will download it later during grafana setup.
You can use object list or string list in this parameter, in the latter case, Pigsty will use the url basename as the filename.
Beware that if the region is set to china, the pigsty.io will be replaced with pigsty.cc automatically.
INFRA_PACKAGE
These packages are installed on infra nodes only, including common rpm/deb/pip packages.
infra_packages
name: infra_packages, type: string[], level: G
This param is an array os strings, each string is a list of common software packages separated by space, specifying which packages to be installed on INFRA nodes.
This param DOES NOT have a default value, you can specify it explicitly, or leaving it empty if you want to use the default values.
When leaving it empty, Pigsty will use the default values from the repo_packages_default defined in roles/node_id/vars according to you OS.
For EL (7/8/9) system, the default values are:
infra_packages:# packages to be installed on infra nodes- grafana,loki,logcli,promtail,prometheus2,alertmanager,pushgateway- node_exporter,blackbox_exporter,nginx_exporter,pg_exporter- nginx,dnsmasq,ansible,etcd,python3-requests,redis,mcli
For Debian (11,12) or Ubuntu (22.04, 22.04) systems, the default values are:
infra_packages:# packages to be installed on infra nodes- grafana,loki,logcli,promtail,prometheus2,alertmanager,pushgateway- node-exporter,blackbox-exporter,nginx-exporter,pg-exporter- nginx,dnsmasq,ansible,etcd,python3-requests,redis,mcli
infra_packages_pip
name: infra_packages_pip, type: string, level: G
pip installed packages for infra nodes, default value is empty string
NGINX
Pigsty exposes all Web services through Nginx: Home Page, Grafana, Prometheus, AlertManager, etc…,
and other optional tools such as PGWe, Jupyter Lab, Pgadmin, Bytebase ,and other static resource & report such as pev, schemaspy & pgbadger
This nginx also serves as a local yum/apt repo.
nginx_enabled:true# enable nginx on this infra node?nginx_exporter_enabled:true# enable nginx_exporter on this infra node?nginx_sslmode:enable # nginx ssl mode? disable,enable,enforcenginx_home:/www # nginx content dir, `/www` by defaultnginx_port:80# nginx listen port, 80 by defaultnginx_ssl_port:443# nginx ssl listen port, 443 by defaultnginx_navbar:# nginx index page navigation links- {name: CA Cert ,url: '/ca.crt' ,desc:'pigsty self-signed ca.crt'}- {name: Package ,url: '/pigsty' ,desc:'local yum repo packages'}- {name: PG Logs ,url: '/logs' ,desc:'postgres raw csv logs'}- {name: Reports ,url: '/report' ,desc:'pgbadger summary report'}- {name: Explain ,url: '/pigsty/pev.html' ,desc:'postgres explain visualizer'}
nginx_enabled
name: nginx_enabled, type: bool, level: G/I
enable nginx on this infra node? default value: true
enforce: all links will be rendered as https:// by default
nginx_home
name: nginx_home, type: path, level: G
nginx web server static content dir, /www by default
Nginx root directory which contains static resource and repo resource. It’s wise to set this value same as repo_home so that local repo content is automatically served.
nginx_port
name: nginx_port, type: port, level: G
nginx listen port which serves the HTTP requests, 80 by default.
If your default 80 port is occupied or unavailable, you can consider using another port, and change repo_endpoint and repo_upstream (the local entry) accordingly.
Each record is rendered as a navigation link to the Pigsty home page App drop-down menu, and the apps are all optional, mounted by default on the Pigsty default server under http://pigsty/.
The url parameter specifies the URL PATH for the app, with the exception that if the ${grafana} string is present in the URL, it will be automatically replaced with the Grafana domain name defined in infra_portal.
DNS
Pigsty will launch a default DNSMASQ server on infra nodes to serve DNS inquiry. such as h.pigstya.pigstyp.pigstyg.pigsty and sss.pigsty for optional MinIO service.
All records will be added to infra node’s /etc/hosts.d/*.
You have to add nameserver {{ admin_ip }} to your /etc/resolv to use this dns server, and node_dns_servers will do the trick.
dns_enabled:true# setup dnsmasq on this infra node?dns_port:53# dns server listen port, 53 by defaultdns_records:# dynamic dns records resolved by dnsmasq- "${admin_ip} h.pigsty a.pigsty p.pigsty g.pigsty"- "${admin_ip} api.pigsty adm.pigsty cli.pigsty ddl.pigsty lab.pigsty git.pigsty sss.pigsty wiki.pigsty"
dns_enabled
name: dns_enabled, type: bool, level: G/I
setup dnsmasq on this infra node? default value: true
If you don’t want to use the default DNS server, you can set this value to false to disable it.
And use node_default_etc_hosts and node_etc_hosts instead.
dns_port
name: dns_port, type: port, level: G
dns server listen port, 53 by default
dns_records
name: dns_records, type: string[], level: G
dynamic dns records resolved by dnsmasq, Some auxiliary domain names will be written to /etc/hosts.d/default on infra nodes by default
dns_records:# dynamic dns records resolved by dnsmasq- "${admin_ip} h.pigsty a.pigsty p.pigsty g.pigsty"- "${admin_ip} api.pigsty adm.pigsty cli.pigsty ddl.pigsty lab.pigsty git.pigsty sss.pigsty wiki.pigsty"
PROMETHEUS
Prometheus is used as time-series database for metrics scrape, storage & analysis.
prometheus_enabled:true# enable prometheus on this infra node?prometheus_clean:true# clean prometheus data during init?prometheus_data:/data/prometheus# prometheus data dir, `/data/prometheus` by defaultprometheus_sd_dir:/etc/prometheus/targets# prometheus file service discovery directoryprometheus_sd_interval:5s # prometheus target refresh interval, 5s by defaultprometheus_scrape_interval:10s # prometheus scrape & eval interval, 10s by defaultprometheus_scrape_timeout:8s # prometheus global scrape timeout, 8s by defaultprometheus_options:'--storage.tsdb.retention.time=15d'# prometheus extra server optionspushgateway_enabled:true# setup pushgateway on this infra node?pushgateway_options:'--persistence.interval=1m'# pushgateway extra server optionsblackbox_enabled:true# setup blackbox_exporter on this infra node?blackbox_options:''# blackbox_exporter extra server optionsalertmanager_enabled:true# setup alertmanager on this infra node?alertmanager_options:''# alertmanager extra server optionsexporter_metrics_path:/metrics # exporter metric path, `/metrics` by defaultexporter_install:none # how to install exporter? none,yum,binaryexporter_repo_url:''# exporter repo file url if install exporter via yum
prometheus_enabled
name: prometheus_enabled, type: bool, level: G/I
enable prometheus on this infra node?
default value: true
prometheus_clean
name: prometheus_clean, type: bool, level: G/A
clean prometheus data during init? default value: true
setup alertmanager on this infra node? default value: true
alertmanager_options
name: alertmanager_options, type: arg, level: G
alertmanager extra server options, default value is empty string
exporter_metrics_path
name: exporter_metrics_path, type: path, level: G
exporter metric path, /metrics by default
exporter_install
name: exporter_install, type: enum, level: G
(OBSOLETE) how to install exporter? none,yum,binary
default value: none
Specify how to install Exporter:
none: No installation, (by default, the Exporter has been previously installed by the node_pkg task)
yum: Install using yum (if yum installation is enabled, run yum to install node_exporter and pg_exporter before deploying Exporter)
binary: Install using a copy binary (copy node_exporter and pg_exporter binary directly from the meta node, not recommended)
When installing with yum, if exporter_repo_url is specified (not empty), the installation will first install the REPO file under that URL into /etc/yum.repos.d. This feature allows you to install Exporter directly without initializing the node infrastructure.
It is not recommended for regular users to use binary installation. This mode is usually used for emergency troubleshooting and temporary problem fixes.
(OBSOLETE) exporter repo file url if install exporter via yum
default value is empty string
Default is empty; when exporter_install is yum, the repo specified by this parameter will be added to the node source list.
GRAFANA
Grafana is the visualization platform for Pigsty’s monitoring system.
It can also be used as a low code data visualization environment
grafana_enabled:true# enable grafana on this infra node?grafana_clean:true# clean grafana data during init?grafana_admin_username:admin # grafana admin username, `admin` by defaultgrafana_admin_password:pigsty # grafana admin password, `pigsty` by defaultgrafana_plugin_cache:/www/pigsty/plugins.tgz# path to grafana plugins cache tarballgrafana_plugin_list:# grafana plugins to be downloaded with grafana-cli- volkovlabs-echarts-panel- volkovlabs-image-panel- volkovlabs-form-panel- volkovlabs-variable-panel- volkovlabs-grapi-datasource- marcusolsson-static-datasource- marcusolsson-json-datasource- marcusolsson-dynamictext-panel- marcusolsson-treemap-panel- marcusolsson-calendar-panel- marcusolsson-hourly-heatmap-panel- knightss27-weathermap-panelloki_enabled:true# enable loki on this infra node?loki_clean:false# whether remove existing loki data?loki_data:/data/loki # loki data dir, `/data/loki` by defaultloki_retention:15d # loki log retention period, 15d by default
grafana_enabled
name: grafana_enabled, type: bool, level: G/I
enable grafana on this infra node? default value: true
grafana_clean
name: grafana_clean, type: bool, level: G/A
clean grafana data during init? default value: true
grafana_admin_username
name: grafana_admin_username, type: username, level: G
grafana admin username, admin by default
grafana_admin_password
name: grafana_admin_password, type: password, level: G
grafana admin password, pigsty by default
default value: pigsty
WARNING: Change this to a strong password before deploying to production environment
grafana_plugin_cache
name: grafana_plugin_cache, type: path, level: G
path to grafana plugins cache tarball
default value: /www/pigsty/plugins.tgz
If that cache exists, pigsty use that instead of downloading plugins from the Internet
grafana_plugin_list
name: grafana_plugin_list, type: string[], level: G
grafana plugins to be downloaded with grafana-cli
default value:
grafana_plugin_list:# grafana plugins to be downloaded with grafana-cli- volkovlabs-echarts-panel- volkovlabs-image-panel- volkovlabs-form-panel- volkovlabs-variable-panel- volkovlabs-grapi-datasource- marcusolsson-static-datasource- marcusolsson-json-datasource- marcusolsson-dynamictext-panel- marcusolsson-treemap-panel- marcusolsson-calendar-panel- marcusolsson-hourly-heatmap-panel- knightss27-weathermap-panel
LOKI
loki_enabled
name: loki_enabled, type: bool, level: G/I
enable loki on this infra node? default value: true
Node module are tuning target nodes into desired state and take it into the Pigsty monitor system.
NODE_ID
Each node has identity parameters that are configured through the parameters in <cluster>.hosts and <cluster>.vars. Check NODE Identity for details.
nodename
name: nodename, type: string, level: I
node instance identity, use hostname if missing, optional
no default value, Null or empty string means nodename will be set to node’s current hostname.
If node_id_from_pg is true (by default) and nodename is not explicitly defined, nodename will try to use ${pg_cluster}-${pg_seq} first, if PGSQL is not defined on this node, it will fall back to default HOSTNAME.
If nodename_overwrite is true, the node name will also be used as the HOSTNAME.
node_cluster
name: node_cluster, type: string, level: C
node cluster identity, use ’nodes’ if missing, optional
default values: nodes
If node_id_from_pg is true (by default) and node_cluster is not explicitly defined, node_cluster will try to use ${pg_cluster} first, if PGSQL is not defined on this node, it will fall back to default HOSTNAME.
nodename_overwrite
name: nodename_overwrite, type: bool, level: C
overwrite node’s hostname with nodename?
default value is true, a non-empty node name nodename will override the hostname of the current node.
When the nodename parameter is undefined or an empty string, but node_id_from_pg is true,
the node name will try to use {{ pg_cluster }}-{{ pg_seq }}, borrow identity from the 1:1 PostgreSQL Instance’s ins name.
No changes are made to the hostname if the nodename is undefined, empty, or an empty string and node_id_from_pg is false.
nodename_exchange
name: nodename_exchange, type: bool, level: C
exchange nodename among play hosts?
default value is false
When this parameter is enabled, node names are exchanged between the same group of nodes executing the node.yml playbook, written to /etc/hosts.
node_id_from_pg
name: node_id_from_pg, type: bool, level: C
use postgres identity as node identity if applicable?
default value is true
Boworrow PostgreSQL cluster & instance identity if application.
It’s useful to use same identity for postgres & node if there’s a 1:1 relationship
NODE_DNS
Pigsty configs static DNS records and dynamic DNS resolver for nodes.
If you already have a DNS server, set node_dns_method to none to disable dynamic DNS setup.
node_write_etc_hosts:true# modify `/etc/hosts` on target node?node_default_etc_hosts:# static dns records in `/etc/hosts`- "${admin_ip} h.pigsty a.pigsty p.pigsty g.pigsty"node_etc_hosts:[]# extra static dns records in `/etc/hosts`node_dns_method: add # how to handle dns servers:add,none,overwritenode_dns_servers:['${admin_ip}']# dynamic nameserver in `/etc/resolv.conf`node_dns_options:# dns resolv options in `/etc/resolv.conf`- options single-request-reopen timeout:1
node_default_etc_hosts is an array. Each element is a DNS record with format <ip> <name>.
It is used for global static DNS records. You can use node_etc_hosts for ad hoc records for each cluster.
Make sure to write a DNS record like 10.10.10.10 h.pigsty a.pigsty p.pigsty g.pigsty to /etc/hosts to ensure that the local yum repo can be accessed using the domain name before the DNS Nameserver starts.
add: Append the records in node_dns_servers to /etc/resolv.conf and keep the existing DNS servers. (default)
overwrite: Overwrite /etc/resolv.conf with the record in node_dns_servers
none: If a DNS server is provided in the production env, the DNS server config can be skipped.
node_dns_servers
name: node_dns_servers, type: string[], level: C
dynamic nameserver in /etc/resolv.conf
default values: ["${admin_ip}"] , the default nameserver on admin node will be added to /etc/resolv.conf as the first nameserver.
node_dns_options
name: node_dns_options, type: string[], level: C
dns resolv options in /etc/resolv.conf, default value:
- options single-request-reopen timeout:1
NODE_PACKAGE
This section is about upstream yum repos & packages to be installed.
node_repo_modules:local # upstream repo to be added on node, local by defaultnode_repo_remove:true# remove existing repo on node?node_packages:[openssh-server] # packages to be installed current nodes with latest version#node_default_packages: [] # default packages to be installed on infra nodes, (defaults are load from node_id/vars)
node_repo_modules
name: node_repo_modules, type: string, level: C/A
upstream repo to be added on node, default value: local
This parameter specifies the upstream repo to be added to the node. It is used to filter the repo_upstream entries
and only the entries with the same module value will be added to the node’s software source. Which is similar to the repo_modules parameter.
node_repo_remove
name: node_repo_remove, type: bool, level: C/A
remove existing repo on node?
default value is true, and thus Pigsty will move existing repo file in /etc/yum.repos.d to a backup dir: /etc/yum.repos.d/backup before adding upstream repos
On Debian/Ubuntu, Pigsty will backup & move /etc/apt/sources.list(.d) to /etc/apt/backup.
node_packages
name: node_packages, type: string[], level: C
packages to be installed current nodes, default values: [openssh-server].
Each element is a comma-separated list of package names, which will be installed on the current node in addition to node_default_packages
Packages specified in this parameter will be upgraded to the latest version, and the default value is [openssh-server], which will upgrade sshd by default to avoid SSH CVE.
This parameters is usually used to install additional software packages that is ad hoc for the current node/cluster.
node_default_packages
name: node_default_packages, type: string[], level: G
default packages to be installed on all nodes, the default values is not defined.
This param is an array os strings, each string is a comma-separated list of package names, which will be installed on all nodes by default.
This param DOES NOT have a default value, you can specify it explicitly, or leaving it empty if you want to use the default values.
When leaving it empty, Pigsty will use the default values from the node_packages_default defined in roles/node_id/vars according to you OS.
Configure tuned templates, features, kernel modules, sysctl params on node.
node_disable_firewall:true# disable node firewall? true by defaultnode_disable_selinux:true# disable node selinux? true by defaultnode_disable_numa:false# disable node numa, reboot requirednode_disable_swap:false# disable node swap, use with cautionnode_static_network:true# preserve dns resolver settings after rebootnode_disk_prefetch:false# setup disk prefetch on HDD to increase performancenode_kernel_modules:[softdog, br_netfilter, ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh ]node_hugepage_count:0# number of 2MB hugepage, take precedence over rationode_hugepage_ratio:0# node mem hugepage ratio, 0 disable it by defaultnode_overcommit_ratio:0# node mem overcommit ratio, 0 disable it by defaultnode_tune: oltp # node tuned profile:none,oltp,olap,crit,tinynode_sysctl_params:{}# sysctl parameters in k:v format in addition to tuned
node_disable_firewall
name: node_disable_firewall, type: bool, level: C
disable node firewall? true by default
default value is true
node_disable_selinux
name: node_disable_selinux, type: bool, level: C
disable node selinux? true by default
default value is true
node_disable_numa
name: node_disable_numa, type: bool, level: C
disable node numa, reboot required
default value is false
Boolean flag, default is not off. Note that turning off NUMA requires a reboot of the machine before it can take effect!
If you don’t know how to set the CPU affinity, it is recommended to turn off NUMA.
node_disable_swap
name: node_disable_swap, type: bool, level: C
disable node swap, use with caution
default value is false
But turning off SWAP is not recommended. But SWAP should be disabled when your node is used for a Kubernetes deployment.
If there is enough memory and the database is deployed exclusively. it may slightly improve performance
node_static_network
name: node_static_network, type: bool, level: C
preserve dns resolver settings after reboot, default value is true
Enabling static networking means that machine reboots will not overwrite your DNS Resolv config with NIC changes. It is recommended to enable it in production environment.
node_disk_prefetch
name: node_disk_prefetch, type: bool, level: C
setup disk prefetch on HDD to increase performance
default value is false, Consider enable this when using HDD.
node_kernel_modules
name: node_kernel_modules, type: string[], level: C
For example, if you have default 25% mem for postgres shard buffers, you can set this value to 0.27 ~ 0.30, Wasted hugepage can be reclaimed later with /pg/bin/pg-tune-hugepage
node_overcommit_ratio
name: node_overcommit_ratio, type: int, level: C
node mem overcommit ratio, 0 disable it by default. this is an integer from 0 to 100+ .
default values: 0, which will set vm.overcommit_memory=0, otherwise vm.overcommit_memory=2 will be used,
and this value will be used as vm.overcommit_ratio.
It is recommended to set use a vm.overcommit_ratio on dedicated pgsql nodes. e.g. 50 ~ 100.
oltp: Regular OLTP templates with optimized latency
olap : Regular OLAP templates to optimize throughput
crit: Core financial business templates, optimizing the number of dirty pages
Usually, the database tuning template pg_conf should be paired with the node tuning template: node_tune
node_sysctl_params
name: node_sysctl_params, type: dict, level: C
sysctl parameters in k:v format in addition to tuned
default values: {}
Dictionary K-V structure, Key is kernel sysctl parameter name, Value is the parameter value.
You can also define sysctl parameters with tuned profile
NODE_ADMIN
This section is about admin users and it’s credentials.
node_data:/data # node main data directory, `/data` by defaultnode_admin_enabled:true# create a admin user on target node?node_admin_uid:88# uid and gid for node admin usernode_admin_username:dba # name of node admin user, `dba` by defaultnode_admin_ssh_exchange:true# exchange admin ssh key among node clusternode_admin_pk_current:true# add current user's ssh pk to admin authorized_keysnode_admin_pk_list:[]# ssh public keys to be added to admin user
node_data
name: node_data, type: path, level: C
node main data directory, /data by default
default values: /data
If specified, this path will be used as major data disk mountpoint. And a dir will be created and throwing a warning if path not exists.
The data dir is owned by root with mode 0777.
node_admin_enabled
name: node_admin_enabled, type: bool, level: C
create a admin user on target node?
default value is true
Create an admin user on each node (password-free sudo and ssh), an admin user named dba (uid=88) will be created by default,
which can access other nodes in the env and perform sudo from the meta node via SSH password-free.
node_admin_uid
name: node_admin_uid, type: int, level: C
uid and gid for node admin user
default values: 88
node_admin_username
name: node_admin_username, type: username, level: C
name of node admin user, dba by default
default values: dba
node_admin_ssh_exchange
name: node_admin_ssh_exchange, type: bool, level: C
exchange admin ssh key among node cluster
default value is true
When enabled, Pigsty will exchange SSH public keys between members during playbook execution, allowing admins node_admin_username to access each other from different nodes.
node_admin_pk_current
name: node_admin_pk_current, type: bool, level: C
add current user’s ssh pk to admin authorized_keys
default value is true
When enabled, on the current node, the SSH public key (~/.ssh/id_rsa.pub) of the current user is copied to the authorized_keys of the target node admin user.
When deploying in a production env, be sure to pay attention to this parameter, which installs the default public key of the user currently executing the command to the admin user of all machines.
node_admin_pk_list
name: node_admin_pk_list, type: string[], level: C
ssh public keys to be added to admin user
default values: []
Each element of the array is a string containing the key written to the admin user ~/.ssh/authorized_keys, and the user with the corresponding private key can log in as an admin user.
When deploying in production envs, be sure to note this parameter and add only trusted keys to this list.
NODE_TIME
node_timezone:''# setup node timezone, empty string to skipnode_ntp_enabled:true# enable chronyd time sync service?node_ntp_servers:# ntp servers in `/etc/chrony.conf`- pool pool.ntp.org iburstnode_crontab_overwrite:true# overwrite or append to `/etc/crontab`?node_crontab:[]# crontab entries in `/etc/crontab`
node_timezone
name: node_timezone, type: string, level: C
setup node timezone, empty string to skip
default value is empty string, which will not change the default timezone (usually UTC)
node_ntp_enabled
name: node_ntp_enabled, type: bool, level: C
enable chronyd time sync service?
default value is true, and thus Pigsty will override the node’s /etc/chrony.conf by with node_ntp_servers.
If you already a NTP server configured, just set to false to leave it be.
node_ntp_servers
name: node_ntp_servers, type: string[], level: C
ntp servers in /etc/chrony.conf, default value: ["pool pool.ntp.org iburst"]
You can use ${admin_ip} to sync time with ntp server on admin node rather than public ntp server.
node_ntp_servers:['pool ${admin_ip} iburst']
node_crontab_overwrite
name: node_crontab_overwrite, type: bool, level: C
overwrite or append to /etc/crontab?
default value is true, and pigsty will render records in node_crontab in overwrite mode rather than appending to it.
node_crontab
name: node_crontab, type: string[], level: C
crontab entries in /etc/crontab
default values: []
NODE_VIP
You can bind an optional L2 VIP among one node cluster, which is disabled by default.
L2 VIP can only be used in same L2 LAN, which may incurs extra restrictions on your network topology.
If enabled, You have to manually assign the vip_address and vip_vrid for each node cluster.
It is user’s responsibility to ensure that the address / vrid is unique among the same LAN.
vip_enabled:false# enable vip on this node cluster?# vip_address: [IDENTITY] # node vip address in ipv4 format, required if vip is enabled# vip_vrid: [IDENTITY] # required, integer, 1-254, should be unique among same VLANvip_role:backup # optional, `master/backup`, backup by default, use as init rolevip_preempt:false# optional, `true/false`, false by default, enable vip preemptionvip_interface:eth0 # node vip network interface to listen, `eth0` by defaultvip_dns_suffix:''# node vip dns name suffix, empty string by defaultvip_exporter_port:9650# keepalived exporter listen port, 9650 by default
vip_enabled
name: vip_enabled, type: bool, level: C
enable vip on this node cluster? default value is false, means no L2 VIP is created for this node cluster.
L2 VIP can only be used in same L2 LAN, which may incurs extra restrictions on your network topology.
vip_address
name: vip_address, type: ip, level: C
node vip address in IPv4 format, required if node vip_enabled.
no default value. This parameter must be explicitly assigned and unique in your LAN.
vip_vrid
name: vip_address, type: ip, level: C
integer, 1-254, should be unique in same VLAN, required if node vip_enabled.
no default value. This parameter must be explicitly assigned and unique in your LAN.
vip_role
name: vip_role, type: enum, level: I
node vip role, could be master or backup, will be used as initial keepalived state.
vip_preempt
name: vip_preempt, type: bool, level: C/I
optional, true/false, false by default, enable vip preemption
default value is false, means no preempt is happening when a backup have higher priority than living master.
vip_interface
name: vip_interface, type: string, level: C/I
node vip network interface to listen, eth0 by default.
It should be the same primary intranet interface of your node, which is the IP address you used in the inventory file.
If your node have different interface, you can override it on instance vars
vip_dns_suffix
name: vip_dns_suffix, type: string, level: C/I
node vip dns name suffix, empty string by default. It will be used as the DNS name of the node VIP.
vip_exporter_port
name: vip_exporter_port, type: port, level: C/I
keepalived exporter listen port, 9650 by default.
HAPROXY
HAProxy is installed on every node by default, exposing services in a NodePort manner.
haproxy_enabled:true# enable haproxy on this node?haproxy_clean:false# cleanup all existing haproxy config?haproxy_reload:true# reload haproxy after config?haproxy_auth_enabled:true# enable authentication for haproxy admin pagehaproxy_admin_username:admin # haproxy admin username, `admin` by defaulthaproxy_admin_password:pigsty # haproxy admin password, `pigsty` by defaulthaproxy_exporter_port:9101# haproxy admin/exporter port, 9101 by defaulthaproxy_client_timeout:24h # client side connection timeout, 24h by defaulthaproxy_server_timeout:24h # server side connection timeout, 24h by defaulthaproxy_services:[]# list of haproxy service to be exposed on node
haproxy_enabled
name: haproxy_enabled, type: bool, level: C
enable haproxy on this node?
default value is true
haproxy_clean
name: haproxy_clean, type: bool, level: G/C/A
cleanup all existing haproxy config?
default value is false
haproxy_reload
name: haproxy_reload, type: bool, level: A
reload haproxy after config?
default value is true, it will reload haproxy after config change.
If you wish to check before apply, you can turn off this with cli args and check it.
haproxy_auth_enabled
name: haproxy_auth_enabled, type: bool, level: G
enable authentication for haproxy admin page
default value is true, which will require a http basic auth for admin page.
disable it is not recommended, since your traffic control will be exposed
haproxy_admin_username
name: haproxy_admin_username, type: username, level: G
haproxy admin username, admin by default
haproxy_admin_password
name: haproxy_admin_password, type: password, level: G
haproxy admin password, pigsty by default
PLEASE CHANGE IT IN YOUR PRODUCTION ENVIRONMENT!
haproxy_exporter_port
name: haproxy_exporter_port, type: port, level: C
haproxy admin/exporter port, 9101 by default
haproxy_client_timeout
name: haproxy_client_timeout, type: interval, level: C
client side connection timeout, 24h by default
haproxy_server_timeout
name: haproxy_server_timeout, type: interval, level: C
server side connection timeout, 24h by default
haproxy_services
name: haproxy_services, type: service[], level: C
list of haproxy service to be exposed on node, default values: []
Each element is a service definition, here is an ad hoc haproxy service example:
haproxy_services:# list of haproxy service# expose pg-test read only replicas- name:pg-test-ro # [REQUIRED] service name, uniqueport:5440# [REQUIRED] service port, uniqueip:"*"# [OPTIONAL] service listen addr, "*" by defaultprotocol:tcp # [OPTIONAL] service protocol, 'tcp' by defaultbalance:leastconn # [OPTIONAL] load balance algorithm, roundrobin by default (or leastconn)maxconn:20000# [OPTIONAL] max allowed front-end connection, 20000 by defaultdefault:'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'options:- option httpchk- option http-keep-alive- http-check send meth OPTIONS uri /read-only- http-check expect status 200servers:- {name: pg-test-1 ,ip: 10.10.10.11 , port: 5432 , options: check port 8008 , backup:true}- {name: pg-test-2 ,ip: 10.10.10.12 , port: 5432 , options:check port 8008 }- {name: pg-test-3 ,ip: 10.10.10.13 , port: 5432 , options:check port 8008 }
It will be rendered to /etc/haproxy/<service.name>.cfg and take effect after reload.
NODE_EXPORTER
node_exporter_enabled:true# setup node_exporter on this node?node_exporter_port:9100# node exporter listen port, 9100 by defaultnode_exporter_options:'--no-collector.softnet --no-collector.nvme --collector.tcpstat --collector.processes'
node_exporter_enabled
name: node_exporter_enabled, type: bool, level: C
setup node_exporter on this node? default value is true
node_exporter_port
name: node_exporter_port, type: port, level: C
node exporter listen port, 9100 by default
node_exporter_options
name: node_exporter_options, type: arg, level: C
extra server options for node_exporter, default value: --no-collector.softnet --no-collector.nvme --collector.tcpstat --collector.processes
Pigsty enables tcpstat, processes collectors and and disable nvme, softnet metrics collectors by default.
PROMTAIL
Promtail will collect logs from other modules, and send them to LOKI
INFRA: Infra logs, collected only on infra nodes.
nginx-access: /var/log/nginx/access.log
nginx-error: /var/log/nginx/error.log
grafana: /var/log/grafana/grafana.log
NODES: Host node logs, collected on all nodes.
syslog: /var/log/messages
dmesg: /var/log/dmesg
cron: /var/log/cron
PGSQL: PostgreSQL logs, collected when a node is defined with pg_cluster.
postgres: /pg/log/postgres/*
patroni: /pg/log/patroni.log
pgbouncer: /pg/log/pgbouncer/pgbouncer.log
pgbackrest: /pg/log/pgbackrest/*.log
REDIS: Redis logs, collected when a node is defined with redis_cluster.
promtail_enabled:true# enable promtail logging collector?promtail_clean:false# purge existing promtail status file during init?promtail_port:9080# promtail listen port, 9080 by defaultpromtail_positions:/var/log/positions.yaml# promtail position status file path
promtail_enabled
name: promtail_enabled, type: bool, level: C
enable promtail logging collector?
default value is true
promtail_clean
name: promtail_clean, type: bool, level: G/A
purge existing promtail status file during init?
default value is false, if you choose to clean, Pigsty will remove the existing state file defined by promtail_positions
which means that Promtail will recollect all logs on the current node and send them to Loki again.
promtail_port
name: promtail_port, type: port, level: C
promtail listen port, 9080 by default
default values: 9080
promtail_positions
name: promtail_positions, type: path, level: C
promtail position status file path
default values: /var/log/positions.yaml
Promtail records the consumption offsets of all logs, which are periodically written to the file specified by promtail_positions.
docker image cache tarball glob list, ["/tmp/docker/*.tgz"] by default.
The local docker image cache with .tgz suffix match this glob list will be loaded into docker one by one:
cat *.tgz | gzip -d -c - | docker load
ETCD
ETCD is a distributed, reliable key-value store for the most critical data of a distributed system,
and pigsty use etcd as DCS, Which is critical to PostgreSQL High-Availability.
Pigsty has a hard coded group name etcd for etcd cluster, it can be an existing & external etcd cluster, or a new etcd cluster created by Pigsty with etcd.yml.
#etcd_seq: 1 # etcd instance identifier, explicitly required#etcd_cluster: etcd # etcd cluster & group name, etcd by defaultetcd_safeguard:false# prevent purging running etcd instance?etcd_clean:true# purging existing etcd during initialization?etcd_data:/data/etcd # etcd data directory, /data/etcd by defaultetcd_port:2379# etcd client port, 2379 by defaultetcd_peer_port:2380# etcd peer port, 2380 by defaultetcd_init:new # etcd initial cluster state, new or existingetcd_election_timeout:1000# etcd election timeout, 1000ms by defaultetcd_heartbeat_interval:100# etcd heartbeat interval, 100ms by default
etcd_seq
name: etcd_seq, type: int, level: I
etcd instance identifier, REQUIRED
no default value, you have to specify it explicitly. Here is a 3-node etcd cluster example:
etcd:# dcs service for postgres/patroni ha consensushosts:# 1 node for testing, 3 or 5 for production10.10.10.10:{etcd_seq:1}# etcd_seq required10.10.10.11:{etcd_seq:2}# assign from 1 ~ n10.10.10.12:{etcd_seq:3}# odd number pleasevars:# cluster level parameter override roles/etcdetcd_cluster:etcd # mark etcd cluster name etcdetcd_safeguard:false# safeguard against purgingetcd_clean:true# purge etcd during init process
etcd_cluster
name: etcd_cluster, type: string, level: C
etcd cluster & group name, etcd by default
default values: etcd, which is a fixed group name, can be useful when you want to use deployed some extra etcd clusters
etcd_safeguard
name: etcd_safeguard, type: bool, level: G/C/A
prevent purging running etcd instance? default value is false
If enabled, running etcd instance will not be purged by etcd.yml playbook.
etcd_clean
name: etcd_clean, type: bool, level: G/C/A
purging existing etcd during initialization? default value is true
If enabled, running etcd instance will be purged by etcd.yml playbook, which makes the playbook fully idempotent.
But if etcd_safeguard is enabled, it will still abort on any running etcd instance.
etcd_data
name: etcd_data, type: path, level: C
etcd data directory, /data/etcd by default
etcd_port
name: etcd_port, type: port, level: C
etcd client port, 2379 by default
etcd_peer_port
name: etcd_peer_port, type: port, level: C
etcd peer port, 2380 by default
etcd_init
name: etcd_init, type: enum, level: C
etcd initial cluster state, new or existing
default values: new, which will create a standalone new etcd cluster.
The value existing is used when trying to add new member to existing etcd cluster.
etcd_election_timeout
name: etcd_election_timeout, type: int, level: C
etcd election timeout, 1000 (ms) by default
etcd_heartbeat_interval
name: etcd_heartbeat_interval, type: int, level: C
etcd heartbeat interval, 100 (ms) by default
MINIO
Minio is a S3 compatible object storage service. Which is used as an optional central backup storage repo for PostgreSQL.
But you can use it for other purpose, such as storing large files, document, pictures & videos.
#minio_seq: 1 # minio instance identifier, REQUIREDminio_cluster:minio # minio cluster name, minio by defaultminio_clean:false# cleanup minio during init?, false by defaultminio_user:minio # minio os user, `minio` by defaultminio_node:'${minio_cluster}-${minio_seq}.pigsty'# minio node name patternminio_data:'/data/minio'# minio data dir(s), use {x...y} to specify multi driversminio_domain:sss.pigsty # minio external domain name, `sss.pigsty` by defaultminio_port:9000# minio service port, 9000 by defaultminio_admin_port:9001# minio console port, 9001 by defaultminio_access_key:minioadmin # root access key, `minioadmin` by defaultminio_secret_key:minioadmin # root secret key, `minioadmin` by defaultminio_extra_vars:''# extra environment variablesminio_alias:sss # alias name for local minio deploymentminio_buckets:[{name:pgsql }, { name: infra }, { name: redis } ]minio_users:- {access_key: dba , secret_key: S3User.DBA, policy:consoleAdmin }- {access_key: pgbackrest , secret_key: S3User.Backup, policy:readwrite }
minio_seq
name: minio_seq, type: int, level: I
minio instance identifier, REQUIRED identity parameters. no default value, you have to assign it manually
minio_cluster
name: minio_cluster, type: string, level: C
minio cluster name, minio by default. This is useful when deploying multiple MinIO clusters
minio_clean
name: minio_clean, type: bool, level: G/C/A
cleanup minio during init?, false by default
minio_user
name: minio_user, type: username, level: C
minio os user name, minio by default
minio_node
name: minio_node, type: string, level: C
minio node name pattern, this is used for multi-node deployment
Two default users are created for PostgreSQL DBA and pgBackREST.
PLEASE ADJUST THESE USERS & CREDENTIALS IN YOUR DEPLOYMENT!
REDIS
#redis_cluster: <CLUSTER> # redis cluster name, required identity parameter#redis_node: 1 <NODE> # redis node sequence number, node int id required#redis_instances: {} <NODE> # redis instances definition on this redis noderedis_fs_main:/data # redis main data mountpoint, `/data` by defaultredis_exporter_enabled:true# install redis exporter on redis nodes?redis_exporter_port:9121# redis exporter listen port, 9121 by defaultredis_exporter_options:''# cli args and extra options for redis exporterredis_safeguard:false# prevent purging running redis instance?redis_clean:true# purging existing redis during init?redis_rmdata:true# remove redis data when purging redis server?redis_mode: standalone # redis mode:standalone,cluster,sentinelredis_conf:redis.conf # redis config template path, except sentinelredis_bind_address:'0.0.0.0'# redis bind address, empty string will use host ipredis_max_memory:1GB # max memory used by each redis instanceredis_mem_policy:allkeys-lru # redis memory eviction policyredis_password:''# redis password, empty string will disable passwordredis_rdb_save:['1200 1']# redis rdb save directives, disable with empty listredis_aof_enabled:false# enable redis append only file?redis_rename_commands:{}# rename redis dangerous commandsredis_cluster_replicas:1# replica number for one master in redis clusterredis_sentinel_monitor:[]# sentinel master list, works on sentinel cluster only
redis_cluster
name: redis_cluster, type: string, level: C
redis cluster name, required identity parameter.
no default value, you have to define it explicitly.
Comply with regexp [a-z][a-z0-9-]*, it is recommended to use the same name as the group name and start with redis-
redis_node
name: redis_node, type: int, level: I
redis node sequence number, unique integer among redis cluster is required
You have to explicitly define the node id for each redis node. integer start from 0 or 1.
redis_instances
name: redis_instances, type: dict, level: I
redis instances definition on this redis node
no default value, you have to define redis instances on each redis node using this parameter explicitly.
Here is an example for a native redis cluster definition
redis bind address, empty string will use inventory hostname
default values: 0.0.0.0, which will bind to all available IPv4 address on this host
PLEASE bind to intranet IP only in production environment, i.e. set this value to ''
redis_max_memory
name: redis_max_memory, type: size, level: C/I
max memory used by each redis instance, default values: 1GB
redis_mem_policy
name: redis_mem_policy, type: enum, level: C
redis memory eviction policy
default values: allkeys-lru, check redis eviction policy for more details
noeviction: New values aren’t saved when memory limit is reached. When a database uses replication, this applies to the primary database
allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
volatile-lru: Removes least recently used keys with the expire field set to true.
volatile-lfu: Removes least frequently used keys with the expire field set to true.
allkeys-random: Randomly removes keys to make space for the new data added.
volatile-random: Randomly removes keys with expire field set to true.
volatile-ttl: Removes keys with expire field set to true and the shortest remaining time-to-live (TTL) value.
redis_password
name: redis_password, type: password, level: C/N
redis password, empty string will disable password, which is the default behavior
Note that due to the implementation limitation of redis_exporter, you can only set one redis_password per node.
This is usually not a problem, because pigsty does not allow deploying two different redis cluster on the same node.
PLEASE use a strong password in production environment
redis_rdb_save
name: redis_rdb_save, type: string[], level: C
redis rdb save directives, disable with empty list, check redis persist for details.
the default value is ["1200 1"]: dump the dataset to disk every 20 minutes if at least 1 key changed:
redis_aof_enabled
name: redis_aof_enabled, type: bool, level: C
enable redis append only file? default value is false.
redis_rename_commands
name: redis_rename_commands, type: dict, level: C
rename redis dangerous commands, which is a dict of k:v old: new
default values: {}, you can hide dangerous commands like FLUSHDB and FLUSHALL by setting this value, here’s an example:
replica number for one master/primary in redis cluster, default values: 1
redis_sentinel_monitor
name: redis_sentinel_monitor, type: master[], level: C
This can only be used when redis_mode is set to sentinel.
List of redis master to be monitored by this sentinel cluster. each master is defined as a dict with name, host, port, password, quorum keys.
redis_sentinel_monitor:# primary list for redis sentinel, use cls as name, primary ip:port- {name: redis-src, host: 10.10.10.45, port: 6379 ,password: redis.src, quorum:1}- {name: redis-dst, host: 10.10.10.48, port: 6379 ,password: redis.dst, quorum:1}
The name and host are mandatory, port, password, quorum are optional, quorum is used to set the quorum for this master, usually large than half of the sentinel instances.
PGSQL
PGSQL module requires NODE module to be installed, and you also need a viable ETCD cluster to store cluster meta data.
Install PGSQL module on a single node will create a primary instance which a standalone PGSQL server/instance.
Install it on additional nodes will create replicas, which can be used for serving read-only traffics, or use as standby backup.
You can also create offline instance of ETL/OLAP/Interactive queries,
use Sync Standby and Quorum Commit to increase data consistency,
or even form a standby cluster and delayed standby cluster for disaster recovery.
You can define multiple PGSQL clusters and form a horizontal sharding cluster, which is a group of PGSQL clusters running on different nodes.
Pigsty has native citus cluster group support, which can extend your PGSQL cluster to a distributed database sharding cluster.
PG_ID
Here are some common parameters used to identify PGSQL entities: instance, service, etc…
# pg_cluster: #CLUSTER # pgsql cluster name, required identity parameter# pg_seq: 0 #INSTANCE # pgsql instance seq number, required identity parameter# pg_role: replica #INSTANCE # pgsql role, required, could be primary,replica,offline# pg_instances: {} #INSTANCE # define multiple pg instances on node in `{port:ins_vars}` format# pg_upstream: #INSTANCE # repl upstream ip addr for standby cluster or cascade replica# pg_shard: #CLUSTER # pgsql shard name, optional identity for sharding clusters# pg_group: 0 #CLUSTER # pgsql shard index number, optional identity for sharding clusters# gp_role: master #CLUSTER # greenplum role of this cluster, could be master or segmentpg_offline_query:false#INSTANCE # set to true to enable offline query on this instance
You have to assign these identity parameters explicitly, there’s no default value for them.
pg_cluster: It identifies the name of the cluster, which is configured at the cluster level.
pg_role: Configured at the instance level, identifies the role of the ins. Only the primary role will be handled specially. If not filled in, the default is the replica role and the special delayed and offline roles.
pg_seq: Used to identify the ins within the cluster, usually with an integer number incremented from 0 or 1, which is not changed once it is assigned.
{{ pg_cluster }}-{{ pg_seq }} is used to uniquely identify the ins, i.e. pg_instance.
{{ pg_cluster }}-{{ pg_role }} is used to identify the services within the cluster, i.e. pg_service.
pg_shard and pg_group are used for horizontally sharding clusters, for citus, greenplum, and matrixdb only.
pg_cluster, pg_role, pg_seq are core identity params, which are required for any Postgres cluster, and must be explicitly specified. Here’s an example:
All other params can be inherited from the global config or the default config, but the identity params must be explicitly specified and manually assigned.
pg_mode
name: pg_mode, type: enum, level: C
pgsql cluster mode, pgsql by default, i.e. standard PostgreSQL cluster.
pgsql: Standard PostgreSQL cluster, default value.
citus: Horizontal sharding cluster with citus extension.
If pg_mode is set to citus or gpsql, pg_shard and pg_group will be required for horizontal sharding clusters.
pg_cluster
name: pg_cluster, type: string, level: C
pgsql cluster name, REQUIRED identity parameter
The cluster name will be used as the namespace for PGSQL related resources within that cluster.
The naming needs to follow the specific naming pattern: [a-z][a-z0-9-]* to be compatible with the requirements of different constraints on the identity.
A serial number of this instance, unique within its cluster, starting from 0 or 1.
pg_role
name: pg_role, type: enum, level: I
pgsql role, REQUIRED, could be primary,replica,offline
Roles for PGSQL instance, can be: primary, replica, standby or offline.
primary: Primary, there is one and only one primary in a cluster.
replica: Replica for carrying online read-only traffic, there may be a slight replication delay through (10ms~100ms, 100KB).
standby: Special replica that is always synced with primary, there’s no replication delay & data loss on this replica. (currently same as replica)
offline: Offline replica for taking on offline read-only traffic, such as statistical analysis/ETL/personal queries, etc.
Identity params, required params, and instance-level params.
pg_instances
name: pg_instances, type: dict, level: I
define multiple pg instances on node in {port:ins_vars} format.
This parameter is reserved for multi-instance deployment on a single node which is not implemented in Pigsty yet.
pg_upstream
name: pg_upstream, type: ip, level: I
Upstream ip address for standby cluster or cascade replica
Setting pg_upstream is set on primary instance indicate that this cluster is a Standby Cluster, and will receiving changes from upstream instance, thus the primary is actually a standby leader.
Setting pg_upstream for a non-primary instance will explicitly set a replication upstream instance, if it is different from the primary ip addr, this instance will become a cascade replica. And it’s user’s responsibility to ensure that the upstream IP addr is another instance in the same cluster.
pg_shard
name: pg_shard, type: string, level: C
pgsql shard name, required identity parameter for sharding clusters (e.g. citus cluster), optional for common pgsql clusters.
When multiple pgsql clusters serve the same business together in a horizontally sharding style, Pigsty will mark this group of clusters as a Sharding Group.
pg_shard is the name of the shard group name. It’s usually the prefix of pg_cluster.
For example, if we have a sharding group pg-citus, and 4 clusters in it, there identity params will be:
pgsql shard index number, required identity for sharding clusters, optional for common pgsql clusters.
Sharding cluster index of sharding group, used in pair with pg_shard. You can use any non-negative integer as the index number.
gp_role
name: gp_role, type: enum, level: C
greenplum/matrixdb role of this cluster, could be master or segment
master: mark the postgres cluster as greenplum master, which is the default value
segment mark the postgres cluster as greenplum segment
This parameter is only used for greenplum/matrixdb database, and is ignored for common pgsql cluster.
pg_exporters
name: pg_exporters, type: dict, level: C
additional pg_exporters to monitor remote postgres instances, default values: {}
If you wish to monitoring remote postgres instances, define them in pg_exporters and load them with pgsql-monitor.yml playbook.
pg_exporters:# list all remote instances here, alloc a unique unused local port as k20001:{pg_cluster: pg-foo, pg_seq: 1, pg_host:10.10.10.10}20004:{pg_cluster: pg-foo, pg_seq: 2, pg_host:10.10.10.11}20002:{pg_cluster: pg-bar, pg_seq: 1, pg_host:10.10.10.12}20003:{pg_cluster: pg-bar, pg_seq: 1, pg_host:10.10.10.13}
set to true to enable offline query on this instance
default value is false
When set to true, the user group dbrole_offline can connect to the ins and perform offline queries, regardless of the role of the current instance, just like a offline instance.
If you just have one replica or even one primary in your postgres cluster, adding this could mark it for accepting ETL, slow queries with interactive access.
PG_BUSINESS
Database credentials, In-Database Objects that need to be taken care of by Users.
WARNING: YOU HAVE TO CHANGE THESE DEFAULT PASSWORDs in production environment.
# postgres business object definition, overwrite in group varspg_users:[]# postgres business userspg_databases:[]# postgres business databasespg_services:[]# postgres business servicespg_hba_rules:[]# business hba rules for postgrespgb_hba_rules:[]# business hba rules for pgbouncer# global credentials, overwrite in global varspg_dbsu_password:''# dbsu password, empty string means no dbsu password by defaultpg_replication_username:replicatorpg_replication_password:DBUser.Replicatorpg_admin_username:dbuser_dbapg_admin_password:DBUser.DBApg_monitor_username:dbuser_monitorpg_monitor_password:DBUser.Monitor
pg_users
name: pg_users, type: user[], level: C
postgres business users, has to be defined at cluster level.
default values: [], each object in the array defines a User/Role. Examples:
- name:dbuser_meta # REQUIRED, `name` is the only mandatory field of a user definitionpassword:DBUser.Meta # optional, password, can be a scram-sha-256 hash string or plain textlogin:true# optional, can log in, true by default (new biz ROLE should be false)superuser:false# optional, is superuser? false by defaultcreatedb:false# optional, can create database? false by defaultcreaterole:false# optional, can create role? false by defaultinherit:true# optional, can this role use inherited privileges? true by defaultreplication:false# optional, can this role do replication? false by defaultbypassrls:false# optional, can this role bypass row level security? false by defaultpgbouncer:true# optional, add this user to pgbouncer user-list? false by default (production user should be true explicitly)connlimit:-1# optional, user connection limit, default -1 disable limitexpire_in:3650# optional, now + n days when this role is expired (OVERWRITE expire_at)expire_at:'2030-12-31'# optional, YYYY-MM-DD 'timestamp' when this role is expired (OVERWRITTEN by expire_in)comment:pigsty admin user # optional, comment string for this user/roleroles: [dbrole_admin] # optional, belonged roles. default roles are:dbrole_{admin,readonly,readwrite,offline}parameters:{}# optional, role level parameters with `ALTER ROLE SET`pool_mode:transaction # optional, pgbouncer pool mode at user level, transaction by defaultpool_connlimit:-1# optional, max database connections at user level, default -1 disable limitsearch_path: public # key value config parameters according to postgresql documentation (e.g:use pigsty as default search_path)
The only mandatory field of a user definition is name, and the rest are optional.
pg_databases
name: pg_databases, type: database[], level: C
postgres business databases, has to be defined at cluster level.
default values: [], each object in the array defines a Database. Examples:
- name:meta # REQUIRED, `name` is the only mandatory field of a database definitionbaseline:cmdb.sql # optional, database sql baseline path, (relative path among ansible search path, e.g files/)pgbouncer:true# optional, add this database to pgbouncer database list? true by defaultschemas:[pigsty] # optional, additional schemas to be created, array of schema namesextensions: # optional, additional extensions to be installed:array of `{name[,schema]}`- {name: postgis , schema:public }- {name:timescaledb }comment:pigsty meta database # optional, comment string for this databaseowner:postgres # optional, database owner, postgres by defaulttemplate:template1 # optional, which template to use, template1 by defaultencoding:UTF8 # optional, database encoding, UTF8 by default. (MUST same as template database)locale:C # optional, database locale, C by default. (MUST same as template database)lc_collate:C # optional, database collate, C by default. (MUST same as template database)lc_ctype:C # optional, database ctype, C by default. (MUST same as template database)tablespace:pg_default # optional, default tablespace, 'pg_default' by default.allowconn:true# optional, allow connection, true by default. false will disable connect at allrevokeconn:false# optional, revoke public connection privilege. false by default. (leave connect with grant option to owner)register_datasource:true# optional, register this database to grafana datasources? true by defaultconnlimit:-1# optional, database connection limit, default -1 disable limitpool_auth_user:dbuser_meta # optional, all connection to this pgbouncer database will be authenticated by this userpool_mode:transaction # optional, pgbouncer pool mode at database level, default transactionpool_size:64# optional, pgbouncer pool size at database level, default 64pool_size_reserve:32# optional, pgbouncer pool size reserve at database level, default 32pool_size_min:0# optional, pgbouncer pool size min at database level, default 0pool_max_db_conn:100# optional, max database connections at database level, default 100
In each database definition, the DB name is mandatory and the rest are optional.
pg_services
name: pg_services, type: service[], level: C
postgres business services exposed via haproxy, has to be defined at cluster level.
default values: [], each object in the array defines a Service. Examples:
- name: standby # required, service name, the actual svc name will be prefixed with `pg_cluster`, e.g:pg-meta-standbyport:5435# required, service exposed port (work as kubernetes service node port mode)ip:"*"# optional, service bind ip address, `*` for all ip by defaultselector:"[]"# required, service member selector, use JMESPath to filter inventorydest:default # optional, destination port, default|postgres|pgbouncer|<port_number>, 'default' by defaultcheck:/sync # optional, health check url path, / by defaultbackup:"[? pg_role == `primary`]"# backup server selectormaxconn:3000# optional, max allowed front-end connectionbalance: roundrobin # optional, haproxy load balance algorithm (roundrobin by default, other:leastconn)options:'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
pg_hba_rules
name: pg_hba_rules, type: hba[], level: C
business hba rules for postgres
default values: [], each object in array is an HBA Rule definition:
Which are array of hba object, each hba object may look like
# RAW HBA RULES- title:allow intranet password accessrole:commonrules:- host all all 10.0.0.0/8 md5- host all all 172.16.0.0/12 md5- host all all 192.168.0.0/16 md5
title: Rule Title, transform into comment in hba file
rules: Array of strings, each string is a raw hba rule record
role: Applied roles, where to install these hba rules
common: apply for all instances
primary, replica,standby, offline: apply on corresponding instances with that pg_role.
special case: HBA rule with role == 'offline' will be installed on instance with pg_offline_query flag
or you can use another alias form
- addr:'intra'# world|intra|infra|admin|local|localhost|cluster|<cidr>auth:'pwd'# trust|pwd|ssl|cert|deny|<official auth method>user:'all'# all|${dbsu}|${repl}|${admin}|${monitor}|<user>|<group>db:'all'# all|replication|....rules:[]# raw hba string precedence over above alltitle:allow intranet password access
pg_default_hba_rules is similar to this, but is used for global HBA rule settings
pgb_hba_rules
name: pgb_hba_rules, type: hba[], level: C
business hba rules for pgbouncer, default values: []
Similar to pg_hba_rules, array of hba rule object, except this is for pgbouncer.
pg_replication_username
name: pg_replication_username, type: username, level: G
postgres replication username, replicator by default
This parameter is globally used, it not wise to change it.
pg_replication_password
name: pg_replication_password, type: password, level: G
postgres replication password, DBUser.Replicator by default
WARNING: CHANGE THIS IN PRODUCTION ENVIRONMENT!!!!
pg_admin_username
name: pg_admin_username, type: username, level: G
postgres admin username, dbuser_dba by default, which is a global postgres superuser.
default values: dbuser_dba
pg_admin_password
name: pg_admin_password, type: password, level: G
postgres admin password in plain text, DBUser.DBA by default
WARNING: CHANGE THIS IN PRODUCTION ENVIRONMENT!!!!
pg_monitor_username
name: pg_monitor_username, type: username, level: G
postgres monitor username, dbuser_monitor by default, which is a global monitoring user.
pg_monitor_password
name: pg_monitor_password, type: password, level: G
postgres monitor password, DBUser.Monitor by default.
Try not using the @:/ character in the password to avoid problems with PGURL string.
WARNING: CHANGE THIS IN PRODUCTION ENVIRONMENT!!!!
PostgreSQL dbsu password for pg_dbsu, empty string means no dbsu password, which is the default behavior.
WARNING: It’s not recommend to set a dbsu password for common PGSQL clusters, except for pg_mode = citus.
PG_INSTALL
This section is responsible for installing PostgreSQL & Extensions.
If you wish to install a different major version, just make sure repo packages exists and overwrite pg_version on cluster level.
To install extra extensions, overwrite pg_extensions on cluster level. Beware that not all extensions are available with other major versions.
pg_dbsu:postgres # os dbsu name, postgres by default, better not change itpg_dbsu_uid:26# os dbsu uid and gid, 26 for default postgres users and groupspg_dbsu_sudo:limit # dbsu sudo privilege, none,limit,all,nopass. limit by defaultpg_dbsu_home:/var/lib/pgsql # postgresql home directory, `/var/lib/pgsql` by defaultpg_dbsu_ssh_exchange:true# exchange postgres dbsu ssh key among same pgsql clusterpg_version:16# postgres major version to be installed, 16 by defaultpg_bin_dir:/usr/pgsql/bin # postgres binary dir, `/usr/pgsql/bin` by defaultpg_log_dir:/pg/log/postgres # postgres log dir, `/pg/log/postgres` by defaultpg_packages:# pg packages to be installed, alias can be used- postgresql- patroni pgbouncer pgbackrest pg_exporter pgbadger vip-manager wal2json pg_repackpg_extensions:# pg extensions to be installed, alias can be used- postgis timescaledb pgvector
pg_dbsu
name: pg_dbsu, type: username, level: C
os dbsu name, postgres by default, it’s not wise to change it.
When installing Greenplum / MatrixDB, set this parameter to the corresponding default value: gpadmin|mxadmin.
pg_dbsu_uid
name: pg_dbsu_uid, type: int, level: C
os dbsu uid and gid, 26 for default postgres users and groups, which is consistent with the official pgdg RPM.
For Ubuntu/Debian, there’s no default postgres UID/GID, consider using another ad hoc value, such as 543 instead.
pg_dbsu_sudo
name: pg_dbsu_sudo, type: enum, level: C
dbsu sudo privilege, coud be none, limit ,all ,nopass. limit by default
none: No Sudo privilege
limit: Limited sudo privilege to execute systemctl commands for database-related components, default.
all: Full sudo privilege, password required.
nopass: Full sudo privileges without a password (not recommended).
default values: limit, which only allow sudo systemctl <start|stop|reload> <postgres|patroni|pgbouncer|...>
pg_dbsu_home
name: pg_dbsu_home, type: path, level: C
postgresql home directory, /var/lib/pgsql by default, which is consistent with the official pgdg RPM.
pg_dbsu_ssh_exchange
name: pg_dbsu_ssh_exchange, type: bool, level: C
exchange postgres dbsu ssh key among same pgsql cluster?
default value is true, means the dbsu can ssh to each other among the same cluster.
pg_version
name: pg_version, type: enum, level: C
postgres major version to be installed, 15 by default
Note that PostgreSQL physical stream replication cannot cross major versions, so do not configure this on instance level.
You can use the parameters in pg_packages and pg_extensions to install rpms for the specific pg major version.
pg_bin_dir
name: pg_bin_dir, type: path, level: C
postgres binary dir, /usr/pgsql/bin by default
The default value is a soft link created manually during the installation process, pointing to the specific Postgres version dir installed.
For example /usr/pgsql -> /usr/pgsql-15. For more details, check PGSQL File Structure for details.
pg_log_dir
name: pg_log_dir, type: path, level: C
postgres log dir, /pg/log/postgres by default.
caveat: if pg_log_dir is prefixed with pg_data it will not be created explicit (it will be created by postgres itself then).
pg_packages
name: pg_packages, type: string[], level: C
PG packages to be installed (rpm/deb), this is an array of software package names, each element is a comma or space separated PG software package name.
The default value is the PostgreSQL kernel, as well as patroni, pgbouncer, pg_exporter, … and two important extension pg_repack and wal2json.
pg_packages:# pg packages to be installed, alias can be used- postgresql- patroni pgbouncer pgbackrest pg_exporter pgbadger vip-manager wal2json pg_repack
Starting from Pigsty v3, you can use package aliases specified by pg_package_map in roles/node_id/vars to perform an alias mapping.
The benefit of using package aliases is that you don’t have to worry about the package names, architectures, and major version of PostgreSQL-related packages on different OS platforms, thus sealing off the differences between different OSs.
You can also use the raw package name directly, the ${pg_version} or $v version placeholder in the package name will be replaced with the actual PG major version pg_version.
pg_extensions
name: pg_extensions, type: string[], level: C
PG extensions to be installed (rpm/deb), this is an array of software package names, each element is a comma or space separated PG extension package name.
This parameter is similar to pg_packages, but is usually used to specify the extension to be installed @ global | cluster level, and the software packages specified here will be upgraded to the latest available version.
The default value of this parameter is the three most important extension plugins in the PG extension ecosystem: postgis, timescaledb, pgvector.
pg_extensions:# pg extensions to be installed, alias can be used- postgis timescaledb pgvector # replace postgis with postgis33 when using el7
The complete list of extensions can be found in auto generated config, there are EL9 extension list:
Bootstrap a postgres cluster with patroni, and setup pgbouncer connection pool along with it.
It also init cluster template databases with default roles, schemas & extensions & default privileges specified in PG_PROVISION
pg_safeguard:false# prevent purging running postgres instance? false by defaultpg_clean:true# purging existing postgres during pgsql init? true by defaultpg_data:/pg/data # postgres data directory, `/pg/data` by defaultpg_fs_main:/data # mountpoint/path for postgres main data, `/data` by defaultpg_fs_bkup:/data/backups # mountpoint/path for pg backup data, `/data/backup` by defaultpg_storage_type:SSD # storage type for pg main data, SSD,HDD, SSD by defaultpg_dummy_filesize:64MiB # size of `/pg/dummy`, hold 64MB disk space for emergency usepg_listen:'0.0.0.0'# postgres/pgbouncer listen addresses, comma separated listpg_port:5432# postgres listen port, 5432 by defaultpg_localhost:/var/run/postgresql# postgres unix socket dir for localhost connectionpatroni_enabled:true# if disabled, no postgres cluster will be created during initpatroni_mode: default # patroni working mode:default,pause,removepg_namespace:/pg # top level key namespace in etcd, used by patroni & vippatroni_port:8008# patroni listen port, 8008 by defaultpatroni_log_dir:/pg/log/patroni # patroni log dir, `/pg/log/patroni` by defaultpatroni_ssl_enabled:false# secure patroni RestAPI communications with SSL?patroni_watchdog_mode: off # patroni watchdog mode:automatic,required,off. off by defaultpatroni_username:postgres # patroni restapi username, `postgres` by defaultpatroni_password:Patroni.API # patroni restapi password, `Patroni.API` by defaultpg_primary_db:postgres # primary database name, used by citus,etc... ,postgres by defaultpg_parameters:{}# extra parameters in postgresql.auto.confpg_files:[]# extra files to be copied to postgres data directory (e.g. license)pg_conf: oltp.yml # config template:oltp,olap,crit,tiny. `oltp.yml` by defaultpg_max_conn:auto # postgres max connections, `auto` will use recommended valuepg_shared_buffer_ratio:0.25# postgres shared buffers ratio, 0.25 by default, 0.1~0.4pg_rto:30# recovery time objective in seconds, `30s` by defaultpg_rpo:1048576# recovery point objective in bytes, `1MiB` at most by defaultpg_libs:'pg_stat_statements, auto_explain'# preloaded libraries, `pg_stat_statements,auto_explain` by defaultpg_delay:0# replication apply delay for standby cluster leaderpg_checksum:false# enable data checksum for postgres cluster?pg_pwd_enc: scram-sha-256 # passwords encryption algorithm:md5,scram-sha-256pg_encoding:UTF8 # database cluster encoding, `UTF8` by defaultpg_locale:C # database cluster local, `C` by defaultpg_lc_collate:C # database cluster collate, `C` by defaultpg_lc_ctype:en_US.UTF8 # database character type, `en_US.UTF8` by defaultpgbouncer_enabled:true# if disabled, pgbouncer will not be launched on pgsql hostpgbouncer_port:6432# pgbouncer listen port, 6432 by defaultpgbouncer_log_dir:/pg/log/pgbouncer # pgbouncer log dir, `/pg/log/pgbouncer` by defaultpgbouncer_auth_query:false# query postgres to retrieve unlisted business users?pgbouncer_poolmode: transaction # pooling mode:transaction,session,statement, transaction by defaultpgbouncer_sslmode:disable # pgbouncer client ssl mode, disable by default
pg_safeguard
name: pg_safeguard, type: bool, level: G/C/A
prevent purging running postgres instance? false by default
If enabled, pgsql.yml & pgsql-rm.yml will abort immediately if any postgres instance is running.
pg_clean
name: pg_clean, type: bool, level: G/C/A
purging existing postgres during pgsql init? true by default
default value is true, it will purge existing postgres instance during pgsql.yml init. which makes the playbook idempotent.
if set to false, pgsql.yml will abort if there’s already a running postgres instance. and pgsql-rm.yml will NOT remove postgres data (only stop the server).
pg_data
name: pg_data, type: path, level: C
postgres data directory, /pg/data by default
default values: /pg/data, DO NOT CHANGE IT.
It’s a soft link that point to underlying data directory.
mountpoint/path for postgres main data, /data by default
default values: /data, which will be used as parent dir of postgres main data directory: /data/postgres.
It’s recommended to use NVME SSD for postgres main data storage, Pigsty is optimized for SSD storage by default.
But HDD is also supported, you can change pg_storage_type to HDD to optimize for HDD storage.
pg_fs_bkup
name: pg_fs_bkup, type: path, level: C
mountpoint/path for pg backup data, /data/backup by default
If you are using the default pgbackrest_method = local, it is recommended to have a separate disk for backup storage.
The backup disk should be large enough to hold all your backups, at least enough for 3 basebackups + 2 days WAL archive.
This is usually not a problem since you can use cheap & large HDD for that.
It’s recommended to use a separate disk for backup storage, otherwise pigsty will fall back to the main data disk.
pg_storage_type
name: pg_storage_type, type: enum, level: C
storage type for pg main data, SSD,HDD, SSD by default
default values: SSD, it will affect some tuning parameters, such as random_page_cost & effective_io_concurrency
pg_dummy_filesize
name: pg_dummy_filesize, type: size, level: C
size of /pg/dummy, default values: 64MiB, which hold 64MB disk space for emergency use
When the disk is full, removing the placeholder file can free up some space for emergency use, it is recommended to use at least 8GiB for production use.
pg_listen
name: pg_listen, type: ip, level: C
postgres/pgbouncer listen address, 0.0.0.0 (all ipv4 addr) by default
You can use placeholder in this variable:
${ip}: translate to inventory_hostname, which is primary private IP address in the inventory
This parameter is used to specify and manage configuration parameters in postgresql.auto.conf.
After all instances in the cluster have completed initialization, the pg_param task will sequentially overwrite the key/value pairs in this dictionary to /pg/data/postgresql.auto.conf.
Note: Please do not manually modify this configuration file, or use ALTER SYSTEM to change cluster configuration parameters. Any changes will be overwritten during the next configuration sync.
This variable has a higher priority than the cluster configuration in Patroni/DCS (i.e., it has a higher priority than the cluster configuration edited by Patroni edit-config). Therefore, it can typically override the cluster default parameters at the instance level.
When your cluster members have different specifications (not recommended!), you can fine-tune the configuration of each instance using this parameter.
Please note that some important cluster parameters (which have requirements for primary and replica parameter values) are managed directly by Patroni through command-line parameters and have the highest priority. These cannot be overridden by this method. For these parameters, you must use Patroni edit-config for management and configuration.
PostgreSQL parameters that must remain consistent across primary and replicas (inconsistency will prevent the replica from starting!):
wal_level
max_connections
max_locks_per_transaction
max_worker_processes
max_prepared_transactions
track_commit_timestamp
Parameters that should ideally remain consistent across primary and replicas (considering the possibility of primary-replica switch):
listen_addresses
port
cluster_name
hot_standby
wal_log_hints
max_wal_senders
max_replication_slots
wal_keep_segments
wal_keep_size
You can set non-existent parameters (such as GUCs from extensions), but changing existing configurations to illegal values may prevent PostgreSQL from starting. Please configure with caution!
pg_files
Parameter Name: pg_files, Type: path[], Level: C
Designates a list of files to be copied to the {{ pg_data }} directory. The default value is an empty array: [].
Files specified in this parameter will be copied to the {{ pg_data }} directory. This is mainly used to distribute license files required by special commercial versions of the PostgreSQL kernel.
Currently, only the PolarDB (Oracle-compatible) kernel requires a license file. For example, you can place the license.lic file in the files/ directory and specify it in pg_files:
pg_files:[license.lic ]
pg_conf
name: pg_conf, type: enum, level: C
config template: {oltp,olap,crit,tiny}.yml, oltp.yml by default
tiny.yml: optimize for tiny nodes, virtual machines, small demo, (1~8Core, 1~16GB)
oltp.yml: optimize for OLTP workloads and latency sensitive applications, (4C8GB+), which is the default template
olap.yml: optimize for OLAP workloads and throughput (4C8G+)
crit.yml: optimize for data consistency and critical applications (4C8G+)
default values: oltp.yml, but configure procedure will set this value to tiny.yml if current node is a tiny node.
You can have your own template, just put it under templates/<mode>.yml and set this value to the template name.
pg_max_conn
name: pg_max_conn, type: int, level: C
postgres max connections, You can specify a value between 50 and 5000, or use auto to use recommended value.
It’s not recommended to set this value greater than 5000, otherwise you have to increase the haproxy service connection limit manually as well.
Pgbouncer’s transaction pooling can alleviate the problem of too many OLTP connections, but it’s not recommended to use it in OLAP scenarios.
pg_shared_buffer_ratio
name: pg_shared_buffer_ratio, type: float, level: C
postgres shared buffer memory ratio, 0.25 by default, 0.1~0.4
default values: 0.25, means 25% of node memory will be used as PostgreSQL shard buffers.
Setting this value greater than 0.4 (40%) is usually not a good idea.
Note that shared buffer is only part of shared memory in PostgreSQL, to calculate the total shared memory, use show shared_memory_size_in_huge_pages;.
pg_rto
name: pg_rto, type: int, level: C
recovery time objective in seconds, This will be used as Patroni TTL value, 30s by default.
If a primary instance is missing for such a long time, a new leader election will be triggered.
Decrease the value can reduce the unavailable time (unable to write) of the cluster during failover,
but it will make the cluster more sensitive to network jitter, thus increase the chance of false-positive failover.
Config this according to your network condition and expectation to trade-off between chance and impact,
the default value is 30s, and it will be populated to the following patroni parameters:
# the TTL to acquire the leader lock (in seconds). Think of it as the length of time before initiation of the automatic failover process. Default value: 30ttl:{{pg_rto }}# the number of seconds the loop will sleep. Default value: 10 , this is patroni check loop intervalloop_wait:{{(pg_rto / 3)|round(0, 'ceil')|int }}# timeout for DCS and PostgreSQL operation retries (in seconds). DCS or network issues shorter than this will not cause Patroni to demote the leader. Default value: 10retry_timeout:{{(pg_rto / 3)|round(0, 'ceil')|int }}# the amount of time a primary is allowed to recover from failures before failover is triggered (in seconds), Max RTO: 2 loop wait + primary_start_timeoutprimary_start_timeout:{{(pg_rto / 3)|round(0, 'ceil')|int }}
pg_rpo
name: pg_rpo, type: int, level: C
recovery point objective in bytes, 1MiB at most by default
default values: 1048576, which will tolerate at most 1MiB data loss during failover.
when the primary is down and all replicas are lagged, you have to make a tough choice to trade off between Availability and Consistency:
Promote a replica to be the new primary and bring system back online ASAP, with the price of an acceptable data loss (e.g. less than 1MB).
Wait for the primary to come back (which may never be) or human intervention to avoid any data loss.
You can use crit.ymlconf template to ensure no data loss during failover, but it will sacrifice some performance.
pg_libs
name: pg_libs, type: string, level: C
shared preloaded libraries, pg_stat_statements,auto_explain by default.
They are two extensions that come with PostgreSQL, and it is strongly recommended to enable them.
For existing clusters, you can configure the shared_preload_libraries parameter of the cluster and apply it.
If you want to use TimescaleDB or Citus extensions, you need to add timescaledb or citus to this list. timescaledb and citus should be placed at the top of this list, for example:
citus,timescaledb,pg_stat_statements,auto_explain
Other extensions that need to be loaded can also be added to this list, such as pg_cron, pgml, etc.
Generally, citus and timescaledb have the highest priority and should be added to the top of the list.
enable data checksum for postgres cluster?, default value is false.
This parameter can only be set before PGSQL deployment. (but you can enable it manually later)
If pg_confcrit.yml template is used, data checksum is always enabled regardless of this parameter to ensure data integrity.
pg_pwd_enc
name: pg_pwd_enc, type: enum, level: C
passwords encryption algorithm: md5,scram-sha-256
default values: scram-sha-256, if you have compatibility issues with old clients, you can set it to md5 instead.
pg_encoding
name: pg_encoding, type: enum, level: C
database cluster encoding, UTF8 by default
pg_locale
name: pg_locale, type: enum, level: C
database cluster local, C by default
pg_lc_collate
name: pg_lc_collate, type: enum, level: C
database cluster collate, C by default, It’s not recommended to change this value unless you know what you are doing.
pg_lc_ctype
name: pg_lc_ctype, type: enum, level: C
database character type, en_US.UTF8 by default
pgbouncer_enabled
name: pgbouncer_enabled, type: bool, level: C
default value is true, if disabled, pgbouncer will not be launched on pgsql host
pgbouncer_port
name: pgbouncer_port, type: port, level: C
pgbouncer listen port, 6432 by default
pgbouncer_log_dir
name: pgbouncer_log_dir, type: path, level: C
pgbouncer log dir, /pg/log/pgbouncer by default, referenced by promtail the logging agent.
pgbouncer_auth_query
name: pgbouncer_auth_query, type: bool, level: C
query postgres to retrieve unlisted business users? default value is false
If enabled, pgbouncer user will be authenticated against postgres database with SELECT username, password FROM monitor.pgbouncer_auth($1), otherwise, only the users with pgbouncer: true will be allowed to connect to pgbouncer.
pgbouncer_poolmode
name: pgbouncer_poolmode, type: enum, level: C
Pgbouncer pooling mode: transaction, session, statement, transaction by default
session: Session-level pooling with the best compatibility.
transaction: Transaction-level pooling with better performance (lots of small conns), could break some session level features such as notify/listen, etc…
statements: Statement-level pooling which is used for simple read-only queries.
If you application has some compatibility issues with pgbouncer, you can try to change this value to session instead.
pgbouncer_sslmode
name: pgbouncer_sslmode, type: enum, level: C
pgbouncer client ssl mode, disable by default
default values: disable, beware that this may have a huge performance impact on your pgbouncer.
disable: Plain TCP. If client requests TLS, it’s ignored. Default.
allow: If client requests TLS, it is used. If not, plain TCP is used. If the client presents a client certificate, it is not validated.
prefer: Same as allow.
require: Client must use TLS. If not, the client connection is rejected. If the client presents a client certificate, it is not validated.
verify-ca: Client must use TLS with valid client certificate.
verify-full: Same as verify-ca.
PG_PROVISION
PG_BOOTSTRAP will bootstrap a new postgres cluster with patroni, while PG_PROVISION will create default objects in the cluster, including:
pg_provision:true# provision postgres cluster after bootstrappg_init:pg-init # provision init script for cluster template, `pg-init` by defaultpg_default_roles:# default roles and users in postgres cluster- {name: dbrole_readonly ,login: false ,comment:role for global read-only access }- {name: dbrole_offline ,login: false ,comment:role for restricted read-only access }- {name: dbrole_readwrite ,login: false ,roles: [dbrole_readonly] ,comment:role for global read-write access }- {name: dbrole_admin ,login: false ,roles: [pg_monitor, dbrole_readwrite] ,comment:role for object creation }- {name: postgres ,superuser: true ,comment:system superuser }- {name: replicator ,replication: true ,roles: [pg_monitor, dbrole_readonly] ,comment:system replicator }- {name: dbuser_dba ,superuser: true ,roles: [dbrole_admin] ,pgbouncer: true ,pool_mode: session, pool_connlimit: 16 ,comment:pgsql admin user }- {name: dbuser_monitor ,roles: [pg_monitor] ,pgbouncer: true ,parameters:{log_min_duration_statement: 1000 } ,pool_mode: session ,pool_connlimit: 8 ,comment:pgsql monitor user }pg_default_privileges:# default privileges when created by admin user- GRANT USAGE ON SCHEMAS TO dbrole_readonly- GRANT SELECT ON TABLES TO dbrole_readonly- GRANT SELECT ON SEQUENCES TO dbrole_readonly- GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly- GRANT USAGE ON SCHEMAS TO dbrole_offline- GRANT SELECT ON TABLES TO dbrole_offline- GRANT SELECT ON SEQUENCES TO dbrole_offline- GRANT EXECUTE ON FUNCTIONS TO dbrole_offline- GRANT INSERT ON TABLES TO dbrole_readwrite- GRANT UPDATE ON TABLES TO dbrole_readwrite- GRANT DELETE ON TABLES TO dbrole_readwrite- GRANT USAGE ON SEQUENCES TO dbrole_readwrite- GRANT UPDATE ON SEQUENCES TO dbrole_readwrite- GRANT TRUNCATE ON TABLES TO dbrole_admin- GRANT REFERENCES ON TABLES TO dbrole_admin- GRANT TRIGGER ON TABLES TO dbrole_admin- GRANT CREATE ON SCHEMAS TO dbrole_adminpg_default_schemas:[monitor ] # default schemas to be createdpg_default_extensions:# default extensions to be created- {name: adminpack ,schema:pg_catalog }- {name: pg_stat_statements ,schema:monitor }- {name: pgstattuple ,schema:monitor }- {name: pg_buffercache ,schema:monitor }- {name: pageinspect ,schema:monitor }- {name: pg_prewarm ,schema:monitor }- {name: pg_visibility ,schema:monitor }- {name: pg_freespacemap ,schema:monitor }- {name: postgres_fdw ,schema:public }- {name: file_fdw ,schema:public }- {name: btree_gist ,schema:public }- {name: btree_gin ,schema:public }- {name: pg_trgm ,schema:public }- {name: intagg ,schema:public }- {name: intarray ,schema:public }- {name:pg_repack }pg_reload:true# reload postgres after hba changespg_default_hba_rules:# postgres default host-based authentication rules- {user:'${dbsu}',db: all ,addr: local ,auth: ident ,title:'dbsu access via local os user ident'}- {user:'${dbsu}',db: replication ,addr: local ,auth: ident ,title:'dbsu replication from local os ident'}- {user:'${repl}',db: replication ,addr: localhost ,auth: pwd ,title:'replicator replication from localhost'}- {user:'${repl}',db: replication ,addr: intra ,auth: pwd ,title:'replicator replication from intranet'}- {user:'${repl}',db: postgres ,addr: intra ,auth: pwd ,title:'replicator postgres db from intranet'}- {user:'${monitor}',db: all ,addr: localhost ,auth: pwd ,title:'monitor from localhost with password'}- {user:'${monitor}',db: all ,addr: infra ,auth: pwd ,title:'monitor from infra host with password'}- {user:'${admin}',db: all ,addr: infra ,auth: ssl ,title:'admin @ infra nodes with pwd & ssl'}- {user:'${admin}',db: all ,addr: world ,auth: ssl ,title:'admin @ everywhere with ssl & pwd'}- {user: '+dbrole_readonly',db: all ,addr: localhost ,auth: pwd ,title:'pgbouncer read/write via local socket'}- {user: '+dbrole_readonly',db: all ,addr: intra ,auth: pwd ,title:'read/write biz user via password'}- {user: '+dbrole_offline' ,db: all ,addr: intra ,auth: pwd ,title:'allow etl offline tasks from intranet'}pgb_default_hba_rules:# pgbouncer default host-based authentication rules- {user:'${dbsu}',db: pgbouncer ,addr: local ,auth: peer ,title:'dbsu local admin access with os ident'}- {user: 'all' ,db: all ,addr: localhost ,auth: pwd ,title:'allow all user local access with pwd'}- {user:'${monitor}',db: pgbouncer ,addr: intra ,auth: pwd ,title:'monitor access via intranet with pwd'}- {user:'${monitor}',db: all ,addr: world ,auth: deny ,title:'reject all other monitor access addr'}- {user:'${admin}',db: all ,addr: intra ,auth: pwd ,title:'admin access via intranet with pwd'}- {user:'${admin}',db: all ,addr: world ,auth: deny ,title:'reject all other admin access addr'}- {user: 'all' ,db: all ,addr: intra ,auth: pwd ,title:'allow all user intra access with pwd'}
pg_provision
name: pg_provision, type: bool, level: C
provision postgres cluster after bootstrap, default value is true.
If disabled, postgres cluster will not be provisioned after bootstrap.
pg_default_privileges:# default privileges when created by admin user- GRANT USAGE ON SCHEMAS TO dbrole_readonly- GRANT SELECT ON TABLES TO dbrole_readonly- GRANT SELECT ON SEQUENCES TO dbrole_readonly- GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly- GRANT USAGE ON SCHEMAS TO dbrole_offline- GRANT SELECT ON TABLES TO dbrole_offline- GRANT SELECT ON SEQUENCES TO dbrole_offline- GRANT EXECUTE ON FUNCTIONS TO dbrole_offline- GRANT INSERT ON TABLES TO dbrole_readwrite- GRANT UPDATE ON TABLES TO dbrole_readwrite- GRANT DELETE ON TABLES TO dbrole_readwrite- GRANT USAGE ON SEQUENCES TO dbrole_readwrite- GRANT UPDATE ON SEQUENCES TO dbrole_readwrite- GRANT TRUNCATE ON TABLES TO dbrole_admin- GRANT REFERENCES ON TABLES TO dbrole_admin- GRANT TRIGGER ON TABLES TO dbrole_admin- GRANT CREATE ON SCHEMAS TO dbrole_admin
Pigsty has a built-in privileges base on default role system, check PGSQL Privileges for details.
postgres default host-based authentication rules, array of hba rule object.
default value provides a fair enough security level for common scenarios, check PGSQL Authentication for details.
pg_default_hba_rules:# postgres default host-based authentication rules- {user:'${dbsu}',db: all ,addr: local ,auth: ident ,title:'dbsu access via local os user ident'}- {user:'${dbsu}',db: replication ,addr: local ,auth: ident ,title:'dbsu replication from local os ident'}- {user:'${repl}',db: replication ,addr: localhost ,auth: pwd ,title:'replicator replication from localhost'}- {user:'${repl}',db: replication ,addr: intra ,auth: pwd ,title:'replicator replication from intranet'}- {user:'${repl}',db: postgres ,addr: intra ,auth: pwd ,title:'replicator postgres db from intranet'}- {user:'${monitor}',db: all ,addr: localhost ,auth: pwd ,title:'monitor from localhost with password'}- {user:'${monitor}',db: all ,addr: infra ,auth: pwd ,title:'monitor from infra host with password'}- {user:'${admin}',db: all ,addr: infra ,auth: ssl ,title:'admin @ infra nodes with pwd & ssl'}- {user:'${admin}',db: all ,addr: world ,auth: ssl ,title:'admin @ everywhere with ssl & pwd'}- {user: '+dbrole_readonly',db: all ,addr: localhost ,auth: pwd ,title:'pgbouncer read/write via local socket'}- {user: '+dbrole_readonly',db: all ,addr: intra ,auth: pwd ,title:'read/write biz user via password'}- {user: '+dbrole_offline' ,db: all ,addr: intra ,auth: pwd ,title:'allow etl offline tasks from intranet'}
pgbouncer default host-based authentication rules, array or hba rule object.
default value provides a fair enough security level for common scenarios, check PGSQL Authentication for details.
pgb_default_hba_rules:# pgbouncer default host-based authentication rules- {user:'${dbsu}',db: pgbouncer ,addr: local ,auth: peer ,title:'dbsu local admin access with os ident'}- {user: 'all' ,db: all ,addr: localhost ,auth: pwd ,title:'allow all user local access with pwd'}- {user:'${monitor}',db: pgbouncer ,addr: intra ,auth: pwd ,title:'monitor access via intranet with pwd'}- {user:'${monitor}',db: all ,addr: world ,auth: deny ,title:'reject all other monitor access addr'}- {user:'${admin}',db: all ,addr: intra ,auth: pwd ,title:'admin access via intranet with pwd'}- {user:'${admin}',db: all ,addr: world ,auth: deny ,title:'reject all other admin access addr'}- {user: 'all' ,db: all ,addr: intra ,auth: pwd ,title:'allow all user intra access with pwd'}
PG_BACKUP
This section defines variables for pgBackRest, which is used for PGSQL PITR (Point-In-Time-Recovery).
pgbackrest_enabled:true# enable pgbackrest on pgsql host?pgbackrest_clean:true# remove pg backup data during init?pgbackrest_log_dir:/pg/log/pgbackrest# pgbackrest log dir, `/pg/log/pgbackrest` by defaultpgbackrest_method: local # pgbackrest repo method:local,minio,[user-defined...]pgbackrest_repo: # pgbackrest repo:https://pgbackrest.org/configuration.html#section-repositorylocal:# default pgbackrest repo with local posix fspath:/pg/backup # local backup directory, `/pg/backup` by defaultretention_full_type:count # retention full backups by countretention_full:2# keep 2, at most 3 full backup when using local fs repominio:# optional minio repo for pgbackresttype:s3 # minio is s3-compatible, so s3 is useds3_endpoint:sss.pigsty # minio endpoint domain name, `sss.pigsty` by defaults3_region:us-east-1 # minio region, us-east-1 by default, useless for minios3_bucket:pgsql # minio bucket name, `pgsql` by defaults3_key:pgbackrest # minio user access key for pgbackrests3_key_secret:S3User.Backup # minio user secret key for pgbackrests3_uri_style:path # use path style uri for minio rather than host stylepath:/pgbackrest # minio backup path, default is `/pgbackrest`storage_port:9000# minio port, 9000 by defaultstorage_ca_file:/etc/pki/ca.crt # minio ca file path, `/etc/pki/ca.crt` by defaultbundle:y# bundle small files into a single filecipher_type:aes-256-cbc # enable AES encryption for remote backup repocipher_pass:pgBackRest # AES encryption password, default is 'pgBackRest'retention_full_type:time # retention full backup by time on minio reporetention_full:14# keep full backup for last 14 days
pgbackrest_enabled
name: pgbackrest_enabled, type: bool, level: C
enable pgBackRest on pgsql host? default value is true
pgbackrest_clean
name: pgbackrest_clean, type: bool, level: C
remove pg backup data during init? default value is true
pgbackrest_log_dir
name: pgbackrest_log_dir, type: path, level: C
pgBackRest log dir, /pg/log/pgbackrest by default, which is referenced by promtail the logging agent.
pgbackrest_method
name: pgbackrest_method, type: enum, level: C
pgBackRest repo method: local, minio, or other user-defined methods, local by default
This parameter is used to determine which repo to use for pgBackRest, all available repo methods are defined in pgbackrest_repo.
Pigsty will use local backup repo by default, which will create a backup repo on primary instance’s /pg/backup directory. The underlying storage is specified by pg_fs_bkup.
default value includes two repo methods: local and minio, which are defined as follows:
pgbackrest_repo: # pgbackrest repo:https://pgbackrest.org/configuration.html#section-repositorylocal:# default pgbackrest repo with local posix fspath:/pg/backup # local backup directory, `/pg/backup` by defaultretention_full_type:count # retention full backups by countretention_full:2# keep 2, at most 3 full backup when using local fs repominio:# optional minio repo for pgbackresttype:s3 # minio is s3-compatible, so s3 is useds3_endpoint:sss.pigsty # minio endpoint domain name, `sss.pigsty` by defaults3_region:us-east-1 # minio region, us-east-1 by default, useless for minios3_bucket:pgsql # minio bucket name, `pgsql` by defaults3_key:pgbackrest # minio user access key for pgbackrests3_key_secret:S3User.Backup # minio user secret key for pgbackrests3_uri_style:path # use path style uri for minio rather than host stylepath:/pgbackrest # minio backup path, default is `/pgbackrest`storage_port:9000# minio port, 9000 by defaultstorage_ca_file:/etc/pki/ca.crt # minio ca file path, `/etc/pki/ca.crt` by defaultbundle:y# bundle small files into a single filecipher_type:aes-256-cbc # enable AES encryption for remote backup repocipher_pass:pgBackRest # AES encryption password, default is 'pgBackRest'retention_full_type:time # retention full backup by time on minio reporetention_full:14# keep full backup for last 14 days
PG_SERVICE
This section is about exposing PostgreSQL service to outside world: including:
Exposing different PostgreSQL services on different ports with haproxy
Bind an optional L2 VIP to the primary instance with vip-manager
Register cluster/instance DNS records with to dnsmasq on infra nodes
pg_weight:100#INSTANCE # relative load balance weight in service, 100 by default, 0-255pg_default_service_dest:pgbouncer# default service destination if svc.dest='default'pg_default_services:# postgres default service definitions- {name: primary ,port: 5433 ,dest: default ,check: /primary ,selector:"[]"}- {name: replica ,port: 5434 ,dest: default ,check: /read-only ,selector:"[]", backup:"[? pg_role == `primary` || pg_role == `offline` ]"}- {name: default ,port: 5436 ,dest: postgres ,check: /primary ,selector:"[]"}- {name: offline ,port: 5438 ,dest: postgres ,check: /replica ,selector:"[? pg_role == `offline` || pg_offline_query ]", backup:"[? pg_role == `replica` && !pg_offline_query]"}pg_vip_enabled:false# enable a l2 vip for pgsql primary? false by defaultpg_vip_address:127.0.0.1/24 # vip address in `<ipv4>/<mask>` format, require if vip is enabledpg_vip_interface:eth0 # vip network interface to listen, eth0 by defaultpg_dns_suffix:''# pgsql dns suffix, '' by defaultpg_dns_target:auto # auto, primary, vip, none, or ad hoc ip
pg_weight
name: pg_weight, type: int, level: G
relative load balance weight in service, 100 by default, 0-255
default values: 100. you have to define it at instance vars, and reload-service to take effect.
default value is false, means no L2 VIP is created for this cluster.
L2 VIP can only be used in same L2 network, which may incurs extra restrictions on your network topology.
pg_vip_address
name: pg_vip_address, type: cidr4, level: C
vip address in <ipv4>/<mask> format, if vip is enabled, this parameter is required.
default values: 127.0.0.1/24. This value is consist of two parts: ipv4 and mask, separated by /.
pg_vip_interface
name: pg_vip_interface, type: string, level: C/I
vip network interface to listen, eth0 by default.
It should be the same primary intranet interface of your node, which is the IP address you used in the inventory file.
If your node have different interface, you can override it on instance vars:
pg-test:hosts:10.10.10.11:{pg_seq: 1, pg_role: replica ,pg_vip_interface:eth0 }10.10.10.12:{pg_seq: 2, pg_role: primary ,pg_vip_interface:eth1 }10.10.10.13:{pg_seq: 3, pg_role: replica ,pg_vip_interface:eth2 }vars:pg_vip_enabled:true# enable L2 VIP for this cluster, bind to primary instance by defaultpg_vip_address: 10.10.10.3/24 # the L2 network CIDR: 10.10.10.0/24, the vip address:10.10.10.3# pg_vip_interface: eth1 # if your node have non-uniform interface, you can define it here
pg_dns_suffix
name: pg_dns_suffix, type: string, level: C
pgsql dns suffix, ’’ by default, cluster DNS name is defined as {{ pg_cluster }}{{ pg_dns_suffix }}
For example, if you set pg_dns_suffix to .db.vip.company.tld for cluster pg-test, then the cluster DNS name will be pg-test.db.vip.company.tld
pg_dns_target
name: pg_dns_target, type: enum, level: C
Could be: auto, primary, vip, none, or an ad hoc ip address, which will be the target IP address of cluster DNS record.
default values: auto , which will bind to pg_vip_address if pg_vip_enabled, or fallback to cluster primary instance ip address.
vip: bind to pg_vip_address
primary: resolve to cluster primary instance ip address
auto: resolve to pg_vip_address if pg_vip_enabled, or fallback to cluster primary instance ip address.
none: do not bind to any ip address
<ipv4>: bind to the given IP address
PG_EXPORTER
pg_exporter_enabled:true# enable pg_exporter on pgsql hosts?pg_exporter_config:pg_exporter.yml # pg_exporter configuration file namepg_exporter_cache_ttls:'1,10,60,300'# pg_exporter collector ttl stage in seconds, '1,10,60,300' by defaultpg_exporter_port:9630# pg_exporter listen port, 9630 by defaultpg_exporter_params:'sslmode=disable'# extra url parameters for pg_exporter dsnpg_exporter_url:''# overwrite auto-generate pg dsn if specifiedpg_exporter_auto_discovery:true# enable auto database discovery? enabled by defaultpg_exporter_exclude_database:'template0,template1,postgres'# csv of database that WILL NOT be monitored during auto-discoverypg_exporter_include_database:''# csv of database that WILL BE monitored during auto-discoverypg_exporter_connect_timeout:200# pg_exporter connect timeout in ms, 200 by defaultpg_exporter_options:''# overwrite extra options for pg_exporterpgbouncer_exporter_enabled:true# enable pgbouncer_exporter on pgsql hosts?pgbouncer_exporter_port:9631# pgbouncer_exporter listen port, 9631 by defaultpgbouncer_exporter_url:''# overwrite auto-generate pgbouncer dsn if specifiedpgbouncer_exporter_options:''# overwrite extra options for pgbouncer_exporter
pg_exporter_enabled
name: pg_exporter_enabled, type: bool, level: C
enable pg_exporter on pgsql hosts?
default value is true, if you don’t want to install pg_exporter, set it to false.
pg_exporter_config
name: pg_exporter_config, type: string, level: C
pg_exporter configuration file name, used by pg_exporter & pgbouncer_exporter
default values: pg_exporter.yml, if you want to use a custom configuration file, you can specify its relative path here.
Your config file should be placed in files/<filename>.yml. For example, if you want to monitor a remote PolarDB instance, you can use the sample config: files/polar_exporter.yml.
pg_exporter_cache_ttls
name: pg_exporter_cache_ttls, type: string, level: C
pg_exporter collector ttl stage in seconds, ‘1,10,60,300’ by default
default values: 1,10,60,300, which will use 1s, 10s, 60s, 300s for different metric collectors.
Pigsty’s self-signed CA is located on files/pki/ directory under pigsty home.
YOU HAVE TO SECURE THE CA KEY PROPERLY: files/pki/ca/ca.key,
which is generated by the ca role during install.yml or infra.yml.
# pigsty/files/pki# ^-----@ca # self-signed CA key & cert# ^-----@ca.key # VERY IMPORTANT: keep it secret# ^-----@ca.crt # VERY IMPORTANT: trusted everywhere# ^-----@csr # signing request csr# ^-----@misc # misc certs, issued certs# ^-----@etcd # etcd server certs# ^-----@minio # minio server certs# ^-----@nginx # nginx SSL certs# ^-----@infra # infra client certs# ^-----@pgsql # pgsql server certs# ^-----@mongo # mongodb/ferretdb server certs# ^-----@mysql # mysql server certs
The managed nodes will have the following files installed:
/etc/pki/ca.crt # all nodes
/etc/pki/ca-trust/source/anchors/ca.crt # soft link and trusted anchor
All infra nodes will have the following certs:
/etc/pki/infra.crt # infra nodes cert
/etc/pki/infra.key # infra nodes key
In case of admin node failure, you have to keep files/pki and pigsty.yml safe.
You can rsync them to another admin node to make a backup admin node.
# run on meta-1, rsync to meta2cd ~/pigsty;rsync -avz ./ meta-2:~/pigsty
NODE FHS
Node main data dir is specified by node_data parameter, which is /data by default.
The data dir is owned by root with mode 0777. All modules’ local data will be stored under this directory by default.
/data
# ^-----@postgres # postgres main data dir# ^-----@backups # postgres backup data dir (if no dedicated backup disk)# ^-----@redis # redis data dir (shared by multiple redis instances)# ^-----@minio # minio data dir (default when in single node single disk mode)# ^-----@etcd # etcd main data dir# ^-----@prometheus # prometheus time series data dir# ^-----@loki # Loki data dir for logs# ^-----@docker # Docker data dir# ^-----@... # other modules
Prometheus FHS
The prometheus bin / rules are located on files/prometheus/ directory under pigsty home.
# real dirs{{ pg_fs_main }} /data # top level data directory, usually a SSD mountpoint{{ pg_dir_main }} /data/postgres # contains postgres data{{ pg_cluster_dir }} /data/postgres/pg-test-15 # contains cluster `pg-test` data (of version 15) /data/postgres/pg-test-15/bin # bin scripts /data/postgres/pg-test-15/log # logs: postgres/pgbouncer/patroni/pgbackrest /data/postgres/pg-test-15/tmp # tmp, sql files, rendered results /data/postgres/pg-test-15/cert # postgres server certificates /data/postgres/pg-test-15/conf # patroni config, links to related config /data/postgres/pg-test-15/data # main data directory /data/postgres/pg-test-15/meta # identity information /data/postgres/pg-test-15/stat # stats information, summary, log report /data/postgres/pg-test-15/change # changing records /data/postgres/pg-test-15/backup # soft link to backup dir{{ pg_fs_bkup }} /data/backups # could be a cheap & large HDD mountpoint /data/backups/postgres/pg-test-15/backup # local backup repo path# soft links/pg -> /data/postgres/pg-test-15 # pg root link/pg/data -> /data/postgres/pg-test-15/data # real data dir/pg/backup -> /var/backups/postgres/pg-test-15/backup # base backup
Binary FHS
On EL releases, the default path for PostgreSQL binaries is:
/usr/pgsql-${pg_version}/
Pigsty will create a softlink /usr/pgsql to the currently installed version specified by pg_version.
/usr/pgsql -> /usr/pgsql-15
Therefore, the default pg_bin_dir will be /usr/pgsql/bin/, and this path is added to the PATH environment via /etc/profile.d/pgsql.sh.
Both Alibaba Cloud RDS and AWS RDS are proprietary cloud database services, offered only on the public cloud through a leasing model. The following comparison is based on the latest PostgreSQL 16 main branch version, with the comparison cut-off date being February 2024.
Features
Item
Pigsty
Aliyun RDS
AWS RDS
Major Version
12 - 17
12 - 17
12 - 17
Read on Standby
Of course
Not Readable
Not Readable
Separate R & W
By Port
Paid Proxy
Paid Proxy
Offline Instance
Yes
Not Available
Not Available
Standby Cluster
Yes
Multi-AZ
Multi-AZ
Delayed Instance
Yes
Not Available
Not Available
Load Balancer
HAProxy / LVS
Paid ELB
Paid ELB
Connection Pooling
Pgbouncer
Paid Proxy
Paid RDS Proxy
High Availability
Patroni / etcd
HA Version Only
HA Version Only
Point-in-Time Recovery
pgBackRest / MinIO
Yes
Yes
Monitoring Metrics
Prometheus / Exporter
About 9 Metrics
About 99 Metrics
Logging Collector
Loki / Promtail
Yes
Yes
Dashboards
Grafana / Echarts
Basic Support
Basic Support
Alerts
AlterManager
Basic Support
Basic Support
Extensions
Here are some important extensions in the PostgreSQL ecosystem. The comparison is base on PostgreSQL 16 and complete on 2024-02-29:
Experience shows that the per-unit cost of hardware and software resources for RDS is 5 to 15 times that of self-built solutions, with the rent-to-own ratio typically being one month. For more details, please refer to Cost Analysis.
RDS / DBA Cost reference to help you evaluate the costs of self-hosting database
Cost Reference
EC2
vCPU-Month
RDS
vCPU-Month
DHH’s self-hosted core-month price (192C 384G)
25.32
Junior open-source DBA reference salary
15K/person-month
IDC self-hosted data center (exclusive physical machine: 64C384G)
19.53
Intermediate open-source DBA reference salary
30K/person-month
IDC self-hosted data center (container, oversold 500%)
7
Senior open-source DBA reference salary
60K/person-month
UCloud Elastic Virtual Machine (8C16G, oversold)
25
ORACLE database license
10000
Alibaba Cloud Elastic Server 2x memory (exclusive without overselling)
107
Alibaba Cloud RDS PG 2x memory (exclusive)
260
Alibaba Cloud Elastic Server 4x memory (exclusive without overselling)
138
Alibaba Cloud RDS PG 4x memory (exclusive)
320
Alibaba Cloud Elastic Server 8x memory (exclusive without overselling)
180
Alibaba Cloud RDS PG 8x memory (exclusive)
410
AWS C5D.METAL 96C 200G (monthly without upfront)
100
AWS RDS PostgreSQL db.T2 (2x)
440
For instance, using RDS for PostgreSQL on AWS, the price for a 64C / 256GB db.m5.16xlarge RDS for one month is $25,817, which is equivalent to about 180,000 yuan per month. The monthly rent is enough for you to buy two servers with even better performance and set them up on your own. The rent-to-buy ratio doesn’t even last a month; renting for just over ten days is enough to buy the whole server for yourself.
Payment Model
Price
Cost Per Year (¥10k)
Self-hosted IDC (Single Physical Server)
¥75k / 5 years
1.5
Self-hosted IDC (2-3 Server HA Cluster)
¥150k / 5 years
3.0 ~ 4.5
Alibaba Cloud RDS (On-demand)
¥87.36/hour
76.5
Alibaba Cloud RDS (Monthly)
¥42k / month
50
Alibaba Cloud RDS (Yearly, 15% off)
¥425,095 / year
42.5
Alibaba Cloud RDS (3-year, 50% off)
¥750,168 / 3 years
25
AWS (On-demand)
$25,817 / month
217
AWS (1-year, no upfront)
$22,827 / month
191.7
AWS (3-year, full upfront)
$120k + $17.5k/month
175
AWS China/Ningxia (On-demand)
¥197,489 / month
237
AWS China/Ningxia (1-year, no upfront)
¥143,176 / month
171
AWS China/Ningxia (3-year, full upfront)
¥647k + ¥116k/month
160.6
Comparing the costs of self-hosting versus using a cloud database:
Create a standby cluster of an existing PostgreSQL cluster.
Create a delayed cluster of another pgsql cluster?
Monitoring an existing postgres instance?
Migration from an external PostgreSQL with logical replication?
Use MinIO as a central pgBackRest repo.
Use dedicate etcd cluster for DCS?
Use dedicated haproxy for exposing PostgreSQL service.
Deploy a multi-node MinIO cluster?
Use CMDB instead of Config as inventory.
Use PostgreSQL as grafana backend storage ?
Use PostgreSQL as prometheus backend storage ?
6.1 - Architecture
PostgreSQL cluster architectures and implmenetation details.
Component Overview
Here is how PostgreSQL module components and their interactions. From top to bottom:
Cluster DNS is resolved by DNSMASQ on infra nodes
Cluster VIP is manged by vip-manager, which will bind to cluster primary.
vip-manager will acquire cluster leader info written by patroni from etcd cluster directly
Cluster services are exposed by Haproxy on nodes, services are distinguished by node ports (543x).
Haproxy port 9101: monitoring metrics & stats & admin page
Haproxy port 5433: default service that routes to primary pgbouncer: primary
Haproxy port 5434: default service that routes to replica pgbouncer: replica
Haproxy port 5436: default service that routes to primary postgres: default
Haproxy port 5438: default service that routeroutesto offline postgres: offline
HAProxy will route traffic based on health check information provided by patroni.
Pgbouncer is a connection pool middleware that buffers connections, exposes extra metrics, and brings extra flexibility @ port 6432
Pgbouncer is stateless and deployed with the Postgres server in a 1:1 manner through a local unix socket.
Production traffic (Primary/Replica) will go through pgbouncer by default (can be skipped by pg_default_service_dest )
Default/Offline service will always bypass pgbouncer and connect to target Postgres directly.
Postgres provides relational database services @ port 5432
Install PGSQL module on multiple nodes will automatically form a HA cluster based on streaming replication
PostgreSQL is supervised by patroni by default.
Patroni will supervise PostgreSQL server @ port 8008 by default
Patroni spawn postgres servers as the child process
Patroni uses etcd as DCS: config storage, failure detection, and leader election.
Patroni will provide Postgres information through a health check. Which is used by HAProxy
Patroni metrics will be scraped by prometheus on infra nodes
PG Exporter will expose postgres metrics @ port 9630
PostgreSQL’s metrics will be scraped by prometheus on infra nodes
Pgbouncer Exporter will expose pgbouncer metrics @ port 9631
Pgbouncer’s metrics will be scraped by prometheus on infra nodes
pgBackRest will work on the local repo by default (pgbackrest_method)
If local (default) is used as the backup repo, pgBackRest will create local repo under the primary’s pg_fs_bkup
If minio is used as the backup repo, pgBackRest will create the repo on the dedicated MinIO cluster in pgbackrest_repo.minio
Postgres-related logs (postgres,pgbouncer,patroni,pgbackrest) are exposed by promtail @ port 9080
Promtail will send logs to Loki on infra nodes
6.2 - Users
Define business users & roles in PostgreSQL, which is the object created by SQL CREATE USER/ROLE
In this context, the User refers to objects created by SQL CREATE USER/ROLE.
Define User
There are two parameters related to users:
pg_users : Define business users & roles at cluster level
pg_default_roles : Define system-wide roles & global users at global level
They are both arrays of user/role definition. You can define multiple users/roles in one cluster.
pg-meta:hosts:{10.10.10.10:{pg_seq: 1, pg_role:primary } }vars:pg_cluster:pg-metapg_databases:- {name: dbuser_meta ,password: DBUser.Meta ,pgbouncer: true ,roles: [dbrole_admin] ,comment:pigsty admin user }- {name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly] ,comment:read-only viewer for meta database }- {name: dbuser_grafana ,password: DBUser.Grafana ,pgbouncer: true ,roles: [dbrole_admin] ,comment:admin user for grafana database }- {name: dbuser_bytebase ,password: DBUser.Bytebase ,pgbouncer: true ,roles: [dbrole_admin] ,comment:admin user for bytebase database }- {name: dbuser_kong ,password: DBUser.Kong ,pgbouncer: true ,roles: [dbrole_admin] ,comment:admin user for kong api gateway }- {name: dbuser_gitea ,password: DBUser.Gitea ,pgbouncer: true ,roles: [dbrole_admin] ,comment:admin user for gitea service }- {name: dbuser_wiki ,password: DBUser.Wiki ,pgbouncer: true ,roles: [dbrole_admin] ,comment:admin user for wiki.js service }- {name: dbuser_noco ,password: DBUser.Noco ,pgbouncer: true ,roles: [dbrole_admin] ,comment:admin user for nocodb service }
And each user definition may look like:
- name:dbuser_meta # REQUIRED, `name` is the only mandatory field of a user definitionpassword:DBUser.Meta # optional, password, can be a scram-sha-256 hash string or plain textlogin:true# optional, can log in, true by default (new biz ROLE should be false)superuser:false# optional, is superuser? false by defaultcreatedb:false# optional, can create database? false by defaultcreaterole:false# optional, can create role? false by defaultinherit:true# optional, can this role use inherited privileges? true by defaultreplication:false# optional, can this role do replication? false by defaultbypassrls:false# optional, can this role bypass row level security? false by defaultpgbouncer:true# optional, add this user to pgbouncer user-list? false by default (production user should be true explicitly)connlimit:-1# optional, user connection limit, default -1 disable limitexpire_in:3650# optional, now + n days when this role is expired (OVERWRITE expire_at)expire_at:'2030-12-31'# optional, YYYY-MM-DD 'timestamp' when this role is expired (OVERWRITTEN by expire_in)comment:pigsty admin user # optional, comment string for this user/roleroles: [dbrole_admin] # optional, belonged roles. default roles are:dbrole_{admin,readonly,readwrite,offline}parameters:{}# optional, role level parameters with `ALTER ROLE SET`pool_mode:transaction # optional, pgbouncer pool mode at user level, transaction by defaultpool_connlimit:-1# optional, max database connections at user level, default -1 disable limitsearch_path: public # key value config parameters according to postgresql documentation (e.g:use pigsty as default search_path)
The only required field is name, which should be a valid & unique username in PostgreSQL.
Roles don’t need a password, while it could be necessary for a login-able user.
The password can be plain text or a scram-sha-256 / md5 hash string.
User/Role are created one by one in array order. So make sure role/group definition is ahead of its members
login, superuser, createdb, createrole, inherit, replication, bypassrls are boolean flags
pgbouncer is disabled by default. To add a business user to the pgbouncer user-list, you should set it to true explicitly.
ACL System
Pigsty has a battery-included ACL system, which can be easily used by assigning roles to users:
dbrole_readonly : The role for global read-only access
dbrole_readwrite : The role for global read-write access
dbrole_admin : The role for object creation
dbrole_offline : The role for restricted read-only access (offline instance)
If you wish to re-design your ACL system, check the following parameters & templates.
The playbook is idempotent, so it’s ok to run this multiple times on the existing cluster.
Create User with Playbook!
If you are using the default pgbouncer, You MUST create new users with bin/pgsql-user util, or pgsql-user.yml playbook,
The playbook will add and configure database user to the pgbouncer userlist for your.
Pgbouncer User
Pgbouncer is enabled by default and serves as a connection pool middleware, and its user is managed by default.
Pigsty will add all users in pg_users with pgbouncer: true flag to the pgbouncer userlist by default.
The user is listed in /etc/pgbouncer/userlist.txt:
Each database definition is a dict with the following fields:
- name:meta # REQUIRED, `name` is the only mandatory field of a database definitionbaseline:cmdb.sql # optional, database sql baseline path, (relative path among ansible search path, e.g files/)pgbouncer:true# optional, add this database to pgbouncer database list? true by defaultschemas:[pigsty] # optional, additional schemas to be created, array of schema namesextensions: # optional, additional extensions to be installed:array of `{name[,schema]}`- {name: postgis , schema:public }- {name:timescaledb }comment:pigsty meta database # optional, comment string for this databaseowner:postgres # optional, database owner, postgres by defaulttemplate:template1 # optional, which template to use, template1 by defaultencoding:UTF8 # optional, database encoding, UTF8 by default. (MUST same as template database)locale:C # optional, database locale, C by default. (MUST same as template database)lc_collate:C # optional, database collate, C by default. (MUST same as template database)lc_ctype:C # optional, database ctype, C by default. (MUST same as template database)tablespace:pg_default # optional, default tablespace, 'pg_default' by default.allowconn:true# optional, allow connection, true by default. false will disable connect at allrevokeconn:false# optional, revoke public connection privilege. false by default. (leave connect with grant option to owner)register_datasource:true# optional, register this database to grafana datasources? true by defaultconnlimit:-1# optional, database connection limit, default -1 disable limitpool_auth_user:dbuser_meta # optional, all connection to this pgbouncer database will be authenticated by this userpool_mode:transaction # optional, pgbouncer pool mode at database level, default transactionpool_size:64# optional, pgbouncer pool size at database level, default 64pool_size_reserve:32# optional, pgbouncer pool size reserve at database level, default 32pool_size_min:0# optional, pgbouncer pool size min at database level, default 0pool_max_db_conn:100# optional, max database connections at database level, default 100
The only required field is name, which should be a valid and unique database name in PostgreSQL.
Newly created databases are forked from template1 database by default. which is customized by PG_PROVISION during cluster bootstrap.
It’s usually not a good idea to execute this on the existing database again when a baseline script is used.
If you are using the default pgbouncer as the proxy middleware, YOU MUST create the new database with pgsql-db util or pgsql-db.yml playbook. Otherwise, the new database will not be added to the pgbouncer database list.
Remember, if your database definition has a non-trivial owner (dbsu postgres by default ), make sure the owner user exists.
That is to say, always create the user before the database.
Pgbouncer Database
Pgbouncer is enabled by default and serves as a connection pool middleware.
Pigsty will add all databases in pg_databases to the pgbouncer database list by default.
You can disable the pgbouncer proxy for a specific database by setting pgbouncer: false in the database definition.
The database is listed in /etc/pgbouncer/database.txt, with extra database-level parameters such as:
The Pgbouncer database list will be updated when create database with Pigsty util & playbook.
To access pgbouncer administration functionality, you can use the pgb alias as dbsu.
There’s a util function defined in /etc/profile.d/pg-alias.sh, allowing you to reroute pgbouncer database traffic to a new host quickly, which can be used during zero-downtime migration.
# route pgbouncer traffic to another cluster memberfunction pgb-route(){localip=${1-'\/var\/run\/postgresql'} sed -ie "s/host=[^[:space:]]\+/host=${ip}/g" /etc/pgbouncer/pgbouncer.ini
cat /etc/pgbouncer/pgbouncer.ini
}
6.4 - Services
Define and create new services, and expose them via haproxy
Service Implementation
In Pigsty, services are implemented using haproxy on nodes, differentiated by different ports on the host node.
Every node has Haproxy enabled to expose services. From the database perspective, nodes in the cluster may be primary or replicas, but from the service perspective, all nodes are the same. This means even if you access a replica node, as long as you use the correct service port, you can still use the primary’s read-write service. This design seals the complexity: as long as you can access any instance on the PostgreSQL cluster, you can fully access all services.
This design is akin to the NodePort service in Kubernetes. Similarly, in Pigsty, every service includes these two core elements:
Access endpoints exposed via NodePort (port number, from where to access?)
Target instances chosen through Selectors (list of instances, who will handle it?)
The boundary of Pigsty’s service delivery stops at the cluster’s HAProxy. Users can access these load balancers in various ways. Please refer to Access Service.
All services are declared through configuration files. For instance, the default PostgreSQL service is defined by the pg_default_services parameter:
While you can define your extra PostgreSQL services with pg_services @ the global or cluster level.
These two parameters are both arrays of service objects. Each service definition will be rendered as a haproxy config in /etc/haproxy/<svcname>.cfg, check service.j2 for details.
Here is an example of an extra service definition: standby
- name: standby # required, service name, the actual svc name will be prefixed with `pg_cluster`, e.g:pg-meta-standbyport:5435# required, service exposed port (work as kubernetes service node port mode)ip:"*"# optional, service bind ip address, `*` for all ip by defaultselector:"[]"# required, service member selector, use JMESPath to filter inventorydest:default # optional, destination port, default|postgres|pgbouncer|<port_number>, 'default' by defaultcheck:/sync # optional, health check url path, / by defaultbackup:"[? pg_role == `primary`]"# backup server selectormaxconn:3000# optional, max allowed front-end connectionbalance: roundrobin # optional, haproxy load balance algorithm (roundrobin by default, other:leastconn)options:'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
And it will be translated to a haproxy config file /etc/haproxy/pg-test-standby.conf:
#---------------------------------------------------------------------# service: pg-test-standby @ 10.10.10.11:5435#---------------------------------------------------------------------# service instances 10.10.10.11, 10.10.10.13, 10.10.10.12# service backups 10.10.10.11listen pg-test-standbybind *:5435mode tcpmaxconn 5000balance roundrobinoption httpchkoption http-keep-alivehttp-check send meth OPTIONS uri /sync # <--- true for primary & sync standbyhttp-check expect status 200default-server inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100# serversserver pg-test-1 10.10.10.11:6432 check port 8008 weight 100 backup # the primary is used as backup serverserver pg-test-3 10.10.10.13:6432 check port 8008 weight 100server pg-test-2 10.10.10.12:6432 check port 8008 weight 100
Reload Service
When cluster membership has changed, such as append / remove replicas, switchover/failover, or adjust relative weight,
You have to reload service to make the changes take effect.
bin/pgsql-svc <cls> [ip...]# reload service for lb cluster or lb instance# ./pgsql.yml -t pg_service # the actual ansible task to reload service
Override Service
You can override default service configuration with several ways:
Bypass Pgbouncer
When defining a service, if svc.dest='default', this parameter pg_default_service_dest will be used as the default value.
pgbouncer is used by default, you can use postgres instead, so the default primary & replica service will bypass pgbouncer and route traffic to postgres directly
If you don’t need connection pooling at all, you can change pg_default_service_dest to postgres, and remove default and offline services.
If you don’t need read-only replicas for online traffic, you can remove replica from pg_default_services too.
For example, this configuration will expose pg cluster primary service on haproxy node group proxy with port 10013.
pg_service_provider:proxy # use load balancer on group `proxy` with port 10013pg_default_services:[{name: primary ,port: 10013 ,dest: postgres ,check: /primary ,selector:"[]"}]
It’s user’s responsibility to make sure each delegate service port is unique among the proxy cluster.
6.5 - Extensions
Define, Create, Install, Enable Extensions in Pigsty
Extensions are the soul of PostgreSQL, and Pigsty deeply integrates the core extension plugins of the PostgreSQL ecosystem, providing you with battery-included distributed temporal, geospatial text, graph, and vector database capabilities! Check extension list for details.
Pigsty includes over 340 PostgreSQL extension plugins and has compiled, packaged, integrated, and maintained many extensions not included in the official PGDG source.
It also ensures through thorough testing that all these plugins can work together seamlessly. Including some potent extensions:
PostGIS: Add geospatial data support to PostgreSQL
TimescaleDB: Add time-series/continuous-aggregation support to PostgreSQL
PGVector: AI vector/embedding data type support, and ivfflat / hnsw index access method
Citus: Turn a standalone primary-replica postgres cluster into a horizontally scalable distributed cluster
Apache AGE: Add OpenCypher graph query language support to PostgreSQL, works like Neo4J
PG GraphQL: Add GraphQL language support to PostgreSQL
zhparser : Add Chinese word segmentation support to PostgreSQL, works like ElasticSearch
Supabase: Open-Source Firebase alternative based on PostgreSQL
FerretDB: Open-Source MongoDB alternative based on PostgreSQL
PostgresML: Use machine learning algorithms and pretrained models with SQL
ParadeDB: Open-Source ElasticSearch Alternative (based on PostgreSQL)
Plugins are already included and placed in the yum repo of the infra nodes, which can be directly enabled through PGSQL Cluster Config. Pigsty also introduces a complete compilation environment and infrastructure, allowing you to compile extensions not included in Pigsty & PGDG.
Some “database” are not actual PostgreSQL extensions, but also supported by pigsty, such as:
Supabase: Open-Source Firebase Alternative (based on PostgreSQL)
FerretDB: Open-Source MongoDB Alternative (based on PostgreSQL)
NocoDB: Open-Source Airtable Alternative (based on PostgreSQL)
DuckDB: Open-Source Analytical SQLite Alternative (PostgreSQL Compatible)
Install Extension
When you init a PostgreSQL cluster, the extensions listed in pg_packages & pg_extensions will be installed.
For default EL systems, the default values of pg_packages and pg_extensions are defined as follows:
pg_packages: # these extensions are always installed by default :pg_repack, wal2json, passwordcheck_cracklib- pg_repack_$v* wal2json_$v* passwordcheck_cracklib_$v*# important extensionspg_extensions:# install postgis, timescaledb, pgvector by default- postgis34_$v* timescaledb-2-postgresql-$v* pgvector_$v*
For ubuntu / debian, package names are different, and passwordcheck_cracklib is not available.
pg_packages: # these extensions are always installed by default :pg_repack, wal2json- postgresql-$v-repack postgresql-$v-wal2jsonpg_extensions:# these extensions are installed by default:- postgresql-$v-postgis* timescaledb-2-postgresql-$v postgresql-$v-pgvector postgresql-$v-citus-12.1
Here, $v is a placeholder that will be replaced with the actual major version number pg_version of that PostgreSQL cluster
Therefore, the default configuration will install these extensions:
pg_repack: Extension for online table bloat processing.
wal2json: Extracts changes in JSON format through logical decoding.
To install all available extensions in one pass, you can just specify pg_extensions: ['*$v*'], which is really a bold move.
Install Manually
After the PostgreSQL cluster is inited, you can manually install plugins via Ansible or Shell commands. For example, if you want to enable a specific extension on a cluster that has already been initialized:
cd ~/pigsty;# enter pigsty home dir and install the apache age extension for the pg-test clusteransible pg-test -m yum -b -a 'name=apache-age_16*'# The extension name usually has a suffix like `_<pgmajorversion>`
Most plugins are already included in the yum repository on the infrastructure node and can be installed directly using the yum command. If not included, you can consider downloading from the PGDG upstream source using the repotrack / apt download command or compiling source code into RPMs for distribution.
After the extension installation, you should be able to see them in the pg_available_extensions view of the target database cluster. Next, execute in the database where you want to install the extension:
CREATEEXTENSIONage;-- install the graph database extension
6.6 - Authentication
Host-Based Authentication in Pigsty, how to manage HBA rules in Pigsty?
Host-Based Authentication in Pigsty
PostgreSQL has various authentication methods. You can use all of them, while pigsty’s battery-include ACL system focuses on HBA, password, and SSL authentication.
Client Authentication
To connect to a PostgreSQL database, the user has to be authenticated (with a password by default).
You can provide the password in the connection string (not secure) or use the PGPASSWORD env or .pgpass file. Check psql docs and PostgreSQL connection string for more details.
Typically, global HBA is defined in all.vars. If you want to modify the global default HBA rules, you can copy from the full.yml template to all.vars for modification.
Here are some examples of cluster HBA rule definitions.
pg-meta:hosts:{10.10.10.10:{pg_seq: 1, pg_role:primary } }vars:pg_cluster:pg-metapg_hba_rules:- {user: dbuser_view ,db: all ,addr: infra ,auth: pwd ,title:'allow grafana dashboard access cmdb from infra nodes'}- {user: all ,db: all ,addr: 100.0.0.0/8 ,auth: pwd ,title:'all user access all db from kubernetes cluster'}- {user:'${admin}',db: world ,addr: 0.0.0.0/0 ,auth: cert ,title:'all admin world access with client cert'}
Reload HBA
To reload postgres/pgbouncer hba rules:
bin/pgsql-hba <cls> # reload hba rules of cluster `<cls>`bin/pgsql-hba <cls> ip1 ip2... # reload hba rules of specific instances
Pigsty has a default set of HBA rules, which is pretty secure for most cases.
The rules are self-explained in alias form.
pg_default_hba_rules:# postgres default host-based authentication rules- {user:'${dbsu}',db: all ,addr: local ,auth: ident ,title:'dbsu access via local os user ident'}- {user:'${dbsu}',db: replication ,addr: local ,auth: ident ,title:'dbsu replication from local os ident'}- {user:'${repl}',db: replication ,addr: localhost ,auth: pwd ,title:'replicator replication from localhost'}- {user:'${repl}',db: replication ,addr: intra ,auth: pwd ,title:'replicator replication from intranet'}- {user:'${repl}',db: postgres ,addr: intra ,auth: pwd ,title:'replicator postgres db from intranet'}- {user:'${monitor}',db: all ,addr: localhost ,auth: pwd ,title:'monitor from localhost with password'}- {user:'${monitor}',db: all ,addr: infra ,auth: pwd ,title:'monitor from infra host with password'}- {user:'${admin}',db: all ,addr: infra ,auth: ssl ,title:'admin @ infra nodes with pwd & ssl'}- {user:'${admin}',db: all ,addr: world ,auth: ssl ,title:'admin @ everywhere with ssl & pwd'}- {user: '+dbrole_readonly',db: all ,addr: localhost ,auth: pwd ,title:'pgbouncer read/write via local socket'}- {user: '+dbrole_readonly',db: all ,addr: intra ,auth: pwd ,title:'read/write biz user via password'}- {user: '+dbrole_offline' ,db: all ,addr: intra ,auth: pwd ,title:'allow etl offline tasks from intranet'}pgb_default_hba_rules:# pgbouncer default host-based authentication rules- {user:'${dbsu}',db: pgbouncer ,addr: local ,auth: peer ,title:'dbsu local admin access with os ident'}- {user: 'all' ,db: all ,addr: localhost ,auth: pwd ,title:'allow all user local access with pwd'}- {user:'${monitor}',db: pgbouncer ,addr: intra ,auth: pwd ,title:'monitor access via intranet with pwd'}- {user:'${monitor}',db: all ,addr: world ,auth: deny ,title:'reject all other monitor access addr'}- {user:'${admin}',db: all ,addr: intra ,auth: pwd ,title:'admin access via intranet with pwd'}- {user:'${admin}',db: all ,addr: world ,auth: deny ,title:'reject all other admin access addr'}- {user: 'all' ,db: all ,addr: intra ,auth: pwd ,title:'allow all user intra access with pwd'}
Example: Rendered pg_hba.conf
#==============================================================## File : pg_hba.conf# Desc : Postgres HBA Rules for pg-meta-1 [primary]# Time : 2023-01-11 15:19# Host : pg-meta-1 @ 10.10.10.10:5432# Path : /pg/data/pg_hba.conf# Note : ANSIBLE MANAGED, DO NOT CHANGE!# Author : Ruohang Feng (rh@vonng.com)# License : AGPLv3#==============================================================## addr alias# local : /var/run/postgresql# admin : 10.10.10.10# infra : 10.10.10.10# intra : 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16# user alias# dbsu : postgres# repl : replicator# monitor : dbuser_monitor# admin : dbuser_dba# dbsu access via local os user ident [default]local all postgres ident# dbsu replication from local os ident [default]local replication postgres ident# replicator replication from localhost [default]local replication replicator scram-sha-256host replication replicator 127.0.0.1/32 scram-sha-256# replicator replication from intranet [default]host replication replicator 10.0.0.0/8 scram-sha-256host replication replicator 172.16.0.0/12 scram-sha-256host replication replicator 192.168.0.0/16 scram-sha-256# replicator postgres db from intranet [default]host postgres replicator 10.0.0.0/8 scram-sha-256host postgres replicator 172.16.0.0/12 scram-sha-256host postgres replicator 192.168.0.0/16 scram-sha-256# monitor from localhost with password [default]local all dbuser_monitor scram-sha-256host all dbuser_monitor 127.0.0.1/32 scram-sha-256# monitor from infra host with password [default]host all dbuser_monitor 10.10.10.10/32 scram-sha-256# admin @ infra nodes with pwd & ssl [default]hostssl all dbuser_dba 10.10.10.10/32 scram-sha-256# admin @ everywhere with ssl & pwd [default]hostssl all dbuser_dba 0.0.0.0/0 scram-sha-256# pgbouncer read/write via local socket [default]local all +dbrole_readonly scram-sha-256host all +dbrole_readonly 127.0.0.1/32 scram-sha-256# read/write biz user via password [default]host all +dbrole_readonly 10.0.0.0/8 scram-sha-256host all +dbrole_readonly 172.16.0.0/12 scram-sha-256host all +dbrole_readonly 192.168.0.0/16 scram-sha-256# allow etl offline tasks from intranet [default]host all +dbrole_offline 10.0.0.0/8 scram-sha-256host all +dbrole_offline 172.16.0.0/12 scram-sha-256host all +dbrole_offline 192.168.0.0/16 scram-sha-256# allow application database intranet access [common] [DISABLED]#host kong dbuser_kong 10.0.0.0/8 md5#host bytebase dbuser_bytebase 10.0.0.0/8 md5#host grafana dbuser_grafana 10.0.0.0/8 md5
Example: Rendered pgb_hba.conf
#==============================================================## File : pgb_hba.conf# Desc : Pgbouncer HBA Rules for pg-meta-1 [primary]# Time : 2023-01-11 15:28# Host : pg-meta-1 @ 10.10.10.10:5432# Path : /etc/pgbouncer/pgb_hba.conf# Note : ANSIBLE MANAGED, DO NOT CHANGE!# Author : Ruohang Feng (rh@vonng.com)# License : AGPLv3#==============================================================## PGBOUNCER HBA RULES FOR pg-meta-1 @ 10.10.10.10:6432# ansible managed: 2023-01-11 14:30:58# addr alias# local : /var/run/postgresql# admin : 10.10.10.10# infra : 10.10.10.10# intra : 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16# user alias# dbsu : postgres# repl : replicator# monitor : dbuser_monitor# admin : dbuser_dba# dbsu local admin access with os ident [default]local pgbouncer postgres peer# allow all user local access with pwd [default]local all all scram-sha-256host all all 127.0.0.1/32 scram-sha-256# monitor access via intranet with pwd [default]host pgbouncer dbuser_monitor 10.0.0.0/8 scram-sha-256host pgbouncer dbuser_monitor 172.16.0.0/12 scram-sha-256host pgbouncer dbuser_monitor 192.168.0.0/16 scram-sha-256# reject all other monitor access addr [default]host all dbuser_monitor 0.0.0.0/0 reject# admin access via intranet with pwd [default]host all dbuser_dba 10.0.0.0/8 scram-sha-256host all dbuser_dba 172.16.0.0/12 scram-sha-256host all dbuser_dba 192.168.0.0/16 scram-sha-256# reject all other admin access addr [default]host all dbuser_dba 0.0.0.0/0 reject# allow all user intra access with pwd [default]host all all 10.0.0.0/8 scram-sha-256host all all 172.16.0.0/12 scram-sha-256host all all 192.168.0.0/16 scram-sha-256
Security Enhancement
For those critical cases, we have a security.yml template with the following hba rule set as a reference:
pg_default_hba_rules:# postgres host-based auth rules by default- {user:'${dbsu}',db: all ,addr: local ,auth: ident ,title:'dbsu access via local os user ident'}- {user:'${dbsu}',db: replication ,addr: local ,auth: ident ,title:'dbsu replication from local os ident'}- {user:'${repl}',db: replication ,addr: localhost ,auth: ssl ,title:'replicator replication from localhost'}- {user:'${repl}',db: replication ,addr: intra ,auth: ssl ,title:'replicator replication from intranet'}- {user:'${repl}',db: postgres ,addr: intra ,auth: ssl ,title:'replicator postgres db from intranet'}- {user:'${monitor}',db: all ,addr: localhost ,auth: pwd ,title:'monitor from localhost with password'}- {user:'${monitor}',db: all ,addr: infra ,auth: ssl ,title:'monitor from infra host with password'}- {user:'${admin}',db: all ,addr: infra ,auth: ssl ,title:'admin @ infra nodes with pwd & ssl'}- {user:'${admin}',db: all ,addr: world ,auth: cert ,title:'admin @ everywhere with ssl & cert'}- {user: '+dbrole_readonly',db: all ,addr: localhost ,auth: ssl ,title:'pgbouncer read/write via local socket'}- {user: '+dbrole_readonly',db: all ,addr: intra ,auth: ssl ,title:'read/write biz user via password'}- {user: '+dbrole_offline' ,db: all ,addr: intra ,auth: ssl ,title:'allow etl offline tasks from intranet'}pgb_default_hba_rules:# pgbouncer host-based authentication rules- {user:'${dbsu}',db: pgbouncer ,addr: local ,auth: peer ,title:'dbsu local admin access with os ident'}- {user: 'all' ,db: all ,addr: localhost ,auth: pwd ,title:'allow all user local access with pwd'}- {user:'${monitor}',db: pgbouncer ,addr: intra ,auth: ssl ,title:'monitor access via intranet with pwd'}- {user:'${monitor}',db: all ,addr: world ,auth: deny ,title:'reject all other monitor access addr'}- {user:'${admin}',db: all ,addr: intra ,auth: ssl ,title:'admin access via intranet with pwd'}- {user:'${admin}',db: all ,addr: world ,auth: deny ,title:'reject all other admin access addr'}- {user: 'all' ,db: all ,addr: intra ,auth: ssl ,title:'allow all user intra access with pwd'}
6.7 - Configuration
Configure your PostgreSQL cluster & instances according to your needs
You can define different types of instances & clusters.
Identity: Parameters used for describing a PostgreSQL cluster
Offline instance works like common replica instances, but it is used as a backup server in pg-test-replica service.
That is to say, offline and primary instance serves only when all replica instances are down.
Pigsty uses asynchronous stream replication by default. Which may have a small replication lag. (10KB / 10ms).
A small window of data loss may occur when the primary fails (can be controlled with pg_rpo.), but it is acceptable for most scenarios.
But in some critical scenarios (e.g. financial transactions), data loss is totally unacceptable or read-your-write consistency is required.
In this case, you can enable synchronous commit to ensure that.
To enable sync standby mode, you can simply use crit.yml template in pg_conf
To enable sync standby on existing clusters, config the cluster and enable synchronous_mode:
$ pg edit-config pg-test # run on admin node with admin user+++
-synchronous_mode: false# <--- old value+synchronous_mode: true# <--- new value synchronous_mode_strict: falseApply these changes? [y/N]: y
If synchronous_mode: true, the synchronous_standby_names parameter will be managed by patroni.
It will choose a sync standby from all available replicas and write its name to the primary’s configuration file.
Quorum Commit
When sync standby is enabled, PostgreSQL will pick one replica as the standby instance, and all other replicas as candidates.
Primary will wait until the standby instance flushes to disk before a commit is confirmed, and the standby instance will always have the latest data without any lags.
However, you can achieve an even higher/lower consistency level with the quorum commit (trade-off with availability).
For example, to have all 2 replicas to confirm a commit:
synchronous_mode:true# make sure synchronous mode is enabledsynchronous_node_count:2# at least 2 nodes to confirm a commit
If you have more replicas and wish to have more sync standby, increase synchronous_node_count accordingly.
Beware of adjust synchronous_node_count accordingly when you append or remove replicas.
The postgres synchronous_standby_names parameter will be managed by patroni:
The classic quorum commit is to use majority of replicas to confirm a commit.
synchronous_mode:quorum # use quorum commitpostgresql:parameters:# change the PostgreSQL parameter `synchronous_standby_names`, use the `ANY n ()` notionsynchronous_standby_names:'ANY 1 (*)'# you can specify a list of standby names, or use `*` to match them all
Example: Enable Quorum Commit
$ pg edit-config pg-test
+ synchronous_standby_names: 'ANY 1 (*)'# You have to configure this manually+ synchronous_mode: quorum # use quorum commit mode, undocumented parameter- synchronous_node_count: 2# this parameter is no longer needed in quorum modeApply these changes? [y/N]: y
After applying the configuration, we can see that all replicas are no longer sync standby, but just normal replicas.
After that, when we can check pg_stat_replication.sync_state, it becomes quorum instead of sync or async.
Standby Cluster
You can clone an existing cluster and create a standby cluster, which can be used for migration, horizontal split, multi-az deployment, or disaster recovery.
A standby cluster’s definition is just the same as any other normal cluster, except there’s a pg_upstream defined on the primary instance.
For example, you have a pg-test cluster, to create a standby cluster pg-test2, the inventory may look like this:
# pg-test is the original clusterpg-test:hosts:10.10.10.11:{pg_seq: 1, pg_role:primary }vars:{pg_cluster:pg-test }# pg-test2 is a standby cluster of pg-test.pg-test2:hosts:10.10.10.12:{pg_seq: 1, pg_role: primary , pg_upstream:10.10.10.11}# <--- pg_upstream is defined here10.10.10.13:{pg_seq: 2, pg_role:replica }vars:{pg_cluster:pg-test2 }
And pg-test2-1, the primary of pg-test2 will be a replica of pg-test and serve as a Standby Leader in pg-test2.
Just make sure that the pg_upstream parameter is configured on the primary of the backup cluster to pull backups from the original upstream automatically.
bin/pgsql-add pg-test # Creating the original clusterbin/pgsql-add pg-test2 # Creating a Backup Cluster
Example: Change Replication Upstream
You can change the replication upstream of the standby cluster when necessary (e.g. upstream failover).
To do so, just change the standby_cluster.host to the new upstream IP address and apply.
$ pg edit-config pg-test2
standby_cluster:
create_replica_methods:
- basebackup
- host: 10.10.10.13 # <--- The old upstream+ host: 10.10.10.12 # <--- The new upstream port: 5432 Apply these changes? [y/N]: y
Example: Promote Standby Cluster
You can promote the standby cluster to a standalone cluster at any time.
To do so, you have to config the cluster and wipe the entire standby_cluster section then apply.
$ pg edit-config pg-test2
-standby_cluster:
- create_replica_methods:
- - basebackup
- host: 10.10.10.11
- port: 5432Apply these changes? [y/N]: y
Example: Cascade Replica
If the pg_upstream is specified for replica rather than primary, the replica will be configured as a cascade replica with the given upstream ip instead of the cluster primary
pg-test:hosts:# pg-test-1 ---> pg-test-2 ---> pg-test-310.10.10.11:{pg_seq: 1, pg_role:primary }10.10.10.12:{pg_seq: 2, pg_role:replica }# <--- bridge instance10.10.10.13:{pg_seq: 2, pg_role: replica, pg_upstream:10.10.10.12}# ^--- replicate from pg-test-2 (the bridge) instead of pg-test-1 (the primary) vars:{pg_cluster:pg-test }
Delayed Cluster
A delayed cluster is a special type of standby cluster, which is used to recover “drop-by-accident” ASAP.
For example, if you wish to have a cluster pg-testdelay which has the same data as 1-day ago pg-test cluster:
# pg-test is the original clusterpg-test:hosts:10.10.10.11:{pg_seq: 1, pg_role:primary }vars:{pg_cluster:pg-test }# pg-testdelay is a delayed cluster of pg-test.pg-testdelay:hosts:10.10.10.12:{pg_seq: 1, pg_role: primary , pg_upstream: 10.10.10.11, pg_delay:1d }10.10.10.13:{pg_seq: 2, pg_role:replica }vars:{pg_cluster:pg-test2 }
patroni_citus_db has to be defined to specify the database to be managed
pg_dbsu_password has to be set to a non-empty string plain password if you want to use the pg_dbsupostgres rather than default pg_admin_username to perform admin commands
Besides, extra hba rules that allow ssl access from local & other data nodes are required. Which may looks like this
all:children:pg-citus0:# citus data node 0hosts:{10.10.10.10:{pg_seq: 1, pg_role:primary } }vars:{pg_cluster: pg-citus0 , pg_group:0}pg-citus1:# citus data node 1hosts:{10.10.10.11:{pg_seq: 1, pg_role:primary } }vars:{pg_cluster: pg-citus1 , pg_group:1}pg-citus2:# citus data node 2hosts:{10.10.10.12:{pg_seq: 1, pg_role:primary } }vars:{pg_cluster: pg-citus2 , pg_group:2}pg-citus3:# citus data node 3, with an extra replicahosts:10.10.10.13:{pg_seq: 1, pg_role:primary }10.10.10.14:{pg_seq: 2, pg_role:replica }vars:{pg_cluster: pg-citus3 , pg_group:3}vars:# global parameters for all citus clusterspg_mode: citus # pgsql cluster mode:cituspg_shard: pg-citus # citus shard name:pg-cituspatroni_citus_db:meta # citus distributed database namepg_dbsu_password:DBUser.Postgres# all dbsu password access for citus clusterpg_users:[{name: dbuser_meta ,password: DBUser.Meta ,pgbouncer: true ,roles:[dbrole_admin ] } ]pg_databases:[{name: meta ,extensions:[{name:citus }, { name: postgis }, { name: timescaledb } ] } ]pg_hba_rules:- {user: 'all' ,db: all ,addr: 127.0.0.1/32 ,auth: ssl ,title:'all user ssl access from localhost'}- {user: 'all' ,db: all ,addr: intra ,auth: ssl ,title:'all user ssl access from intranet'}
And you can create distributed table & reference table on the coordinator node. Any data node can be used as the coordinator node since citus 11.2.
Beware that these extensions are just not included in Pigsty’s default repo. You can have these extensions on older pg version with proper configuration.
6.8 - Playbook
How to manage PostgreSQL cluster with ansible playbooks
Pigsty has a series of playbooks for PostgreSQL:
pgsql.yml : Init HA PostgreSQL clusters or add new replicas.
pgsql-rm.yml : Remove PostgreSQL cluster, or remove replicas
pgsql-user.yml : Add new business user to existing PostgreSQL cluster
pgsql-db.yml : Add new business database to existing PostgreSQL cluster
pgsql-monitor.yml : Monitor remote PostgreSQL instance with local exporters
pgsql-migration.yml : Generate Migration manual & scripts for existing PostgreSQL
Safeguard
Beware, when using the pgsql.yml and pgsql-rm.yml playbooks, it can pose a risk of accidentally deleting databases if misused!
When using pgsql.yml, please check the --tags|-t and --limit|-l parameters.
Adding the -l parameter when executing playbooks is strongly recommended to limit execution hosts.
Think thrice before proceeding.
To prevent accidental deletions, the PGSQL module offers a safeguard option controlled by the following two parameters:
pg_safeguard is set to false by default: do not prevent purging by default.
pg_clean is set to true by default, meaning it will clean existing instances.
Effects on the init playbook
When meeting a running instance with the same config during the execution of the pgsql.yml playbook:
pg_safeguard / pg_clean
pg_clean=true
pg_clean=false
pg_safeguard=false
Purge
Abort
pg_safeguard=true
Abort
Abort
If pg_safeguard is enabled, the playbook will abort to avoid purging the running instance.
If the safeguard is disabled, it will further decide whether to remove the existing instance according to the value of pg_clean.
If pg_clean is true, the playbook will directly clean up the existing instance to make room for the new instance. This is the default behavior.
If pg_clean is false, the playbook will abort, which requires explicit configuration.
Effects on the remove playbook
When meeting a running instance with the same config during the execution of the pgsql-rm.yml playbook:
pg_safeguard / pg_clean
pg_clean=true
pg_clean=false
pg_safeguard=false
Purge & rm data
Purge
pg_safeguard=true
Abort
Abort
If pg_safeguard is enabled, the playbook will abort to avoid purging the running instance.
If the safeguard is disabled, it purges the running instance and will further decide whether to remove the existing data along with the instance according to the value of pg_clean.
If pg_clean is true, the playbook will directly clean up the PostgreSQL data cluster.
If pg_clean is false, the playbook will skip data purging, which requires explicit configuration.
pgsql.yml
The pgsql.yml is used for init HA PostgreSQL clusters or adding new replicas.
This playbook contains following subtasks:
# pg_clean : cleanup existing postgres if necessary# pg_dbsu : setup os user sudo for postgres dbsu# pg_install : install postgres packages & extensions# - pg_pkg : install postgres related packages# - pg_extension : install postgres extensions only# - pg_path : link pgsql version bin to /usr/pgsql# - pg_env : add pgsql bin to system path# pg_dir : create postgres directories and setup fhs# pg_util : copy utils scripts, setup alias and env# - pg_bin : sync postgres util scripts /pg/bin# - pg_alias : write /etc/profile.d/pg-alias.sh# - pg_psql : create psqlrc file for psql# - pg_dummy : create dummy placeholder file# patroni : bootstrap postgres with patroni# - pg_config : generate postgres config# - pg_conf : generate patroni config# - pg_systemd : generate patroni systemd config# - pgbackrest_config : generate pgbackrest config# - pg_cert : issues certificates for postgres# - pg_launch : launch postgres primary & replicas# - pg_watchdog : grant watchdog permission to postgres# - pg_primary : launch patroni/postgres primary# - pg_init : init pg cluster with roles/templates# - pg_pass : write .pgpass file to pg home# - pg_replica : launch patroni/postgres replicas# - pg_hba : generate pg HBA rules# - patroni_reload : reload patroni config# - pg_patroni : pause or remove patroni if necessary# pg_user : provision postgres business users# - pg_user_config : render create user sql# - pg_user_create : create user on postgres# pg_db : provision postgres business databases# - pg_db_config : render create database sql# - pg_db_create : create database on postgres# pg_backup : init pgbackrest repo & basebackup# - pgbackrest_init : init pgbackrest repo# - pgbackrest_backup : make a initial backup after bootstrap# pgbouncer : deploy a pgbouncer sidecar with postgres# - pgbouncer_clean : cleanup existing pgbouncer# - pgbouncer_dir : create pgbouncer directories# - pgbouncer_config : generate pgbouncer config# - pgbouncer_svc : generate pgbouncer systemd config# - pgbouncer_ini : generate pgbouncer main config# - pgbouncer_hba : generate pgbouncer hba config# - pgbouncer_db : generate pgbouncer database config# - pgbouncer_user : generate pgbouncer user config# - pgbouncer_launch : launch pgbouncer pooling service# - pgbouncer_reload : reload pgbouncer config# pg_vip : bind vip to pgsql primary with vip-manager# - pg_vip_config : generate config for vip-manager# - pg_vip_launch : launch vip-manager to bind vip# pg_dns : register dns name to infra dnsmasq# - pg_dns_ins : register pg instance name# - pg_dns_cls : register pg cluster name# pg_service : expose pgsql service with haproxy# - pg_service_config : generate local haproxy config for pg services# - pg_service_reload : expose postgres services with haproxy# pg_exporter : expose pgsql service with haproxy# - pg_exporter_config : config pg_exporter & pgbouncer_exporter# - pg_exporter_launch : launch pg_exporter# - pgbouncer_exporter_launch : launch pgbouncer exporter# pg_register : register postgres to pigsty infrastructure# - register_prometheus : register pg as prometheus monitor targets# - register_grafana : register pg database as grafana datasource
pgBackRest backup & restore command and shortcuts:
pb info # print pgbackrest repo infopg-backup # make a backup, incr, or full backup if necessarypg-backup full # make a full backuppg-backup diff # make a differential backuppg-backup incr # make a incremental backuppg-pitr -i # restore to the time of latest backup complete (not often used)pg-pitr --time="2022-12-30 14:44:44+08"# restore to specific time point (in case of drop db, drop table)pg-pitr --name="my-restore-point"# restore TO a named restore point create by pg_create_restore_pointpg-pitr --lsn="0/7C82CB8" -X # restore right BEFORE a LSNpg-pitr --xid="1234567" -X -P # restore right BEFORE a specific transaction id, then promotepg-pitr --backup=latest # restore to latest backup setpg-pitr --backup=20221108-105325 # restore to a specific backup set, which can be checked with pgbackrest info
To create a new database user on the existing Postgres cluster, add database definition to all.children.<cls>.pg_databases, then create the database as follows:
Note: If the database has specified an owner, the user should already exist, or you’ll have to Create User first.
Example: Create Business Database
Reload Service
Services are exposed access point served by HAProxy.
This task is used when cluster membership has changed, e.g., append/remove replicas, switchover/failover / exposing new service or updating existing service’s config (e.g., LB Weight)
To create new services or reload existing services on entire proxy cluster or specific instances:
Note: patroni unsafe RestAPI access is limit from infra/admin nodes and protected with an HTTP basic auth username/password and an optional HTTPS mode.
Example: Config Cluster with patronictl
Append Replica
To add a new replica to the existing Postgres cluster, you have to add its definition to the inventory: all.children.<cls>.hosts, then:
bin/node-add <ip> # init node <ip> for the new replica bin/pgsql-add <cls> <ip> # init pgsql instances on <ip> for cluster <cls>
It will add node <ip> to pigsty and init it as a replica of the cluster <cls>.
Cluster services will be reloaded to adopt the new member
Example: Add replica to pg-test
For example, if you want to add a pg-test-3 / 10.10.10.13 to the existing cluster pg-test, you’ll have to update the inventory first:
pg-test:
hosts:
10.10.10.11: { pg_seq: 1, pg_role: primary }# existing member 10.10.10.12: { pg_seq: 2, pg_role: replica }# existing member 10.10.10.13: { pg_seq: 3, pg_role: replica }# <--- new member vars: { pg_cluster: pg-test }
then apply the change as follows:
bin/node-add 10.10.10.13 # add node to pigstybin/pgsql-add pg-test 10.10.10.13 # init new replica on 10.10.10.13 for cluster pg-test
which is similar to cluster init but only works on single instance。
[ OK ] init instances 10.10.10.11 to pgsql cluster 'pg-test':
[WARN] reminder: add nodes to pigsty, then install additional module 'pgsql'[HINT] $ bin/node-add 10.10.10.11 # run this ahead, except infra nodes[WARN] init instances from cluster:
[ OK ] $ ./pgsql.yml -l '10.10.10.11,&pg-test'[WARN] reload pg_service on existing instances:
[ OK ] $ ./pgsql.yml -l 'pg-test,!10.10.10.11' -t pg_service
Remove Replica
To remove a replica from the existing PostgreSQL cluster:
It will remove instance <ip> from cluster <cls>.
Cluster services will be reloaded to kick the removed instance from load balancer.
Example: Remove replica from pg-test
For example, if you want to remove pg-test-3 / 10.10.10.13 from the existing cluster pg-test:
bin/pgsql-rm pg-test 10.10.10.13 # remove pgsql instance 10.10.10.13 from pg-testbin/node-rm 10.10.10.13 # remove that node from pigsty (optional)vi pigsty.yml # remove instance definition from inventorybin/pgsql-svc pg-test # refresh pg_service on existing instances to kick removed instance from load balancer
[ OK ] remove pgsql instances from 10.10.10.13 of 'pg-test':
[WARN] remove instances from cluster:
[ OK ] $ ./pgsql-rm.yml -l '10.10.10.13,&pg-test'
And remove instance definition from the inventory:
pg-test:hosts:10.10.10.11:{pg_seq: 1, pg_role:primary }10.10.10.12:{pg_seq: 2, pg_role:replica }10.10.10.13:{pg_seq: 3, pg_role:replica }# <--- remove this after executionvars:{pg_cluster:pg-test }
Finally, you can update pg service and kick the removed instance from load balancer:
bin/pgsql-svc pg-test # reload pg service on pg-test
Remove Cluster
To remove the entire Postgres cluster, just run:
bin/pgsql-rm <cls> # ./pgsql-rm.yml -l <cls>
Example: Remove Cluster
Example: Force removing a cluster
Note: if pg_safeguard is configured for this cluster (or globally configured to true), pgsql-rm.yml will abort to avoid removing a cluster by accident.
You can use playbook command line args to explicitly overwrite it to force the purge:
./pgsql-rm.yml -l pg-meta -e pg_safeguard=false# force removing pg cluster pg-meta
Switchover
You can perform a PostgreSQL cluster switchover with patroni cmd.
pg switchover <cls> # interactive mode, you can skip that with following optionspg switchover --leader pg-test-1 --candidate=pg-test-2 --scheduled=now --force pg-test
Example: Switchover pg-test
$ pg switchover pg-test
Master [pg-test-1]:
Candidate ['pg-test-2', 'pg-test-3'][]: pg-test-2
When should the switchover take place (e.g. 2022-12-26T07:39 )[now]: now
Current cluster topology
+ Cluster: pg-test (7181325041648035869) -----+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Tags |+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-1 | 10.10.10.11 | Leader | running |1|| clonefrom: true|||||||| conf: tiny.yml |||||||| spec: 1C.2G.50G |||||||| version: '15'|+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-2 | 10.10.10.12 | Replica | running |1|0| clonefrom: true|||||||| conf: tiny.yml |||||||| spec: 1C.2G.50G |||||||| version: '15'|+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-3 | 10.10.10.13 | Replica | running |1|0| clonefrom: true|||||||| conf: tiny.yml |||||||| spec: 1C.2G.50G |||||||| version: '15'|+-----------+-------------+---------+---------+----+-----------+-----------------+
Are you sure you want to switchover cluster pg-test, demoting current master pg-test-1? [y/N]: y
2022-12-26 06:39:58.02468 Successfully switched over to "pg-test-2"+ Cluster: pg-test (7181325041648035869) -----+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Tags |+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-1 | 10.10.10.11 | Replica | stopped || unknown | clonefrom: true|||||||| conf: tiny.yml |||||||| spec: 1C.2G.50G |||||||| version: '15'|+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-2 | 10.10.10.12 | Leader | running |1|| clonefrom: true|||||||| conf: tiny.yml |||||||| spec: 1C.2G.50G |||||||| version: '15'|+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-3 | 10.10.10.13 | Replica | running |1|0| clonefrom: true|||||||| conf: tiny.yml |||||||| spec: 1C.2G.50G |||||||| version: '15'|+-----------+-------------+---------+---------+----+-----------+-----------------+
To do so with Patroni API (schedule a switchover from 2 to 1 at a specific time):
To create a backup with pgBackRest, run as local dbsu:
pg-backup # make a postgres base backuppg-backup full # make a full backuppg-backup diff # make a differential backuppg-backup incr # make a incremental backuppb info # check backup information
You can add crontab to node_crontab to specify your backup policy.
# make a full backup 1 am everyday- '00 01 * * * postgres /pg/bin/pg-backup full'# rotate backup: make a full backup on monday 1am, and an incremental backup during weekdays- '00 01 * * 1 postgres /pg/bin/pg-backup full'- '00 01 * * 2,3,4,5,6,7 postgres /pg/bin/pg-backup'
Restore Cluster
To restore a cluster to a previous time point (PITR), run as local dbsu:
pg-pitr -i # restore to the time of latest backup complete (not often used)pg-pitr --time="2022-12-30 14:44:44+08"# restore to specific time point (in case of drop db, drop table)pg-pitr --name="my-restore-point"# restore TO a named restore point create by pg_create_restore_pointpg-pitr --lsn="0/7C82CB8" -X # restore right BEFORE a LSNpg-pitr --xid="1234567" -X -P # restore right BEFORE a specific transaction id, then promotepg-pitr --backup=latest # restore to latest backup setpg-pitr --backup=20221108-105325 # restore to a specific backup set, which can be checked with pgbackrest info
And follow the instructions wizard, Check Backup & PITR for details.
Example: PITR with raw pgBackRest Command
# restore to the latest available point (e.g. hardware failure)pgbackrest --stanza=pg-meta restore
# PITR to specific time point (e.g. drop table by accident)pgbackrest --stanza=pg-meta --type=time --target="2022-11-08 10:58:48"\
--target-action=promote restore
# restore specific backup point and then promote (or pause|shutdown)pgbackrest --stanza=pg-meta --type=immediate --target-action=promote \
--set=20221108-105325F_20221108-105938I restore
Then rebuild repo on infra nodes with ./infra.yml -t repo_build subtask, Then you can install these packages with ansible module package:
ansible pg-test -b -m package -a "name=pg_cron_15,topn_15,pg_stat_monitor_15*"# install some packages
Update Packages Manually
# add repo upstream on admin node, then download them manuallycd ~/pigsty; ./infra.yml -t repo_upstream,repo_cache # add upstream repo (internet)cd /www/pigsty; repotrack "some_new_package_name"# download the latest RPMs# re-create local repo on admin node, then refresh yum/apt cache on all nodescd ~/pigsty; ./infra.yml -t repo_create # recreate local repo on admin node./node.yml -t node_repo # refresh yum/apt cache on all nodes# alternatives: clean and remake cache on all nodes with ansible commandansible all -b -a 'yum clean all'# clean node repo cacheansible all -b -a 'yum makecache'# remake cache from the new repoansible all -b -a 'apt clean'# clean node repo cache (Ubuntu/Debian)ansible all -b -a 'apt update'# remake cache from the new repo (Ubuntu/Debian)
For example, you can then install or upgrade packages with:
ansible pg-test -b -m package -a "name=postgresql15* state=latest"
Install Extension
If you want to install extension on pg clusters, Add them to pg_extensions and make sure them installed with:
./pgsql.yml -t pg_extension # install extensions
Some extension needs to be loaded in shared_preload_libraries, You can add them to pg_libs, or Config an existing cluster.
Finally, CREATE EXTENSION <extname>; on the cluster primary instance to install it.
Example: Install pg_cron on pg-test cluster
ansible pg-test -b -m package -a "name=pg_cron_15"# install pg_cron packages on all nodes# add pg_cron to shared_preload_librariespg edit-config --force -p shared_preload_libraries='timescaledb, pg_cron, pg_stat_statements, auto_explain'pg restart --force pg-test # restart clusterpsql -h pg-test -d postgres -c 'CREATE EXTENSION pg_cron;'# install pg_cron on primary
The simplest way to achieve a major version upgrade is to create a new cluster with the new version, then migration with logical replication & green/blue deployment.
You can also perform an in-place major upgrade, which is not recommended especially when certain extensions are installed. But it is possible.
Assume you want to upgrade PostgreSQL 14 to 15, you have to add packages to yum/apt repo, and guarantee the extensions has exact same version too.
!> Remember to change these password in production deployment !
pg_dbsu:postgres # os user for the databasepg_replication_username:replicator # system replication userpg_replication_password:DBUser.Replicator # system replication passwordpg_monitor_username:dbuser_monitor # system monitor userpg_monitor_password:DBUser.Monitor # system monitor passwordpg_admin_username:dbuser_dba # system admin userpg_admin_password:DBUser.DBA # system admin password
- GRANT USAGE ON SCHEMAS TO dbrole_readonly- GRANT SELECT ON TABLES TO dbrole_readonly- GRANT SELECT ON SEQUENCES TO dbrole_readonly- GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly- GRANT USAGE ON SCHEMAS TO dbrole_offline- GRANT SELECT ON TABLES TO dbrole_offline- GRANT SELECT ON SEQUENCES TO dbrole_offline- GRANT EXECUTE ON FUNCTIONS TO dbrole_offline- GRANT INSERT ON TABLES TO dbrole_readwrite- GRANT UPDATE ON TABLES TO dbrole_readwrite- GRANT DELETE ON TABLES TO dbrole_readwrite- GRANT USAGE ON SEQUENCES TO dbrole_readwrite- GRANT UPDATE ON SEQUENCES TO dbrole_readwrite- GRANT TRUNCATE ON TABLES TO dbrole_admin- GRANT REFERENCES ON TABLES TO dbrole_admin- GRANT TRIGGER ON TABLES TO dbrole_admin- GRANT CREATE ON SCHEMAS TO dbrole_admin
Newly created objects will have corresponding privileges when it is created by admin users
The \ddp+ may looks like:
Type
Access privileges
function
=X
dbrole_readonly=X
dbrole_offline=X
dbrole_admin=X
schema
dbrole_readonly=U
dbrole_offline=U
dbrole_admin=UC
sequence
dbrole_readonly=r
dbrole_offline=r
dbrole_readwrite=wU
dbrole_admin=rwU
table
dbrole_readonly=r
dbrole_offline=r
dbrole_readwrite=awd
dbrole_admin=arwdDxt
Default Privilege
ALTER DEFAULT PRIVILEGES allows you to set the privileges that will be applied to objects created in the future.
It does not affect privileges assigned to already-existing objects, and objects created by non-admin users.
Pigsty will use the following default privileges:
{%forprivinpg_default_privileges%}ALTERDEFAULTPRIVILEGESFORROLE{{pg_dbsu}}{{priv}};{%endfor%}{%forprivinpg_default_privileges%}ALTERDEFAULTPRIVILEGESFORROLE{{pg_admin_username}}{{priv}};{%endfor%}-- for additional business admin, they can SET ROLE to dbrole_admin
{%forprivinpg_default_privileges%}ALTERDEFAULTPRIVILEGESFORROLE"dbrole_admin"{{priv}};{%endfor%}
Which will be rendered in pg-init-template.sql alone with ALTER DEFAULT PRIVILEGES statement for admin users.
These SQL command will be executed on postgres & template1 during cluster bootstrap, and newly created database will inherit it from tempalte1 by default.
That is to say, to maintain the correct object privilege, you have to run DDL with admin users, which could be:
It’s wise to use postgres as global object owner to perform DDL changes.
If you wish to create objects with business admin user, YOU MUST USE SET ROLE dbrole_admin before running that DDL to maintain the correct privileges.
You can also ALTER DEFAULT PRIVILEGE FOR ROLE <some_biz_admin> XXX to grant default privilege to business admin user, too.
There are 3 database level privileges: CONNECT, CREATE, TEMP, and a special ‘privilege’: OWNERSHIP.
- name:meta # required, `name` is the only mandatory field of a database definitionowner:postgres # optional, specify a database owner, {{ pg_dbsu }} by defaultallowconn:true# optional, allow connection, true by default. false will disable connect at allrevokeconn:false# optional, revoke public connection privilege. false by default. (leave connect with grant option to owner)
If owner exists, it will be used as database owner instead of default {{ pg_dbsu }}
If revokeconn is false, all users have the CONNECT privilege of the database, this is the default behavior.
If revokeconn is set to true explicitly:
CONNECT privilege of the database will be revoked from PUBLIC
CONNECT privilege will be granted to {{ pg_replication_username }}, {{ pg_monitor_username }} and {{ pg_admin_username }}
CONNECT privilege will be granted to database owner with GRANT OPTION
revokeconn flag can be used for database access isolation, you can create different business users as the owners for each database and set the revokeconn option for all of them.
Pigsty revokes the CREATE privilege on database from PUBLIC by default, for security consideration.
And this is the default behavior since PostgreSQL 15.
The database owner have the full capability to adjust these privileges as they see fit.
6.11 - Backup & PITR
How to perform base backup & PITR with pgBackRest?
In the case of a hardware failure, a physical replica failover could be the best choice. Whereas for data corruption scenarios (whether machine or human errors), Point-in-Time Recovery (PITR) is often more appropriate.
# stanza name = {{ pg_cluster }} by defaultpgbackrest --stanza=${stanza} --type=full|diff|incr backup
# you can also use the following command in pigsty (/pg/bin/pg-backup)pg-backup # make a backup, incr, or full backup if necessarypg-backup full # make a full backuppg-backup diff # make a differential backuppg-backup incr # make a incremental backup
pg-pitr # restore to wal archive stream end (e.g. used in case of entire DC failure)
pg-pitr -i # restore to the time of latest backup complete (not often used)
pg-pitr --time="2022-12-30 14:44:44+08" # restore to specific time point (in case of drop db, drop table)
pg-pitr --name="my-restore-point" # restore TO a named restore point create by pg_create_restore_point
pg-pitr --lsn="0/7C82CB8" -X # restore right BEFORE a LSN
pg-pitr --xid="1234567" -X -P # restore right BEFORE a specific transaction id, then promote
pg-pitr --backup=latest # restore to latest backup set
pg-pitr --backup=20221108-105325 # restore to a specific backup set, which can be checked with pgbackrest info
pg-pitr # pgbackrest --stanza=pg-meta restore
pg-pitr -i # pgbackrest --stanza=pg-meta --type=immediate restore
pg-pitr -t "2022-12-30 14:44:44+08" # pgbackrest --stanza=pg-meta --type=time --target="2022-12-30 14:44:44+08" restore
pg-pitr -n "my-restore-point" # pgbackrest --stanza=pg-meta --type=name --target=my-restore-point restore
pg-pitr -b 20221108-105325F # pgbackrest --stanza=pg-meta --type=name --set=20221230-120101F restore
pg-pitr -l "0/7C82CB8" -X # pgbackrest --stanza=pg-meta --type=lsn --target="0/7C82CB8" --target-exclusive restore
pg-pitr -x 1234567 -X -P # pgbackrest --stanza=pg-meta --type=xid --target="0/7C82CB8" --target-exclusive --target-action=promote restore
The pg-pitr script will generate instructions for you to perform PITR.
For example, if you wish to rollback current cluster status back to "2023-02-07 12:38:00+08":
$ pg-pitr -t "2023-02-07 12:38:00+08"pgbackrest --stanza=pg-meta --type=time --target='2023-02-07 12:38:00+08' restore
Perform time PITR on pg-meta
[1. Stop PostgreSQL]=========================================== 1.1 Pause Patroni (if there are any replicas) $ pg pause <cls> # pause patroni auto failover 1.2 Shutdown Patroni
$ pt-stop # sudo systemctl stop patroni 1.3 Shutdown Postgres
$ pg-stop # pg_ctl -D /pg/data stop -m fast[2. Perform PITR]=========================================== 2.1 Restore Backup
$ pgbackrest --stanza=pg-meta --type=time --target='2023-02-07 12:38:00+08' restore
2.2 Start PG to Replay WAL
$ pg-start # pg_ctl -D /pg/data start 2.3 Validate and Promote
- If database content is ok, promote it to finish recovery, otherwise goto 2.1
$ pg-promote # pg_ctl -D /pg/data promote[3. Restart Patroni]=========================================== 3.1 Start Patroni
$ pt-start;# sudo systemctl start patroni 3.2 Enable Archive Again
$ psql -c 'ALTER SYSTEM SET archive_mode = on; SELECT pg_reload_conf();' 3.3 Restart Patroni
$ pt-restart # sudo systemctl start patroni[4. Restore Cluster]=========================================== 3.1 Re-Init All Replicas (if any replicas) $ pg reinit <cls> <ins>
3.2 Resume Patroni
$ pg resume <cls> # resume patroni auto failover 3.2 Make Full Backup (optional) $ pg-backup full # pgbackrest --stanza=pg-meta backup --type=full
For example, the default pg-meta will take a full backup every day at 1 am.
node_crontab: # make a full backup 1 am everyday
- '00 01 * * * postgres /pg/bin/pg-backup full'
With the default local repo retention policy, it will keep at most two full backups and temporarily allow three during backup.
pgbackrest_repo: # pgbackrest repo:https://pgbackrest.org/configuration.html#section-repositorylocal:# default pgbackrest repo with local posix fspath:/pg/backup # local backup directory, `/pg/backup` by defaultretention_full_type:count # retention full backups by countretention_full:2# keep 2, at most 3 full backup when using local fs repo
Your backup disk storage should be at least three x database file size + WAL archive in 3 days.
MinIO repo
When using MinIO, storage capacity is usually not a problem. You can keep backups as long as you want.
For example, the default pg-test will take a full backup on Monday and incr backup on other weekdays.
node_crontab:# make a full backup 1 am everyday- '00 01 * * 1 postgres /pg/bin/pg-backup full'- '00 01 * * 2,3,4,5,6,7 postgres /pg/bin/pg-backup'
And with a 14-day time retention policy, backup in the last two weeks will be kept. But beware, this guarantees a week’s PITR period only.
pgbackrest_repo: # pgbackrest repo:https://pgbackrest.org/configuration.html#section-repository=minio:# optional minio repo for pgbackresttype:s3 # minio is s3-compatible, so s3 is useds3_endpoint:sss.pigsty # minio endpoint domain name, `sss.pigsty` by defaults3_region:us-east-1 # minio region, us-east-1 by default, useless for minios3_bucket:pgsql # minio bucket name, `pgsql` by defaults3_key:pgbackrest # minio user access key for pgbackrests3_key_secret:S3User.Backup # minio user secret key for pgbackrests3_uri_style:path # use path style uri for minio rather than host stylepath:/pgbackrest # minio backup path, default is `/pgbackrest`storage_port:9000# minio port, 9000 by defaultstorage_ca_file:/etc/pki/ca.crt # minio ca file path, `/etc/pki/ca.crt` by defaultbundle:y# bundle small files into a single filecipher_type:aes-256-cbc # enable AES encryption for remote backup repocipher_pass:pgBackRest # AES encryption password, default is 'pgBackRest'retention_full_type:time # retention full backup by time on minio reporetention_full:14# keep full backup for last 14 days
6.12 - Migration
How to migrate existing postgres into Pigsty-managed cluster with mimial downtime? The blue-green online migration playbook
Pigsty has a built-in playbook pgsql-migration.yml to perform online database migration based on logical replication.
With proper automation, the downtime could be minimized to several seconds. But beware that logical replication requires PostgreSQL 10+ to work. You can still use the facility here and use a pg_dump | psql instead of logical replication.
Define a Migration Task
You have to create a migration task definition file to use this playbook.
You have to tell pigsty where is the source cluster and destination cluster. The database to be migrated, and the primary IP address.
You should have superuser privileges on both sides to proceed
You can overwrite the superuser connection to the source cluster with src_pg, and logical replication connection string with sub_conn, Otherwise, pigsty default admin & replicator credentials will be used.
---#-----------------------------------------------------------------# PG_MIGRATION#-----------------------------------------------------------------context_dir:~/migration # migration manuals & scripts#-----------------------------------------------------------------# SRC Cluster (The OLD Cluster)#-----------------------------------------------------------------src_cls:pg-meta # src cluster name <REQUIRED>src_db:meta # src database name <REQUIRED>src_ip:10.10.10.10# src cluster primary ip <REQUIRED>#src_pg: '' # if defined, use this as src dbsu pgurl instead of:# # postgres://{{ pg_admin_username }}@{{ src_ip }}/{{ src_db }}# # e.g. 'postgres://dbuser_dba:DBUser.DBA@10.10.10.10:5432/meta'#sub_conn: '' # if defined, use this as subscription connstr instead of:# # host={{ src_ip }} dbname={{ src_db }} user={{ pg_replication_username }}'# # e.g. 'host=10.10.10.10 dbname=meta user=replicator password=DBUser.Replicator'#-----------------------------------------------------------------# DST Cluster (The New Cluster)#-----------------------------------------------------------------dst_cls:pg-test # dst cluster name <REQUIRED>dst_db:test # dst database name <REQUIRED>dst_ip:10.10.10.11# dst cluster primary ip <REQUIRED>#dst_pg: '' # if defined, use this as dst dbsu pgurl instead of:# # postgres://{{ pg_admin_username }}@{{ dst_ip }}/{{ dst_db }}# # e.g. 'postgres://dbuser_dba:DBUser.DBA@10.10.10.11:5432/test'#-----------------------------------------------------------------# PGSQL#-----------------------------------------------------------------pg_dbsu:postgrespg_replication_username:replicatorpg_replication_password:DBUser.Replicatorpg_admin_username:dbuser_dbapg_admin_password:DBUser.DBApg_monitor_username:dbuser_monitorpg_monitor_password:DBUser.Monitor#-----------------------------------------------------------------...
Generate Migration Plan
The playbook does not migrate src to dst, but it will generate everything your need to do so.
After the execution, you will find migration context dir under ~/migration/pg-meta.meta by default
Following the README.md and executing these scripts one by one, you will do the trick!
# this script will setup migration context with env vars. ~/migration/pg-meta.meta/activate
# these scripts are used for check src cluster status# and help generating new cluster definition in pigsty./check-user # check src users./check-db # check src databases./check-hba # check src hba rules./check-repl # check src replica identities./check-misc # check src special objects# these scripts are used for building logical replication# between existing src cluster and pigsty managed dst cluster# schema, data will be synced in realtime, except for sequences./copy-schema # copy schema to dest./create-pub # create publication on src./create-sub # create subscription on dst./copy-progress # print logical replication progress./copy-diff # quick src & dst diff by counting tables# these scripts will run in an online migration, which will# stop src cluster, copy sequence numbers (which is not synced with logical replication)# you have to reroute you app traffic according to your access method (dns,vip,haproxy,pgbouncer,etc...)# then perform cleanup to drop subscription and publication./copy-seq [n]# sync sequence numbers, if n is given, an additional shift will applied#./disable-src # restrict src cluster access to admin node & new cluster (YOUR IMPLEMENTATION)#./re-routing # ROUTING APPLICATION TRAFFIC FROM SRC TO DST! (YOUR IMPLEMENTATION)./drop-sub # drop subscription on dst after migration./drop-pub # drop publication on src after migration
Caveats
You can use ./copy-seq 1000 to advance all sequences by a number (e.g. 1000) after syncing sequences.
Which may prevent potential serial primary key conflict in new clusters.
You have to implement your own ./re-routing script to route your application traffic from src to dst.
Since we don’t know how your traffic is routed (e.g dns, VIP, haproxy, or pgbouncer).
Of course, you can always do that by hand…
You have to implement your own ./disable-src script to restrict the src cluster.
You can do that by changing HBA rules & reload (recommended), or just shutting down postgres, pgbouncer, or haproxy…
6.13 - Monitoring
How PostgreSQL monitoring works, and how to monitor remote (existing) PostgreSQL instances?
Overview
Pigsty uses the modern observability stack for PostgreSQL monitoring:
Grafana for metrics visualization and PostgreSQL datasource.
There are three identity labels: cls, ins, ip, which will be attached to all metrics & logs. node & haproxy will try to reuse the same identity to provide consistent metrics & logs.
Prometheus monitoring targets are defined in static files under /etc/prometheus/targets/pgsql/. Each instance will have a corresponding file. Take pg-meta-1 as an example:
When the global flag patroni_ssl_enabled is set, the patroni target will be managed as /etc/prometheus/targets/patroni/<ins>.yml because it requires a different scrape endpoint (https).
Prometheus monitoring target will be removed when a cluster is removed by bin/pgsql-rm or pgsql-rm.yml. You can use playbook subtasks, or remove them manually:
bin/pgmon-rm <ins> # remove prometheus targets from all infra nodes
Remote RDS targets are managed as /etc/prometheus/targets/pgrds/<cls>.yml. It will be created by the pgsql-monitor.yml playbook or bin/pgmon-add script.
Monitor Mode
There are three ways to monitor PostgreSQL instances in Pigsty:
Suppose the target DB node can be managed by Pigsty (accessible via ssh and sudo is available). In that case, you can use the pg_exporter task in the pgsql.yml playbook to deploy the monitoring component PG Exporter on the target node in the same manner as a standard deployment.
You can also deploy the connection pool and its monitoring on existing instance nodes using the pgbouncer and pgbouncer_exporter tasks from the same playbook. Additionally, you can deploy host monitoring, load balancing, and log collection components using the node_exporter, haproxy, and promtail tasks from the node.yml playbook, achieving a similar user experience with the native Pigsty cluster.
The definition method for existing clusters is very similar to the normal clusters managed by Pigsty. Selectively run certain tasks from the pgsql.yml playbook instead of running the entire playbook.
./node.yml -l <cls> -t node_repo,node_pkg # Add YUM sources for INFRA nodes on host nodes and install packages../node.yml -l <cls> -t node_exporter,node_register # Configure host monitoring and add to Prometheus../node.yml -l <cls> -t promtail # Configure host log collection and send to Loki../pgsql.yml -l <cls> -t pg_exporter,pg_register # Configure PostgreSQL monitoring and register with Prometheus/Grafana.
If you can only access the target database via PGURL (database connection string), you can refer to the instructions here for configuration. In this mode, Pigsty deploys the corresponding PG Exporter on the INFRA node to fetch metrics from the remote database, as shown below:
The monitoring system will no longer have host/pooler/load balancer metrics. But the PostgreSQL metrics & catalog info are still available. Pigsty has two dedicated dashboards for that: PGRDS Cluster and PGRDS Instance. Overview and Database level dashboards are reused. Since Pigsty cannot manage your RDS, you have to setup monitor on the target database in advance.
Below, we use a sandbox environment as an example: now we assume that the pg-meta cluster is an RDS instance pg-foo-1 to be monitored, and the pg-test cluster is an RDS cluster pg-bar to be monitored:
Declare the cluster in the configuration list. For example, suppose we want to monitor the “remote” pg-meta & pg-test clusters:
infra:# Infra cluster for proxies, monitoring, alerts, etc.hosts:{10.10.10.10:{infra_seq:1}}vars:# Install pg_exporter on 'infra' group for remote postgres RDSpg_exporters:# List all remote instances here, assign a unique unused local port for k20001:{pg_cluster: pg-foo, pg_seq: 1, pg_host: 10.10.10.10 , pg_databases:[{name:meta }] }# Register meta database as Grafana data source20002:{pg_cluster: pg-bar, pg_seq: 1, pg_host: 10.10.10.11 , pg_port:5432}# Several different connection string concatenation methods20003:{pg_cluster: pg-bar, pg_seq: 2, pg_host: 10.10.10.12 , pg_exporter_url:'postgres://dbuser_monitor:DBUser.Monitor@10.10.10.12:5432/postgres?sslmode=disable'}20004:{pg_cluster: pg-bar, pg_seq: 3, pg_host: 10.10.10.13 , pg_monitor_username: dbuser_monitor, pg_monitor_password:DBUser.Monitor }
The databases listed in the pg_databases field will be registered in Grafana as a PostgreSQL data source, providing data support for the PGCAT monitoring panel. If you don’t want to use PGCAT and register the database in Grafana, set pg_databases to an empty array or leave it blank.
Execute the command to add monitoring: bin/pgmon-add <clsname>
bin/pgmon-add pg-foo # Bring the pg-foo cluster into monitoringbin/pgmon-add pg-bar # Bring the pg-bar cluster into monitoring
To remove a remote cluster from monitoring, use bin/pgmon-rm <clsname>
bin/pgmon-rm pg-foo # Remove pg-foo from Pigsty monitoringbin/pgmon-rm pg-bar # Remove pg-bar from Pigsty monitoring
You can use more parameters to override the default pg_exporter options. Here is an example for monitoring Aliyun RDS and PolarDB with Pigsty:
infra:# infra cluster for proxy, monitor, alert, etc..hosts:{10.10.10.10:{infra_seq:1}}vars:# install pg_exporter for remote postgres RDS on a group 'infra'pg_exporters:# list all remote instances here, alloc a unique unused local port as k20001:{pg_cluster: pg-foo, pg_seq: 1, pg_host:10.10.10.10}20002:{pg_cluster: pg-bar, pg_seq: 1, pg_host: 10.10.10.11 , pg_port:5432}20003:{pg_cluster: pg-bar, pg_seq: 2, pg_host: 10.10.10.12 , pg_exporter_url:'postgres://dbuser_monitor:DBUser.Monitor@10.10.10.12:5432/postgres?sslmode=disable'}20004:{pg_cluster: pg-bar, pg_seq: 3, pg_host: 10.10.10.13 , pg_monitor_username: dbuser_monitor, pg_monitor_password:DBUser.Monitor }20011:pg_cluster:pg-polar # RDS Cluster Name (Identity, Explicitly Assigned, used as 'cls')pg_seq:1# RDS Instance Seq (Identity, Explicitly Assigned, used as part of 'ins')pg_host:pxx.polardbpg.rds.aliyuncs.com # RDS Host Addresspg_port:1921# RDS Portpg_exporter_include_database:'test'# Only monitoring database in this listpg_monitor_username:dbuser_monitor # monitor username, overwrite defaultpg_monitor_password:DBUser_Monitor # monitor password, overwrite defaultpg_databases:[{name:test }] # database to be added to grafana datasource20012:pg_cluster:pg-polar # RDS Cluster Name (Identity, Explicitly Assigned, used as 'cls')pg_seq:2# RDS Instance Seq (Identity, Explicitly Assigned, used as part of 'ins')pg_host:pe-xx.polarpgmxs.rds.aliyuncs.com # RDS Host Addresspg_port:1521# RDS Portpg_databases:[{name:test }] # database to be added to grafana datasource20014:pg_cluster:pg-rdspg_seq:1pg_host:pgm-xx.pg.rds.aliyuncs.compg_port:5432pg_exporter_auto_discovery:truepg_exporter_include_database:'rds'pg_monitor_username:dbuser_monitorpg_monitor_password:DBUser_Monitorpg_databases:[{name:rds } ]20015:pg_cluster:pg-rdshapg_seq:1pg_host:pgm-2xx8wu.pg.rds.aliyuncs.compg_port:5432pg_exporter_auto_discovery:truepg_exporter_include_database:'rds'pg_databases:[{name:test }, {name: rds}]20016:pg_cluster:pg-rdshapg_seq:2pg_host:pgr-xx.pg.rds.aliyuncs.compg_exporter_auto_discovery:truepg_exporter_include_database:'rds'pg_databases:[{name:test }, {name: rds}]
Monitor Setup
When you want to monitor existing instances, whether it’s RDS or a self-built PostgreSQL instance, you need to make some configurations on the target database so that Pigsty can access them.
To bring an external existing PostgreSQL instance into monitoring, you need a connection string that can access that instance/cluster. Any accessible connection string (business user, superuser) can be used, but we recommend using a dedicated monitoring user to avoid permission leaks.
Monitor User: The default username used is dbuser_monitor. This user belongs to the pg_monitor group, or ensure it has the necessary view permissions.
Monitor HBA: Default password is DBUser.Monitor. You need to ensure that the HBA policy allows the monitoring user to access the database from the infra nodes.
Monitor Schema: It’s optional but recommended to create a dedicate schema monitor for monitoring views and extensions.
Monitor Extension:It is strongly recommended to enable the built-in extension pg_stat_statements.
Monitor View: Monitoring views are optional but can provide additional metrics. Which is recommended.
Monitor User
Create a monitor user on the target database cluster. For example, dbuser_monitor is used by default in Pigsty.
CREATEUSERdbuser_monitor;-- create the monitor user
COMMENTONROLEdbuser_monitorIS'system monitor user';-- comment the monitor user
GRANTpg_monitorTOdbuser_monitor;-- grant system role pg_monitor to monitor user
ALTERUSERdbuser_monitorPASSWORD'DBUser.Monitor';-- set password for monitor user
ALTERUSERdbuser_monitorSETlog_min_duration_statement=1000;-- set this to avoid log flooding
ALTERUSERdbuser_monitorSETsearch_path=monitor,public;-- set this to avoid pg_stat_statements extension not working
You also need to configure pg_hba.conf to allow monitoring user access from infra/admin nodes.
# allow local role monitor with passwordlocal all dbuser_monitor md5host all dbuser_monitor 127.0.0.1/32 md5host all dbuser_monitor <admin_ip>/32 md5host all dbuser_monitor <infra_ip>/32 md5
If your RDS does not support the RAW HBA format, add admin/infra node IP to the whitelist.
Monitor Schema
Monitor schema is optional, but we strongly recommend creating one.
CREATESCHEMAIFNOTEXISTSmonitor;-- create dedicate monitor schema
GRANTUSAGEONSCHEMAmonitorTOdbuser_monitor;-- allow monitor user to use this schema
Monitor Extension
Monitor extension is optional, but we strongly recommend enabling pg_stat_statements extension.
Note that this extension must be listed in shared_preload_libraries to take effect, and changing this parameter requires a database restart.
You should create this extension inside the admin database: postgres. If your RDS does not grant CREATE on the database postgres. You can create that extension in the default public schema:
As long as your monitor user can access pg_stat_statements view without schema qualification, it should be fine.
Monitor View
It’s recommended to create the monitor views in all databases that need to be monitored.
Monitor Schema & View Definition
----------------------------------------------------------------------
-- Table bloat estimate : monitor.pg_table_bloat
----------------------------------------------------------------------
DROPVIEWIFEXISTSmonitor.pg_table_bloatCASCADE;CREATEORREPLACEVIEWmonitor.pg_table_bloatASSELECTCURRENT_CATALOGASdatname,nspname,relname,tblid,bs*tblpagesASsize,CASEWHENtblpages-est_tblpages_ff>0THEN(tblpages-est_tblpages_ff)/tblpages::FLOATELSE0ENDASratioFROM(SELECTceil(reltuples/((bs-page_hdr)*fillfactor/(tpl_size*100)))+ceil(toasttuples/4)ASest_tblpages_ff,tblpages,fillfactor,bs,tblid,nspname,relname,is_naFROM(SELECT(4+tpl_hdr_size+tpl_data_size+(2*ma)-CASEWHENtpl_hdr_size%ma=0THENmaELSEtpl_hdr_size%maEND-CASEWHENceil(tpl_data_size)::INT%ma=0THENmaELSEceil(tpl_data_size)::INT%maEND)AStpl_size,(heappages+toastpages)AStblpages,heappages,toastpages,reltuples,toasttuples,bs,page_hdr,tblid,nspname,relname,fillfactor,is_naFROM(SELECTtbl.oidAStblid,ns.nspname,tbl.relname,tbl.reltuples,tbl.relpagesASheappages,coalesce(toast.relpages,0)AStoastpages,coalesce(toast.reltuples,0)AStoasttuples,coalesce(substring(array_to_string(tbl.reloptions,' ')FROM'fillfactor=([0-9]+)')::smallint,100)ASfillfactor,current_setting('block_size')::numericASbs,CASEWHENversion()~'mingw32'ORversion()~'64-bit|x86_64|ppc64|ia64|amd64'THEN8ELSE4ENDASma,24ASpage_hdr,23+CASEWHENMAX(coalesce(s.null_frac,0))>0THEN(7+count(s.attname))/8ELSE0::intEND+CASEWHENbool_or(att.attname='oid'andatt.attnum<0)THEN4ELSE0ENDAStpl_hdr_size,sum((1-coalesce(s.null_frac,0))*coalesce(s.avg_width,0))AStpl_data_size,bool_or(att.atttypid='pg_catalog.name'::regtype)ORsum(CASEWHENatt.attnum>0THEN1ELSE0END)<>count(s.attname)ASis_naFROMpg_attributeASattJOINpg_classAStblONatt.attrelid=tbl.oidJOINpg_namespaceASnsONns.oid=tbl.relnamespaceLEFTJOINpg_statsASsONs.schemaname=ns.nspnameANDs.tablename=tbl.relnameANDs.inherited=falseANDs.attname=att.attnameLEFTJOINpg_classAStoastONtbl.reltoastrelid=toast.oidWHERENOTatt.attisdroppedANDtbl.relkind='r'ANDnspnameNOTIN('pg_catalog','information_schema')GROUPBY1,2,3,4,5,6,7,8,9,10)ASs)ASs2)ASs3WHERENOTis_na;COMMENTONVIEWmonitor.pg_table_bloatIS'postgres table bloat estimate';GRANTSELECTONmonitor.pg_table_bloatTOpg_monitor;----------------------------------------------------------------------
-- Index bloat estimate : monitor.pg_index_bloat
----------------------------------------------------------------------
DROPVIEWIFEXISTSmonitor.pg_index_bloatCASCADE;CREATEORREPLACEVIEWmonitor.pg_index_bloatASSELECTCURRENT_CATALOGASdatname,nspname,idxnameASrelname,tblid,idxid,relpages::BIGINT*bsASsize,COALESCE((relpages-(reltuples*(6+ma-(CASEWHENindex_tuple_hdr%ma=0THENmaELSEindex_tuple_hdr%maEND)+nulldatawidth+ma-(CASEWHENnulldatawidth%ma=0THENmaELSEnulldatawidth%maEND))/(bs-pagehdr)::FLOAT+1)),0)/relpages::FLOATASratioFROM(SELECTnspname,idxname,indrelidAStblid,indexrelidASidxid,reltuples,relpages,current_setting('block_size')::INTEGERASbs,(CASEWHENversion()~'mingw32'ORversion()~'64-bit|x86_64|ppc64|ia64|amd64'THEN8ELSE4END)ASma,24ASpagehdr,(CASEWHENmax(COALESCE(pg_stats.null_frac,0))=0THEN2ELSE6END)ASindex_tuple_hdr,sum((1.0-COALESCE(pg_stats.null_frac,0.0))*COALESCE(pg_stats.avg_width,1024))::INTEGERASnulldatawidthFROMpg_attributeJOIN(SELECTpg_namespace.nspname,ic.relnameASidxname,ic.reltuples,ic.relpages,pg_index.indrelid,pg_index.indexrelid,tc.relnameAStablename,regexp_split_to_table(pg_index.indkey::TEXT,' ')::INTEGERASattnum,pg_index.indexrelidASindex_oidFROMpg_indexJOINpg_classicONpg_index.indexrelid=ic.oidJOINpg_classtcONpg_index.indrelid=tc.oidJOINpg_namespaceONpg_namespace.oid=ic.relnamespaceJOINpg_amONic.relam=pg_am.oidWHEREpg_am.amname='btree'ANDic.relpages>0ANDnspnameNOTIN('pg_catalog','information_schema'))ind_attsONpg_attribute.attrelid=ind_atts.indexrelidANDpg_attribute.attnum=ind_atts.attnumJOINpg_statsONpg_stats.schemaname=ind_atts.nspnameAND((pg_stats.tablename=ind_atts.tablenameANDpg_stats.attname=pg_get_indexdef(pg_attribute.attrelid,pg_attribute.attnum,TRUE))OR(pg_stats.tablename=ind_atts.idxnameANDpg_stats.attname=pg_attribute.attname))WHEREpg_attribute.attnum>0GROUPBY1,2,3,4,5,6)est;COMMENTONVIEWmonitor.pg_index_bloatIS'postgres index bloat estimate (btree-only)';GRANTSELECTONmonitor.pg_index_bloatTOpg_monitor;----------------------------------------------------------------------
-- Relation Bloat : monitor.pg_bloat
----------------------------------------------------------------------
DROPVIEWIFEXISTSmonitor.pg_bloatCASCADE;CREATEORREPLACEVIEWmonitor.pg_bloatASSELECTcoalesce(ib.datname,tb.datname)ASdatname,coalesce(ib.nspname,tb.nspname)ASnspname,coalesce(ib.tblid,tb.tblid)AStblid,coalesce(tb.nspname||'.'||tb.relname,ib.nspname||'.'||ib.tblid::RegClass)AStblname,tb.sizeAStbl_size,CASEWHENtb.ratio<0THEN0ELSEround(tb.ratio::NUMERIC,6)ENDAStbl_ratio,(tb.size*(CASEWHENtb.ratio<0THEN0ELSEtb.ratio::NUMERICEND))::BIGINTAStbl_wasted,ib.idxid,ib.nspname||'.'||ib.relnameASidxname,ib.sizeASidx_size,CASEWHENib.ratio<0THEN0ELSEround(ib.ratio::NUMERIC,5)ENDASidx_ratio,(ib.size*(CASEWHENib.ratio<0THEN0ELSEib.ratio::NUMERICEND))::BIGINTASidx_wastedFROMmonitor.pg_index_bloatibFULLOUTERJOINmonitor.pg_table_bloattbONib.tblid=tb.tblid;COMMENTONVIEWmonitor.pg_bloatIS'postgres relation bloat detail';GRANTSELECTONmonitor.pg_bloatTOpg_monitor;----------------------------------------------------------------------
-- monitor.pg_index_bloat_human
----------------------------------------------------------------------
DROPVIEWIFEXISTSmonitor.pg_index_bloat_humanCASCADE;CREATEORREPLACEVIEWmonitor.pg_index_bloat_humanASSELECTidxnameASname,tblname,idx_wastedASwasted,pg_size_pretty(idx_size)ASidx_size,round(100*idx_ratio::NUMERIC,2)ASidx_ratio,pg_size_pretty(idx_wasted)ASidx_wasted,pg_size_pretty(tbl_size)AStbl_size,round(100*tbl_ratio::NUMERIC,2)AStbl_ratio,pg_size_pretty(tbl_wasted)AStbl_wastedFROMmonitor.pg_bloatWHEREidxnameISNOTNULL;COMMENTONVIEWmonitor.pg_index_bloat_humanIS'postgres index bloat info in human-readable format';GRANTSELECTONmonitor.pg_index_bloat_humanTOpg_monitor;----------------------------------------------------------------------
-- monitor.pg_table_bloat_human
----------------------------------------------------------------------
DROPVIEWIFEXISTSmonitor.pg_table_bloat_humanCASCADE;CREATEORREPLACEVIEWmonitor.pg_table_bloat_humanASSELECTtblnameASname,idx_wasted+tbl_wastedASwasted,pg_size_pretty(idx_wasted+tbl_wasted)ASall_wasted,pg_size_pretty(tbl_wasted)AStbl_wasted,pg_size_pretty(tbl_size)AStbl_size,tbl_ratio,pg_size_pretty(idx_wasted)ASidx_wasted,pg_size_pretty(idx_size)ASidx_size,round(idx_wasted::NUMERIC*100.0/idx_size,2)ASidx_ratioFROM(SELECTdatname,nspname,tblname,coalesce(max(tbl_wasted),0)AStbl_wasted,coalesce(max(tbl_size),1)AStbl_size,round(100*coalesce(max(tbl_ratio),0)::NUMERIC,2)AStbl_ratio,coalesce(sum(idx_wasted),0)ASidx_wasted,coalesce(sum(idx_size),1)ASidx_sizeFROMmonitor.pg_bloatWHEREtblnameISNOTNULLGROUPBY1,2,3)d;COMMENTONVIEWmonitor.pg_table_bloat_humanIS'postgres table bloat info in human-readable format';GRANTSELECTONmonitor.pg_table_bloat_humanTOpg_monitor;----------------------------------------------------------------------
-- Activity Overview: monitor.pg_session
----------------------------------------------------------------------
DROPVIEWIFEXISTSmonitor.pg_sessionCASCADE;CREATEORREPLACEVIEWmonitor.pg_sessionASSELECTcoalesce(datname,'all')ASdatname,numbackends,active,idle,ixact,max_duration,max_tx_duration,max_conn_durationFROM(SELECTdatname,count(*)ASnumbackends,count(*)FILTER(WHEREstate='active')ASactive,count(*)FILTER(WHEREstate='idle')ASidle,count(*)FILTER(WHEREstate='idle in transaction'ORstate='idle in transaction (aborted)')ASixact,max(extract(epochfromnow()-state_change))FILTER(WHEREstate='active')ASmax_duration,max(extract(epochfromnow()-xact_start))ASmax_tx_duration,max(extract(epochfromnow()-backend_start))ASmax_conn_durationFROMpg_stat_activityWHEREbackend_type='client backend'ANDpid<>pg_backend_pid()GROUPBYROLLUP(1)ORDERBY1NULLSFIRST)t;COMMENTONVIEWmonitor.pg_sessionIS'postgres activity group by session';GRANTSELECTONmonitor.pg_sessionTOpg_monitor;----------------------------------------------------------------------
-- Sequential Scan: monitor.pg_seq_scan
----------------------------------------------------------------------
DROPVIEWIFEXISTSmonitor.pg_seq_scanCASCADE;CREATEORREPLACEVIEWmonitor.pg_seq_scanASSELECTschemanameASnspname,relname,seq_scan,seq_tup_read,seq_tup_read/seq_scanASseq_tup_avg,idx_scan,n_live_tup+n_dead_tupAStuples,round(n_live_tup*100.0::NUMERIC/(n_live_tup+n_dead_tup),2)ASlive_ratioFROMpg_stat_user_tablesWHEREseq_scan>0and(n_live_tup+n_dead_tup)>0ORDERBYseq_scanDESC;COMMENTONVIEWmonitor.pg_seq_scanIS'table that have seq scan';GRANTSELECTONmonitor.pg_seq_scanTOpg_monitor;
Shmem allocation for PostgreSQL 13+
DROPFUNCTIONIFEXISTSmonitor.pg_shmem()CASCADE;CREATEORREPLACEFUNCTIONmonitor.pg_shmem()RETURNSSETOFpg_shmem_allocationsAS$$SELECT*FROMpg_shmem_allocations;$$LANGUAGESQLSECURITYDEFINER;COMMENTONFUNCTIONmonitor.pg_shmem()IS'security wrapper for system view pg_shmem';REVOKEALLONFUNCTIONmonitor.pg_shmem()FROMPUBLIC;GRANTEXECUTEONFUNCTIONmonitor.pg_shmem()TOpg_monitor;
6.14 - Dashboards
Grafana dashboards provided by Pigsty
Grafana Dashboards for PostgreSQL clusters: Demo & Gallery.
There are 26 default grafana dashboards about PostgreSQL and categorized into 4 levels. and categorized into PGSQL, PGCAT & PGLOG by datasource.
Client connections that have sent queries but have not yet got a server connection
pgbouncer_stat_avg_query_count
gauge
datname, job, ins, ip, instance, cls
Average queries per second in last stat period
pgbouncer_stat_avg_query_time
gauge
datname, job, ins, ip, instance, cls
Average query duration, in seconds
pgbouncer_stat_avg_recv
gauge
datname, job, ins, ip, instance, cls
Average received (from clients) bytes per second
pgbouncer_stat_avg_sent
gauge
datname, job, ins, ip, instance, cls
Average sent (to clients) bytes per second
pgbouncer_stat_avg_wait_time
gauge
datname, job, ins, ip, instance, cls
Time spent by clients waiting for a server, in seconds (average per second).
pgbouncer_stat_avg_xact_count
gauge
datname, job, ins, ip, instance, cls
Average transactions per second in last stat period
pgbouncer_stat_avg_xact_time
gauge
datname, job, ins, ip, instance, cls
Average transaction duration, in seconds
pgbouncer_stat_total_query_count
gauge
datname, job, ins, ip, instance, cls
Total number of SQL queries pooled by pgbouncer
pgbouncer_stat_total_query_time
counter
datname, job, ins, ip, instance, cls
Total number of seconds spent when executing queries
pgbouncer_stat_total_received
counter
datname, job, ins, ip, instance, cls
Total volume in bytes of network traffic received by pgbouncer
pgbouncer_stat_total_sent
counter
datname, job, ins, ip, instance, cls
Total volume in bytes of network traffic sent by pgbouncer
pgbouncer_stat_total_wait_time
counter
datname, job, ins, ip, instance, cls
Time spent by clients waiting for a server, in seconds
pgbouncer_stat_total_xact_count
gauge
datname, job, ins, ip, instance, cls
Total number of SQL transactions pooled by pgbouncer
pgbouncer_stat_total_xact_time
counter
datname, job, ins, ip, instance, cls
Total number of seconds spent when in a transaction
pgbouncer_up
gauge
job, ins, ip, instance, cls
last scrape was able to connect to the server: 1 for yes, 0 for no
pgbouncer_version
gauge
job, ins, ip, instance, cls
server version number
process_cpu_seconds_total
counter
job, ins, ip, instance, cls
Total user and system CPU time spent in seconds.
process_max_fds
gauge
job, ins, ip, instance, cls
Maximum number of open file descriptors.
process_open_fds
gauge
job, ins, ip, instance, cls
Number of open file descriptors.
process_resident_memory_bytes
gauge
job, ins, ip, instance, cls
Resident memory size in bytes.
process_start_time_seconds
gauge
job, ins, ip, instance, cls
Start time of the process since unix epoch in seconds.
process_virtual_memory_bytes
gauge
job, ins, ip, instance, cls
Virtual memory size in bytes.
process_virtual_memory_max_bytes
gauge
job, ins, ip, instance, cls
Maximum amount of virtual memory available in bytes.
promhttp_metric_handler_requests_in_flight
gauge
job, ins, ip, instance, cls
Current number of scrapes being served.
promhttp_metric_handler_requests_total
counter
code, job, ins, ip, instance, cls
Total number of scrapes by HTTP status code.
scrape_duration_seconds
Unknown
job, ins, ip, instance, cls
N/A
scrape_samples_post_metric_relabeling
Unknown
job, ins, ip, instance, cls
N/A
scrape_samples_scraped
Unknown
job, ins, ip, instance, cls
N/A
scrape_series_added
Unknown
job, ins, ip, instance, cls
N/A
up
Unknown
job, ins, ip, instance, cls
N/A
6.16 - FAQ
Pigsty PGSQL module frequently asked questions
ABORT due to postgres exists
Set pg_clean = true and pg_safeguard = false to force clean postgres data during pgsql.yml
This happens when you run pgsql.yml on a node with postgres running, and pg_clean is set to false.
If pg_clean is true (and the pg_safeguard is false, too), the pgsql.yml playbook will remove the existing pgsql data and re-init it as a new one, which makes this playbook fully idempotent.
You can still purge the existing PostgreSQL data by using a special task tag pg_purge
./pgsql.yml -t pg_clean # honor pg_clean and pg_safeguard./pgsql.yml -t pg_purge # ignore pg_clean and pg_safeguard
ABORT due to pg_safeguard enabled
Disable pg_safeguard to remove the Postgres instance.
If pg_safeguard is enabled, you can not remove the running pgsql instance with bin/pgsql-rm and pgsql-rm.yml playbook.
To disable pg_safeguard, you can set pg_safeguard to false in the inventory or pass -e pg_safeguard=false as cli arg to the playbook:
./pgsql-rm.yml -e pg_safeguard=false -l <cls_to_remove> # force override pg_safeguard
Fail to wait for postgres/patroni primary
There are several possible reasons for this error, and you need to check the system logs to determine the actual cause.
This usually happens when the cluster is misconfigured, or the previous primary is improperly removed. (e.g., trash metadata in DCS with the same cluster name).
You must check /pg/log/* to find the reason.
To delete trash meta from etcd, you can use etcdctl del --prefix /pg/<cls>, do with caution!
1: Misconfiguration. Identify the incorrect parameters, modify them, and apply the changes.
2: Another cluster with the same cls name already exists in the deployment
3: The previous cluster on the node, or previous cluster with same name was not correctly removed.
To remove obsolete cluster metadata, you can use etcdctl del --prefix /pg/<cls> to manually delete the residual data.
4: The RPM packages related to your PostgreSQL or node were not successfully installed.
5: Your Watchdog kernel module was not correctly enabled or loaded, but required.
6: The locale or ctype specified pg_lc_collate and pg_lc_ctype does not exist in OS
Feel free to submit an issue or seek help from the community.
Fail to wait for postgres/patroni replica
Failed Immediately: Usually, this happens because of misconfiguration, network issues, broken DCS metadata, etc…, you have to inspect /pg/log to find out the actual reason.
Failed After a While: This may be due to source instance data corruption. Check PGSQL FAQ: How to create replicas when data is corrupted?
Timeout: If the wait for postgres replica task takes 30min or more and fails due to timeout, This is common for a huge cluster (e.g., 1TB+, which may take hours to create a replica). In this case, the underlying creating replica procedure is still proceeding. You can check cluster status with pg list <cls> and wait until the replica catches up with the primary. Then continue the following tasks:
To install PostgreSQL 12 - 15, you have to set pg_version to 12, 13, 14, or 15 in the inventory. (usually at cluster level)
pg_version:16# install pg 16 in this templatepg_libs:'pg_stat_statements, auto_explain'# remove timescaledb from pg 16 betapg_extensions:[]# missing pg16 extensions for now
How enable hugepage for PostgreSQL?
use node_hugepage_count and node_hugepage_ratio or /pg/bin/pg-tune-hugepage
If you plan to enable hugepage, consider using node_hugepage_count and node_hugepage_ratio and apply with ./node.yml -t node_tune .
It’s good to allocate enough hugepage before postgres start, and use pg_tune_hugepage to shrink them later.
If your postgres is already running, you can use /pg/bin/pg-tune-hugepage to enable hugepage on the fly. Note that this only works on PostgreSQL 15+
sync;echo3 > /proc/sys/vm/drop_caches # drop system cache (ready for performance impact)sudo /pg/bin/pg-tune-hugepage # write nr_hugepages to /etc/sysctl.d/hugepage.confpg restart <cls> # restart postgres to use hugepage
How to guarantee zero data loss during failover?
Use crit.yml template, or setting pg_rpo to 0, or config cluster with synchronous mode.
The pg_dummy_filesize is set to 64MB by default. Consider increasing it to 8GB or larger in the production environment.
It will be placed on /pg/dummy same disk as the PGSQL main data disk. You can remove that file to free some emergency space. At least you can run some shell scripts on that node.
How to create replicas when data is corrupted?
Disable clonefrom on bad instances and reload patroni config.
Pigsty sets the cloneform: true tag on all instances’ patroni config, which marks the instance available for cloning replica.
If this instance has corrupt data files, you can set clonefrom: false to avoid pulling data from the evil instance. To do so:
bin/pgmon-rm <ins> # shortcut for removing prometheus targets of pgsql instance 'ins'
7 - Kernel: PGSQL
How to use another PostgreSQL “kernel” in Pigsty, such as Citus, Babelfish, IvorySQL, PolarDB, Neon, and Greenplum
You can use different “flavors” of PostgreSQL branches, forks and derivatives to replace the “native PG kernel” in Pigsty.
7.1 - Citus (Distributive)
Deploy native HA citus cluster with Pigsty, horizontal scaling PostgreSQL with better throughput and performance.
Beware that citus for the latest major version PostgreSQL 17 support is still WIP
Pigsty has native citus support:
Install
Citus is a standard PostgreSQL extension, which can be installed and enabled on a native PostgreSQL cluster by following the standard plugin installation process.
To install it manually, you can run the following command:
pg_primary_db has to be defined to specify the database to be managed
pg_dbsu_password has to be set to a non-empty string plain password if you want to use the pg_dbsupostgres rather than default pg_admin_username to perform admin commands
Besides, extra hba rules that allow ssl access from local & other data nodes are required. Which may looks like this
You can define each citus cluster separately within a group, like conf/dbms/citus.yml :
all:children:pg-citus0:# citus data node 0hosts:{10.10.10.10:{pg_seq: 1, pg_role:primary } }vars:{pg_cluster: pg-citus0 , pg_group:0}pg-citus1:# citus data node 1hosts:{10.10.10.11:{pg_seq: 1, pg_role:primary } }vars:{pg_cluster: pg-citus1 , pg_group:1}pg-citus2:# citus data node 2hosts:{10.10.10.12:{pg_seq: 1, pg_role:primary } }vars:{pg_cluster: pg-citus2 , pg_group:2}pg-citus3:# citus data node 3, with an extra replicahosts:10.10.10.13:{pg_seq: 1, pg_role:primary }10.10.10.14:{pg_seq: 2, pg_role:replica }vars:{pg_cluster: pg-citus3 , pg_group:3}vars:# global parameters for all citus clusterspg_mode: citus # pgsql cluster mode:cituspg_shard: pg-citus # citus shard name:pg-cituspatroni_citus_db:meta # citus distributed database namepg_dbsu_password:DBUser.Postgres# all dbsu password access for citus clusterpg_users:[{name: dbuser_meta ,password: DBUser.Meta ,pgbouncer: true ,roles:[dbrole_admin ] } ]pg_databases:[{name: meta ,extensions:[{name:citus }, { name: postgis }, { name: timescaledb } ] } ]pg_hba_rules:- {user: 'all' ,db: all ,addr: 127.0.0.1/32 ,auth: ssl ,title:'all user ssl access from localhost'}- {user: 'all' ,db: all ,addr: intra ,auth: ssl ,title:'all user ssl access from intranet'}
You can also specify all citus cluster members within a group, take prod.yml for example.
#==========================================================## pg-citus: 10 node citus cluster (5 x primary-replica pair)#==========================================================#pg-citus:# citus grouphosts:10.10.10.50:{pg_group: 0, pg_cluster: pg-citus0 ,pg_vip_address: 10.10.10.60/24 ,pg_seq: 0, pg_role:primary }10.10.10.51:{pg_group: 0, pg_cluster: pg-citus0 ,pg_vip_address: 10.10.10.60/24 ,pg_seq: 1, pg_role:replica }10.10.10.52:{pg_group: 1, pg_cluster: pg-citus1 ,pg_vip_address: 10.10.10.61/24 ,pg_seq: 0, pg_role:primary }10.10.10.53:{pg_group: 1, pg_cluster: pg-citus1 ,pg_vip_address: 10.10.10.61/24 ,pg_seq: 1, pg_role:replica }10.10.10.54:{pg_group: 2, pg_cluster: pg-citus2 ,pg_vip_address: 10.10.10.62/24 ,pg_seq: 0, pg_role:primary }10.10.10.55:{pg_group: 2, pg_cluster: pg-citus2 ,pg_vip_address: 10.10.10.62/24 ,pg_seq: 1, pg_role:replica }10.10.10.56:{pg_group: 3, pg_cluster: pg-citus3 ,pg_vip_address: 10.10.10.63/24 ,pg_seq: 0, pg_role:primary }10.10.10.57:{pg_group: 3, pg_cluster: pg-citus3 ,pg_vip_address: 10.10.10.63/24 ,pg_seq: 1, pg_role:replica }10.10.10.58:{pg_group: 4, pg_cluster: pg-citus4 ,pg_vip_address: 10.10.10.64/24 ,pg_seq: 0, pg_role:primary }10.10.10.59:{pg_group: 4, pg_cluster: pg-citus4 ,pg_vip_address: 10.10.10.64/24 ,pg_seq: 1, pg_role:replica }vars:pg_mode: citus # pgsql cluster mode:cituspg_shard: pg-citus # citus shard name:pg-cituspg_primary_db:test # primary database used by cituspg_dbsu_password:DBUser.Postgres# all dbsu password access for citus clusterpg_vip_enabled:truepg_vip_interface:eth1pg_extensions:['citus postgis timescaledb pgvector']pg_libs:'citus, timescaledb, pg_stat_statements, auto_explain'# citus will be added by patroni automaticallypg_users:[{name: test ,password: test ,pgbouncer: true ,roles:[dbrole_admin ] } ]pg_databases:[{name: test ,owner: test ,extensions:[{name:citus }, { name: postgis } ] } ]pg_hba_rules:- {user: 'all' ,db: all ,addr: 10.10.10.0/24 ,auth: trust ,title:'trust citus cluster members'}- {user: 'all' ,db: all ,addr: 127.0.0.1/32 ,auth: ssl ,title:'all user ssl access from localhost'}- {user: 'all' ,db: all ,addr: intra ,auth: ssl ,title:'all user ssl access from intranet'}
And you can create distributed table & reference table on the coordinator node. Any data node can be used as the coordinator node since citus 11.2.
Usage
You can access any (primary) node in the cluster as you would with a regular cluster:
And in case of primary node failure, the replica will take over with native patroni support:
test=# select * from pg_dist_node; nodeid | groupid | nodename | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards
--------+---------+-------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------
1|0| 10.10.10.51 |5432| default | t | t | primary | default | t | f
2|2| 10.10.10.54 |5432| default | t | t | primary | default | t | t
5|1| 10.10.10.52 |5432| default | t | t | primary | default | t | t
3|4| 10.10.10.58 |5432| default | t | t | primary | default | t | t
4|3| 10.10.10.56 |5432| default | t | t | primary | default | t | t
7.2 - WiltonDB (MSSQL)
Create SQL Server Compatible PostgreSQL cluster with WiltonDB and Babelfish (Wire Protocol Level)
Pigsty allows users to create a Microsoft SQL Server compatible PostgreSQL cluster using Babelfish and WiltonDB!
Babelfish: An open-source MSSQL (Microsoft SQL Server) compatibility extension Open Sourced by AWS
WiltonDB: A PostgreSQL kernel distribution focusing on integrating Babelfish
Babelfish is a PostgreSQL extension, but it works on a slightly modified PostgreSQL kernel Fork, WiltonDB provides compiled kernel binaries and extension binary packages on EL/Ubuntu systems.
Pigsty can replace the native PostgreSQL kernel with WiltonDB, providing an out-of-the-box MSSQL compatible cluster along with all the supported by common PostgreSQL clusters, such as HA, PITR, IaC, monitoring, etc.
WiltonDB is very similar to PostgreSQL 15, but it can not use vanilla PostgreSQL extensions directly. WiltonDB has several re-compiled extensions such as system_stats, pg_hint_plan and tds_fdw.
The cluster will listen on the default PostgreSQL port and the default MSSQL 1433 port, providing MSSQL services via the TDS WireProtocol on this port.
You can connect to the MSSQL service provided by Pigsty using any MSSQL client, such as SQL Server Management Studio, or using the sqlcmd command-line tool.
Notes
When installing and deploying the MSSQL module, please pay special attention to the following points:
WiltonDB is available on EL (7/8/9) and Ubuntu (20.04/22.04) but not available on Debian systems.
WiltonDB is currently compiled based on PostgreSQL 15, so you need to specify pg_version: 15.
On EL systems, the wiltondb binary is installed by default in the /usr/bin/ directory, while on Ubuntu systems, it is installed in the /usr/lib/postgresql/15/bin/ directory, which is different from the official PostgreSQL binary location.
In WiltonDB compatibility mode, the HBA password authentication rule needs to use md5 instead of scram-sha-256. Therefore, you need to override Pigsty’s default HBA rule set and insert the md5 authentication rule required by SQL Server before the dbrole_readonly wildcard authentication rule.
WiltonDB can only be enabled for a primary database, and you should designate a user as the Babelfish superuser, allowing Babelfish to create databases and users. The default is mssql and dbuser_myssql. If you change this, you should also modify the user in files/mssql.sql.
The WiltonDB TDS cable protocol compatibility plugin babelfishpg_tds needs to be enabled in shared_preload_libraries.
After enabling the WiltonDB extension, it listens on the default MSSQL port 1433. You can override Pigsty’s default service definitions to redirect the primary and replica services to port 1433 instead of the 5432 / 6432ports.
The following parameters need to be configured for the MSSQL database cluster:
#----------------------------------## PGSQL & MSSQL (Babelfish & Wilton)#----------------------------------## PG Installationnode_repo_modules:local,node,mssql# add mssql and os upstream repospg_mode:mssql # Microsoft SQL Server Compatible Modepg_libs:'babelfishpg_tds, pg_stat_statements, auto_explain'# add timescaledb to shared_preload_librariespg_version:15# The current WiltonDB major version is 15pg_packages:- wiltondb # install forked version of postgresql with babelfishpg support- patroni pgbouncer pgbackrest pg_exporter pgbadger vip-managerpg_extensions:[]# do not install any vanilla postgresql extensions# PG Provisionpg_default_hba_rules:# overwrite default HBA rules for babelfish cluster- {user:'${dbsu}',db: all ,addr: local ,auth: ident ,title:'dbsu access via local os user ident'}- {user:'${dbsu}',db: replication ,addr: local ,auth: ident ,title:'dbsu replication from local os ident'}- {user:'${repl}',db: replication ,addr: localhost ,auth: pwd ,title:'replicator replication from localhost'}- {user:'${repl}',db: replication ,addr: intra ,auth: pwd ,title:'replicator replication from intranet'}- {user:'${repl}',db: postgres ,addr: intra ,auth: pwd ,title:'replicator postgres db from intranet'}- {user:'${monitor}',db: all ,addr: localhost ,auth: pwd ,title:'monitor from localhost with password'}- {user:'${monitor}',db: all ,addr: infra ,auth: pwd ,title:'monitor from infra host with password'}- {user:'${admin}',db: all ,addr: infra ,auth: ssl ,title:'admin @ infra nodes with pwd & ssl'}- {user:'${admin}',db: all ,addr: world ,auth: ssl ,title:'admin @ everywhere with ssl & pwd'}- {user: dbuser_mssql ,db: mssql ,addr: intra ,auth: md5 ,title:'allow mssql dbsu intranet access'}# <--- use md5 auth method for mssql user- {user: '+dbrole_readonly',db: all ,addr: localhost ,auth: pwd ,title:'pgbouncer read/write via local socket'}- {user: '+dbrole_readonly',db: all ,addr: intra ,auth: pwd ,title:'read/write biz user via password'}- {user: '+dbrole_offline' ,db: all ,addr: intra ,auth: pwd ,title:'allow etl offline tasks from intranet'}pg_default_services:# route primary & replica service to mssql port 1433- {name: primary ,port: 5433 ,dest: 1433 ,check: /primary ,selector:"[]"}- {name: replica ,port: 5434 ,dest: 1433 ,check: /read-only ,selector:"[]", backup:"[? pg_role == `primary` || pg_role == `offline` ]"}- {name: default ,port: 5436 ,dest: postgres ,check: /primary ,selector:"[]"}- {name: offline ,port: 5438 ,dest: postgres ,check: /replica ,selector:"[? pg_role == `offline` || pg_offline_query ]", backup:"[? pg_role == `replica` && !pg_offline_query]"}
You can define business database & users in the pg_databases and pg_users section:
#----------------------------------## pgsql (singleton on current node)#----------------------------------## this is an example single-node postgres cluster with postgis & timescaledb installed, with one biz database & two biz userspg-meta:hosts:10.10.10.10:{pg_seq: 1, pg_role:primary }# <---- primary instance with read-write capabilityvars:pg_cluster:pg-testpg_users:# create MSSQL superuser- {name: dbuser_mssql ,password: DBUser.MSSQL ,superuser: true, pgbouncer: true ,roles: [dbrole_admin], comment:superuser & owner for babelfish }pg_primary_db:mssql # use `mssql` as the primary sql server databasepg_databases:- name:mssqlbaseline:mssql.sql # init babelfish database & userextensions:- {name:uuid-ossp }- {name:babelfishpg_common }- {name:babelfishpg_tsql }- {name:babelfishpg_tds }- {name:babelfishpg_money }- {name:pg_hint_plan }- {name:system_stats }- {name:tds_fdw }owner:dbuser_mssqlparameters:{'babelfishpg_tsql.migration_mode' :'multi-db'}comment:babelfish cluster, a MSSQL compatible pg cluster
Client Access
You can use any SQL Server compatible client tool to access this database cluster.
Microsoft provides sqlcmd as the official command-line tool.
Besides, they have a go version cli tool: go-sqlcmd。
Install go-sqlcmd:
curl -LO https://github.com/microsoft/go-sqlcmd/releases/download/v1.4.0/sqlcmd-v1.4.0-linux-amd64.tar.bz2
tar xjvf sqlcmd-v1.4.0-linux-amd64.tar.bz2
sudo mv sqlcmd* /usr/bin/
Get started with go-sqlcmd
$ sqlcmd -S 10.10.10.10,1433 -U dbuser_mssql -P DBUser.MSSQL
1> select @@version
2> go
version
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Babelfish for PostgreSQL with SQL Server Compatibility - 12.0.2000.8
Oct 222023 17:48:32
Copyright (c) Amazon Web Services
PostgreSQL 15.4 (EL 1:15.4.wiltondb3.3_2-2.el8) on x86_64-redhat-linux-gnu (Babelfish 3.3.0)(1 row affected)
You can route service traffic to MSSQL 1433 port instead of 5433/5434:
# route 5433 on all members to 1433 on primary sqlcmd -S 10.10.10.11,5433 -U dbuser_mssql -P DBUser.MSSQL
# route 5434 on all members to 1433 on replicassqlcmd -S 10.10.10.11,5434 -U dbuser_mssql -P DBUser.MSSQL
Install
If you have the Internet access, you can add the WiltonDB repository to the node and install it as a node package directly:
It’s OK to install vanilla PostgreSQL and WiltonDB on the same node, but you can only run one of them at a time, and this is not recommended for production environments.
Extensions
Most of the PGSQL module’s extensions (non-SQL class) cannot be used directly on the WiltonDB core of the MSSQL module and need to be recompiled.
WiltonDB currently comes with the following extension plugins:
Name
Version
Comment
dblink
1.2
connect to other PostgreSQL databases from within a database
adminpack
2.1
administrative functions for PostgreSQL
dict_int
1.0
text search dictionary template for integers
intagg
1.1
integer aggregator and enumerator (obsolete)
dict_xsyn
1.0
text search dictionary template for extended synonym processing
amcheck
1.3
functions for verifying relation integrity
autoinc
1.0
functions for autoincrementing fields
bloom
1.0
bloom access method - signature file based index
fuzzystrmatch
1.1
determine similarities and distance between strings
intarray
1.5
functions, operators, and index support for 1-D arrays of integers
btree_gin
1.3
support for indexing common datatypes in GIN
btree_gist
1.7
support for indexing common datatypes in GiST
hstore
1.8
data type for storing sets of (key, value) pairs
hstore_plperl
1.0
transform between hstore and plperl
isn
1.2
data types for international product numbering standards
hstore_plperlu
1.0
transform between hstore and plperlu
jsonb_plperl
1.0
transform between jsonb and plperl
citext
1.6
data type for case-insensitive character strings
jsonb_plperlu
1.0
transform between jsonb and plperlu
jsonb_plpython3u
1.0
transform between jsonb and plpython3u
cube
1.5
data type for multidimensional cubes
hstore_plpython3u
1.0
transform between hstore and plpython3u
earthdistance
1.1
calculate great-circle distances on the surface of the Earth
lo
1.1
Large Object maintenance
file_fdw
1.0
foreign-data wrapper for flat file access
insert_username
1.0
functions for tracking who changed a table
ltree
1.2
data type for hierarchical tree-like structures
ltree_plpython3u
1.0
transform between ltree and plpython3u
pg_walinspect
1.0
functions to inspect contents of PostgreSQL Write-Ahead Log
moddatetime
1.0
functions for tracking last modification time
old_snapshot
1.0
utilities in support of old_snapshot_threshold
pgcrypto
1.3
cryptographic functions
pgrowlocks
1.2
show row-level locking information
pageinspect
1.11
inspect the contents of database pages at a low level
pg_surgery
1.0
extension to perform surgery on a damaged relation
seg
1.4
data type for representing line segments or floating-point intervals
pgstattuple
1.5
show tuple-level statistics
pg_buffercache
1.3
examine the shared buffer cache
pg_freespacemap
1.2
examine the free space map (FSM)
postgres_fdw
1.1
foreign-data wrapper for remote PostgreSQL servers
pg_prewarm
1.2
prewarm relation data
tcn
1.0
Triggered change notifications
pg_trgm
1.6
text similarity measurement and index searching based on trigrams
xml2
1.1
XPath querying and XSLT
refint
1.0
functions for implementing referential integrity (obsolete)
pg_visibility
1.2
examine the visibility map (VM) and page-level visibility info
pg_stat_statements
1.10
track planning and execution statistics of all SQL statements executed
sslinfo
1.2
information about SSL certificates
tablefunc
1.0
functions that manipulate whole tables, including crosstab
tsm_system_rows
1.0
TABLESAMPLE method which accepts number of rows as a limit
tsm_system_time
1.0
TABLESAMPLE method which accepts time in milliseconds as a limit
unaccent
1.1
text search dictionary that removes accents
uuid-ossp
1.1
generate universally unique identifiers (UUIDs)
plpgsql
1.0
PL/pgSQL procedural language
babelfishpg_money
1.1.0
babelfishpg_money
system_stats
2.0
EnterpriseDB system statistics for PostgreSQL
tds_fdw
2.0.3
Foreign data wrapper for querying a TDS database (Sybase or Microsoft SQL Server)
babelfishpg_common
3.3.3
Transact SQL Datatype Support
babelfishpg_tds
1.0.0
TDS protocol extension
pg_hint_plan
1.5.1
babelfishpg_tsql
3.3.1
Transact SQL compatibility
Pigsty Pro offers the offline installation ability for MSSQL compatible extensions
Pigsty Pro offers MSSQL compatible extension porting services, which can port available extensions in the PGSQL module to the MSSQL cluster.
7.3 - IvorySQL (Oracle)
Run “Oracle-Compatible” PostgreSQL cluster with the IvorySQL Kernel open sourced by HighGo
Pigsty allows you to create PostgreSQL clusters with the IvorySQL kernel, which is a PostgreSQL fork that is compatible with Oracle SQL dialects.
Beware that ivorySQL packages are conflicting with the vanilla PostgreSQL packages, they are mutually exclusive. Pigsty Professional Edition provides offline installation solutions that include the IvorySQL kernel in another local repo.
The latest version of IvroySQL is 3.4, which is compatible with PostgreSQL 16.4. IvorySQL now only supports EL8/EL9.
The last version of IvorySQL that supports EL7 is 3.3, which corresponds to PostgreSQL 16.3.
Get Started
The following parameters need to be configured for the IvorySQL database cluster:
#----------------------------------## Ivory SQL Configuration#----------------------------------#node_repo_modules:local,node,pgsql,ivory # add ivorysql upstream repopg_mode:ivory # IvorySQL Oracle Compatible Modepg_packages:['ivorysql patroni pgbouncer pgbackrest pg_exporter pgbadger vip-manager']pg_libs:'liboracle_parser, pg_stat_statements, auto_explain'pg_extensions:[]# do not install any vanilla postgresql extensions
You have to dynamically load the liboracle_parser library to enable Oracle SQL compatibility.
Client Access
IvorySQL 3 is equivalent to PostgreSQL 16, you can connect to the IvorySQL cluster using any PostgreSQL compatible client tools.
Installation
If you have the Internet access, you can install the IvorySQL software package online by setting the following parameters:
And install the IvorySQL kernel and related software packages by running the following command:
./node.yml -t node_repo,node_pkg
Extensions
Most of the PGSQL modules’ extension (non-SQL classes) cannot be used directly on the IvorySQL kernel.
If you need to use them, you need to recompile and install from source code for the new kernel.
Currently, the IvorySQL kernel comes with the following 101 extension plugins:
name
version
comment
hstore_plperl
1.0
transform between hstore and plperl
plisql
1.0
PL/iSQL procedural language
hstore_plperlu
1.0
transform between hstore and plperlu
adminpack
2.1
administrative functions for PostgreSQL
insert_username
1.0
functions for tracking who changed a table
dblink
1.2
connect to other PostgreSQL databases from within a database
dict_int
1.0
text search dictionary template for integers
amcheck
1.3
functions for verifying relation integrity
intagg
1.1
integer aggregator and enumerator (obsolete)
autoinc
1.0
functions for autoincrementing fields
bloom
1.0
bloom access method - signature file based index
dict_xsyn
1.0
text search dictionary template for extended synonym processing
btree_gin
1.3
support for indexing common datatypes in GIN
earthdistance
1.1
calculate great-circle distances on the surface of the Earth
file_fdw
1.0
foreign-data wrapper for flat file access
fuzzystrmatch
1.2
determine similarities and distance between strings
btree_gist
1.7
support for indexing common datatypes in GiST
intarray
1.5
functions, operators, and index support for 1-D arrays of integers
citext
1.6
data type for case-insensitive character strings
isn
1.2
data types for international product numbering standards
ivorysql_ora
1.0
Oracle Compatible extenison on Postgres Database
jsonb_plperl
1.0
transform between jsonb and plperl
cube
1.5
data type for multidimensional cubes
dummy_index_am
1.0
dummy_index_am - index access method template
dummy_seclabel
1.0
Test code for SECURITY LABEL feature
hstore
1.8
data type for storing sets of (key, value) pairs
jsonb_plperlu
1.0
transform between jsonb and plperlu
lo
1.1
Large Object maintenance
ltree
1.2
data type for hierarchical tree-like structures
moddatetime
1.0
functions for tracking last modification time
old_snapshot
1.0
utilities in support of old_snapshot_threshold
ora_btree_gin
1.0
support for indexing oracle datatypes in GIN
pg_trgm
1.6
text similarity measurement and index searching based on trigrams
ora_btree_gist
1.0
support for oracle indexing common datatypes in GiST
pg_visibility
1.2
examine the visibility map (VM) and page-level visibility info
pg_walinspect
1.1
functions to inspect contents of PostgreSQL Write-Ahead Log
pgcrypto
1.3
cryptographic functions
pgstattuple
1.5
show tuple-level statistics
pageinspect
1.12
inspect the contents of database pages at a low level
pgrowlocks
1.2
show row-level locking information
pg_buffercache
1.4
examine the shared buffer cache
pg_stat_statements
1.10
track planning and execution statistics of all SQL statements executed
pg_freespacemap
1.2
examine the free space map (FSM)
plsample
1.0
PL/Sample
pg_prewarm
1.2
prewarm relation data
pg_surgery
1.0
extension to perform surgery on a damaged relation
seg
1.4
data type for representing line segments or floating-point intervals
postgres_fdw
1.1
foreign-data wrapper for remote PostgreSQL servers
refint
1.0
functions for implementing referential integrity (obsolete)
test_ext_req_schema1
1.0
Required extension to be referenced
spgist_name_ops
1.0
Test opclass for SP-GiST
test_ext_req_schema2
1.0
Test schema referencing of required extensions
test_shm_mq
1.0
Test code for shared memory message queues
sslinfo
1.2
information about SSL certificates
test_slru
1.0
Test code for SLRU
tablefunc
1.0
functions that manipulate whole tables, including crosstab
bool_plperl
1.0
transform between bool and plperl
tcn
1.0
Triggered change notifications
test_ext_req_schema3
1.0
Test schema referencing of 2 required extensions
test_bloomfilter
1.0
Test code for Bloom filter library
test_copy_callbacks
1.0
Test code for COPY callbacks
test_ginpostinglist
1.0
Test code for ginpostinglist.c
test_custom_rmgrs
1.0
Test code for custom WAL resource managers
test_integerset
1.0
Test code for integerset
test_ddl_deparse
1.0
Test code for DDL deparse feature
tsm_system_rows
1.0
TABLESAMPLE method which accepts number of rows as a limit
test_ext1
1.0
Test extension 1
tsm_system_time
1.0
TABLESAMPLE method which accepts time in milliseconds as a limit
test_ext2
1.0
Test extension 2
unaccent
1.1
text search dictionary that removes accents
test_ext3
1.0
Test extension 3
test_ext4
1.0
Test extension 4
uuid-ossp
1.1
generate universally unique identifiers (UUIDs)
test_ext5
1.0
Test extension 5
worker_spi
1.0
Sample background worker
test_ext6
1.0
test_ext6
test_lfind
1.0
Test code for optimized linear search functions
xml2
1.1
XPath querying and XSLT
test_ext7
1.0
Test extension 7
plpgsql
1.0
PL/pgSQL procedural language
test_ext8
1.0
Test extension 8
test_parser
1.0
example of a custom parser for full-text search
test_pg_dump
1.0
Test pg_dump with an extension
test_ext_cine
1.0
Test extension using CREATE IF NOT EXISTS
test_predtest
1.0
Test code for optimizer/util/predtest.c
test_ext_cor
1.0
Test extension using CREATE OR REPLACE
test_rbtree
1.0
Test code for red-black tree library
test_ext_cyclic1
1.0
Test extension cyclic 1
test_ext_cyclic2
1.0
Test extension cyclic 2
test_ext_extschema
1.0
test @extschema@
test_regex
1.0
Test code for backend/regex/
test_ext_evttrig
1.0
Test extension - event trigger
bool_plperlu
1.0
transform between bool and plperlu
plperl
1.0
PL/Perl procedural language
plperlu
1.0
PL/PerlU untrusted procedural language
hstore_plpython3u
1.0
transform between hstore and plpython3u
jsonb_plpython3u
1.0
transform between jsonb and plpython3u
ltree_plpython3u
1.0
transform between ltree and plpython3u
plpython3u
1.0
PL/Python3U untrusted procedural language
pltcl
1.0
PL/Tcl procedural language
pltclu
1.0
PL/TclU untrusted procedural language
7.4 - PolarDB PG (RAC)
Replace vanilla PostgreSQL with PolarDB PG, which is an OSS Aurora similar to Oracle RAC
You can deploy an Aurora flavor of PostgreSQL, PolarDB, in Pigsty.
PolarDB is a distributed, shared-nothing, and high-availability database system that is compatible with PostgreSQL 15, open sourced by Aliyun.
Notes
The following parameters need to be tuned to deploy a PolarDB cluster:
#----------------------------------## PGSQL & PolarDB#----------------------------------#pg_version:15pg_packages:['polardb patroni pgbouncer pgbackrest pg_exporter pgbadger vip-manager']pg_extensions:[]# do not install any vanilla postgresql extensionspg_mode:polar # PolarDB Compatible Modepg_default_roles:# default roles and users in postgres cluster- {name: dbrole_readonly ,login: false ,comment:role for global read-only access }- {name: dbrole_offline ,login: false ,comment:role for restricted read-only access }- {name: dbrole_readwrite ,login: false ,roles: [dbrole_readonly] ,comment:role for global read-write access }- {name: dbrole_admin ,login: false ,roles: [pg_monitor, dbrole_readwrite] ,comment:role for object creation }- {name: postgres ,superuser: true ,comment:system superuser }- {name: replicator ,superuser: true ,replication: true ,roles: [pg_monitor, dbrole_readonly] ,comment:system replicator }# <- superuser is required for replication- {name: dbuser_dba ,superuser: true ,roles: [dbrole_admin] ,pgbouncer: true ,pool_mode: session, pool_connlimit: 16 ,comment:pgsql admin user }- {name: dbuser_monitor ,roles: [pg_monitor] ,pgbouncer: true ,parameters:{log_min_duration_statement: 1000 } ,pool_mode: session ,pool_connlimit: 8 ,comment:pgsql monitor user }
Client Access
PolarDB for PostgreSQL is essentially equivalent to PostgreSQL 11, and any client tools compatible with the PostgreSQL wire protocol can access the PolarDB cluster.
Installation
If your environment has internet access, you can directly add the PolarDB repository to the node and install it as a node package:
ånd then install the PolarDB kernel pacakge with the following command:
./node.yml -t node_repo,node_pkg
Extensions
Most of the PGSQL module’s extension (non pure-SQL) cannot be used directly on the PolarDB kernel.
If you need to use them, you need to recompile and install from source code for the new kernel.
Currently, the PolarDB kernel comes with the following 61 extension plugins. In addition to Contrib extensions, the additional extensions provided include:
polar_csn 1.0 : polar_csn
polar_monitor 1.2 : examine the polardb information
polar_monitor_preload 1.1 : examine the polardb information
polar_parameter_check 1.0 : kernel extension for parameter validation
polar_px 1.0 : Parallel Execution extension
polar_stat_env 1.0 : env stat functions for PolarDB
polar_stat_sql 1.3 : Kernel statistics gathering, and sql plan nodes information gathering
polar_tde_utils 1.0 : Internal extension for TDE
polar_vfs 1.0 : polar_vfs
polar_worker 1.0 : polar_worker
timetravel 1.0 : functions for implementing time travel
vector 0.5.1 : vector data type and ivfflat and hnsw access methods
smlar 1.0 : compute similary of any one-dimensional arrays
Here is the list of extensions provided by the PolarDB kernel:
name
version
comment
hstore_plpython2u
1.0
transform between hstore and plpython2u
dict_int
1.0
text search dictionary template for integers
adminpack
2.0
administrative functions for PostgreSQL
hstore_plpython3u
1.0
transform between hstore and plpython3u
amcheck
1.1
functions for verifying relation integrity
hstore_plpythonu
1.0
transform between hstore and plpythonu
autoinc
1.0
functions for autoincrementing fields
insert_username
1.0
functions for tracking who changed a table
bloom
1.0
bloom access method - signature file based index
file_fdw
1.0
foreign-data wrapper for flat file access
dblink
1.2
connect to other PostgreSQL databases from within a database
btree_gin
1.3
support for indexing common datatypes in GIN
fuzzystrmatch
1.1
determine similarities and distance between strings
lo
1.1
Large Object maintenance
intagg
1.1
integer aggregator and enumerator (obsolete)
btree_gist
1.5
support for indexing common datatypes in GiST
hstore
1.5
data type for storing sets of (key, value) pairs
intarray
1.2
functions, operators, and index support for 1-D arrays of integers
citext
1.5
data type for case-insensitive character strings
cube
1.4
data type for multidimensional cubes
hstore_plperl
1.0
transform between hstore and plperl
isn
1.2
data types for international product numbering standards
jsonb_plperl
1.0
transform between jsonb and plperl
dict_xsyn
1.0
text search dictionary template for extended synonym processing
hstore_plperlu
1.0
transform between hstore and plperlu
earthdistance
1.1
calculate great-circle distances on the surface of the Earth
pg_prewarm
1.2
prewarm relation data
jsonb_plperlu
1.0
transform between jsonb and plperlu
pg_stat_statements
1.6
track execution statistics of all SQL statements executed
jsonb_plpython2u
1.0
transform between jsonb and plpython2u
jsonb_plpython3u
1.0
transform between jsonb and plpython3u
jsonb_plpythonu
1.0
transform between jsonb and plpythonu
pg_trgm
1.4
text similarity measurement and index searching based on trigrams
pgstattuple
1.5
show tuple-level statistics
ltree
1.1
data type for hierarchical tree-like structures
ltree_plpython2u
1.0
transform between ltree and plpython2u
pg_visibility
1.2
examine the visibility map (VM) and page-level visibility info
ltree_plpython3u
1.0
transform between ltree and plpython3u
ltree_plpythonu
1.0
transform between ltree and plpythonu
seg
1.3
data type for representing line segments or floating-point intervals
moddatetime
1.0
functions for tracking last modification time
pgcrypto
1.3
cryptographic functions
pgrowlocks
1.2
show row-level locking information
pageinspect
1.7
inspect the contents of database pages at a low level
pg_buffercache
1.3
examine the shared buffer cache
pg_freespacemap
1.2
examine the free space map (FSM)
tcn
1.0
Triggered change notifications
plperl
1.0
PL/Perl procedural language
uuid-ossp
1.1
generate universally unique identifiers (UUIDs)
plperlu
1.0
PL/PerlU untrusted procedural language
refint
1.0
functions for implementing referential integrity (obsolete)
xml2
1.1
XPath querying and XSLT
plpgsql
1.0
PL/pgSQL procedural language
plpython3u
1.0
PL/Python3U untrusted procedural language
pltcl
1.0
PL/Tcl procedural language
pltclu
1.0
PL/TclU untrusted procedural language
polar_csn
1.0
polar_csn
sslinfo
1.2
information about SSL certificates
polar_monitor
1.2
examine the polardb information
polar_monitor_preload
1.1
examine the polardb information
polar_parameter_check
1.0
kernel extension for parameter validation
polar_px
1.0
Parallel Execution extension
tablefunc
1.0
functions that manipulate whole tables, including crosstab
polar_stat_env
1.0
env stat functions for PolarDB
smlar
1.0
compute similary of any one-dimensional arrays
timetravel
1.0
functions for implementing time travel
tsm_system_rows
1.0
TABLESAMPLE method which accepts number of rows as a limit
polar_stat_sql
1.3
Kernel statistics gathering, and sql plan nodes information gathering
tsm_system_time
1.0
TABLESAMPLE method which accepts time in milliseconds as a limit
polar_tde_utils
1.0
Internal extension for TDE
polar_vfs
1.0
polar_vfs
polar_worker
1.0
polar_worker
unaccent
1.1
text search dictionary that removes accents
postgres_fdw
1.0
foreign-data wrapper for remote PostgreSQL servers
Pigsty Pro has offline installation support for PolarDB and its extensions
Pigsty has partnership with Aliyun and can provide PolarDB kernel enterprise support for enterprise users
7.5 - PolarDB O(racle)
The commercial version of PolarDB for Oracle, only available in Pigsty Enterprise Edition.
Oracle Compatible version, Fork of PolarDB PG.
This is not available in OSS version.
7.6 - Supabase (Firebase)
How to self-host Supabase with existing managed HA PostgreSQL cluster, and launch the stateless part with docker-compose?
Supabase is the open-source Firebase alternative based on PostgreSQL.
Pigsty allow you to self-host supabase with your own HA PostgreSQL clusters.
cd app/supabase; make up # https://supabase.com/docs/guides/self-hosting/docker
If your IP address is not placeholder 10.10.10.10, change the .env accordingly before launching
Then you can access the supabase studio dashboard via http://<admin_ip>:8000 by default, the default dashboard username is supabase and password is pigsty.
You can also configure the infra_portal to expose the WebUI to the public through Nginx and SSL.
Postgres
Supabase require certain PostgreSQL extensions, schemas, and roles to work, which can be pre-configured by Pigsty: supa.yml.
It will create a single-node postgres cluster named pg-meta, with the default postgres database properly configured for supabase. and install some popular & necessary extensions.
Everything you need to care about is in the .env file, which contains important settings for supabase. It is already configured to use the pg-meta.supa database by default, You have to change that according to your actual deployment.
############# Secrets - YOU MUST CHANGE THESE BEFORE GOING INTO PRODUCTION############# you have to change the JWT_SECRET to a random string with at least 32 characters long# and issue new ANON_KEY/SERVICE_ROLE_KEY JWT with that new secret, check the tutorial:# https://supabase.com/docs/guides/self-hosting/docker#securing-your-servicesJWT_SECRET=your-super-secret-jwt-token-with-at-least-32-characters-long
ANON_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyAgCiAgICAicm9sZSI6ICJhbm9uIiwKICAgICJpc3MiOiAic3VwYWJhc2UtZGVtbyIsCiAgICAiaWF0IjogMTY0MTc2OTIwMCwKICAgICJleHAiOiAxNzk5NTM1NjAwCn0.dc_X5iR_VP_qT0zsiyj_I_OZ2T9FtRU2BBNWN8Bu4GE
SERVICE_ROLE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyAgCiAgICAicm9sZSI6ICJzZXJ2aWNlX3JvbGUiLAogICAgImlzcyI6ICJzdXBhYmFzZS1kZW1vIiwKICAgICJpYXQiOiAxNjQxNzY5MjAwLAogICAgImV4cCI6IDE3OTk1MzU2MDAKfQ.DaYlNEoUrrEn2Ig7tqibS-PHK5vgusbcbo7X36XVt4Q
############# Dashboard - Credentials for the Supabase Studio WebUI############DASHBOARD_USERNAME=supabase # change to your own usernameDASHBOARD_PASSWORD=pigsty # change to your own password############# Database - You can change these to any PostgreSQL database that has logical replication enabled.############POSTGRES_HOST=10.10.10.10 # change to Pigsty managed PostgreSQL cluster/instance VIP/IP/HostnamePOSTGRES_PORT=5432# you can use other service port such as 5433, 5436, 6432, etc...POSTGRES_DB=supa # change to supabase database name, `supa` by default in pigstyPOSTGRES_PASSWORD=DBUser.Supa # supabase dbsu password (shared by multiple supabase biz users)
Usually you’ll have to change these parameters accordingly. Here we’ll use fixed username, password and IP:Port database connstr for simplicity.
The postgres username is fixed as supabase_admin and the password is DBUser.Supa, change that according to your supa.yml
And the supabase studio WebUI credential is managed by DASHBOARD_USERNAME and DASHBOARD_PASSWORD, which is supabase and pigsty by default.
Once configured, you can launch the stateless part with docker-compose or make up shortcut:
cd ~/pigsty/app/supabase; make up # = docker compose up
Expose Service
The supabase studio dashboard is exposed on port 8000 by default, you can add this service to the infra_portal to expose it to the public through Nginx and SSL.
To expose the service, you can run the infra.yml playbook with the nginx tag:
./infra.yml -t nginx
Make suare supa.pigsty or your own domain is resolvable to the infra_portal server, and you can access the supabase studio dashboard via https://supa.pigsty.
7.7 - Greenplum (MPP)
Deploy and monitoring Greenplum/YMatrix MPP clusters with Pigsty
Pigsty has native support for Greenplum and its derivative distribution YMatrixDB.
It can deploy Greenplum clusters and monitor them with Pigsty.
To define a Greenplum cluster, you need to specify the following parameters:
Set pg_mode = gpsql and the extra identity parameters pg_shard and gp_role.
#================================================================## GPSQL Clusters ##================================================================##----------------------------------## cluster: mx-mdw (gp master)#----------------------------------#mx-mdw:hosts:10.10.10.10:{pg_seq: 1, pg_role: primary , nodename:mx-mdw-1 }vars:gp_role:master # this cluster is used as greenplum masterpg_shard:mx # pgsql sharding name & gpsql deployment namepg_cluster:mx-mdw # this master cluster name is mx-mdwpg_databases:- {name: matrixmgr , extensions:[{name:matrixdbts } ] }- {name:meta }pg_users:- {name: meta , password: DBUser.Meta , pgbouncer:true}- {name: dbuser_monitor , password: DBUser.Monitor , roles: [ dbrole_readonly ], superuser:true}pgbouncer_enabled:true# enable pgbouncer for greenplum masterpgbouncer_exporter_enabled:false# enable pgbouncer_exporter for greenplum masterpg_exporter_params:'host=127.0.0.1&sslmode=disable'# use 127.0.0.1 as local monitor host#----------------------------------## cluster: mx-sdw (gp master)#----------------------------------#mx-sdw:hosts:10.10.10.11:nodename:mx-sdw-1 # greenplum segment nodepg_instances:# greenplum segment instances6000:{pg_cluster: mx-seg1, pg_seq: 1, pg_role: primary , pg_exporter_port:9633}6001:{pg_cluster: mx-seg2, pg_seq: 2, pg_role: replica , pg_exporter_port:9634}10.10.10.12:nodename:mx-sdw-2pg_instances:6000:{pg_cluster: mx-seg2, pg_seq: 1, pg_role: primary , pg_exporter_port:9633}6001:{pg_cluster: mx-seg3, pg_seq: 2, pg_role: replica , pg_exporter_port:9634}10.10.10.13:nodename:mx-sdw-3pg_instances:6000:{pg_cluster: mx-seg3, pg_seq: 1, pg_role: primary , pg_exporter_port:9633}6001:{pg_cluster: mx-seg1, pg_seq: 2, pg_role: replica , pg_exporter_port:9634}vars:gp_role:segment # these are nodes for gp segmentspg_shard:mx # pgsql sharding name & gpsql deployment namepg_cluster:mx-sdw # these segment clusters name is mx-sdwpg_preflight_skip:true# skip preflight check (since pg_seq & pg_role & pg_cluster not exists)pg_exporter_config:pg_exporter_basic.yml # use basic config to avoid segment server crashpg_exporter_params:'options=-c%20gp_role%3Dutility&sslmode=disable'# use gp_role = utility to connect to segments
Besides, you’ll need extra parameters to connect to Greenplum Segment instances for monitoring.
Since Greenplum is no longer Open-Sourced, this feature is only available in the Professional/Enterprise version and is not open-sourced at this time.
7.8 - Cloudberry (MPP)
Deploy Cloudberry MPP cluster, which is forked from Greenplum.
Self-Hosting serverless version of PostgreSQL from Neon, which is a powerful, truly scalable, and elastic service.
Neon adopts a storage and compute separation architecture, offering seamless features such as auto-scaling, Scale to Zero, and unique capabilities like database version branching.
Due to the substantial size of Neon’s compiled binaries, they are not currently available to open-source users. If you need them, please contact Pigsty sales.
8 - Extension: PGSQL
There are 340 PostgreSQL extensions available in Pigsty, with out-of-box RPM/DEB packages"
There are 340 PostgreSQL extensions available in Pigsty, including 326RPM extensions available in EL and 312DEB extensions available in Debian/Ubuntu.
There are 70 PostgreSQL built-in Contrib extensions, alone with (109+133) deb and (135+121) rpm extensions provided by PGDG and Pigsty.
We have an extension catalog and usage instructions. Even if you are not using Pigsty, you can still use our extension repository.
8.1 - Extension List
The complete 346 PostgreSQL Extension List available in Pigsty
There are 340 available extensions in Pigsty, including 338RPM extensions available in EL and 326DEB extensions available in Debian/Ubuntu.
There are 70Contrib extensions provided by PostgreSQL and 275 additional third-party extensions provide by PGDG & Pigsty.
The pg_stat_monitor is a PostgreSQL Query Performance Monitoring tool, based on PostgreSQL contrib module pg_stat_statements. pg_stat_monitor provides aggregated statistics, client information, plan details including plan, and histogram information.
export XML, JSON and BYTEA documents from PostgreSQL
8.1.1 - Metadata Desc
Available Metadata for PostgreSQL Extensions, and explain each attribute.
Each extension comes with several metadata attributes. Below are the descriptions of these attributes:
id
Extension identifier, an unique integer assigned to each extension for internal sorting.
name
Extension name, the name of the extension in the PostgreSQL system catalog, used in CREATE EXTENSION.
Extensions typically come with files like <name>.control, <name>*.so, and <name>*.sql.
alias
Extension alias, a normalized name assigned by Pigsty to each extension, usually matching the extension name name.
However, there are exceptions. For example, installing an RPM package that introduces multiple extensions will share a common alias, such as postgis.
version
Default version of the extension, usually the latest version. In some special cases, the available versions in RPM and Debian may slightly differ.
category
Extension category, used to distinguish the type of functionality provided by the extension, such as:
gis, time, rag, fts, olap, feat, lang, type, func, admin, stat, sec, fdw, sim, etl
tags
Tags describing the features of the extension.
repo
The source repository of the extension, CONTRIB means it’s a PostgreSQL built-in extension, PGDG denotes a PGDG first-party extension, and PIGSTY indicates a Pigsty third-party extension.
lang
The programming language used by the extension, usually C, but there are some written in C++ or Rust. There are also extensions purely composed of SQL and data.
need_load
Marked with Load, meaning the extension uses PostgreSQL hooks, requiring dynamic loading and a PostgreSQL restart to take effect. Only a few extensions need dynamic loading, most are statically loaded.
need_ddl
Marked with DDL, meaning the extension requires executing DDL statements: CREATE EXTENSION.
Most extensions need the CREATE EXTENSION DDL statement for creation, but there are exceptions like pg_stat_statements and wal2json.
trusted
Does installing this extension require superuser privileges? Or is the extension “trusted” — only providing functions internally within the database.
A few extensions only provide functions internally within the database and thus do not require superuser privileges to install (trusted). Any user with CREATE privileges can install trusted extensions.
relocatable
Can the extension be relocated? That is, can it be installed into other schemas? Most extensions are relocatable, but there are exceptions where extensions specify their schema explicitly.
schemas
If the extension is relocatable, it can be installed into a specified schema. This attribute specifies the default schema for the extension.
PostgreSQL typically allows extensions to use only one schema, but some extensions do not follow this rule, such as citus and timescaledb.
pg_ver
The PostgreSQL versions supported by the extension, typically only considering versions within the support lifecycle, i.e., 12 - 16.
requires
Other extensions this extension depends on, if any. An extension may depend on multiple other extensions, and these dependencies are usually declared in the requires field of the extension’s control file.
When installing an extension, dependencies can be automatically installed with the CREATE EXTENSION xxx CASCADE statement.
pkg
Extension package (RPM/DEB) name, using $v to replace the specific major PostgreSQL version number.
pkg_ver
The version number of the extension package (RPM/DEB), usually consistent with the extension’s version (versionobtained from system views). However, there are rare exceptions where the package version and the extension version are inconsistent or independently managed.
pkg_deps
The dependencies of the extension package (RPM/DEB), different from the extension’s dependencies (requires), here referring to the specific dependencies of the RPM/DEB package.
url
The official website or source code repository of the extension.
license
The open-source license used by the extension, typically PostgreSQL, MIT, Apache, GPL, etc.
en_desc
The English description of the extension, describing its functions and uses.
zh_desc
The Chinese description of the extension, describing its functions and uses.
comment
Additional comments describing the features or considerations of the extension.
8.1.2 - RPM List
338 Available PostgreSQL Extension RPM in RHEL & Compatible Distributions
There are 338 extensions available on EL compatible systems, 19 of them are RPM exclusive, missing 7 DEB exclusive extensions.
There are 70 built-in contrib extensions, in addition to 134 rpm extensions provided by PGDG YUM repository, and 130 extensions provided by Pigsty.
There are 333 extensions available in the current major version PostgreSQL 16, and 297 ready for PostgreSQL 17.
The pg_stat_monitor is a PostgreSQL Query Performance Monitoring tool, based on PostgreSQL contrib module pg_stat_statements. pg_stat_monitor provides aggregated statistics, client information, plan details including plan, and histogram information.
export XML, JSON and BYTEA documents from PostgreSQL
8.1.3 - DEB List
326 Available PostgreSQL Extensions Deb in Debian / Ubuntu Distributions
There are 326 extensions available on Debian compatible systems, 7 of them are DEB only, missing 23 RPM only extensions.
There are 70 built-in contrib extensions, in addition to 109 deb extensions provided by PGDG APT repository, and 133 deb extensions provided by Pigsty.
There are 318 extensions available in the current major version PostgreSQL 16, and 286 ready for PostgreSQL 17.
The pg_stat_monitor is a PostgreSQL Query Performance Monitoring tool, based on PostgreSQL contrib module pg_stat_statements. pg_stat_monitor provides aggregated statistics, client information, plan details including plan, and histogram information.
When performing the default online installation in Pigsty, all available extensions for the current primary PostgreSQL version (16) are automatically downloaded.
If you do not need additional or niche extensions, you don’t need to worry about repo_upstream, repo_packages, or any issues related to extension downloads.
In the config template, a complete list of available extension alias is already included.
To install additional extensions, simply add/uncomment them to pg_packages and pg_extensions.
A small number of extensions that utilize PostgreSQL HOOKs need to be dynamically loaded and will only take effect after restarting the database server.
You should add these extensions to pg_libs, or manually overwrite shared_preload_libraries in pg_parameters or DCS, and ensure they are loaded upon restart.
Most extensions require the execution of the CREATE EXTENSION DDL statement after installation to actually create and enable them in a specific database.
You can manually execute this DDL, or explicitly specify the extensions in pg_databases.extensions, and the database will automatically enable these extensions during initialization.
Out-Of-The-Box
Pigsty seals the complexity of extension management for users. You don’t need to know the RPM package names of these extensions,
nor how to download, install, load, or enable them. You only need to declare the extensions you require in the configuration file.
For example, the following configuration snippet declares a PostgreSQL cluster that installs all available extension plugins,
dynamically loads three extensions, and enables these 3 extensions.
You might have noticed that the extension names here are not the RPM/DEB package names but rather normalized extension aliases that have been simplified and encapsulated by Pigsty.
Pigsty translates these standardized aliases into the corresponding RPM/DEB package names for the specific PostgreSQL major version on different operating system distributions.
This way, you don’t have to worry about the differences in extension package names across various OS distributions.
During the Pigsty configure process, the default configuration generated for your specific OS distro will already include the above list.
To install these extensions, you only need to uncomment the ones you need in the configuration file.
Please note that you can still directly use OS-specific RPM/DEB package names here if you prefer.
Predefined Stacks
If you are not sure which extensions to install, Pigsty provides you with some predefined extension collections (Stacks).
You can choose one of them and any combination according to your needs, and add them to pg_extensions.
When you specify these extension stack names in pg_extensions or pg_packages, Pigsty will automatically translate, expand, and install all the extension plugins in them.
Install, Load, and Create
Pigsty not only allows you to declare the extensions you need to install in the configuration file, but it also lets you directly specify the extensions that need to be loaded and enabled.
Here’s a concrete example: Supabase. Supabase is an “upstream abstract database” built on top of PostgreSQL, which heavily utilizes PostgreSQL’s extension mechanism.
Below is a sample configuration file for creating a PostgreSQL cluster required for Supabase using Pigsty:
# supabase example cluster: pg-meta# this cluster needs to be migrated with app/supabase/migration.sql :# psql postgres://supabase_admin:DBUser.Supa@10.10.10.10:5432/supa -v ON_ERROR_STOP=1 --no-psqlrc -f ~pigsty/app/supabase/migration.sqlpg-meta:hosts:{10.10.10.10:{pg_seq: 1, pg_role:primary } }vars:pg_cluster:pg-metapg_users:- {name: supabase_admin ,password: 'DBUser.Supa' ,pgbouncer: true ,inherit: true ,superuser: true ,replication: true ,createdb: true ,createrole: true ,bypassrls:true}pg_databases:- name:supabaseline: supa.sql # the init-scripts:https://github.com/supabase/postgres/tree/develop/migrations/db/init-scriptsowner:supabase_admincomment:supabase postgres databaseschemas:[extensions ,auth ,realtime ,storage ,graphql_public ,supabase_functions ,_analytics ,_realtime ]extensions:- {name: pgcrypto ,schema: extensions } # 1.3 :cryptographic functions- {name: pg_net ,schema: extensions } # 0.9.2 :async HTTP- {name: pgjwt ,schema: extensions } # 0.2.0 :json web token API for postgres- {name: uuid-ossp ,schema: extensions } # 1.1 :generate universally unique identifiers (UUIDs)- {name: pgsodium } # 3.1.9 :pgsodium is a modern cryptography library for Postgres.- {name: supabase_vault } # 0.2.8 :Supabase Vault Extension- {name: pg_graphql } # 1.5.7 : pg_graphql:GraphQL support- {name: pg_jsonschema } # 0.3.1 : pg_jsonschema:Validate json schema- {name: wrappers } # 0.4.1 : wrappers:FDW collections- {name: http } # 1.6 : http:allows web page retrieval inside the database.- {name:pg_cron }# supabase required extensionspg_libs:'pg_net, pg_cron, pg_stat_statements, auto_explain'# add pg_net to shared_preload_librariespg_extensions:- wal2json pg_repack- supa-stack# pgvector pg_cron pgsodium pg_graphql pg_jsonschema wrappers pgjwt pgsql_http pg_net supautilspg_parameters:cron.database_name:supapgsodium.enable_event_trigger:offpg_hba_rules:# supabase hba rules, require access from docker network- {user: all ,db: supa ,addr: intra ,auth: pwd ,title:'allow supa database access from intranet'}- {user: all ,db: supa ,addr: 172.0.0.0/8 ,auth: pwd ,title:'allow supa database access from docker network'}- {user: all ,db: supa ,addr: all ,auth: pwd ,title:'allow supa database access from entire world'}# not safe!
In this example, we declare a PostgreSQL cluster named pg-meta, which contains a database called supa along with a set of extension plugins.
The supa-stack defined in pg_extensions translates to pgvector pg_cron pgsodium pg_graphql pg_jsonschema wrappers pgjwt pgsql_http pg_net supautils, which are automatically installed. Meanwhile, pg_libs specifies two extensions that need to be dynamically loaded: pg_net and pg_cron. Additionally, the necessary configuration parameters for the pgsodium and pg_cron extensions are pre-configured via pg_parameters.
Following that, these extensions are sequentially created and enabled in the specified or default schemas within pg_databases.extensions.
Finally, this out-of-the-box, highly available PostgreSQL cluster, ready to be used by stateless Supabase containers, can be fully launched with a single ./pgsql.yml command, providing a seamless experience.
8.2.2 - Download Extension
How to download new extension packages to Pigsty’s local software repository?
In Pigsty’s default installation mode, the downloading and installation of extension plugins are handled separately. Before installing extensions, you must ensure that the appropriate software repositories are added to the target node. Otherwise, the installation may fail due to missing packages.
During the installation process, Pigsty downloads all available extensions for the current major PG version (16) to the INFRA node and sets up a local software repository. This repository is used by all nodes, including the local machine. This approach accelerates installation, avoids redundant downloads, reduces network traffic, improves delivery reliability, and mitigates the risk of inconsistent version installations.
Alternatively, you can opt to add the upstream PostgreSQL software repository and its dependencies (OS software sources) directly to the target node managed by Pigsty. This method allows you to easily update plugins to the latest versions but requires internet access or an HTTP proxy. It may also be subject to network conditions and carries the potential risk of inconsistent installation versions.
Software Repo
During the initial installation, Pigsty downloads the packages specified by repo_upstream from the upstream software repository. The package names differ between EL systems and Debian/Ubuntu systems. The complete list can be found at the following links:
A few plugins are excluded by default due to various reasons. If you need these extensions, refer to the RPM/DEB package names in the extension list and add them to repo_upstream for download.
Heavy dependencies: pljava, plr
Niche overlap: repmgr, pgexporterext, pgpool
EL9 exclusives: pljava
Download Extension
To download new extension plugins, you can add them to repo_upstream and run the following tasks to update the local software repository and refresh the package cache on all nodes:
./infra.yml -t repo # Re-download the specified packages to the local software repository./node.yml -t node_repo # Refresh the metadata cache of the local software repository on all nodes
By default, Pigsty uses the local software source located on the INFRA node. If you prefer not to download these extensions to the local repository but instead use an online software repository for installation, you can directly add the upstream software source to the nodes:
./node.yml -t node_repo -e node_repo_modules=node,pgsql # Add the Postgres plugin repository and OS software sources (dependencies)
After completing these tasks, you can install PostgreSQL extension plugins using the standard OS package manager (yum/apt).
8.2.3 - Extension Repo
How to
YUM Repo
Pigsty currently offers a supplementary PG extension repository for EL systems, providing 121 additional RPM plugins in addition to the official PGDG YUM repository (135).
The Pigsty YUM repository only includes extensions not present in the PGDG YUM repository.
Once an extension is added to the PGDG YUM repository, Pigsty YUM repository will either remove it or align with the PGDG repository.
For EL 7/8/9 and compatible systems, use the following commands to add the GPG public key and the upstream repository file of the Pigsty repository:
Pigsty currently offers a supplementary PG extension repository for Debian/Ubuntu systems, providing 133 additional DEB packages in addition to the official PGDG APT repository (109).
The Pigsty APT repository only includes extensions not present in the PGDG APT repository.
Once an extension is added to the PGDG APT repository, Pigsty APT repository will either remove it or align with the PGDG repository.
For Debian/Ubuntu and compatible systems, use the following commands to sequentially add the GPG public key and the upstream repository file of the Pigsty repository:
# add GPG key to keyringcurl -fsSL https://repo.pigsty.io/key | sudo gpg --dearmor -o /etc/apt/keyrings/pigsty.gpg
# get debian codename, distro_codename=jammy, focal, bullseye, bookwormdistro_codename=$(lsb_release -cs)sudo tee /etc/apt/sources.list.d/pigsty-io.list > /dev/null <<EOF
deb [signed-by=/etc/apt/keyrings/pigsty.gpg] https://repo.pigsty.io/apt/infra generic main
deb [signed-by=/etc/apt/keyrings/pigsty.gpg] https://repo.pigsty.io/apt/pgsql/${distro_codename} ${distro_codename} main
EOF# refresh APT repo cachesudo apt update
All DEBs are signed with the GPG key fingerprint 9592A7BC7A682E7333376E09E7935D8DB9BD8B20 (B9BD8B20).
Repo of Repo
The building recipes and specs, metadata are all open-sourced, related GitHub repos:
pkg: The repository of RPM/DEB packages for PostgreSQL extensions
infra_pkg: Building observability stack & modules from tarball
pgsql-rpm: Building PostgreSQL RPM packages from source code
pgsql-deb: Building PostgreSQL DEB packages from source code
8.2.4 - Install Extension
How to install a extension from local repo, or directly from the Internet upstream repo.
Pigsty uses the standard OS package managers (yum/apt) to install PostgreSQL extension plugins.
Parameters
You can specify which extensions to install using the following variables, both of which have similar effects:
Generally, pg_packages is used to globally specify the software packages that need to be installed across all PostgreSQL clusters in the environment. This includes essential components like the PostgreSQL core, high-availability setup with Patroni, connection pooling with pgBouncer, monitoring with pgExporter, etc. By default, Pigsty also includes two essential extensions here: pg_repack for bloat management and wal2json for CDC (Change Data Capture).
On the other hand, pg_extensions is typically used to specify extension plugins that need to be installed for a specific cluster. Pigsty defaults to installing three key PostgreSQL ecosystem extensions: postgis, timescaledb, and pgvector. You can also specify any additional extensions you need in this list.
pg_packages:# pg packages to be installed, alias can be used- postgresql- patroni pgbouncer pgbackrest pg_exporter pgbadger vip-manager wal2json pg_repackpg_extensions:# pg extensions to be installed, alias can be used- postgis timescaledb pgvector
Another important distinction is that packages installed via pg_packages are merely ensured to be present, whereas those installed via pg_extensions are automatically upgraded to the latest available version. This is not an issue when using a local software repository, but when using upstream online repositories, consider this carefully and move extensions you do not want to be upgraded to pg_packages.
During PGSQL cluster initialization, Pigsty will automatically install the extensions specified in both pg_packages and pg_extensions.
Install on Existing Cluster
For a PostgreSQL cluster that has already been provisioned and initialized, you can first add the desired extensions to either pg_packages or pg_extensions, and then install the extensions using the following command:
./pgsql.yml -t pg_extension # install extensions specified in pg_extensions
Alias Translation
When specifying extensions in Pigsty, you can use the following formats in pg_packages and pg_extensions:
The original OS package name
A normalized extension name (alias)
postgis # Installs the PostGIS package for the current major PG versionpostgis34_$v* # Installs the PostGIS RPM package for the current major PG versionpostgis34_15* # Installs the PostGIS RPM package for PG 15postgresql-$v-postgis-3* # Installs the PostGIS DEB package for the current major PG versionpostgresql-14-postgis-3* # Installs the PostGIS DEB package for PG 14
We recommend using the standardized extension names (aliases) provided by Pigsty. Pigsty will translate these aliases into the appropriate RPM/DEB package names corresponding to the PG major version for different OS distributions. This way, users don’t need to worry about the differences in extension package names across various OS distributions:
Pigsty strives to align the PostgreSQL extensions available for EL and Debian-based systems.
However, a few extensions may be difficult to migrate or have not yet been ported due to various reasons.
For more information, please refer to the RPM Extension List and DEB Extension List.
8.2.5 - Load Extension
Some extensions that use hook mechanism must be preloaded and restarted to take effect.
After installing PostgreSQL extensions, you can view them in the pg_available_extensions view in PostgreSQL.
Aside from extensions written purely in SQL, most extensions provide a .so file, which is a dynamic shared library.
Most extensions do not require explicit loading and can be enabled simply with CREATE EXTENSION. However, a small subset of extensions use PostgreSQL’s hook mechanism. These extensions must be preloaded using the shared_preload_libraries parameter and require a PostgreSQL restart to take effect. Attempting to execute CREATE EXTENSION without preloading and restarting will result in an error.
In Pigsty, you can predefine the extensions that need to be loaded in the cluster by specifying them in the cluster’s pg_libs parameter, or modify the cluster configuration after initializing the cluster.
Extensions that Need Loading
In the Extension List, extensions marked with LOAD are the ones that need to be dynamically loaded and require a restart. These include:
In shared_preload_libraries, if multiple extensions need to be loaded, they can be separated by commas, for example:
'timescaledb, pg_stat_statements, auto_explain'
Note that both Citus and TimescaleDB explicitly require preloading in shared_preload_libraries, meaning they should be listed first.
While it is rare to use both Citus and TimescaleDB simultaneously, in such cases, it is recommended to list citus before timescaledb.
Pigsty, by default, will load two extensions: pg_stat_statements and auto_explain. These extensions are very useful for optimizing database performance and are strongly recommended.
8.2.6 - Create Extension
How to use CREATE EXTENSION to actually enable a PostgreSQL extensions.
After installing PostgreSQL extensions, you can view them in the pg_available_extensions view. However, enabling these extensions typically requires additional steps:
Some extensions must be added to the shared_preload_libraries for dynamic loading, such as timescaledb and citus.
Most extensions need to be activated by running the SQL statement: CREATE EXTENSION <name>;. A few, like wal2json, do not require this step.
Modifying shared_preload_libraries:
Before initializing the database cluster: You can manually specify the required libraries using the pg_libsparameter.
After the database cluster has been initialized: You can modify the cluster configuration by directly editing the shared_preload_libraries parameter and applying the changes (no restart required).
Typical extensions that require dynamic loading:citus, timescaledb, pg_cron, pg_net, pg_tle
Executing CREATE EXTENSION:
Before initializing the database cluster: You can specify the required extensions in the extensions list within pg_databases.
After the database cluster has been initialized: You can directly connect to the database and execute the SQL command, or manage extensions using other schema management tools.
Conceptually: PostgreSQL extensions usually consist of three parts: a control file (metadata, always present), an SQL file (optional SQL statements), and a .so file (optional binary shared library). Extensions that provide a .so file may need to be added to shared_preload_libraries to function properly, such as citus and timescaledb. However, many extensions do not require this, such as postgis and pgvector. Extensions that do not expose a SQL interface do not need a CREATE EXTENSION command to be executed, such as the wal2json extension, which provides CDC extraction capabilities.
To complete the extension creation, execute the CREATE EXTENSION SQL statement in the database where you wish to enable the extension.
How to safely remove a PostgreSQL extension from a database cluster?
To uninstall an extension, you typically need to perform the following steps:
DROPEXTENSION"<extname>";
Note that if there are other extensions or database objects dependent on this extension, you will need to uninstall/remove those dependencies first before uninstalling the extension.
Alternatively, you can use the following statement to forcefully uninstall the extension and its dependencies in one go:
DROPEXTENSION"<extname>"CASCADE;
Note: The CASCADE option will delete all objects that depend on this extension, including database objects, functions, views, etc. Use with caution!
If you wish to remove the extension’s package, you can use your operating system’s package manager to uninstall it:
8.2.9 - Pre-defined Stacks
How to use the predefined extension stacks in Pigsty?
8.3 - Build & Packaging
Prepare PostgreSQL rpm/deb pacakge building environment, and some packaging hints
8.3.1 - Building Environment
How to prepare VM env for building PostgreSQL RPM/DEB extensions, EL 8/9,Debian12,Ubuntu22。
VM
To build PGML RPM packages in EL / Debian environment, you need to prepare a virtual machine environment. Pigsty provides a ext.yml template that can be used to prepare the virtual machine environment required for building.
cd pigsty
make build
./node.yml -i files/pigsty/build-ext.yml -t node_repo,node_pkg
It will launch four virtual machines with EL8, EL9, Debian12, and Ubuntu22 respectively, and install the necessary dependencies for building.
Proxy
If you are in a network environment that requires a proxy, you need to configure the proxy environment variables.
Here we assume that you have a proxy server available in your local environment: http://192.168.0.106:8118 (replace with your OWN proxy server).
You’ll have to group install additional 'Development Tools' components in EL 8 / EL 9 environment. In EL8, you need to add the --nobest option to complete the installation due to dependency errors.
fnmain(){#[cfg(target_os = "macos")]{println!("cargo:rustc-link-search=/opt/homebrew/opt/openblas/lib");println!("cargo:rustc-link-search=/opt/homebrew/opt/libomp/lib");}// PostgreSQL is using dlopen(RTLD_GLOBAL). this will parse some
// of symbols into the previous opened .so file, but the others will use a
// relative offset in pgml.so, and will cause a null-pointer crash.
//
// hide all symbol to avoid symbol conflicts.
//
// append mode (link-args) only works with clang ld (lld)
println!("cargo:link-args=-Wl,--version-script={}/ld.map",std::env::current_dir().unwrap().to_string_lossy(),);println!("cargo:rustc-link-lib=static=stdc++fs");println!("cargo:rustc-link-search=native=/opt/rh/gcc-toolset-13/root/usr/lib/gcc/x86_64-redhat-linux/13");vergen::EmitBuilder::builder().all_git().emit().unwrap();}
To define an infra cluster, use the hard-coded group name infra in your inventory file.
You can use multiple nodes to deploy INFRA module, but at least one is required. You have to assign a unique infra_seq to each node.
# Single infra nodeinfra:{hosts:{10.10.10.10:{infra_seq:1}}}# Two INFRA nodeinfra:hosts:10.10.10.10:{infra_seq:1}10.10.10.11:{infra_seq:2}
Then you can init INFRA module with infra.yml playbook.
Administration
Here are some administration tasks related to INFRA module:
Install/Remove Infra Module
./infra.yml # install infra/node module on `infra` group./infra-rm.yml # remove infra module from `infra` group
Manage Local Software Repo
./infra.yml -t repo # setup local yum/apt repo./infra.yml -t repo_dir # create repo directory./infra.yml -t repo_check # check repo exists./infra.yml -t repo_prepare # use existing repo if exists./infra.yml -t repo_build # build repo from upstream if not exists./infra.yml -t repo_upstream # handle upstream repo files in /etc/yum.repos.d or /etc/apt/sources.list.d./infra.yml -t repo_url_pkg # download packages from internet defined by repo_url_packages./infra.yml -t repo_cache # make upstream yum/apt cache./infra.yml -t repo_boot_pkg # install bootstrap pkg such as createrepo_c,yum-utils,... (or dpkg-dev in debian/ubuntu)./infra.yml -t repo_pkg # download packages & dependencies from upstream repo./infra.yml -t repo_create # create a local yum repo with createrepo_c & modifyrepo_c./infra.yml -t repo_use # add newly built repo./infra.yml -t repo_nginx # launch a nginx for repo if no nginx is serving
./infra.yml -t nginx_index # render Nginx homepage./infra.yml -t nginx_config,nginx_reload # render Nginx upstream server config./infra.yml -t prometheus_conf,prometheus_reload # render Prometheus main config and reload./infra.yml -t prometheus_rule,prometheus_reload # copy Prometheus rules & alert definition and reload./infra.yml -t grafana_plugin # download Grafana plugins from the Internet
Playbook
install.yml : Install Pigsty on all nodes in one-pass
infra.yml : Init pigsty infrastructure on infra nodes
infra-rm.yml : Remove infrastructure components from infra nodes
infra.yml
The playbook infra.yml will init pigsty infrastructure on infra nodes.
It will also install NODE module on infra nodes too.
Here are available subtasks:
# ca : create self-signed CA on localhost files/pki# - ca_dir : create CA directory# - ca_private : generate ca private key: files/pki/ca/ca.key# - ca_cert : signing ca cert: files/pki/ca/ca.crt## id : generate node identity## repo : bootstrap a local yum repo from internet or offline packages# - repo_dir : create repo directory# - repo_check : check repo exists# - repo_prepare : use existing repo if exists# - repo_build : build repo from upstream if not exists# - repo_upstream : handle upstream repo files in /etc/yum.repos.d# - repo_remove : remove existing repo file if repo_remove == true# - repo_add : add upstream repo files to /etc/yum.repos.d# - repo_url_pkg : download packages from internet defined by repo_url_packages# - repo_cache : make upstream yum cache with yum makecache# - repo_boot_pkg : install bootstrap pkg such as createrepo_c,yum-utils,...# - repo_pkg : download packages & dependencies from upstream repo# - repo_create : create a local yum repo with createrepo_c & modifyrepo_c# - repo_use : add newly built repo into /etc/yum.repos.d# - repo_nginx : launch a nginx for repo if no nginx is serving## node/haproxy/docker/monitor : setup infra node as a common node (check node.yml)# - node_name, node_hosts, node_resolv, node_firewall, node_ca, node_repo, node_pkg# - node_feature, node_kernel, node_tune, node_sysctl, node_profile, node_ulimit# - node_data, node_admin, node_timezone, node_ntp, node_crontab, node_vip# - haproxy_install, haproxy_config, haproxy_launch, haproxy_reload# - docker_install, docker_admin, docker_config, docker_launch, docker_image# - haproxy_register, node_exporter, node_register, promtail## infra : setup infra components# - infra_env : env_dir, env_pg, env_var# - infra_pkg : infra_pkg, infra_pkg_pip# - infra_user : setup infra os user group# - infra_cert : issue cert for infra components# - dns : dns_config, dns_record, dns_launch# - nginx : nginx_config, nginx_cert, nginx_static, nginx_launch, nginx_exporter# - prometheus : prometheus_clean, prometheus_dir, prometheus_config, prometheus_launch, prometheus_reload# - alertmanager : alertmanager_config, alertmanager_launch# - pushgateway : pushgateway_config, pushgateway_launch# - blackbox : blackbox_config, blackbox_launch# - grafana : grafana_clean, grafana_config, grafana_plugin, grafana_launch, grafana_provision# - loki : loki clean, loki_dir, loki_config, loki_launch# - infra_register : register infra components to prometheus
infra-rm.yml
The playbook infra-rm.yml will remove infrastructure components from infra nodes
./infra-rm.yml # remove INFRA module./infra-rm.yml -t service # stop INFRA services./infra-rm.yml -t data # remove INFRA data./infra-rm.yml -t package # uninstall INFRA packages
install.yml
The playbook install.yml will install Pigsty on all node in one-pass.
A metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which alertmanager was built, and the goos and goarch for the build.
alertmanager_cluster_alive_messages_total
counter
ins, instance, ip, peer, job, cls
Total number of received alive messages.
alertmanager_cluster_enabled
gauge
ins, instance, ip, job, cls
Indicates whether the clustering is enabled or not.
alertmanager_cluster_failed_peers
gauge
ins, instance, ip, job, cls
Number indicating the current number of failed peers in the cluster.
alertmanager_cluster_health_score
gauge
ins, instance, ip, job, cls
Health score of the cluster. Lower values are better and zero means ’totally healthy’.
alertmanager_cluster_members
gauge
ins, instance, ip, job, cls
Number indicating current number of members in cluster.
alertmanager_cluster_messages_pruned_total
counter
ins, instance, ip, job, cls
Total number of cluster messages pruned.
alertmanager_cluster_messages_queued
gauge
ins, instance, ip, job, cls
Number of cluster messages which are queued.
alertmanager_cluster_messages_received_size_total
counter
ins, instance, ip, msg_type, job, cls
Total size of cluster messages received.
alertmanager_cluster_messages_received_total
counter
ins, instance, ip, msg_type, job, cls
Total number of cluster messages received.
alertmanager_cluster_messages_sent_size_total
counter
ins, instance, ip, msg_type, job, cls
Total size of cluster messages sent.
alertmanager_cluster_messages_sent_total
counter
ins, instance, ip, msg_type, job, cls
Total number of cluster messages sent.
alertmanager_cluster_peer_info
gauge
ins, instance, ip, peer, job, cls
A metric with a constant ‘1’ value labeled by peer name.
alertmanager_cluster_peers_joined_total
counter
ins, instance, ip, job, cls
A counter of the number of peers that have joined.
alertmanager_cluster_peers_left_total
counter
ins, instance, ip, job, cls
A counter of the number of peers that have left.
alertmanager_cluster_peers_update_total
counter
ins, instance, ip, job, cls
A counter of the number of peers that have updated metadata.
alertmanager_cluster_reconnections_failed_total
counter
ins, instance, ip, job, cls
A counter of the number of failed cluster peer reconnection attempts.
alertmanager_cluster_reconnections_total
counter
ins, instance, ip, job, cls
A counter of the number of cluster peer reconnections.
alertmanager_cluster_refresh_join_failed_total
counter
ins, instance, ip, job, cls
A counter of the number of failed cluster peer joined attempts via refresh.
alertmanager_cluster_refresh_join_total
counter
ins, instance, ip, job, cls
A counter of the number of cluster peer joined via refresh.
alertmanager_config_hash
gauge
ins, instance, ip, job, cls
Hash of the currently loaded alertmanager configuration.
A metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which blackbox_exporter was built, and the goos and goarch for the build.
Number of schedulers this frontend is connected to.
cortex_query_frontend_queries_in_progress
gauge
ins, instance, ip, job, cls
Number of queries in progress handled by this frontend.
cortex_query_frontend_retries_bucket
Unknown
ins, instance, ip, le, job, cls
N/A
cortex_query_frontend_retries_count
Unknown
ins, instance, ip, job, cls
N/A
cortex_query_frontend_retries_sum
Unknown
ins, instance, ip, job, cls
N/A
cortex_query_scheduler_connected_frontend_clients
gauge
ins, instance, ip, job, cls
Number of query-frontend worker clients currently connected to the query-scheduler.
cortex_query_scheduler_connected_querier_clients
gauge
ins, instance, ip, job, cls
Number of querier worker clients currently connected to the query-scheduler.
cortex_query_scheduler_inflight_requests
summary
ins, instance, ip, job, cls, quantile
Number of inflight requests (either queued or processing) sampled at a regular interval. Quantile buckets keep track of inflight requests over the last 60s.
A summary of the pause duration of garbage collection cycles.
go_gc_duration_seconds_count
Unknown
ins, instance, ip, job, cls
N/A
go_gc_duration_seconds_sum
Unknown
ins, instance, ip, job, cls
N/A
go_gc_gogc_percent
gauge
ins, instance, ip, job, cls
Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function.
go_gc_gomemlimit_bytes
gauge
ins, instance, ip, job, cls
Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function.
go_gc_heap_allocs_by_size_bytes_bucket
Unknown
ins, instance, ip, le, job, cls
N/A
go_gc_heap_allocs_by_size_bytes_count
Unknown
ins, instance, ip, job, cls
N/A
go_gc_heap_allocs_by_size_bytes_sum
Unknown
ins, instance, ip, job, cls
N/A
go_gc_heap_allocs_bytes_total
Unknown
ins, instance, ip, job, cls
N/A
go_gc_heap_allocs_objects_total
Unknown
ins, instance, ip, job, cls
N/A
go_gc_heap_frees_by_size_bytes_bucket
Unknown
ins, instance, ip, le, job, cls
N/A
go_gc_heap_frees_by_size_bytes_count
Unknown
ins, instance, ip, job, cls
N/A
go_gc_heap_frees_by_size_bytes_sum
Unknown
ins, instance, ip, job, cls
N/A
go_gc_heap_frees_bytes_total
Unknown
ins, instance, ip, job, cls
N/A
go_gc_heap_frees_objects_total
Unknown
ins, instance, ip, job, cls
N/A
go_gc_heap_goal_bytes
gauge
ins, instance, ip, job, cls
Heap size target for the end of the GC cycle.
go_gc_heap_live_bytes
gauge
ins, instance, ip, job, cls
Heap memory occupied by live objects that were marked by the previous GC.
go_gc_heap_objects_objects
gauge
ins, instance, ip, job, cls
Number of objects, live or unswept, occupying heap memory.
go_gc_heap_tiny_allocs_objects_total
Unknown
ins, instance, ip, job, cls
N/A
go_gc_limiter_last_enabled_gc_cycle
gauge
ins, instance, ip, job, cls
GC cycle the last time the GC CPU limiter was enabled. This metric is useful for diagnosing the root cause of an out-of-memory error, because the limiter trades memory for CPU time when the GC’s CPU time gets too high. This is most likely to occur with use of SetMemoryLimit. The first GC cycle is cycle 1, so a value of 0 indicates that it was never enabled.
go_gc_pauses_seconds_bucket
Unknown
ins, instance, ip, le, job, cls
N/A
go_gc_pauses_seconds_count
Unknown
ins, instance, ip, job, cls
N/A
go_gc_pauses_seconds_sum
Unknown
ins, instance, ip, job, cls
N/A
go_gc_scan_globals_bytes
gauge
ins, instance, ip, job, cls
The total amount of global variable space that is scannable.
go_gc_scan_heap_bytes
gauge
ins, instance, ip, job, cls
The total amount of heap space that is scannable.
go_gc_scan_stack_bytes
gauge
ins, instance, ip, job, cls
The number of bytes of stack that were scanned last GC cycle.
go_gc_scan_total_bytes
gauge
ins, instance, ip, job, cls
The total amount space that is scannable. Sum of all metrics in /gc/scan.
Memory that is completely free and eligible to be returned to the underlying system, but has not been. This metric is the runtime’s estimate of free address space that is backed by physical memory.
go_memory_classes_heap_objects_bytes
gauge
ins, instance, ip, job, cls
Memory occupied by live objects and dead objects that have not yet been marked free by the garbage collector.
go_memory_classes_heap_released_bytes
gauge
ins, instance, ip, job, cls
Memory that is completely free and has been returned to the underlying system. This metric is the runtime’s estimate of free address space that is still mapped into the process, but is not backed by physical memory.
go_memory_classes_heap_stacks_bytes
gauge
ins, instance, ip, job, cls
Memory allocated from the heap that is reserved for stack space, whether or not it is currently in-use. Currently, this represents all stack memory for goroutines. It also includes all OS thread stacks in non-cgo programs. Note that stacks may be allocated differently in the future, and this may change.
go_memory_classes_heap_unused_bytes
gauge
ins, instance, ip, job, cls
Memory that is reserved for heap objects but is not currently used to hold heap objects.
go_memory_classes_metadata_mcache_free_bytes
gauge
ins, instance, ip, job, cls
Memory that is reserved for runtime mcache structures, but not in-use.
go_memory_classes_metadata_mcache_inuse_bytes
gauge
ins, instance, ip, job, cls
Memory that is occupied by runtime mcache structures that are currently being used.
go_memory_classes_metadata_mspan_free_bytes
gauge
ins, instance, ip, job, cls
Memory that is reserved for runtime mspan structures, but not in-use.
go_memory_classes_metadata_mspan_inuse_bytes
gauge
ins, instance, ip, job, cls
Memory that is occupied by runtime mspan structures that are currently being used.
go_memory_classes_metadata_other_bytes
gauge
ins, instance, ip, job, cls
Memory that is reserved for or used to hold runtime metadata.
go_memory_classes_os_stacks_bytes
gauge
ins, instance, ip, job, cls
Stack memory allocated by the underlying operating system. In non-cgo programs this metric is currently zero. This may change in the future.In cgo programs this metric includes OS thread stacks allocated directly from the OS. Currently, this only accounts for one stack in c-shared and c-archive build modes, and other sources of stacks from the OS are not measured. This too may change in the future.
go_memory_classes_other_bytes
gauge
ins, instance, ip, job, cls
Memory used by execution trace buffers, structures for debugging the runtime, finalizer and profiler specials, and more.
go_memory_classes_profiling_buckets_bytes
gauge
ins, instance, ip, job, cls
Memory that is used by the stack trace hash map used for profiling.
go_memory_classes_total_bytes
gauge
ins, instance, ip, job, cls
All memory mapped by the Go runtime into the current process as read-write. Note that this does not include memory mapped by code called via cgo or via the syscall package. Sum of all metrics in /memory/classes.
go_memstats_alloc_bytes
counter
ins, instance, ip, job, cls
Total number of bytes allocated, even if freed.
go_memstats_alloc_bytes_total
counter
ins, instance, ip, job, cls
Total number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytes
gauge
ins, instance, ip, job, cls
Number of bytes used by the profiling bucket hash table.
go_memstats_frees_total
counter
ins, instance, ip, job, cls
Total number of frees.
go_memstats_gc_sys_bytes
gauge
ins, instance, ip, job, cls
Number of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytes
gauge
ins, instance, ip, job, cls
Number of heap bytes allocated and still in use.
go_memstats_heap_idle_bytes
gauge
ins, instance, ip, job, cls
Number of heap bytes waiting to be used.
go_memstats_heap_inuse_bytes
gauge
ins, instance, ip, job, cls
Number of heap bytes that are in use.
go_memstats_heap_objects
gauge
ins, instance, ip, job, cls
Number of allocated objects.
go_memstats_heap_released_bytes
gauge
ins, instance, ip, job, cls
Number of heap bytes released to OS.
go_memstats_heap_sys_bytes
gauge
ins, instance, ip, job, cls
Number of heap bytes obtained from system.
go_memstats_last_gc_time_seconds
gauge
ins, instance, ip, job, cls
Number of seconds since 1970 of last garbage collection.
go_memstats_lookups_total
counter
ins, instance, ip, job, cls
Total number of pointer lookups.
go_memstats_mallocs_total
counter
ins, instance, ip, job, cls
Total number of mallocs.
go_memstats_mcache_inuse_bytes
gauge
ins, instance, ip, job, cls
Number of bytes in use by mcache structures.
go_memstats_mcache_sys_bytes
gauge
ins, instance, ip, job, cls
Number of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytes
gauge
ins, instance, ip, job, cls
Number of bytes in use by mspan structures.
go_memstats_mspan_sys_bytes
gauge
ins, instance, ip, job, cls
Number of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytes
gauge
ins, instance, ip, job, cls
Number of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytes
gauge
ins, instance, ip, job, cls
Number of bytes used for other system allocations.
go_memstats_stack_inuse_bytes
gauge
ins, instance, ip, job, cls
Number of bytes in use by the stack allocator.
go_memstats_stack_sys_bytes
gauge
ins, instance, ip, job, cls
Number of bytes obtained from system for stack allocator.
go_memstats_sys_bytes
gauge
ins, instance, ip, job, cls
Number of bytes obtained from system.
go_sched_gomaxprocs_threads
gauge
ins, instance, ip, job, cls
The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously.
go_sched_goroutines_goroutines
gauge
ins, instance, ip, job, cls
Count of live goroutines.
go_sched_latencies_seconds_bucket
Unknown
ins, instance, ip, le, job, cls
N/A
go_sched_latencies_seconds_count
Unknown
ins, instance, ip, job, cls
N/A
go_sched_latencies_seconds_sum
Unknown
ins, instance, ip, job, cls
N/A
go_sql_stats_connections_blocked_seconds
unknown
ins, instance, db_name, ip, job, cls
The total time blocked waiting for a new connection.
go_sql_stats_connections_closed_max_idle
unknown
ins, instance, db_name, ip, job, cls
The total number of connections closed due to SetMaxIdleConns.
go_sql_stats_connections_closed_max_idle_time
unknown
ins, instance, db_name, ip, job, cls
The total number of connections closed due to SetConnMaxIdleTime.
go_sql_stats_connections_closed_max_lifetime
unknown
ins, instance, db_name, ip, job, cls
The total number of connections closed due to SetConnMaxLifetime.
go_sql_stats_connections_idle
gauge
ins, instance, db_name, ip, job, cls
The number of idle connections.
go_sql_stats_connections_in_use
gauge
ins, instance, db_name, ip, job, cls
The number of connections currently in use.
go_sql_stats_connections_max_open
gauge
ins, instance, db_name, ip, job, cls
Maximum number of open connections to the database.
go_sql_stats_connections_open
gauge
ins, instance, db_name, ip, job, cls
The number of established connections both in use and idle.
A metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which nginx_exporter was built, and the goos and goarch for the build.
nginx_http_requests_total
counter
ins, instance, ip, job, cls
Total http requests
nginx_up
gauge
ins, instance, ip, job, cls
Status of the last metric scrape
plugins_active_instances
gauge
ins, instance, ip, job, cls
The number of active plugin instances
plugins_datasource_instances_total
Unknown
ins, instance, ip, job, cls
N/A
process_cpu_seconds_total
counter
ins, instance, ip, job, cls
Total user and system CPU time spent in seconds.
process_max_fds
gauge
ins, instance, ip, job, cls
Maximum number of open file descriptors.
process_open_fds
gauge
ins, instance, ip, job, cls
Number of open file descriptors.
process_resident_memory_bytes
gauge
ins, instance, ip, job, cls
Resident memory size in bytes.
process_start_time_seconds
gauge
ins, instance, ip, job, cls
Start time of the process since unix epoch in seconds.
process_virtual_memory_bytes
gauge
ins, instance, ip, job, cls
Virtual memory size in bytes.
process_virtual_memory_max_bytes
gauge
ins, instance, ip, job, cls
Maximum amount of virtual memory available in bytes.
prometheus_api_remote_read_queries
gauge
ins, instance, ip, job, cls
The current number of remote read queries being executed or waiting.
A metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which prometheus was built, and the goos and goarch for the build.
The timestamp of the oldest exemplar stored in circular storage. Useful to check for what timerange the current exemplar buffer limit allows. This usually means the last timestampfor all exemplars for a typical setup. This is not true though if one of the series timestamp is in future compared to rest series.
prometheus_tsdb_exemplar_max_exemplars
gauge
ins, instance, ip, job, cls
Total number of exemplars the exemplar storage can store, resizeable.
A metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which pushgateway was built, and the goos and goarch for the build.
A metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which pushgateway was built, and the goos and goarch for the build.
pushgateway_http_requests_total
counter
job, cls, method, code, handler, instance, ins, ip
Total HTTP requests processed by the Pushgateway, excluding scrapes.
scrape_duration_seconds
Unknown
job, cls, instance, ins, ip
N/A
scrape_samples_post_metric_relabeling
Unknown
job, cls, instance, ins, ip
N/A
scrape_samples_scraped
Unknown
job, cls, instance, ins, ip
N/A
scrape_series_added
Unknown
job, cls, instance, ins, ip
N/A
up
Unknown
job, cls, instance, ins, ip
N/A
9.2 - FAQ
Pigsty INFRA module frequently asked questions
Which components are included in INFRA
Ansible for automation, deployment, and administration;
Nginx for exposing any WebUI service and serving the yum/apt repo;
Self-Signed CA for SSL/TLS certificates;
Prometheus for monitoring metrics
Grafana for monitoring/visualization
Loki for logging collection
AlertManager for alerts aggregation
Chronyd for NTP time sync on the admin node.
DNSMasq for DNS registration and resolution.
ETCD as DCS for PGSQL HA; (dedicated module)
PostgreSQL on meta nodes as CMDB; (optional)
Docker for stateless applications & tools (optional)
How to restore Prometheus targets
If you accidentally deleted the Prometheus targets dir, you can register monitoring targets to Prometheus again with the:
./infra.yml -t register_prometheus # register all infra targets to prometheus on infra nodes./node.yml -t register_prometheus # register all node targets to prometheus on infra nodes./etcd.yml -t register_prometheus # register all etcd targets to prometheus on infra nodes./minio.yml -t register_prometheus # register all minio targets to prometheus on infra nodes./pgsql.yml -t register_prometheus # register all pgsql targets to prometheus on infra nodes
How to restore Grafana datasource
PGSQL Databases in pg_databases are registered as Grafana datasource by default.
If you accidentally deleted the registered postgres datasource in Grafana, you can register them again with
./pgsql.yml -t register_grafana # register all pgsql database (in pg_databases) as grafana datasource
How to restore the HAProxy admin page proxy
The haproxy admin page is proxied by Nginx under the default server.
If you accidentally deleted the registered haproxy proxy settings in /etc/nginx/conf.d/haproxy, you can restore them again with
./node.yml -t register_nginx # register all haproxy admin page proxy settings to nginx on infra nodes
How to restore the DNS registration
PGSQL cluster/instance domain names are registered to /etc/hosts.d/<name> on infra nodes by default.
You can restore them again with the following:
./pgsql.yml -t pg_dns # register pg DNS names to dnsmasq on infra nodes
How to expose new Nginx upstream service
If you wish to expose a new WebUI service via the Nginx portal, you can add the service definition to the infra_portal parameter.
And re-run ./infra.yml -t nginx_config,nginx_launch to update & apply the Nginx configuration.
If you wish to access with HTTPS, you must remove files/pki/csr/pigsty.csr, files/pki/nginx/pigsty.{key,crt} to force re-generating the Nginx SSL/TLS certificate to include the new upstream’s domain name.
How to expose web service through Nginx?
While you can directly access services via IP:Port, we still recommend consolidating access points by using domain names and uniformly accessing various web-based services through the Nginx portal.
This approach helps centralize access, reduce the number of exposed ports, and facilitates access control and auditing.
If you wish to expose a new WebUI service through the Nginx portal, you can add the service definition to the infra_portal parameter.
For example, here is the config used by the public demo site, which exposes several additional web services:
If you wish to access via HTTPS, you must delete files/pki/csr/pigsty.csr and files/pki/nginx/pigsty.{key,crt} to force the regeneration of the Nginx SSL/TLS certificate to include the new upstream domain names. If you prefer to use an SSL certificate issued by an authoritative organization instead of a certificate issued by Pigsty’s self-signed CA, you can place it in the /etc/nginx/conf.d/cert/ directory and modify the corresponding configuration: /etc/nginx/conf.d/<name>.conf.
How to manually add upstream repo files
Pigsty has a built-in wrap script bin/repo-add, which will invoke ansible playbook node.yml to adding repo files to corresponding nodes.
bin/repo-add <selector> [modules]bin/repo-add 10.10.10.10 # add node repos for node 10.10.10.10bin/repo-add infra node,infra # add node and infra repos for group infrabin/repo-add infra node,local # add node repos and local pigsty repobin/repo-add pg-test node,pgsql # add node & pgsql repos for group pg-test
10 - Module: NODE
Tune nodes into the desired state and monitor it, manage node, vip haproxy, and exporters.
The admin node is usually overlapped with the infra node, if there’s more than one infra node,
the first one is often used as the default admin node, and the rest of the infra nodes can be used as backup admin nodes.
Common Node
You can manage nodes with Pigsty, and install modules on them. The node.yml playbook will adjust the node to desired state.
Some services will be added to all nodes by default:
Component
Port
Description
Status
Node Exporter
9100
Node Monitoring Metrics Exporter
Enabled
HAProxy Admin
9101
HAProxy admin page
Enabled
Promtail
9080
Log collecting agent
Enabled
Docker Daemon
9323
Enable Container Service
Disabled
Keepalived
-
Manage Node Cluster L2 VIP
Disabled
Keepalived Exporter
9650
Monitoring Keepalived Status
Disabled
Docker & Keepalived are optional components, enabled when required.
ADMIN Node
There is one and only one admin node in a pigsty deployment, which is specified by admin_ip. It is set to the local primary IP during configure.
The node will have ssh / sudo access to all other nodes, which is critical; ensure it’s fully secured.
INFRA Node
A pigsty deployment may have one or more infra nodes, usually 2 ~ 3, in a large production environment.
The infra group specifies infra nodes in the inventory. And infra nodes will have INFRA module installed (DNS, Nginx, Prometheus, Grafana, etc…),
The admin node is also the default and first infra node, and infra nodes can be used as ‘backup’ admin nodes.
Component
Port
Domain
Description
Nginx
80
h.pigsty
Web Service Portal (YUM/APT Repo)
AlertManager
9093
a.pigsty
Alert Aggregation and delivery
Prometheus
9090
p.pigsty
Monitoring Time Series Database
Grafana
3000
g.pigsty
Visualization Platform
Loki
3100
-
Logging Collection Server
PushGateway
9091
-
Collect One-Time Job Metrics
BlackboxExporter
9115
-
Blackbox Probing
Dnsmasq
53
-
DNS Server
Chronyd
123
-
NTP Time Server
PostgreSQL
5432
-
Pigsty CMDB & default database
Ansible
-
-
Run playbooks
PGSQL Node
The node with PGSQL module installed is called a PGSQL node. The node and pg instance is 1:1 deployed. And node instance can be borrowed from corresponding pg instances with node_id_from_pg.
Component
Port
Description
Status
Postgres
5432
Pigsty CMDB
Enabled
Pgbouncer
6432
Pgbouncer Connection Pooling Service
Enabled
Patroni
8008
Patroni HA Component
Enabled
Haproxy Primary
5433
Primary connection pool: Read/Write Service
Enabled
Haproxy Replica
5434
Replica connection pool: Read-only Service
Enabled
Haproxy Default
5436
Primary Direct Connect Service
Enabled
Haproxy Offline
5438
Offline Direct Connect: Offline Read Service
Enabled
Haproxy service
543x
Customized PostgreSQL Services
On Demand
Haproxy Admin
9101
Monitoring metrics and traffic management
Enabled
PG Exporter
9630
PG Monitoring Metrics Exporter
Enabled
PGBouncer Exporter
9631
PGBouncer Monitoring Metrics Exporter
Enabled
Node Exporter
9100
Node Monitoring Metrics Exporter
Enabled
Promtail
9080
Collect Postgres, Pgbouncer, Patroni logs
Enabled
Docker Daemon
9323
Docker Container Service (disable by default)
Disabled
vip-manager
-
Bind VIP to the primary
Disabled
keepalived
-
Node Cluster L2 VIP manager (disable by default)
Disabled
Keepalived Exporter
9650
Keepalived Metrics Exporter (disable by default)
Disabled
Configuration
Each node has identity parameters that are configured through the parameters in <cluster>.hosts and <cluster>.vars.
Pigsty uses IP as a unique identifier for database nodes. This IP must be the IP that the database instance listens to and serves externally, But it would be inappropriate to use a public IP address!
This is very important. The IP is the inventory_hostname of the host in the inventory, which is reflected as the key in the <cluster>.hosts object.
You can use ansible_* parameters to overwrite ssh behavior, e.g. connect via domain name / alias, but the primary IPv4 is still the core identity of the node.
nodename and node_cluster are not mandatory; nodename will use the node’s current hostname by default, while node_cluster will use the fixed default value: nodes.
#nodename: # [INSTANCE] # node instance identity, use hostname if missing, optionalnode_cluster:nodes # [CLUSTER]# node cluster identity, use 'nodes' if missing, optionalnodename_overwrite:true# overwrite node's hostname with nodename?nodename_exchange:false# exchange nodename among play hosts?node_id_from_pg:true# use postgres identity as node identity if applicable?
Administration
Here are some common administration tasks for NODE module.
A metric with a constant ‘1’ value labeled by bios_date, bios_release, bios_vendor, bios_version, board_asset_tag, board_name, board_serial, board_vendor, board_version, chassis_asset_tag, chassis_serial, chassis_vendor, chassis_version, product_family, product_name, product_serial, product_sku, product_uuid, product_version, system_vendor if provided by DMI.
A metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which node_exporter was built, and the goos and goarch for the build.
A metric with a constant ‘1’ value labeled by build_id, id, id_like, image_id, image_version, name, pretty_name, variant, variant_id, version, version_codename, version_id.
node_os_version
gauge
id, ip, ins, instance, job, id_like, cls
Metric containing the major.minor part of the OS version.
node_processes_max_processes
gauge
instance, ins, job, ip, cls
Number of max PIDs limit
node_processes_max_threads
gauge
instance, ins, job, ip, cls
Limit of threads in the system
node_processes_pids
gauge
instance, ins, job, ip, cls
Number of PIDs
node_processes_state
gauge
state, instance, ins, job, ip, cls
Number of processes in each state.
node_processes_threads
gauge
instance, ins, job, ip, cls
Allocated threads in system
node_processes_threads_state
gauge
instance, ins, job, thread_state, ip, cls
Number of threads in each state.
node_procs_blocked
gauge
instance, ins, job, ip, cls
Number of processes blocked waiting for I/O to complete.
node_procs_running
gauge
instance, ins, job, ip, cls
Number of processes in runnable state.
node_schedstat_running_seconds_total
counter
ip, ins, job, cpu, instance, cls
Number of seconds CPU spent running a process.
node_schedstat_timeslices_total
counter
ip, ins, job, cpu, instance, cls
Number of timeslices executed by CPU.
node_schedstat_waiting_seconds_total
counter
ip, ins, job, cpu, instance, cls
Number of seconds spent by processing waiting for this CPU.
node_scrape_collector_duration_seconds
gauge
ip, collector, ins, job, instance, cls
node_exporter: Duration of a collector scrape.
node_scrape_collector_success
gauge
ip, collector, ins, job, instance, cls
node_exporter: Whether a collector succeeded.
node_selinux_enabled
gauge
instance, ins, job, ip, cls
SELinux is enabled, 1 is true, 0 is false
node_sockstat_FRAG6_inuse
gauge
instance, ins, job, ip, cls
Number of FRAG6 sockets in state inuse.
node_sockstat_FRAG6_memory
gauge
instance, ins, job, ip, cls
Number of FRAG6 sockets in state memory.
node_sockstat_FRAG_inuse
gauge
instance, ins, job, ip, cls
Number of FRAG sockets in state inuse.
node_sockstat_FRAG_memory
gauge
instance, ins, job, ip, cls
Number of FRAG sockets in state memory.
node_sockstat_RAW6_inuse
gauge
instance, ins, job, ip, cls
Number of RAW6 sockets in state inuse.
node_sockstat_RAW_inuse
gauge
instance, ins, job, ip, cls
Number of RAW sockets in state inuse.
node_sockstat_TCP6_inuse
gauge
instance, ins, job, ip, cls
Number of TCP6 sockets in state inuse.
node_sockstat_TCP_alloc
gauge
instance, ins, job, ip, cls
Number of TCP sockets in state alloc.
node_sockstat_TCP_inuse
gauge
instance, ins, job, ip, cls
Number of TCP sockets in state inuse.
node_sockstat_TCP_mem
gauge
instance, ins, job, ip, cls
Number of TCP sockets in state mem.
node_sockstat_TCP_mem_bytes
gauge
instance, ins, job, ip, cls
Number of TCP sockets in state mem_bytes.
node_sockstat_TCP_orphan
gauge
instance, ins, job, ip, cls
Number of TCP sockets in state orphan.
node_sockstat_TCP_tw
gauge
instance, ins, job, ip, cls
Number of TCP sockets in state tw.
node_sockstat_UDP6_inuse
gauge
instance, ins, job, ip, cls
Number of UDP6 sockets in state inuse.
node_sockstat_UDPLITE6_inuse
gauge
instance, ins, job, ip, cls
Number of UDPLITE6 sockets in state inuse.
node_sockstat_UDPLITE_inuse
gauge
instance, ins, job, ip, cls
Number of UDPLITE sockets in state inuse.
node_sockstat_UDP_inuse
gauge
instance, ins, job, ip, cls
Number of UDP sockets in state inuse.
node_sockstat_UDP_mem
gauge
instance, ins, job, ip, cls
Number of UDP sockets in state mem.
node_sockstat_UDP_mem_bytes
gauge
instance, ins, job, ip, cls
Number of UDP sockets in state mem_bytes.
node_sockstat_sockets_used
gauge
instance, ins, job, ip, cls
Number of IPv4 sockets in use.
node_tcp_connection_states
gauge
state, instance, ins, job, ip, cls
Number of connection states.
node_textfile_scrape_error
gauge
instance, ins, job, ip, cls
1 if there was an error opening or reading a file, 0 otherwise
node_time_clocksource_available_info
gauge
ip, device, ins, clocksource, job, instance, cls
Available clocksources read from ‘/sys/devices/system/clocksource’.
node_time_clocksource_current_info
gauge
ip, device, ins, clocksource, job, instance, cls
Current clocksource read from ‘/sys/devices/system/clocksource’.
node_time_seconds
gauge
instance, ins, job, ip, cls
System time in seconds since epoch (1970).
node_time_zone_offset_seconds
gauge
instance, ins, job, time_zone, ip, cls
System time zone offset in seconds.
node_timex_estimated_error_seconds
gauge
instance, ins, job, ip, cls
Estimated error in seconds.
node_timex_frequency_adjustment_ratio
gauge
instance, ins, job, ip, cls
Local clock frequency adjustment.
node_timex_loop_time_constant
gauge
instance, ins, job, ip, cls
Phase-locked loop time constant.
node_timex_maxerror_seconds
gauge
instance, ins, job, ip, cls
Maximum error in seconds.
node_timex_offset_seconds
gauge
instance, ins, job, ip, cls
Time offset in between local system and reference clock.
node_timex_pps_calibration_total
counter
instance, ins, job, ip, cls
Pulse per second count of calibration intervals.
node_timex_pps_error_total
counter
instance, ins, job, ip, cls
Pulse per second count of calibration errors.
node_timex_pps_frequency_hertz
gauge
instance, ins, job, ip, cls
Pulse per second frequency.
node_timex_pps_jitter_seconds
gauge
instance, ins, job, ip, cls
Pulse per second jitter.
node_timex_pps_jitter_total
counter
instance, ins, job, ip, cls
Pulse per second count of jitter limit exceeded events.
node_timex_pps_shift_seconds
gauge
instance, ins, job, ip, cls
Pulse per second interval duration.
node_timex_pps_stability_exceeded_total
counter
instance, ins, job, ip, cls
Pulse per second count of stability limit exceeded events.
node_timex_pps_stability_hertz
gauge
instance, ins, job, ip, cls
Pulse per second stability, average of recent frequency changes.
node_timex_status
gauge
instance, ins, job, ip, cls
Value of the status array bits.
node_timex_sync_status
gauge
instance, ins, job, ip, cls
Is clock synchronized to a reliable server (1 = yes, 0 = no).
node_timex_tai_offset_seconds
gauge
instance, ins, job, ip, cls
International Atomic Time (TAI) offset.
node_timex_tick_seconds
gauge
instance, ins, job, ip, cls
Seconds between clock ticks.
node_udp_queues
gauge
ip, queue, ins, job, exported_ip, instance, cls
Number of allocated memory in the kernel for UDP datagrams in bytes.
A metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which promtail was built, and the goos and goarch for the build.
promtail_config_reload_fail_total
Unknown
instance, ins, job, ip, cls
N/A
promtail_config_reload_success_total
Unknown
instance, ins, job, ip, cls
N/A
promtail_dropped_bytes_total
Unknown
host, ip, ins, job, reason, instance, cls
N/A
promtail_dropped_entries_total
Unknown
host, ip, ins, job, reason, instance, cls
N/A
promtail_encoded_bytes_total
Unknown
host, ip, ins, job, instance, cls
N/A
promtail_file_bytes_total
gauge
path, instance, ins, job, ip, cls
Number of bytes total.
promtail_files_active_total
gauge
instance, ins, job, ip, cls
Number of active files.
promtail_mutated_bytes_total
Unknown
host, ip, ins, job, reason, instance, cls
N/A
promtail_mutated_entries_total
Unknown
host, ip, ins, job, reason, instance, cls
N/A
promtail_read_bytes_total
gauge
path, instance, ins, job, ip, cls
Number of bytes read.
promtail_read_lines_total
Unknown
path, instance, ins, job, ip, cls
N/A
promtail_request_duration_seconds_bucket
Unknown
host, ip, ins, job, status_code, le, instance, cls
N/A
promtail_request_duration_seconds_count
Unknown
host, ip, ins, job, status_code, instance, cls
N/A
promtail_request_duration_seconds_sum
Unknown
host, ip, ins, job, status_code, instance, cls
N/A
promtail_sent_bytes_total
Unknown
host, ip, ins, job, instance, cls
N/A
promtail_sent_entries_total
Unknown
host, ip, ins, job, instance, cls
N/A
promtail_targets_active_total
gauge
instance, ins, job, ip, cls
Number of active total.
promtail_up
Unknown
instance, ins, job, ip, cls
N/A
request_duration_seconds_bucket
Unknown
instance, ins, job, status_code, route, ws, le, ip, cls, method
The max number of TCP connections that can be accepted (0 means no limit).
up
Unknown
instance, ins, job, ip, cls
N/A
10.2 - FAQ
Pigsty NODE module frequently asked questions
How to configure NTP service?
If NTP is not configured, use a public NTP service or sync time with the admin node.
If your nodes already have NTP configured, you can leave it there by setting node_ntp_enabled to false.
Otherwise, if you have Internet access, you can use public NTP services such as pool.ntp.org.
If you don’t have Internet access, at least you can sync time with the admin node with the following:
node_ntp_servers: # NTP servers in /etc/chrony.conf - pool cn.pool.ntp.org iburst
- pool ${admin_ip} iburst # assume non-admin nodes do not have internet access
How to force sync time on nodes?
Use chronyc to sync time. You have to configure the NTP service first.
ansible all -b -a 'chronyc -a makestep'# sync time
You can replace all with any group or host IP address to limit execution scope.
Remote nodes are not accessible via SSH commands.
Consider using Ansible connection parameters if the target machine is hidden behind an SSH springboard machine,
or if some customizations have been made that cannot be accessed directly using ssh ip.
Additional SSH ports can be specified with ansible_port or ansible_host for SSH Alias.
When performing deployments and changes, the admin user used must have ssh and sudo privileges for all nodes. Password-free is not required.
You can pass in ssh and sudo passwords via the -k|-K parameter when executing the playbook or even use another user to run the playbook via -eansible_host=<another_user>.
However, Pigsty strongly recommends configuring SSH passwordless login with passwordless sudo for the admin user.
Create an admin user with the existing admin user.
This will create an admin user specified by node_admin_username with the existing one on that node.
Pigsty will try to include all dependencies in the local yum repo on infra nodes. This repo file will be added according to node_repo_modules.
And existing repo files will be removed by default according to the default value of node_repo_remove. This will prevent the node from using the Internet repo or some stupid issues.
If you want to keep existing repo files during node init, just set node_repo_remove to false.
If you want to keep existing repo files during infra node local repo bootstrap, just set repo_remove to false.
Why my shell prompt change and how to restore it?
The pigsty prompt is defined with the environment variable PS1 in /etc/profile.d/node.sh.
To restore your existing prompt, just remove that file and login again.
Tencent OpenCloudOS Compatibility Issue
OpenCloudOS does not have softdog module, overwrite node_kernel_modules on global vars:
Pigsty use etcd as DCS: Distributed configuration storage (or distributed consensus service). Which is critical to PostgreSQL High-Availability & Auto-Failover.
You have to install ETCD module before any PGSQL modules, since patroni & vip-manager will rely on etcd to work. Unless you are using an external etcd cluster.
You don’t need NODE module to install ETCD, but it requires a valid CA on your local files/pki/ca. Check ETCD Administration SOP for more details.
Configuration
You have to define an etcd cluster before deploying it. There some parameters about etcd.
It is recommending to have at least 3 instances for a serious production environment.
Single Node
Define a group etcd in the inventory, It will create a singleton etcd instance.
# etcd cluster for ha postgresetcd:{hosts:{10.10.10.10:{etcd_seq: 1 } }, vars:{etcd_cluster:etcd } }
This is good enough for development, testing & demonstration, but not recommended in serious production environment.
Three Nodes
You can define etcd cluster with multiple nodes.
etcd:# dcs service for postgres/patroni ha consensushosts:# 1 node for testing, 3 or 5 for production10.10.10.10:{etcd_seq:1}# etcd_seq required10.10.10.11:{etcd_seq:2}# assign from 1 ~ n10.10.10.12:{etcd_seq:3}# odd number pleasevars:# cluster level parameter override roles/etcdetcd_cluster:etcd # mark etcd cluster name etcdetcd_safeguard:false# safeguard against purgingetcd_clean:true# purge etcd during init process
You can use more nodes for production environment, but 3 or 5 nodes are recommended. Remember to use odd number for cluster size.
Administration
Here are some useful administration tasks for etcd:
./pgsql.yml -t pg_conf # regenerate patroni configansible all -f 1 -b -a 'systemctl reload patroni'# reload patroni config
To update etcd endpoints reference on vip-manager, (optional, if you are using a L2 vip)
./pgsql.yml -t pg_vip_config # regenerate vip-manager configansible all -f 1 -b -a 'systemctl restart vip-manager'# restart vip-manager to use new config
etcdctl member add <etcd-?> --learner=true --peer-urls=https://<new_ins_ip>:2380
./etcd.yml -l <new_ins_ip> -e etcd_init=existing
etcdctl member promote <new_ins_server_id>
Detail: Add member to etcd cluster
Here’s the detail, let’s start from one single etcd instance.
etcd:hosts:10.10.10.10:{etcd_seq:1}# <--- this is the existing instance10.10.10.11:{etcd_seq:2}# <--- add this new member definition to inventoryvars:{etcd_cluster:etcd }
Add a learner instance etcd-2 to cluster with etcd member add:
# tell the existing cluster that a new member etcd-2 is coming$ etcdctl member add etcd-2 --learner=true --peer-urls=https://10.10.10.11:2380
Member 33631ba6ced84cf8 added to cluster 6646fbcf5debc68f
ETCD_NAME="etcd-2"ETCD_INITIAL_CLUSTER="etcd-2=https://10.10.10.11:2380,etcd-1=https://10.10.10.10:2380"ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.10.10.11:2380"ETCD_INITIAL_CLUSTER_STATE="existing"
Check the member list with etcdctl member list (or em list), we can see an unstarted member:
$ etcdctl member promote 33631ba6ced84cf8 # promote the new learnerMember 33631ba6ced84cf8 promoted in cluster 6646fbcf5debc68f
$ em list # check again, the new member is started33631ba6ced84cf8, started, etcd-2, https://10.10.10.11:2380, https://10.10.10.11:2379, false429ee12c7fbab5c1, started, etcd-1, https://10.10.10.10:2380, https://10.10.10.10:2379, fals
The new member is added, don’t forget to reload config.
Repeat the steps above to add more members. remember to use at least 3 members for production.
Remove Member
To remove a member from existing etcd cluster, it usually takes 3 steps:
remove/uncomment it from inventory and reload config
remove it with etcdctl member remove <server_id> command and kick it out of the cluster
temporarily add it back to inventory and purge that instance, then remove it from inventory permanently
Detail: Remove member from etcd cluster
Here’s the detail, let’s start from a 3 instance etcd cluster:
etcd:hosts:10.10.10.10:{etcd_seq:1}10.10.10.11:{etcd_seq:2}10.10.10.12:{etcd_seq:3}# <---- comment this line, then reload-configvars:{etcd_cluster:etcd }
Then, you’ll have to actually kick it from cluster with etcdctl member remove command:
$ etcdctl member list
429ee12c7fbab5c1, started, etcd-1, https://10.10.10.10:2380, https://10.10.10.10:2379, false33631ba6ced84cf8, started, etcd-2, https://10.10.10.11:2380, https://10.10.10.11:2379, false93fcf23b220473fb, started, etcd-3, https://10.10.10.12:2380, https://10.10.10.12:2379, false# <--- remove this$ etcdctl member remove 93fcf23b220473fb # kick it from clusterMember 93fcf23b220473fb removed from cluster 6646fbcf5debc68f
Finally, you have to shutdown the instance, and purge it from node, you have to uncomment the member in inventory temporarily, then purge it with etcd.yml playbook:
./etcd.yml -t etcd_purge -l 10.10.10.12 # purge it (the member is in inventory again)
After that, remove the member from inventory permanently, all clear!
Playbook
There’s a built-in playbook: etcd.yml for installing etcd cluster. But you have to define it first.
./etcd.yml # install etcd cluster on group 'etcd'
Here are available sub tasks:
etcd_assert : generate etcd identity
etcd_install : install etcd rpm packages
etcd_clean : cleanup existing etcd
etcd_check : check etcd instance is running
etcd_purge : remove running etcd instance & data
etcd_dir : create etcd data & conf dir
etcd_config : generate etcd config
etcd_conf : generate etcd main config
etcd_cert : generate etcd ssl cert
etcd_launch : launch etcd service
etcd_register : register etcd to prometheus
If etcd_safeguard is true, or etcd_clean is false,
the playbook will abort if any running etcd instance exists to prevent purge etcd by accident.
Start time of the process since unix epoch in seconds.
process_virtual_memory_bytes
gauge
cls, ins, instance, job, ip
Virtual memory size in bytes.
process_virtual_memory_max_bytes
gauge
cls, ins, instance, job, ip
Maximum amount of virtual memory available in bytes.
promhttp_metric_handler_requests_in_flight
gauge
cls, ins, instance, job, ip
Current number of scrapes being served.
promhttp_metric_handler_requests_total
counter
cls, ins, instance, job, ip, code
Total number of scrapes by HTTP status code.
scrape_duration_seconds
Unknown
cls, ins, instance, job, ip
N/A
scrape_samples_post_metric_relabeling
Unknown
cls, ins, instance, job, ip
N/A
scrape_samples_scraped
Unknown
cls, ins, instance, job, ip
N/A
scrape_series_added
Unknown
cls, ins, instance, job, ip
N/A
up
Unknown
cls, ins, instance, job, ip
N/A
11.2 - FAQ
Pigsty ETCD dcs module frequently asked questions
What is the role of the etcd in pigsty?
etcd is a distributed, reliable key-value store used to store the most critical config / consensus data in the deployment.
Pigsty uses etcd as the DCS (Distributed Configuration Store) service for Patroni, which will store the high availability status information of the PostgreSQL cluster.
How many etcd instance should I choose?
If more than (include) half of the etcd instances are down, the etcd cluster and its service will be unavailable.
For example, a 3-node etcd cluster can tolerate at most one node failure, and the other two nodes can still work normally; while a 5-node etcd cluster can tolerate 2 node failures.
Beware that the learner instances in the etcd cluster do not count in the member number, so in a 3-node etcd cluster, if there is a learner instance, the actual member count is 2, so no node failure can be tolerated.
It is advisable to choose an odd number of etcd instances to avoid split-brain scenarios. It is recommended to use 3 or 5 nodes for the production environment.
What is the impact of etcd failure?
If etcd cluster is unavailable, it will affect the control plane of Pigsty, but not the data plane - the existing PostgreSQL cluster will continue to serve, but admin operations through Patroni will not work.
During etcd failure, PostgreSQL HA is unable to perform automatic failover, and most of the Patroni operations will be blocked, such as edit-config, restart, switchover, etc…
Admin tasks through Ansible playbooks are usually not affected by etcd failure, such as create database, create user, reload HBA and Service, etc…, And you can always operate the PostgreSQL cluster directly to achieve most of the patroni functions.
Beware that the above description is only applicable to newer versions of Patroni (>=3.0, Pigsty >= 2.0). If you are using an older version of Patroni (<3.0, corresponding to Pigsty version 1.x), etcd / consul failure will cause a serious impact:
All PostgreSQL clusters will be demoted and reject write requests, and etcd failure will be amplified as a global PostgreSQL failure. After Patroni 3.0’s DCS Failsafe feature, this situation has been significantly improved.
What data is stored in the etcd cluster?
etcd is only used for PostgreSQL HA consensus in Pigsty, no other data is stored in etcd by default.
These consensus data are managed by Patroni, and when these data are lost in etcd, Patroni will automatically rebuild them.
Thus, by default, the etcd in Pigsty can be regarded as a “stateless service” that is disposable, which brings great convenience to maintenance work.
If you use etcd for other purposes, such as storing metadata for Kubernetes, or storing other data, you need to back up the etcd data yourself and restore the data after the etcd cluster is restored.
How to recover from etcd failure?
Since etcd is disposable in Pigsty, you can quickly stop the bleeding by “restarting” or “redeploying” etcd in case of failure.
To Restart the etcd cluster, you can use the following Ansible command (or systemctl restart etcd):
./etcd.yml -t etcd_launch
To Reset the etcd cluster, you can run this playbook, it will nuke the etcd cluster and redeploy it:
./etcd.yml
Beware that if you use etcd to store other data, don’t forget to backup etcd data before nuking the etcd cluster.
Is any maintenance work for etcd cluster?
In short: do not use all the quota of etcd.
etcd has a default quota for database size of 2GB, if your etcd database size exceeds this limit, etcd will reject write requests.
Meanwhile, as etcd’s data model illustrates, each write will generate a new version (a.k.a. revision),
so if your etcd cluster writes frequently, even with very few keys, the etcd database size may continue to grow, and may fail when it reaches the quota limit.
You can achieve this by Auto Compact, Manual Compact, Defragmentation, and Quota Increase, etc., please refer to the etcd official maintenance guide.
Pigsty has auto compact enabled by default since v2.6, so you usually don’t have to worry about etcd full.
For versions before v2.6, we strongly recommend enabling etcd’s auto compact feature in the production environment.
Fill etcd may lead to PostgreSQL failure!
For Pigsty v2.0 - v2.5 users, we strongly recommend upgrading to a newer version, or following the instructions below to enable etcd auto compaction!
How to enable etcd auto compaction?
If you are using an earlier version of Pigsty (v2.0 - v2.5), we strongly recommend that you enable etcd’s auto compaction feature in the production environment.
You can set all the PostgreSQL cluster to maintenance mode and then redeploy the etcd cluster with ./etcd.yml to apply the these changes.
It will increase the etcd default quota from 2 GiB to 16 GiB, and ensure that only the most recent day’s write history is retained, avoiding the infinite growth of the etcd database size.
Where does the PostgreSQL HA data stored in etcd?
Patroni will use the pg_namespace (default is /pg) as the prefix for all metadata keys in etcd, followed by the PostgreSQL cluster name.
For example, a PG cluster named pg-meta, its metadata keys will be stored under /pg/pg-meta, which may look like this:
The hard-coded group, etcd, will be used as DCS servers for PGSQL. You can initialize them with etcd.yml or assume it is an existing external etcd cluster.
To use an existing external etcd cluster, define them as usual and make sure your current etcd cluster certificate is signed by the same CA as your self-signed CA for PGSQL.
How to add a new member to the existing etcd cluster?
MinIO is an S3-compatible object storage server. It’s designed to be scalable, secure, and easy to use.
It has native multi-node multi-driver HA support and can store documents, pictures, videos, and backups.
Pigsty uses MinIO as an optional PostgreSQL backup storage repo, in addition to the default local posix FS repo.
If the MinIO repo is used, the MINIO module should be installed before any PGSQL modules.
MinIO requires a trusted CA to work, so you have to install it in addition to NODE module.
Beware than MinIO mandates HTTPS access, so please ensure that the MinIO service domain (default to sss.pigsty) correctly points to the MinIO server node:
You can ask your SA to add a DNS record in the intranet DNS server
If you enable the DNS server on the Infra node, you can add records in dns_records
You can also add static resolution records to all nodes in node_etc_hosts
The only required params are minio_seq and minio_cluster, which generate a unique identity for each MinIO instance.
Single-Node Single-Driver mode is for development purposes, so you can use a common dir as the data dir, which is /data/minio by default.
Beware that in multi-driver or multi-node mode, MinIO will refuse to start if using a common dir as the data dir rather than a mount point.
To use multiple disks on a single node, you have to specify the minio_data in the format of {{ prefix }}{x...y}, which defines a series of disk mount points.
minio:hosts:{10.10.10.10:{minio_seq:1}}vars:minio_cluster:minio # minio cluster name, minio by defaultminio_data:'/data{1...4}'# minio data dir(s), use {x...y} to specify multi drivers
This example defines a single-node MinIO cluster with 4 drivers: /data1, /data2, /data3, /data4. You have to mount them properly before launching MinIO:
mkfs.xfs /dev/sdb; mkdir /data1; mount -t xfs /dev/sdb /data1;# mount 1st driver, ...
The extra minio_node param will be used for a multi-node deployment:
minio:hosts:10.10.10.10:{minio_seq:1}10.10.10.11:{minio_seq:2}10.10.10.12:{minio_seq:3}vars:minio_cluster:miniominio_data:'/data{1...2}'# use two disk per nodeminio_node:'${minio_cluster}-${minio_seq}.pigsty'# minio node name pattern
The ${minio_cluster} and ${minio_seq} will be replaced with the value of minio_cluster and minio_seq respectively and used as MinIO nodename.
Expose Service
MinIO will serve on port 9000 by default. If a multi-node MinIO cluster is deployed, you can access its service via any node.
It would be better to expose MinIO service via a load balancer, such as the default haproxy on NODE, or use the L2 vip.
To expose MinIO service with haproxy, you have to define an extra service with haproxy_services:
minio:hosts:10.10.10.10:{minio_seq: 1 , nodename:minio-1 }10.10.10.11:{minio_seq: 2 , nodename:minio-2 }10.10.10.12:{minio_seq: 3 , nodename:minio-3 }vars:minio_cluster:minionode_cluster:miniominio_data:'/data{1...2}'# use two disk per nodeminio_node:'${minio_cluster}-${minio_seq}.pigsty'# minio node name patternhaproxy_services:# EXPOSING MINIO SERVICE WITH HAPROXY- name:minio # [REQUIRED] service name, uniqueport:9002# [REQUIRED] service port, uniqueoptions:# [OPTIONAL] minio health check- option httpchk- option http-keep-alive- http-check send meth OPTIONS uri /minio/health/live- http-check expect status 200servers:- {name: minio-1 ,ip: 10.10.10.10 ,port: 9000 ,options:'check-ssl ca-file /etc/pki/ca.crt check port 9000'}- {name: minio-2 ,ip: 10.10.10.11 ,port: 9000 ,options:'check-ssl ca-file /etc/pki/ca.crt check port 9000'}- {name: minio-3 ,ip: 10.10.10.12 ,port: 9000 ,options:'check-ssl ca-file /etc/pki/ca.crt check port 9000'}
# This is the newly added HA MinIO Repo definition, USE THIS INSTEAD!minio_ha:type:s3s3_endpoint: minio-1.pigsty # s3_endpoint could be any load balancer:10.10.10.1{0,1,2},or domain names point to any of the 3 nodess3_region: us-east-1 # you could use external domain name:sss.pigsty , which resolve to any members (`minio_domain`)s3_bucket: pgsql # instance & nodename can be used :minio-1.pigsty minio-1.pigsty minio-1.pigsty minio-1 minio-2 minio-3s3_key:pgbackrest # Better using a new password for MinIO pgbackrest users3_key_secret:S3User.SomeNewPassWords3_uri_style:pathpath:/pgbackreststorage_port:9002# Use the load balancer port 9002 instead of default 9000 (direct access)storage_ca_file:/etc/pki/ca.crtbundle:ycipher_type:aes-256-cbc # Better using a new cipher password for your production environmentcipher_pass:pgBackRest.With.Some.Extra.PassWord.And.Salt.${pg_cluster}retention_full_type:timeretention_full:14
Expose Admin
MinIO will serve an admin web portal on port 9001 by default.
It’s not wise to expose the admin portal to the public, but if you wish to do so, add MinIO to the infra_portal and refresh the nginx server:
Check the MinIO demo config and special Vagrantfile for more details.
Administration
Here are some common MinIO mcli commands for reference, check MinIO Client for more details.
Create Cluster
To create a defined minio cluster, run the minio.yml playbook on minio group:
./minio.yml -l minio # install minio cluster on group 'minio'
Client Setup
To access MinIO servers, you have to configure client mcli alias first:
mcli alias ls # list minio alias (there's a sss by default)mcli aliasset sss https://sss.pigsty:9000 minioadmin minioadmin # root usermcli aliasset pgbackrest https://sss.pigsty:9000 pgbackrest S3User.Backup # backup user
You can manage business users with mcli as well:
mcli admin user list sss # list all users on sssset +o history# hide password in history and create minio usermcli admin user add sss dba S3User.DBA
mcli admin user add sss pgbackrest S3User.Backup
set -o history
CRUD
You can CRUD minio bucket with mcli:
mcli ls sss/ # list buckets of alias 'sss'mcli mb --ignore-existing sss/hello # create a bucket named 'hello'mcli rb --force sss/hello # remove bucket 'hello' with force
Or perform object CRUD:
mcli cp -r /www/pigsty/*.rpm sss/infra/repo/ # upload files to bucket 'infra' with prefix 'repo'mcli cp sss/infra/repo/pg_exporter-0.5.0.x86_64.rpm /tmp/ # download file from minio to local
Playbook
There’s a built-in playbook: minio.yml for installing the MinIO cluster. But you have to define it first.
#minio_seq: 1 # minio instance identifier, REQUIREDminio_cluster:minio # minio cluster name, minio by defaultminio_clean:false# cleanup minio during init?, false by defaultminio_user:minio # minio os user, `minio` by defaultminio_node:'${minio_cluster}-${minio_seq}.pigsty'# minio node name patternminio_data:'/data/minio'# minio data dir(s), use {x...y} to specify multi driversminio_domain:sss.pigsty # minio external domain name, `sss.pigsty` by defaultminio_port:9000# minio service port, 9000 by defaultminio_admin_port:9001# minio console port, 9001 by defaultminio_access_key:minioadmin # root access key, `minioadmin` by defaultminio_secret_key:minioadmin # root secret key, `minioadmin` by defaultminio_extra_vars:''# extra environment variablesminio_alias:sss # alias name for local minio deploymentminio_buckets:[{name:pgsql }, { name: infra }, { name: redis } ]minio_users:- {access_key: dba , secret_key: S3User.DBA, policy:consoleAdmin }- {access_key: pgbackrest , secret_key: S3User.Backup, policy:readwrite }
Pigsty has built-in Redis support, which is a high-performance data-structure server. Deploy redis in standalone, cluster or sentinel mode.
Concept
The entity model of Redis is almost the same as that of PostgreSQL, which also includes the concepts of Cluster and Instance. The Cluster here does not refer to the native Redis Cluster mode.
The core difference between the REDIS module and the PGSQL module is that Redis uses a single-node multi-instance deployment rather than the 1:1 deployment: multiple Redis instances are typically deployed on a physical/virtual machine node to utilize multi-core CPUs fully. Therefore, the ways to configure and administer Redis instances are slightly different from PGSQL.
In Redis managed by Pigsty, nodes are entirely subordinate to the cluster, which means that currently, it is not allowed to deploy Redis instances of two different clusters on one node. However, this does not affect deploying multiple independent Redis primary replica instances on one node.
Configuration
Redis Identity
Redis identity parameters are required parameters when defining a Redis cluster.
A Redis node can only belong to one Redis cluster, which means you cannot assign a node to two different Redis clusters simultaneously.
On each Redis node, you need to assign a unique port number to the Redis instance to avoid port conflicts.
Typically, the same Redis cluster will use the same password, but multiple Redis instances on a Redis node cannot set different passwords (because redis_exporter only allows one password).
Redis Cluster has built-in HA, while standalone HA requires manually configured in Sentinel because we are unsure if you have any sentinels available. Fortunately, configuring standalone Redis HA is straightforward: Configure HA with sentinel.
Administration
Here are some common administration tasks for Redis. Check FAQ: Redis for more details.
Init Redis
Init Cluster/Node/Instance
# init all redis instances on group <cluster>./redis.yml -l <cluster> # init redis cluster# init redis node./redis.yml -l 10.10.10.10 # init redis node# init one specific redis instance 10.10.10.11:6379./redis.yml -l 10.10.10.11 -e redis_port=6379 -t redis
Beware that redis can not be reload online, you have to restart redis to make config effective.
Use Redis Client Tools
Access redis instance with redis-cli:
$ redis-cli -h 10.10.10.10 -p 6379# <--- connect with host and port10.10.10.10:6379> auth redis.ms # <--- auth with passwordOK
10.10.10.10:6379> set a 10# <--- set a keyOK
10.10.10.10:6379> get a # <--- get a key back"10"
Redis also has a redis-benchmark which can be used for benchmark and generate load on redis server:
# promote a redis instance to primary> REPLICAOF NO ONE
"OK"# make a redis instance replica of another instance> REPLICAOF 127.0.0.1 6799"OK"
Configure HA with Sentinel
You have to enable HA for redis standalone m-s cluster manually with your redis sentinel.
Take the 4-node sandbox as an example, a redis sentinel cluster redis-meta is used manage the redis-ms standalone cluster.
# for each sentinel, add redis master to the sentinel with:$ redis-cli -h 10.10.10.11 -p 26379 -a redis.meta
10.10.10.11:26379> SENTINEL MONITOR redis-ms 10.10.10.10 6379110.10.10.11:26379> SENTINEL SET redis-ms auth-pass redis.ms # if auth enabled, password has to be configured
If you wish to remove a redis master from sentinel, use SENTINEL REMOVE <name>.
You can configure multiple redis master on sentinel cluster with redis_sentinel_monitor.
redis_sentinel_monitor:# primary list for redis sentinel, use cls as name, primary ip:port- {name: redis-src, host: 10.10.10.45, port: 6379 ,password: redis.src, quorum:1}- {name: redis-dst, host: 10.10.10.48, port: 6379 ,password: redis.dst, quorum:1}
And refresh master list on sentinel cluster with:
./redis.yml -l redis-meta -t redis-ha # replace redis-meta if your sentinel cluster has different name
#redis_cluster: <CLUSTER> # redis cluster name, required identity parameter#redis_node: 1 <NODE> # redis node sequence number, node int id required#redis_instances: {} <NODE> # redis instances definition on this redis noderedis_fs_main:/data # redis main data mountpoint, `/data` by defaultredis_exporter_enabled:true# install redis exporter on redis nodes?redis_exporter_port:9121# redis exporter listen port, 9121 by defaultredis_exporter_options:''# cli args and extra options for redis exporterredis_safeguard:false# prevent purging running redis instance?redis_clean:true# purging existing redis during init?redis_rmdata:true# remove redis data when purging redis server?redis_mode: standalone # redis mode:standalone,cluster,sentinelredis_conf:redis.conf # redis config template path, except sentinelredis_bind_address:'0.0.0.0'# redis bind address, empty string will use host ipredis_max_memory:1GB # max memory used by each redis instanceredis_mem_policy:allkeys-lru # redis memory eviction policyredis_password:''# redis password, empty string will disable passwordredis_rdb_save:['1200 1']# redis rdb save directives, disable with empty listredis_aof_enabled:false# enable redis append only file?redis_rename_commands:{}# rename redis dangerous commandsredis_cluster_replicas:1# replica number for one master in redis clusterredis_sentinel_monitor:[]# sentinel master list, works on sentinel cluster only
Start time of the Redis instance since unix epoch in seconds.
redis_target_scrape_request_errors_total
counter
cls, ip, instance, ins, job
Errors in requests to the exporter
redis_total_error_replies
counter
cls, ip, instance, ins, job
total_error_replies metric
redis_total_reads_processed
counter
cls, ip, instance, ins, job
total_reads_processed metric
redis_total_system_memory_bytes
gauge
cls, ip, instance, ins, job
total_system_memory_bytes metric
redis_total_writes_processed
counter
cls, ip, instance, ins, job
total_writes_processed metric
redis_tracking_clients
gauge
cls, ip, instance, ins, job
tracking_clients metric
redis_tracking_total_items
gauge
cls, ip, instance, ins, job
tracking_total_items metric
redis_tracking_total_keys
gauge
cls, ip, instance, ins, job
tracking_total_keys metric
redis_tracking_total_prefixes
gauge
cls, ip, instance, ins, job
tracking_total_prefixes metric
redis_unexpected_error_replies
counter
cls, ip, instance, ins, job
unexpected_error_replies metric
redis_up
gauge
cls, ip, instance, ins, job
Information about the Redis instance
redis_uptime_in_seconds
gauge
cls, ip, instance, ins, job
uptime_in_seconds metric
scrape_duration_seconds
Unknown
cls, ip, instance, ins, job
N/A
scrape_samples_post_metric_relabeling
Unknown
cls, ip, instance, ins, job
N/A
scrape_samples_scraped
Unknown
cls, ip, instance, ins, job
N/A
scrape_series_added
Unknown
cls, ip, instance, ins, job
N/A
up
Unknown
cls, ip, instance, ins, job
N/A
13.2 - FAQ
Pigsty REDIS module frequently asked questions
ABORT due to existing redis instance
use redis_clean = true and redis_safeguard = false to force clean redis data
This happens when you run redis.yml to init a redis instance that is already running, and redis_clean is set to false.
If redis_clean is set to true (and the redis_safeguard is set to false, too), the redis.yml playbook will remove the existing redis instance and re-init it as a new one, which makes the redis.yml playbook fully idempotent.
ABORT due to redis_safeguard enabled
This happens when removing a redis instance with redis_safeguard set to true.
You can disable redis_safeguard to remove the Redis instance. This is redis_safeguard is what it is for.
How to add a single new redis instance on this node?
Use bin/redis-add <ip> <port> to deploy a new redis instance on node.
How to remove a single redis instance from the node?
bin/redis-rm <ip> <port> to remove a single redis instance from node
14 - Module: FERRET
Pigsty has built-in FerretDB support, which is a MongoDB compatiable middleware based on PostgreSQL.
MongoDB was once a stunning technology, allowing developers to cast aside the “schema constraints” of relational databases and quickly build applications. However, over time, MongoDB abandoned its open-source nature, changing its license to SSPL, which made it unusable for many open-source projects and early commercial projects. Most MongoDB users actually do not need the advanced features provided by MongoDB, but they do need an easy-to-use open-source document database solution. To fill this gap, FerretDB was born.
PostgreSQL’s JSON functionality is already well-rounded: binary storage JSONB, GIN arbitrary field indexing, various JSON processing functions, JSON PATH, and JSON Schema, it has long been a fully-featured, high-performance document database. However, providing alternative functionality and direct emulation are not the same. FerretDB can provide a smooth transition to PostgreSQL for applications driven by MongoDB drivers.
Pigsty provided a Docker-Compose support for FerretDB in 1.x, and native deployment support since v2.3. As an optional feature, it greatly benefits the enrichment of the PostgreSQL ecosystem. The Pigsty community has already become a partner with the FerretDB community, and we shall find more opportunities to work together in the future.
Configuration
You have to define a Mongo (FerretDB) cluster before deploying it. There are some parameters for it:
Here’s an example to utilize the default single-node pg-meta cluster as MongoDB:
To create a defined mongo/ferretdb cluster, run the mongo.yml playbook:
./mongo.yml -l ferret # install mongo/ferretdb on group 'ferret'
Since FerretDB saves all data in underlying PostgreSQL, it is safe to run the playbook multiple times.
Remove Cluster
To remove a mongo/ferretdb cluster, run the mongo.yml playbook with mongo_purge subtask and mongo_purge flag.
./mongo.yml -e mongo_purge=true -t mongo_purge
FerretDB Connect
You can connect to FerretDB with any MongoDB driver using the MongoDB connection string, here we use the mongosh command line tool installed above as an example:
Since Pigsty uses the scram-sha-256 as the default auth method, you must use the PLAIN auth mechanism to connect to FerretDB. Check FerretDB: authentication for details.
You can also use other PostgreSQL users to connect to FerretDB, just specify them in the connection string:
To generate some load, you can run a simple benchmark with mongosh:
cat > benchmark.js <<'EOF'
const coll = "testColl";
const numDocs = 10000;
for (let i = 0; i < numDocs; i++) { // insert
db.getCollection(coll).insert({ num: i, name: "MongoDB Benchmark Test" });
}
for (let i = 0; i < numDocs; i++) { // select
db.getCollection(coll).find({ num: i });
}
for (let i = 0; i < numDocs; i++) { // update
db.getCollection(coll).update({ num: i }, { $set: { name: "Updated" } });
}
for (let i = 0; i < numDocs; i++) { // delete
db.getCollection(coll).deleteOne({ num: i });
}
EOFmongosh 'mongodb://dbuser_meta:DBUser.Meta@10.10.10.10:27017?authMechanism=PLAIN' benchmark.js
You can check supported Mongo commands on ferretdb: supported commands, and there may be some differences between MongoDB and FerretDB. Check ferretdb: differences for details, it’s not a big deal for sane usage.
Playbook
There’s a built-in playbook mongo.yml for installing the FerretDB cluster. But you have to define it first.
mongo.yml
mongo.yml: Install MongoDB/FerretDB on the target host.
This playbook consists of the following sub-tasks:
mongo_check : check mongo identity
mongo_dbsu : create os user mongod
mongo_install : install mongo/ferretdb rpm
mongo_purge : purge mongo/ferretdb
mongo_config : config mongo/ferretdb
mongo_cert : issue mongo/ferretdb ssl certs
mongo_launch : launch mongo/ferretdb service
mongo_register : register mongo/ferretdb to prometheus
Here are some basic MySQL cluster management operations:
Create MySQL cluster with mysql.yml:
./mysql.yml -l my-test
Playbook
Pigsty has the following playbooks related to the MYSQL module:
mysql.yml: Deploy MySQL according to the inventory
mysql.yml
The playbook mysql.yml contains the following subtasks:
mysql-id : generate mysql instance identity
mysql_clean : remove existing mysql instance (DANGEROUS)mysql_dbsu : create os user mysql
mysql_install : install mysql rpm/deb packages
mysql_dir : create mysql data & conf dir
mysql_config : generate mysql config file
mysql_boot : bootstrap mysql cluster
mysql_launch : launch mysql service
mysql_pass : write mysql password
mysql_db : create mysql biz database
mysql_user : create mysql biz user
mysql_exporter : launch mysql exporter
mysql_register : register mysql service to prometheus
#-----------------------------------------------------------------# MYSQL_IDENTITY#-----------------------------------------------------------------# mysql_cluster: #CLUSTER # mysql cluster name, required identity parameter# mysql_role: replica #INSTANCE # mysql role, required, could be primary,replica# mysql_seq: 0 #INSTANCE # mysql instance seq number, required identity parameter#-----------------------------------------------------------------# MYSQL_BUSINESS#-----------------------------------------------------------------# mysql business object definition, overwrite in group varsmysql_users:[]# mysql business usersmysql_databases:[]# mysql business databasesmysql_services:[]# mysql business services# global credentials, overwrite in global varsmysql_root_username:rootmysql_root_password:DBUser.Rootmysql_replication_username:replicatormysql_replication_password:DBUser.Replicatormysql_admin_username:dbuser_dbamysql_admin_password:DBUser.DBAmysql_monitor_username:dbuser_monitormysql_monitor_password:DBUser.Monitor#-----------------------------------------------------------------# MYSQL_INSTALL#-----------------------------------------------------------------# - install - #mysql_dbsu:mysql # os dbsu name, mysql by default, better not change itmysql_dbsu_uid:27# os dbsu uid and gid, 306 for default mysql users and groupsmysql_dbsu_home:/var/lib/mysql # mysql home directory, `/var/lib/mysql` by defaultmysql_dbsu_ssh_exchange:true# exchange mysql dbsu ssh key among same mysql clustermysql_packages:# mysql packages to be installed, `mysql-community*` by default- mysql-community*- mysqld_exporter# - bootstrap - #mysql_data:/data/mysql # mysql data directory, `/data/mysql` by defaultmysql_listen:'0.0.0.0'# mysql listen addresses, comma separated IP listmysql_port:3306# mysql listen port, 3306 by defaultmysql_sock:/var/lib/mysql/mysql.sock# mysql socket dir, `/var/lib/mysql/mysql.sock` by defaultmysql_pid:/var/run/mysqld/mysqld.pid# mysql pid file, `/var/run/mysqld/mysqld.pid` by defaultmysql_conf:/etc/my.cnf # mysql config file, `/etc/my.cnf` by defaultmysql_log_dir:/var/log # mysql log dir, `/var/log/mysql` by defaultmysql_exporter_port:9104# mysqld_exporter listen port, 9104 by defaultmysql_parameters:{}# extra parameters for mysqldmysql_default_parameters:# default parameters for mysqld
16.2 - Module: Kafka
Deploy kafka with pigsty: open-source distributed event streaming platform
Kafka module is currently available in Pigsty Pro as a Beta Preview.
Kafka requires a Java runtime, so you need to install an available JDK when installing Kafka (OpenJDK 17 is used by default, but other JDKs and versions, such as 8 and 11, can also be used).
Single node Kafka configuration example, please note that in Pigsty single machine deployment mode.
the Kafka Peer 9093 port is already occupied by AlertManager, it is recommended to use other ports, such as (9095).
kf-main:hosts:10.10.10.10:{kafka_seq: 1, kafka_role:controller }vars:kafka_cluster:kf-mainkafka_data:/data/kafkakafka_peer_port:9095# 9093 is already hold by alertmanager
TigerBeetle Requires Linux Kernel Version 5.5 or Higher!
Please note that TigerBeetle supports only Linux kernel version 5.5 or higher, making it incompatible by default with EL7 (3.10) and EL8 (4.18) systems.
To install TigerBeetle, please use EL9 (5.14), Ubuntu 22.04 (5.15), Debian 12 (6.1), Debian 11 (5.10), or another supported system.
16.5 - Module: Kubernetes
Deploy kubernetes, the Production-Grade Container Orchestration Platform
Kubernetes is a production-grade, open-source container orchestration platform. It helps you automate, deploy, scale, and manage containerized applications.
Overview
Pigsty has native support for [ETCD] clusters, which can be used by Kubernetes. Therefore, the pro version also provides the KUBE module for deploying production-grade Kubernetes clusters.
The KUBE module is currently in Beta status and only available for Pro edition customers.
SealOS
SealOS is a lightweight, high-performance, and easy-to-use Kubernetes distribution.
It is designed to simplify the deployment and management of Kubernetes clusters.
Pigsty provides SealOS 5.0 RPM and DEB packages in the Infra repository, which can be downloaded and installed directly.
Kubernetes support multiple container runtimes, if you want to use Containerd as container runtime, please make sure Containerd is installed on the node.
#kube_cluster: #IDENTITY# # define kubernetes cluster name kube_role:node # default kubernetes role (master|node)kube_version:1.31.0# kubernetes versionkube_registry:registry.aliyuncs.com/google_containers # kubernetes version aliyun k8s miiror repositorykube_pod_cidr:"10.11.0.0/16"# kubernetes pod network cidrkube_service_cidr:"10.12.0.0/16"# kubernetes service network cidrkube_dashboard_admin_user:dashboard-admin-sa # kubernetes dashboard admin user name
16.6 - Module: Consul
Deploy consul, the alternative of Etcd, with Pigsty.
In the old version (1.x) of Pigsty, Consul was used as the default high-availability DCS. Now this support has been removed, but it will be provided as a separate module in the future.
#-----------------------------------------------------------------# CONSUL#-----------------------------------------------------------------consul_role:node # consul role, node or server, node by defaultconsul_dc:pigsty # consul data center name, `pigsty` by defaultconsul_data:/data/consul # consul data dir, `/data/consul`consul_clean:true# consul purge flag, if true, clean consul during initconsul_ui:false# enable consul ui, the default value for consul server is true
16.7 - Module: Victoria
Deploy VictoriaMetrics & VictoriaLogs, the in-place replacement for Prometheus & Loki.
VictoriaMetrics is the in-place replacement for Prometheus, offering better performance and compression ratio.
Overview
Victoria is currently only available in the Pigsty Professional Edition Beta preview.
It includes the deployment and management of VictoriaMetrics and VictoriaLogs components.
Installation
Pigsty Infra Repo has the RPM / DEB packages for VictoriaMetrics, use the following command to install:
For common users, installing the standalone version of VictoriaMetrics is sufficient.
If you need to deploy a cluster, you can install the victoria-metrics-cluster package.
16.8 - Module: Jupyter
Launch Jupyter notebook server with Pigsty, a web-based interactive scientific notebook.
Run jupyter notebook with docker, you have to:
change the default password in .env: JUPYTER_TOKEN
create data dir with proper permission: make dir, owned by 1000:100
make up to pull up jupyter with docker compose
cd ~/pigsty/app/jupyter ; make dir up
Visit http://lab.pigsty or http://10.10.10.10:8888, the default password is pigsty
importpsycopg2conn=psycopg2.connect('postgres://dbuser_dba:DBUser.DBA@10.10.10.10:5432/meta')cursor=conn.cursor()cursor.execute('SELECT * FROM pg_stat_activity')foriincursor.fetchall():print(i)
Alias
make up # pull up jupyter with docker composemake dir # create required /data/jupyter and set ownermake run # launch jupyter with dockermake view # print jupyter access pointmake log # tail -f jupyter logsmake info # introspect jupyter with jqmake stop # stop jupyter containermake clean # remove jupyter containermake pull # pull latest jupyter imagemake rmi # remove jupyter imagemake save # save jupyter image to /tmp/docker/jupyter.tgzmake load # load jupyter image from /tmp/docker/jupyter.tgz
17 - Task & Tutorial
Look up common tasks and how to perform them using a short sequence of steps
17.1 - Nginx: Expose Web Service
How to expose, proxy, and forward web services using Nginx?
Pigsty will install Nginx on INFRA Node, as a Web service proxy.
Nginx is the access entry for all WebUI services of Pigsty, and it defaults to the use the 80/443 port on INFRA nodes.
Pigsty provides a global parameter infra_portal to configure Nginx proxy rules and corresponding upstream services.
If you access Nginx directly through the ip:port, it will route to h.pigsty, which is the Pigsty homepage (/www/ directory, served as software repo).
Because Nginx provides multiple services through the same port, it must be distinguished by the domain name (HOST header by the browser). Therefore, by default, Nginx only exposes services with the domain parameter.
And Pigsty will expose grafana, prometheus, and alertmanager services by default in addition to the home server.
How to configure nginx upstream?
Pigsty has a built-in configuration template full.yml, could be used as a reference, and also exposes some Web services in addition to the default services.
Each record in infra_portal is a key-value pair, where the key is the name of the service, and the value is a dictionary.
Currently, there are four available configuration items in the configuration dictionary:
endpoint: REQUIRED, specifies the address of the upstream service, which can be IP:PORT or DOMAIN:PORT.
In this parameter, you can use the placeholder ${admin_ip}, and Pigsty will fill in the value of admin_ip.
domain: OPTIONAL, specifies the domain name of the proxy. If not filled in, Nginx will not expose this service.
For services that need to know the endpoint address but do not want to expose them (such as Loki, Blackbox Exporter), you can leave the domain blank.
scheme: OPTIONAL, specifies the protocol (http/https) when forwarding, leave it blank to default to http.
For services that require HTTPS access (such as the MinIO management interface), you need to specify scheme: https.
websocket: OPTIONAL, specifies whether to enable WebSocket, leave it blank to default to off.
Services that require WebSocket (such as Grafana, Jupyter) need to be set to true to work properly.
If you need to add a new Web service exposed by Nginx, you need to add the corresponding record in the infra_portal parameter in the pigsty.yml file, and then execute the playbook to take effect:
./infra.yml -t nginx # Tune nginx into desired state
To avoid service interruption, you can also execute the following tasks separately:
./infra.yml -t nginx_config # re-generate nginx upstream config in /etc/nginx/conf.d./infra.yml -t nginx_cert # re-generate nginx ssl cert to include new domain namesnginx -s reload # online reload nginx configuration
Nginx distinguishes between different services using the domain name in the HOST header set by the browser. Thus, by default, except for the software repository, you need to access services via domain name.
You can directly access these services via IP address + port, but we recommend accessing all components through Nginx on ports 80/443 using domain names.
When accessing the Pigsty WebUI via domain name, you need to configure DNS resolution or modify the local /etc/hosts file for static resolution. There are several typical methods:
If your service needs to be publicly accessible, you should resolve the internet domain name via a DNS provider (Cloudflare, Aliyun DNS, etc.). Note that in this case, you usually need to modify the Pigsty infra_portalparameter, as the default *.pigsty is not suitable for public use.
If your service needs to be shared within an office network, you should resolve the internal domain name via an internal DNS provider (company internal DNS server) and point it to the IP of the Nginx server. You can request the network administrator to add the appropriate resolution records in the internal DNS server, or ask the system users to manually configure static DNS resolution records.
If your service is only for personal use or a few users (e.g., DBA), you can ask these users to use static domain name resolution records. On Linux/MacOS systems, modify the /etc/hosts file (requires sudo permissions) or C:\Windows\System32\drivers\etc\hosts (Windows) file.
We recommend ordinary single-machine users use the third method, adding the following resolution records on the machine used to access the web system via a browser:
The IP address here is the public IP address where the Pigsty service is installed, and then you can access Pigsty subsystems in the browser via a domain like: http://g.pigsty.
Other web services and custom domains can be added similarly. For example, the following are possible domain resolution records for the Pigsty sandbox demo:
If nginx_sslmode is set to enabled or enforced, you can trust self-signed ca: files/pki/ca/ca.crt to use https in your browser.
Pigsty will generate self-signed certs for Nginx, if you wish to access via HTTPS without “Warning”, here are some options:
Apply & add real certs from trusted CA: such as Let’s Encrypt
Trust your generated CA crt as root ca in your OS and browser
Type thisisunsafe in Chrome will supress the warning
You can access these web UI directly via IP + port. While the common best practice would be access them through Nginx and distinguish via domain names. You’ll need configure DNS records, or use the local static records (/etc/hosts) for that.
How to access Pigsty Web UI by domain name?
There are several options:
Resolve internet domain names through a DNS service provider, suitable for systems accessible from the public internet.
Configure internal network DNS server resolution records for internal domain name resolution.
Modify the local machine’s /etc/hosts file to add static resolution records. (For Windows, it’s located at:)
We recommend the third method for common users. On the machine (which runs the browser), add the following record into /etc/hosts (sudo required) or C:\Windows\System32\drivers\etc\hosts in Windows:
You have to use the external IP address of the node here.
How to configure server side domain names?
The server-side domain name is configured with Nginx. If you want to replace the default domain name, simply enter the domain you wish to use in the parameter infra_portal. When you access the Grafana monitoring homepage via http://g.pigsty, it is actually accessed through the Nginx proxy to Grafana’s WebUI:
If you wish to use a proxy server when Docker pulls images, you should specify the proxy_env parameter in the global variables of the pigsty.yml configuration file:
all:vars:proxy_env:# global proxy env when downloading packagesno_proxy:"localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.myqcloud.com,*.tsinghua.edu.cn"http_proxy:http://192.168.0.106:8118all_proxy:http://192.168.0.106:8118https_proxy:http://192.168.0.106:8118
And when the Docker playbook is executed, these configurations will be rendered as proxy configurations in /etc/docker/daemon.json:
Please note that Docker Daemon does not use the all_proxy parameter
If you wish to manually specify a proxy server, you can directly modify the proxies configuration in /etc/docker/daemon.json;
Or you can modify the service definition in /lib/systemd/system/docker.service (Debian/Ubuntu) and /usr/lib/systemd/system/docker.service to add environment variable declarations in the [Service] section
You can also log in to other mirror sites, such as quay.io, by executing:
docker login quay.io
username #> # input your usernamepassword #> # input your password
17.3 - Use PostgreSQL as Ansible Config Inventory CMDB
Use PostgreSQL instead of static YAML config file as Ansible config inventory
You can use PostgreSQL as a configuration source for Pigsty, replacing the static YAML configuration file.
There are some advantages to using CMDB as a dynamic inventory: metadata is presented in a highly structured way in the form of data tables,
and consistency is ensured through database constraints. At the same time, using CMDB allows you to use third-party tools to edit and manage Pigsty metadata, making it easier to integrate with external systems.
Ansible Inventory
Pigsty’s default configuration file path is specified in ansible.cfg as inventory = pigsty.yml.
Changing this parameter will change the default configuration file path used by Ansible.
If you point it to an executable script file, Ansible will use the dynamic inventory mechanism, execute the script, and expect the script to return a configuration file.
Using CMDB is implemented by editing the ansible.cfg in the Pigsty directory:
We can define a new database grafana on pg-meta.
A Grafana-specific database cluster can also be created on a new machine node: pg-grafana.
Define Cluster
To create a new dedicated database cluster pg-grafana on two bare nodes 10.10.10.11, 10.10.10.12, define it in the config file.
pg-grafana:hosts:10.10.10.11:{pg_seq: 1, pg_role:primary}10.10.10.12:{pg_seq: 2, pg_role:replica}vars:pg_cluster:pg-grafanapg_databases:- name:grafanaowner:dbuser_grafanarevokeconn:truecomment:grafana primary databasepg_users:- name:dbuser_grafanapassword:DBUser.Grafanapgbouncer:trueroles:[dbrole_admin]comment:admin user for grafana database
Create Cluster
Complete the creation of the database cluster pg-grafana with the following command: pgsql.yml.
bin/createpg pg-grafana # Initialize the pg-grafana cluster
This command calls Ansible Playbook pgsql.yml to create the database cluster.
. /pgsql.yml -l pg-grafana # The actual equivalent Ansible playbook command executed
The business users and databases defined in pg_users and pg_databases are created automatically when the cluster is initialized. After creating the cluster using this configuration, the following connection string access database can be used.
postgres://dbuser_grafana:DBUser.Grafana@10.10.10.11:5432/grafana # direct connection to the primarypostgres://dbuser_grafana:DBUser.Grafana@10.10.10.11:5436/grafana # direct connection to the default servicepostgres://dbuser_grafana:DBUser.Grafana@10.10.10.11:5433/grafana # Connect to the string read/write servicepostgres://dbuser_grafana:DBUser.Grafana@10.10.10.12:5432/grafana # direct connection to the primarypostgres://dbuser_grafana:DBUser.Grafana@10.10.10.12:5436/grafana # Direct connection to default servicepostgres://dbuser_grafana:DBUser.Grafana@10.10.10.12:5433/grafana # Connected string read/write service
By default, Pigsty is installed on a single meta node. Then the required users and databases for Grafana are created on the existing pg-meta database cluster instead of using the pg-grafana cluster.
Create Biz User
The convention for business object management is to create users first and then create the database.
Define User
To create a user dbuser_grafana on a pg-meta cluster, add the following user definition to pg-meta’s cluster definition.
Add location: all.children.pg-meta.vars.pg_users.
- name:dbuser_grafanapassword:DBUser.Grafanacomment:admin user for grafana databasepgbouncer:trueroles:[dbrole_admin ]
If you have defined a different password here, replace the corresponding parameter with the new password.
Create User
Complete the creation of the dbuser_grafana user with the following command.
bin/pgsql-user pg-meta dbuser_grafana # Create the `dbuser_grafana` user on the pg-meta cluster
Calls Ansible Playbook pgsql-user.yml to create the user
The dbrole_admin role has the privilege to perform DDL changes in the database, which is precisely what Grafana needs.
Create Biz Database
Define database
Create business databases in the same way as business users. First, add the definition of the new database grafana to the cluster definition of pg-meta.
Use the following command to complete the creation of the grafana database.
bin/pgsql-db pg-meta grafana # Create the `grafana` database on the `pg-meta` cluster
Calls Ansible Playbook pgsql-db.yml to create the database.
. /pgsql-db.yml -l pg-meta -e pg_database=grafana # The actual Ansible playbook to execute
Access Database
Check Connectivity
You can access the database using different services or access methods.
postgres://dbuser_grafana:DBUser.Grafana@meta:5432/grafana # Direct connectionpostgres://dbuser_grafana:DBUser.Grafana@meta:5436/grafana # default servicepostgres://dbuser_grafana:DBUser.Grafana@meta:5433/grafana # primary service
We will use the default service that accesses the database directly from the primary through the LB.
First, check if the connection string is reachable and if you have privileges to execute DDL commands.
psql postgres://dbuser_grafana:DBUser.Grafana@meta:5436/grafana -c \
'CREATE TABLE t(); DROP TABLE t;'
Config Grafana
For Grafana to use the Postgres data source, you need to edit /etc/grafana/grafana.ini and modify the config entries.
[database];type = sqlite3;host = 127.0.0.1:3306;name = grafana;user = root# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;""";password =;url =
See from the monitor system that the new grafana database is already active, then Grafana has started using Postgres as the primary backend database. However, the original Dashboards and Datasources in Grafana have disappeared. You need to re-import Dashboards and Postgres Datasources.
Manage Dashboard
You can reload the Pigsty monitor dashboard by going to the files/ui dir in the Pigsty dir using the admin user and executing grafana.py init.
cd ~/pigsty/files/ui
. /grafana.py init # Initialize the Grafana monitor dashboard using the Dashboards in the current directory
This script detects the current environment (defined at ~/pigsty during installation), gets Grafana access information, and replaces the URL connection placeholder domain name (*.pigsty) in the monitor dashboard with the real one in use.
As a reminder, using grafana.py clean will clear the target monitor dashboard, and using grafana.py load will load all the monitor dashboards in the current dir. When Pigsty’s monitor dashboard changes, you can use these two commands to upgrade all the monitor dashboards.
Manage DataSources
When creating a new PostgreSQL cluster with pgsql.yml or a new business database with pgsql-db.yml, Pigsty will register the new PostgreSQL data source in Grafana, and you can access the target database instance directly through Grafana using the default admin user. Most of the functionality of the application pgcat relies on this.
To register a Postgres database, you can use the register_grafana task in pgsql.yml.
./pgsql.yml -t register_grafana # Re-register all Postgres data sources in the current environment./pgsql.yml -t register_grafana -l pg-test # Re-register all the databases in the pg-test cluster
Update Grafana Database
You can directly change the backend data source used by Grafana by modifying the Pigsty config file. Edit the grafana_database and grafana_pgurl parameters in pigsty.yml and change them.
Then re-execute the grafana task in infral.yml to complete the Grafana upgrade.
./infra.yml -t grafana
17.5 - Use PG as Prometheus Backend
Persist prometheus metrics with PostgreSQL + TimescaleDB through Promscale
It is not recommended to use PostgreSQL as a backend for Prometheus, but it is a good opportunity to understand the
Pigsty deployment system.
Postgres Preparation
vi pigsty.yml # dbuser_prometheus prometheuspg_databases: # define business users/roles on this cluster, array of user definition - { name: prometheus, owner: dbuser_prometheus , revokeconn: true, comment: prometheus primary database }pg_users: # define business users/roles on this cluster, array of user definition - {name: dbuser_prometheus , password: DBUser.Prometheus ,pgbouncer: true , createrole: true, roles: [dbrole_admin], comment: admin user for prometheus database }
Beware that pg_vip_address must be a valid IP address with subnet and available in the current L2 network.
Beware that pg_vip_interface must be a valid network interface name and should be the same as the one using IPv4 address in the inventory.
If the network interface name is different among cluster members, users should explicitly specify the pg_vip_interface parameter for each instance, for example:
To refresh the VIP configuration and restart the VIP-Manager, use the following command:
./pgsql.yml -t pg_vip
18 - Software & Tools
Software and tools that use PostgreSQL can be managed by the docker daemon
PostgreSQL is the most popular database in the world, and countless software is built on PostgreSQL, around PostgreSQL, or serves PostgreSQL itself, such as
“Application software” that uses PostgreSQL as the preferred database
“Tooling software” that serves PostgreSQL software development and management
“Database software” that derives, wraps, forks, modifies, or extends PostgreSQL
And Pigsty just have a series of Docker Compose templates for these software, application and databases:
Expose PostgreSQL & Pgbouncer Metrics for Prometheus
How to prepare Docker?
To run docker compose templates, you need to install the DOCKER module on the node,
If you don’t have the Internet access or having firewall issues, you may need to configure a DockerHub proxy, check the tutorial.
18.1 - Dify: AI Workflow and LLMOps
How to use Pigsty to build an AI Workflow LLMOps platform — Dify, and use external PostgreSQL, PGVector, Redis as storage?
Dify – The Innovation Engine for GenAI Applications
Dify is an open-source LLM app development platform. Orchestrate LLM apps from agents to complex AI workflows, with an RAG engine.
Which claims to be more production-ready than LangChain.
Of course, a workflow orchestration software like this needs a database underneath — Dify uses PostgreSQL for metadata storage, as well as Redis for caching and a dedicated vector database.
You can pull the Docker images and play locally, but for production deployment, this setup won’t suffice — there’s no HA, backup, PITR, monitoring, and many other things.
Fortunately, Pigsty provides a battery-include production-grade highly available PostgreSQL cluster, along with the Redis and S3 (MinIO) capabilities that Dify needs, as well as Nginx to expose the Web service, making it the perfect companion for Dify.
Off-load the stateful part to Pigsty, you only need to pull up the stateless blue circle part with a simple docker compose up.
BTW, I have to criticize the design of the Dify template. Since the metadata is already stored in PostgreSQL, why not add pgvector to use it as a vector database? What’s even more baffling is that pgvector is a separate image and container. Why not just use a PG image with pgvector included?
Dify “supports” a bunch of flashy vector databases, but since PostgreSQL is already chosen, using pgvector as the default vector database is the natural choice. Similarly, I think the Dify team should consider removing Redis. Celery task queues can use PostgreSQL as backend storage, so having multiple databases is unnecessary. Entities should not be multiplied without necessity.
Therefore, the Pigsty-provided Dify Docker Compose template has made some adjustments to the official example. It removes the db and redis database images, using instances managed by Pigsty. The vector database is fixed to use pgvector, reusing the same PostgreSQL instance.
In the end, the architecture is simplified to three stateless containers: dify-api, dify-web, and dify-worker, which can be created and destroyed at will. There are also two optional containers, ssrf_proxy and nginx, for providing proxy and some security features.
There’s a bit of state management left with file system volumes, storing things like private keys. Regular backups are sufficient.
Let’s take the single-node installation of Pigsty as an example. Suppose you have a machine with the IP address 10.10.10.10 and already pigsty installed.
We need to define the database clusters required in the Pigsty configuration file pigsty.yml.
Here, we define a cluster named pg-meta, which includes a superuser named dbuser_dify (the implementation is a bit rough as the Migration script executes CREATE EXTENSION which require dbsu privilege for now),
And there’s a database named dify with the pgvector extension installed, and a specific firewall rule allowing users to access the database from anywhere using a password (you can also restrict it to a more precise range, such as the Docker subnet 172.0.0.0/8).
Additionally, a standard single-instance Redis cluster redis-dify with the password redis.dify is defined.
For demonstration purposes, we use single-instance configurations. You can refer to the Pigsty documentation to deploy high availability PG and Redis clusters. After defining the clusters, use the following commands to create the PG and Redis clusters:
Alternatively, you can define a new business user and business database on an existing PostgreSQL cluster, such as pg-meta, and create them with the following commands:
bin/pgsql-user pg-meta dbuser_dify # create dify biz userbin/pgsql-db pg-meta dify # create dify biz database
You should be able to access PostgreSQL and Redis with the following connection strings, adjusting the connection information as needed:
Once you confirm these connection strings are working, you’re all set to start deploying Dify.
For demonstration purposes, we’re using direct IP connections. For a multi-node high availability PG cluster, please refer to the service access section.
The above assumes you are already a Pigsty user familiar with deploying PostgreSQL and Redis clusters. You can skip the next section and proceed to see how to configure Dify.
Starting from Scratch
If you’re already familiar with setting up Pigsty, feel free to skip this section.
Prepare a fresh Linux x86_64 node that runs compatible OS, then run as a sudo-able user:
curl -fsSL https://repo.pigsty.io/get | bash
It will download Pigsty source to your home, then perform configure and install to finish the installation.
cd ~/pigsty # get pigsty source and entering dir./bootstrap # download bootstrap pkgs & ansible [optional]./configure # pre-check and config templating [optional]# change pigsty.yml, adding those cluster definitions above into all.children ./install.yml # install pigsty according to pigsty.yml
You should insert the above PostgreSQL cluster and Redis cluster definitions into the pigsty.yml file, then run install.yml to complete the installation.
Redis Deploy
Pigsty will not deploy redis in install.yml, so you have to run redis.yml playbook to install Redis explicitly:
./redis.yml
Docker Deploy
Pigsty will not deploy Docker by default, so you need to install Docker with the docker.yml playbook.
All parameters are self-explanatory and filled in with default values that work directly in the Pigsty sandbox env.
Fill in the database connection information according to your actual conf, consistent with the PG/Redis cluster configuration above.
Changing the SECRET_KEY field is recommended. You can generate a strong key with openssl rand -base64 42:
# meta parameterDIFY_PORT=8001# expose dify nginx service with port 8001 by defaultLOG_LEVEL=INFO # The log level for the application. Supported values are `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`SECRET_KEY=sk-9f73s3ljTXVcMT3Blb3ljTqtsKiGHXVcMT3BlbkFJLK7U # A secret key for signing and encryption, gen with `openssl rand -base64 42`# postgres credentialPG_USERNAME=dbuser_dify
PG_PASSWORD=DBUser.Dify
PG_HOST=10.10.10.10
PG_PORT=5432PG_DATABASE=dify
# redis credentialREDIS_HOST=10.10.10.10
REDIS_PORT=6379REDIS_USERNAME=''REDIS_PASSWORD=redis.dify
# minio/s3 [OPTIONAL] when STORAGE_TYPE=s3STORAGE_TYPE=localS3_ENDPOINT='https://sss.pigsty'S3_BUCKET_NAME='infra'S3_ACCESS_KEY='dba'S3_SECRET_KEY='S3User.DBA'S3_REGION='us-east-1'
Now we can pull up dify with docker compose:
cd pigsty/app/dify && make up
Expose Dify Service via Nginx
Dify expose web/api via its own nginx through port 80 by default, while pigsty uses port 80 for its own Nginx. T
herefore, we expose Dify via port 8001 by default, and use Pigsty’s Nginx to forward to this port.
Change infra_portal in pigsty.yml, with the new dify line:
Then expose dify web service via Pigsty’s Nginx server:
./infra.yml -t nginx
Don’t forget to add dify.pigsty to your DNS or local /etc/hosts / C:\Windows\System32\drivers\etc\hosts to access via domain name.
18.2 - Odoo: OSS ERP for Enterprise
How to self-hosting the eopn source ERP – odoo
Odoo is an open-source enterprise resource planning (ERP) software
that provides a full suite of business applications, including CRM, sales, purchasing, inventory, production, accounting, and other management functions. Odoo is a typical web application that uses PostgreSQL as the underlying database.
All your business on one platform, Simple, efficient, yet affordable
Get Started
Check .env file for configurable environment variables:
make up # pull up odoo with docker compose in minimal modemake run # launch odoo with docker , local data dir and external PostgreSQLmake view # print odoo access pointmake log # tail -f odoo logsmake info # introspect odoo with jqmake stop # stop odoo containermake clean # remove odoo containermake pull # pull latest odoo imagemake rmi # remove odoo imagemake save # save odoo image to /tmp/docker/odoo.tgzmake load # load odoo image from /tmp/docker/odoo.tgz
Use External PostgreSQL
You can use external PostgreSQL for Odoo. Odoo will create its own database during setup, so you don’t need to do that
cd app/supabase; make up # https://supabase.com/docs/guides/self-hosting/docker
Then you can access the supabase studio dashboard via http://<admin_ip>:8000 by default, the default dashboard username is supabase and password is pigsty.
You can also configure the infra_portal to expose the WebUI to the public through Nginx and SSL.
Database
Supabase require certain PostgreSQL extensions, schemas, and roles to work, which can be pre-configured by Pigsty: supa.yml.
The following example will configure the default pg-meta cluster as underlying postgres for supabase:
# supabase example cluster: pg-meta, this cluster needs to be migrated with ~/pigsty/app/supabase/migration.sql :pg-meta:hosts:{10.10.10.10:{pg_seq: 1, pg_role:primary } }vars:pg_cluster:pg-metapg_version:15pg_users:# supabase roles: anon, authenticated, dashboard_user- {name: anon ,login:false}- {name: authenticated ,login:false}- {name: dashboard_user ,login: false ,replication: true ,createdb: true ,createrole:true}- {name: service_role ,login: false ,bypassrls:true}# supabase users: please use the same password- {name: supabase_admin ,password: 'DBUser.Supa' ,pgbouncer: true ,inherit: true ,superuser: true ,replication: true ,createdb: true ,createrole: true ,bypassrls:true}- {name: authenticator ,password: 'DBUser.Supa' ,pgbouncer: true ,inherit: false ,roles:[authenticated ,anon ,service_role ] }- {name: supabase_auth_admin ,password: 'DBUser.Supa' ,pgbouncer: true ,inherit: false ,createrole:true}- {name: supabase_storage_admin ,password: 'DBUser.Supa' ,pgbouncer: true ,inherit: false ,createrole: true ,roles:[authenticated ,anon ,service_role ] }- {name: supabase_functions_admin ,password: 'DBUser.Supa' ,pgbouncer: true ,inherit: false ,createrole:true}- {name: supabase_replication_admin ,password: 'DBUser.Supa' ,replication:true}- {name: supabase_read_only_user ,password: 'DBUser.Supa' ,bypassrls: true ,roles:[pg_read_all_data ] }pg_databases:- {name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas:[pigsty ]}# the pigsty cmdb, optional- name:supabaseline: supa.sql # the init-scripts:https://github.com/supabase/postgres/tree/develop/migrations/db/init-scriptsowner:supabase_admincomment:supabase postgres databaseschemas:[extensions ,auth ,realtime ,storage ,graphql_public ,supabase_functions ,_analytics ,_realtime ]extensions:- {name: pgcrypto ,schema: extensions } # 1.3 :cryptographic functions- {name: pg_net ,schema: extensions } # 0.7.1 :Async HTTP- {name: pgjwt ,schema: extensions } # 0.2.0 :JSON Web Token API for Postgresql- {name: uuid-ossp ,schema: extensions } # 1.1 :generate universally unique identifiers (UUIDs)- {name: pgsodium } # 3.1.8 :pgsodium is a modern cryptography library for Postgres.- {name: supabase_vault } # 0.2.8 :Supabase Vault Extension- {name: pg_graphql } # 1.3.0 : pg_graphql:GraphQL supportpg_hba_rules:- {user: all ,db: supa ,addr: intra ,auth: pwd ,title:'allow supa database access from intranet'}- {user: all ,db: supa ,addr: 172.0.0.0/8 ,auth: pwd ,title:'allow supa database access from docker network'}pg_extensions:# required extensions- pg_repack_${pg_version}* wal2json_${pg_version}* pgvector_${pg_version}* pg_cron_${pg_version}* pgsodium_${pg_version}*- vault_${pg_version}* pg_graphql_${pg_version}* pgjwt_${pg_version}* pg_net_${pg_version}* pgsql-http_${pg_version}*pg_libs:'pg_net, pg_stat_statements, auto_explain'# add pg_net to shared_preload_libraries
Beware that baseline: supa.sql parameter will use the files/supa.sql as database baseline schema, which is gathered from here.
You also have to run the migration script: migration.sql after the cluster provisioning, which is gathered from supabase/postgres/migrations/db/migrations in chronological order and slightly modified to fit Pigsty.
You can check the latest migration files and add them to migration.sql, the current script is synced with 20231013070755.
You can run migration on provisioned postgres cluster pg-meta with simple psql command:
pg edit-config pg-meta --force -p pgsodium.enable_event_trigger='off'# setup pgsodium event triggerpsql ${PGURL} -c 'SHOW pgsodium.enable_event_trigger;'# should be off or falsepg restart pg-meta # restart pg-meta to enable the new configuration
Everything you need to care about is in the .env file, which contains important settings for supabase. It is already configured to use the pg-meta.supa database by default, You have to change that according to your actual deployment.
############# Secrets - YOU MUST CHANGE THESE BEFORE GOING INTO PRODUCTION############# you have to change the JWT_SECRET to a random string with at least 32 characters long# and issue new ANON_KEY/SERVICE_ROLE_KEY JWT with that new secret, check the tutorial:# https://supabase.com/docs/guides/self-hosting/docker#securing-your-servicesJWT_SECRET=your-super-secret-jwt-token-with-at-least-32-characters-long
ANON_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyAgCiAgICAicm9sZSI6ICJhbm9uIiwKICAgICJpc3MiOiAic3VwYWJhc2UtZGVtbyIsCiAgICAiaWF0IjogMTY0MTc2OTIwMCwKICAgICJleHAiOiAxNzk5NTM1NjAwCn0.dc_X5iR_VP_qT0zsiyj_I_OZ2T9FtRU2BBNWN8Bu4GE
SERVICE_ROLE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyAgCiAgICAicm9sZSI6ICJzZXJ2aWNlX3JvbGUiLAogICAgImlzcyI6ICJzdXBhYmFzZS1kZW1vIiwKICAgICJpYXQiOiAxNjQxNzY5MjAwLAogICAgImV4cCI6IDE3OTk1MzU2MDAKfQ.DaYlNEoUrrEn2Ig7tqibS-PHK5vgusbcbo7X36XVt4Q
############# Dashboard - Credentials for the Supabase Studio WebUI############DASHBOARD_USERNAME=supabase # change to your own usernameDASHBOARD_PASSWORD=pigsty # change to your own password############# Database - You can change these to any PostgreSQL database that has logical replication enabled.############POSTGRES_HOST=10.10.10.10 # change to Pigsty managed PostgreSQL cluster/instance VIP/IP/HostnamePOSTGRES_PORT=5432# you can use other service port such as 5433, 5436, 6432, etc...POSTGRES_DB=supa # change to supabase database name, `supa` by default in pigstyPOSTGRES_PASSWORD=DBUser.Supa # supabase dbsu password (shared by multiple supabase biz users)
Usually you’ll have to change these parameters accordingly. Here we’ll use fixed username, password and IP:Port database connstr for simplicity.
The postgres username is fixed as supabase_admin and the password is DBUser.Supa, change that according to your supa.yml
And the supabase studio WebUI credential is managed by DASHBOARD_USERNAME and DASHBOARD_PASSWORD, which is supabase and pigsty by default.
You can use the Primary Service of that cluster through DNS/VIP and other service ports, or whatever access method you like.
You can also configure supabase.storage service to use the MinIO service managed by pigsty, too
Once configured, you can launch the stateless part with docker-compose or make up shortcut:
cd ~/pigsty/app/supabase; make up # = docker compose up
Expose Service
The supabase studio dashboard is exposed on port 8000 by default, you can add this service to the infra_portal to expose it to the public through Nginx and SSL.
To expose the service, you can run the infra.yml playbook with the nginx tag:
./infra.yml -t nginx
Make suare supa.pigsty or your own domain is resolvable to the infra_portal server, and you can access the supabase studio dashboard via https://supa.pigsty.
18.4 - Kong: the Nginx API Gateway
Learn how to deploy Kong, the API gateway, with Docker Compose and use external PostgreSQL as the backend database
TL;DR
cd app/kong ; docker-compose up -d
make up # pull up kong with docker-composemake ui # run swagger ui containermake log # tail -f kong logsmake info # introspect kong with jqmake stop # stop kong containermake clean # remove kong containermake rmui # remove swagger ui containermake pull # pull latest kong imagemake rmi # remove kong imagemake save # save kong image to /tmp/kong.tgzmake load # load kong image from /tmp
importpsycopg2conn=psycopg2.connect('postgres://dbuser_dba:DBUser.DBA@10.10.10.10:5432/meta')cursor=conn.cursor()cursor.execute('SELECT * FROM pg_stat_activity')foriincursor.fetchall():print(i)
Alias
make up # pull up jupyter with docker composemake dir # create required /data/jupyter and set ownermake run # launch jupyter with dockermake view # print jupyter access pointmake log # tail -f jupyter logsmake info # introspect jupyter with jqmake stop # stop jupyter containermake clean # remove jupyter containermake pull # pull latest jupyter imagemake rmi # remove jupyter imagemake save # save jupyter image to /tmp/docker/jupyter.tgzmake load # load jupyter image from /tmp/docker/jupyter.tgz
18.6 - Gitea: Simple Self-Hosting Git Service
Launch the self-hosting Git service with Gitea and Pigsty managed PostgreSQL
make up # pull up gitea with docker-compose in minimal modemake run # launch gitea with docker , local data dir and external PostgreSQLmake view # print gitea access pointmake log # tail -f gitea logsmake info # introspect gitea with jqmake stop # stop gitea containermake clean # remove gitea containermake pull # pull latest gitea imagemake rmi # remove gitea imagemake save # save gitea image to /tmp/gitea.tgzmake load # load gitea image from /tmp
PostgreSQL Preparation
Gitea use built-in SQLite as default metadata storage, you can let Gitea use external PostgreSQL by setting connection string environment variable
docker run -p 9000:9000 -p 9001:9001 \
-e "MINIO_ROOT_USER=admin"\
-e "MINIO_ROOT_PASSWORD=pigsty.minio"\
minio/minio server /data --console-address ":9001"
visit http://10.10.10.10:9000 with user admin and password pigsty.minio
make up # pull up minio with docker-composemake run # launch minio with dockermake view # print minio access pointmake log # tail -f minio logsmake info # introspect minio with jqmake stop # stop minio containermake clean # remove minio containermake pull # pull latest minio imagemake rmi # remove minio imagemake save # save minio image to /tmp/minio.tgzmake load # load minio image from /tmp
18.9 - ByteBase: PG Schema Migration
Self-hosting bytebase with PostgreSQL managed by Pigsty
ByteBase
ByteBase is a database schema change management tool, which is a tool for database schema changes. The following command will start a ByteBase on the meta node 8887 port by default.
make up # pull up bytebase with docker-compose in minimal modemake run # launch bytebase with docker , local data dir and external PostgreSQLmake view # print bytebase access pointmake log # tail -f bytebase logsmake info # introspect bytebase with jqmake stop # stop bytebase containermake clean # remove bytebase containermake pull # pull latest bytebase imagemake rmi # remove bytebase imagemake save # save bytebase image to /tmp/bytebase.tgzmake load # load bytebase image from /tmp
PostgreSQL Preparation
Bytebase use its internal PostgreSQL database by default, You can use external PostgreSQL for higher durability.
If you wish to perform CRUD operations and design more fine-grained permission control, please refer
to Tutorial 1 - The Golden Key to generate a signed JWT.
This is an example of creating pigsty cmdb API with PostgREST
cd ~/pigsty/app/postgrest ; docker-compose up -d
http://10.10.10.10:8884 is the default endpoint for PostgREST
http://10.10.10.10:8883 is the default api docs for PostgREST
make up # pull up postgrest with docker-composemake run # launch postgrest with dockermake ui # run swagger ui containermake view # print postgrest access pointmake log # tail -f postgrest logsmake info # introspect postgrest with jqmake stop # stop postgrest containermake clean # remove postgrest containermake rmui # remove swagger ui containermake pull # pull latest postgrest imagemake rmi # remove postgrest imagemake save # save postgrest image to /tmp/postgrest.tgzmake load # load postgrest image from /tmp
Swagger UI
Launch a swagger OpenAPI UI and visualize PostgREST API on 8883 with:
covid: Visualizes WHO COVID-19 data and allows you to check pandemic data by country.
isd: NOAA ISD, which provides access to meteorological observation records from 30,000 surface weather stations worldwide since 1901.
Structure
A Pigsty applet provides an installation script in its root directory: install or a related shortcut. You need to run this script as an admin user on the admin node to execute the installation. The installation script will detect the current environment (fetching METADB_URL, PIGSTY_HOME, GRAFANA_ENDPOINT, etc.) to complete the installation.
Typically, dashboards with the APP tag will be listed in the Pigsty Grafana homepage navigation under the Apps dropdown menu, and dashboards with both APP and OVERVIEW tags will be listed in the homepage panel navigation.
19.1 - Analyse CSVLOG Sample with the built-in PGLOG
Analyse CSVLOG Sample with the built-in PGLOG
PGLOG is a sample application included with Pigsty that uses the pglog.sample table in MetaDB as the data source.
Simply populate this table with logs and access the corresponding Dashboard.
Pigsty provides some handy commands to fetch CSV logs and load them into the sample table. The following shortcut commands are available on the master node by default:
catlog [node=localhost][date=today]# Print CSV logs to standard outputpglog # Load CSVLOG from standard inputpglog12 # Load CSVLOG in PG12 formatpglog13 # Load CSVLOG in PG13 formatpglog14 # Load CSVLOG in PG14 format (=pglog)catlog | pglog # Analyze logs of the current node for todaycatlog node-1 '2021-07-15'| pglog # Analyze CSV logs of node-1 for 2021-07-15
Next, you can visit the following links to view sample log analysis dashboards:
The catlog command fetches CSV database logs from a specific node for a specific date and writes them to stdout.
By default, catlog fetches logs of the current node for today, but you can specify the node and date via parameters.
By combining pglog and catlog, you can quickly fetch and analyze database CSV logs.
catlog | pglog # Analyze logs of the current node for todaycatlog node-1 '2021-07-15'| pglog # Analyze CSV logs of node-1 for 2021-07-15
19.2 - NOAA ISD Station
Fetch, Parse, Analyze, and Visualize Integrated Surface Weather Station Dataset
Including 30000 meteorology station, daily, sub-hourly observation records, from 1900-2023. https://github.com/Vonng/isd
It is recommended to use with Pigsty, the battery-included PostgreSQL distribution with Grafana & echarts for visualization. It will setup everything for your with make all;
Otherwise, you’ll have to provide your own PostgreSQL instance, and setup grafana dashboards manually.
Export PGURL in your environment to specify the target postgres database:
# the default PGURL for pigsty's meta database, change that accordinglyexportPGURL=postgres://dbuser_dba:DBUser.DBA@127.0.0.1:5432/meta?sslmode=disable
psql "${PGURL}" -c 'SELECT 1'# check if connection is ok
make sql # setup postgres schema on target database
Get isd station metadata
The basic station metadata can be downloaded and loaded with:
make reload-station # equivalent to get-station + load-station
Fetch and load isd.daily
To load isd.daily dataset, which is organized by yearly tarball files.
You can download the raw data from noaa and parse with isd parser
make get-parser # download parser binary from github, you can just build with: make buildmake reload-daily # download and reload latest daily data and re-calculates monthly/yearly data
Load Parsed Stable CSV Data
Or just load the pre-parsed stable part from GitHub.
Which is well-formatted CSV that does not require an isd parser.
make get-stable # download stable isd.daily dataset from Githubmake load-stable # load downloaded stable isd.daily dataset into database
More Data
There are two parts of isd datasets needs to be regularly updated: station metadata & isd.daily of the latest year, you can reload them with:
make reload # reload-station + reload-daily
You can download and load isd.daily in a specific year with:
bin/get-daily 2022# get daily observation summary of a specific year (1900-2023)bin/load-daily "${PGURL}"2022# load daily data of a specific year
You can also download and load isd.hourly in a specific year with:
bin/get-hourly 2022# get hourly observation record of a specific year (1900-2023)bin/load-hourly "${PGURL}"2022# load hourly data of a specific year
CREATETABLEIFNOTEXISTSisd.daily(stationVARCHAR(12)NOTNULL,-- station number 6USAF+5WBAN
tsDATENOTNULL,-- observation date
-- temperature & dew point
temp_meanNUMERIC(3,1),-- mean temperature ℃
temp_minNUMERIC(3,1),-- min temperature ℃
temp_maxNUMERIC(3,1),-- max temperature ℃
dewp_meanNUMERIC(3,1),-- mean dew point ℃
-- pressure
slp_meanNUMERIC(5,1),-- sea level pressure (hPa)
stp_meanNUMERIC(5,1),-- station pressure (hPa)
-- visible distance
vis_meanNUMERIC(6),-- visible distance (m)
-- wind speed
wdsp_meanNUMERIC(4,1),-- average wind speed (m/s)
wdsp_maxNUMERIC(4,1),-- max wind speed (m/s)
gustNUMERIC(4,1),-- max wind gust (m/s)
-- precipitation / snow depth
prcp_meanNUMERIC(5,1),-- precipitation (mm)
prcpNUMERIC(5,1),-- rectified precipitation (mm)
sndpNuMERIC(5,1),-- snow depth (mm)
-- FRSHTT (Fog/Rain/Snow/Hail/Thunder/Tornado)
is_foggyBOOLEAN,-- (F)og
is_rainyBOOLEAN,-- (R)ain or Drizzle
is_snowyBOOLEAN,-- (S)now or pellets
is_hailBOOLEAN,-- (H)ail
is_thunderBOOLEAN,-- (T)hunder
is_tornadoBOOLEAN,-- (T)ornado or Funnel Cloud
-- record count
temp_countSMALLINT,-- record count for temp
dewp_countSMALLINT,-- record count for dew point
slp_countSMALLINT,-- record count for sea level pressure
stp_countSMALLINT,-- record count for station pressure
wdsp_countSMALLINT,-- record count for wind speed
visib_countSMALLINT,-- record count for visible distance
-- temp marks
temp_min_fBOOLEAN,-- aggregate min temperature
temp_max_fBOOLEAN,-- aggregate max temperature
prcp_flagCHAR,-- precipitation flag: ABCDEFGHI
PRIMARYKEY(station,ts));-- PARTITION BY RANGE (ts);
ISD Hourly
CREATETABLEIFNOTEXISTSisd.hourly(stationVARCHAR(12)NOTNULL,-- station id
tsTIMESTAMPNOTNULL,-- timestamp
-- air
tempNUMERIC(3,1),-- [-93.2,+61.8]
dewpNUMERIC(3,1),-- [-98.2,+36.8]
slpNUMERIC(5,1),-- [8600,10900]
stpNUMERIC(5,1),-- [4500,10900]
visNUMERIC(6),-- [0,160000]
-- wind
wd_angleNUMERIC(3),-- [1,360]
wd_speedNUMERIC(4,1),-- [0,90]
wd_gustNUMERIC(4,1),-- [0,110]
wd_codeVARCHAR(1),-- code that denotes the character of the WIND-OBSERVATION.
-- cloud
cld_heightNUMERIC(5),-- [0,22000]
cld_codeVARCHAR(2),-- cloud code
-- water
sndpNUMERIC(5,1),-- mm snow
prcpNUMERIC(5,1),-- mm precipitation
prcp_hourNUMERIC(2),-- precipitation duration in hour
prcp_codeVARCHAR(1),-- precipitation type code
-- sky
mw_codeVARCHAR(2),-- manual weather observation code
aw_codeVARCHAR(2),-- auto weather observation code
pw_codeVARCHAR(1),-- weather code of past period of time
pw_hourNUMERIC(2),-- duration of pw_code period
-- misc
-- remark TEXT,
-- eqd TEXT,
dataJSONB-- extra data
)PARTITIONBYRANGE(ts);
Parser
There are two parsers: isdd and isdh, which takes noaa original yearly tarball as input, generate CSV as output (which could be directly consumed by PostgreSQL COPY command).
NAME
isd -- Intergrated Surface Dataset Parser
SYNOPSIS
isd daily [-i <input|stdin>][-o <output|stout>][-v] isd hourly [-i <input|stdin>][-o <output|stout>][-v][-d raw|ts-first|hour-first]DESCRIPTION
The isd program takes noaa isd daily/hourly raw tarball data as input.
and generate parsed data in csv format as output. Works in pipe mode
cat data/daily/2023.tar.gz | bin/isd daily -v | psql ${PGURL} -AXtwqc "COPY isd.daily FROM STDIN CSV;" isd daily -v -i data/daily/2023.tar.gz | psql ${PGURL} -AXtwqc "COPY isd.daily FROM STDIN CSV;" isd hourly -v -i data/hourly/2023.tar.gz | psql ${PGURL} -AXtwqc "COPY isd.hourly FROM STDIN CSV;"OPTIONS
-i <input> input file, stdin by default
-o <output> output file, stdout by default
-p <profpath> pprof file path, enableif specified
-d de-duplicate rows for hourly dataset (raw, ts-first, hour-first) -v verbose mode
-h print help
UI
ISD Overview
Show all stations on a world map.
ISD Country
Show all stations among a country.
ISD Station
Visualize station metadata and daily/monthly/yearly summary
make # 如果本地数据可用make all # 完整安装,从WHO官网下载数据makd reload # 重新下载并加载最新数据
其他一些子任务:
make reload # download latest data and pour it againmake ui # install grafana dashboardsmake sql # install database schemasmake download # download latest datamake load # load downloaded data into databasemake reload # download latest data and pour it into database
curl -fsSL https://repo.pigsty.io/get | bash
cd ~/pigsty; ./bootstrap; ./configure; ./install.yml
Highlight Features
Extension Exploding:
Pigsty now has an unprecedented 340 available extensions for PostgreSQL. This includes 121 extension RPM packages and 133DEB packages, surpassing the total number of extensions provided by the PGDG official repository (135 RPM/109 DEB).
Pigsty has ported unique PG extensions from the EL/DEB system to each other, achieving a great alignment of extension ecosystems between the two major distributions.
A crude list of the extension ecosystem is as follows:
Pigsty v3 allows you to replace the PostgreSQL kernel, currently supporting Babelfish (SQL Server compatible, with wire protocol emulation), IvorySQL (Oracle compatible), and RAC PolarDB for PostgreSQL.
Additionally, self-hosted Supabase is now available on Debian systems. You can emulate MSSQL (via WiltonDB), Oracle (via IvorySQL), Oracle RAC (via PolarDB), MongoDB (via FerretDB), and Firebase (via Supabase) in Pigsty
with production-grade PostgreSQL clusters featuring HA, IaC, PITR, and monitoring.
Pro Edition:
We now offer PGSTY Pro, a professional edition that provides value-added services on top of the open-source features.
The professional edition includes additional modules: MSSQL, Oracle, Mongo, K8S, Victoria, Kafka, etc., and offers broader support for PG major versions, operating systems, and chip architectures.
It provides offline installation packages customized for precise minor versions of all operating systems, and support for legacy systems like EL7, Debian 11, Ubuntu 20.04.
Major Changes
This Pigsty release updates the major version number from 2.x to 3.0, with several significant changes:
Primary supported operating systems updated to: EL 8 / EL 9 / Debian 12 / Ubuntu 22.04
EL7 / Debian 11 / Ubuntu 20.04 systems are now deprecated and no longer supported.
Default to online installation, offline packages are no longer provided to resolve minor OS version compatibility issues.
The bootstrap process will no longer prompt for downloading offline packages, but if /tmp/pkg.tgz exists, it will still use the offline package automatically.
For offline installation needs, please create offline packages yourself or consider our subscription service.
Unified adjustment of upstream software repositories used by Pigsty, address changes, and GPG signing and verification for all packages.
Standard repository: https://repo.pigsty.io/{apt/yum}
Domestic mirror: https://repo.pigsty.cc/{apt/yum}
API parameter changes and configuration template changes
Configuration templates for EL and Debian systems are now consolidated, with differing parameters managed in the roles/node_id/vars/ directory.
Configuration directory changes, all configuration file templates are now placed in the conf directory and categorized into default, dbms, demo, build.
Docker is now completely treated as a separate module, and will not be downloaded by default
New beta module: KAFKA
New beta module: KUBE
Other New Features
Epic enhancement of PG OLAP analysis capabilities: DuckDB 1.0.0, DuckDB FDW, and PG Lakehouse, Hydra have been ported to the Debian system.
Resolved package build issues for ParadeDB, now available on Debian/Ubuntu.
All required extensions for Supabase are now available on Debian/Ubuntu, making Supabase self-hostable across all OSes.
Provided capability for scenario-based pre-configured extension stacks. If you’re unsure which extensions to install, we offer extension recommendation packages (Stacks) tailored for specific application scenarios.
Created metadata tables, documentation, indexes, and name mappings for all PostgreSQL ecosystem extensions, ensuring alignment and usability for both EL and Debian systems.
Enhanced proxy_env parameter functionality to mitigate DockerHub ban issues, simplifying configuration.
Established a new dedicated software repository offering all extension plugins for versions 12-17, with the PG16 extension repository implemented by default in Pigsty.
Upgraded existing software repositories, employing standard signing and verification mechanisms to ensure package integrity and security. The APT repository adopts a new standard layout built through reprepro.
Provided sandbox environments for 1, 2, 3, 4, 43 nodes: meta, dual, trio, full, prod, and quick configuration templates for 7 major OS Distros.
Add PostgreSQL 17 and pgBouncer 1.23 metrics support in pg_exporter config, adding related dashboard panels.
Add logs panel for PGSQL Pgbouncer / PGSQL Patroni Dashboard
Add new playbook cache.yml to make offline packages, instead of bash bin/cache and bin/release-pkg
API Changes
New parameter option: pg_mode now have several new options:
pgsql: Standard PostgreSQL high availability cluster.
citus: Citus horizontally distributed PostgreSQL native high availability cluster.
gpsql: Monitoring for Greenplum and GP compatible databases (Pro edition).
mssql: Install WiltonDB / Babelfish to provide Microsoft SQL Server compatibility mode for standard PostgreSQL high availability clusters, with wire protocol level support, extensions unavailable.
ivory: Install IvorySQL to provide Oracle compatibility for PostgreSQL high availability clusters, supporting Oracle syntax/data types/functions/stored procedures, extensions unavailable (Pro edition).
polar: Install PolarDB for PostgreSQL (PG RAC) open-source version to support localization database capabilities, extensions unavailable (Pro edition).
New parameter: pg_parameters, used to specify parameters in postgresql.auto.conf at the instance level, overriding cluster configurations for personalized settings on different instance members.
New parameter: pg_files, used to specify additional files to be written to the PostgreSQL data directory, to support license feature required by some kernel forks.
New parameter: repo_extra_packages, used to specify additional packages to download, to be used in conjunction with repo_packages, facilitating the specification of extension lists unique to OS versions.
Parameter renaming: patroni_citus_db renamed to pg_primary_db, used to specify the primary database in the cluster (used in Citus mode).
Parameter enhancement: Proxy server configurations in proxy_env will be written to the Docker Daemon to address internet access issues, and the configure -x option will automatically write the proxy server configuration of the current environment.
Parameter enhancement: Allow using path item in infra_portal entries, to expose local dir as web service rather than proxy to another upstream.
Parameter enhancement: The repo_url_packages in repo.pigsty.io will automatically switch to repo.pigsty.cc when the region is China, addressing internet access issues. Additionally, the downloaded file name can now be specified.
Parameter enhancement: The extension field in pg_databases.extensions now supports both dictionary and extension name string modes. The dictionary mode offers version support, allowing the installation of specific extension versions.
Parameter enhancement: If the repo_upstream parameter is not explicitly overridden, it will extract the default value for the corresponding system from rpm.yml or deb.yml.
Parameter enhancement: If the repo_packages parameter is not explicitly overridden, it will extract the default value for the corresponding system from rpm.yml or deb.yml.
Parameter enhancement: If the infra_packages parameter is not explicitly overridden, it will extract the default value for the corresponding system from rpm.yml or deb.yml.
Parameter enhancement: If the node_default_packages parameter is not explicitly overridden, it will extract the default value for the corresponding system from rpm.yml or deb.yml.
Parameter enhancement: The extensions specified in pg_packages and pg_extensions will now perform a lookup and translation from the pg_package_map defined in rpm.yml or deb.yml.
Parameter enhancement: Packages specified in node_packages and pg_extensions will be upgraded to the latest version upon installation. The default value in node_packages is now [openssh-server], helping to fix the OpenSSH CVE.
Parameter enhancement: pg_dbsu_uid will automatically adjust to 26 (EL) or 543 (Debian) based on the operating system type, avoiding manual adjustments.
pgBouncer Parameter update, max_prepared_statements = 128 enabled prepared statement support in transaction pooling mode, and set server_lifetime to 600.
Patroni template parameter update, uniformly increase max_worker_processes +8 available backend processes, increase max_wal_senders and max_replication_slots to 50, and increase the OLAP template temporary file size limit to 1/5 of the main disk.
Software Upgrade
The main components of Pigsty are upgraded to the following versions (as of the release time):
Get the latest source with bash -c "$(curl -fsSL http://get.pigsty.cc/latest)"
Download & Extract packages with new download script.
Monitor Enhancement
Split monitoring system into 5 major categories: INFRA, NODES, REDIS, PGSQL, APP
Logging enabled by default
now loki and promtail are enabled by default. with prebuilt loki-rpm
Models & Labels
A hidden ds prometheus datasource variable is added for all dashboards, so you can easily switch different datasource simply by select a new one rather than modifying Grafana Datasources & Dashboards
An ip label is added for all metrics, and will be used as join key between database metrics & nodes metrics
INFRA Monitoring
Home dashboard for infra: INFRA Overview
Add logging Dashboards : Logs Instance
PGLOG Analysis & PGLOG Session now treated as an example Pigsty APP.
NODES Monitoring Application
If you don’t care database at all, Pigsty now can be used as host monitoring software alone!
New Panels: PGSQL Cluster, add 10 key metrics panel (toggled by default)
New Panels: PGSQL Instance, add 10 key metrics panel (toggled by default)
Simplify & Redesign: PGSQL Service
Add cross-references between PGCAT & PGSL dashboards
[ENHANCEMENT] monitor deploy
Now grafana datasource is automatically registered during monly deployment
[ENHANCEMENT] software upgrade
add PostgreSQL 13 to default package list
upgrade to PostgreSQL 14.1 by default
add greenplum rpm and dependencies
add redis rpm & source packages
add perf as default packages
v1.2.0
[ENHANCEMENT] Use PostgreSQL 14 as default version
[ENHANCEMENT] Use TimescaleDB 2.5 as default extension
now timescaledb & postgis are enabled in cmdb by default
[ENHANCEMENT] new monitor-only mode:
you can use pigsty to monitor existing pg instances with a connectable url only
pg_exporter will be deployed on meta node locally
new dashboard PGSQL Cluster Monly for remote clusters
[ENHANCEMENT] Software upgrade
grafana to 8.2.2
pev2 to v0.11.9
promscale to 0.6.2
pgweb to 0.11.9
Add new extensions: pglogical pg_stat_monitor orafce
[ENHANCEMENT] Automatic detect machine spec and use proper node_tune and pg_conf templates
[ENHANCEMENT] Rework on bloat related views, now more information are exposed
[ENHANCEMENT] Remove timescale & citus internal monitoring
[ENHANCEMENT] New playbook pgsql-audit.yml to create audit report.
[BUG FIX] now pgbouncer_exporter resource owner are {{ pg_dbsu }} instead of postgres
[BUG FIX] fix pg_exporter duplicate metrics on pg_table pg_index while executing REINDEX TABLE CONCURRENTLY
[CHANGE] now all config templates are minimize into two: auto & demo. (removed: pub4, pg14, demo4, tiny, oltp )
pigsty-demo is configured if vagrant is the default user, otherwise pigsty-auto is used.
How to upgrade from v1.1.1
There’s no API change in 1.2.0 You can still use old pigsty.yml configuration files (PG13).
For the infrastructure part. Re-execution of repo will do most of the parts
As for the database. You can still use the existing PG13 instances. In-place upgrade is quite
tricky especially when involving extensions such as PostGIS & Timescale. I would highly recommend
performing a database migration with logical replication.
The new playbook pgsql-migration.yml will make this a lot easier. It will create a series of
scripts which will help you to migrate your cluster with near-zero downtime.
v1.1.1
[ENHANCEMENT] replace timescaledb apache version with timescale version
[ENHANCEMENT] upgrade prometheus to 2.30
[BUG FIX] now pg_exporter config dir’s owner are {{ pg_dbsu }} instead of prometheus
How to upgrade from v1.1.0
The major change in this release is timescaledb. Which replace old apache license version with timescale license version
promtail_clean remove existing promtail status during init?
promtail_port default port used by promtail, 9080 by default
promtail_status_file location of promtail status file
promtail_send_url endpoint of loki service which receives log data
v0.8.0
Service Provisioning support is added in this release
New Features
Service provision.
full locale support.
API Changes
Role vip and haproxy are merged into service.
#------------------------------------------------------------------------------# SERVICE PROVISION#------------------------------------------------------------------------------pg_weight:100# default load balance weight (instance level)# - service - #pg_services:# how to expose postgres service in cluster?# primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)- name:primary # service name {{ pg_cluster }}_primarysrc_ip:"*"src_port:5433dst_port:pgbouncer # 5433 route to pgbouncercheck_url:/primary # primary health check, success when instance is primaryselector:"[]"# select all instance as primary service candidate# replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)- name:replica # service name {{ pg_cluster }}_replicasrc_ip:"*"src_port:5434dst_port:pgbouncercheck_url:/read-only # read-only health check. (including primary)selector:"[]"# select all instance as replica service candidateselector_backup:"[? pg_role == `primary`]"# primary are used as backup server in replica service# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)- name:default # service's actual name is {{ pg_cluster }}-{{ service.name }}src_ip:"*"# service bind ip address, * for all, vip for cluster virtual ip addresssrc_port:5436# bind port, mandatorydst_port: postgres # target port:postgres|pgbouncer|port_number , pgbouncer(6432) by defaultcheck_method: http # health check method:only http is available for nowcheck_port: patroni # health check port:patroni|pg_exporter|port_number , patroni by defaultcheck_url:/primary # health check url path, / as defaultcheck_code:200# health check http code, 200 as defaultselector:"[]"# instance selectorhaproxy:# haproxy specific fieldsmaxconn:3000# default front-end connectionbalance:roundrobin # load balance algorithm (roundrobin by default)default_server_options:'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'# offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)- name:offline # service name {{ pg_cluster }}_replicasrc_ip:"*"src_port:5438dst_port:postgrescheck_url:/replica # offline MUST be a replicaselector:"[? pg_role == `offline` || pg_offline_query ]"# instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'selector_backup:"[? pg_role == `replica` && !pg_offline_query]"# replica are used as backup server in offline servicepg_services_extra:[]# extra services to be added# - haproxy - #haproxy_enabled:true# enable haproxy among every cluster membershaproxy_reload:true# reload haproxy after confighaproxy_policy:roundrobin # roundrobin, leastconnhaproxy_admin_auth_enabled:false# enable authentication for haproxy admin?haproxy_admin_username:admin # default haproxy admin usernamehaproxy_admin_password:admin # default haproxy admin passwordhaproxy_exporter_port:9101# default admin/exporter porthaproxy_client_timeout:3h # client side connection timeouthaproxy_server_timeout:3h # server side connection timeout# - vip - #vip_mode:none # none | l2 | l4vip_reload:true# whether reload service after config# vip_address: 127.0.0.1 # virtual ip address ip (l2 or l4)# vip_cidrmask: 24 # virtual ip address cidr mask (l2 only)# vip_interface: eth0 # virtual ip network interface (l2 only)
New Options
# - localization - #pg_encoding:UTF8 # default to UTF8pg_locale:C # default to Cpg_lc_collate:C # default to Cpg_lc_ctype:en_US.UTF8 # default to en_US.UTF8pg_reload:true# reload postgres after hba changesvip_mode:none # none | l2 | l4vip_reload:true# whether reload service after config
Remove Options
haproxy_check_port # covered by service options
haproxy_primary_port
haproxy_replica_port
haproxy_backend_port
haproxy_weight
haproxy_weight_fallback
vip_enabled # replace by vip_mode
Service
pg_services and pg_services_extra Defines the services in cluster:
A service has some mandatory fields:
name: service’s name
src_port: which port to listen and expose service?
selector: which instances belonging to this service?
# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)- name:default # service's actual name is {{ pg_cluster }}-{{ service.name }}src_ip:"*"# service bind ip address, * for all, vip for cluster virtual ip addresssrc_port:5436# bind port, mandatorydst_port: postgres # target port:postgres|pgbouncer|port_number , pgbouncer(6432) by defaultcheck_method: http # health check method:only http is available for nowcheck_port: patroni # health check port:patroni|pg_exporter|port_number , patroni by defaultcheck_url:/primary # health check url path, / as defaultcheck_code:200# health check http code, 200 as defaultselector:"[]"# instance selectorhaproxy:# haproxy specific fieldsmaxconn:3000# default front-end connectionbalance:roundrobin # load balance algorithm (roundrobin by default)default_server_options:'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
Database
Add additional locale support: lc_ctype and lc_collate.
It’s mainly because of pg_trgm ’s weird behavior on i18n characters.
pg_databases:- name:meta # name is the only required field for a database# owner: postgres # optional, database owner# template: template1 # optional, template1 by default# encoding: UTF8 # optional, UTF8 by default , must same as template database, leave blank to set to db default# locale: C # optional, C by default , must same as template database, leave blank to set to db default# lc_collate: C # optional, C by default , must same as template database, leave blank to set to db default# lc_ctype: C # optional, C by default , must same as template database, leave blank to set to db defaultallowconn:true# optional, true by default, false disable connect at allrevokeconn:false# optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)# tablespace: pg_default # optional, 'pg_default' is the default tablespaceconnlimit:-1# optional, connection limit, -1 or none disable limit (default)extensions:# optional, extension name and where to create- {name: postgis, schema:public}parameters:# optional, extra parameters with ALTER DATABASEenable_partitionwise_join:truepgbouncer:true# optional, add this database to pgbouncer list? true by defaultcomment:pigsty meta database # optional, comment string for database
v0.7.0
Monitor only deployment support
Overview
Monitor Only Deployment
Now you can monitoring existing postgres clusters without Pigsty provisioning solution.
Intergration with other provisioning solution is available and under further test.
Database/User Management
Update user/database definition schema to cover more usecases.
Add pgsql-createdb.yml and pgsql-user.yml to mange user/db on running clusters.
service_registry:consul # none | consul | etcd | bothprometheus_options:'--storage.tsdb.retention=30d'# prometheus cli optsprometheus_sd_method:consul # Prometheus service discovery method:static|consulprometheus_sd_interval:2s # Prometheus service discovery refresh intervalpg_offline_query:false# set to true to allow offline queries on this instancenode_exporter_enabled:true# enabling Node Exporterpg_exporter_enabled:true# enabling PG Exporterpgbouncer_exporter_enabled:true# enabling Pgbouncer Exporterexport_binary_install:false# install Node/PG Exporter via copy binarydcs_disable_purge:false# force dcs_exists_action = abort to avoid dcs purgepg_disable_purge:false# force pg_exists_action = abort to avoid pg purgehaproxy_weight:100# relative lb weight for backend instancehaproxy_weight_fallback:1# primary server weight in replica service group
Obsolete Config Entries
prometheus_metrics_path # duplicate with exporter_metrics_path prometheus_retention # covered by `prometheus_options`
pg_databases:# create a business database 'meta'- name:metaschemas:[meta] # create extra schema named 'meta'extensions:[{name:postgis}] # create extra extension postgisparameters:# overwrite database meta's default search_pathsearch_path:public, monitor
New Schema
pg_databases:- name:meta # name is the only required field for a databaseowner:postgres # optional, database ownertemplate:template1 # optional, template1 by defaultencoding:UTF8 # optional, UTF8 by defaultlocale:C # optional, C by defaultallowconn:true# optional, true by default, false disable connect at allrevokeconn:false# optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)tablespace:pg_default # optional, 'pg_default' is the default tablespaceconnlimit:-1# optional, connection limit, -1 or none disable limit (default)extensions:# optional, extension name and where to create- {name: postgis, schema:public}parameters:# optional, extra parameters with ALTER DATABASEenable_partitionwise_join:truepgbouncer:true# optional, add this database to pgbouncer list? true by defaultcomment:pigsty meta database # optional, comment string for database
Changes
Add new options: template , encoding, locale, allowconn, tablespace, connlimit
Add new option revokeconn, which revoke connect privileges from public for this database
Add comment field for database
Apply Changes
You can create new database on running postgres clusters with pgsql-createdb.yml playbook.
Define your new database in config files
Pass new database.name with option pg_database to playbook.
pg_users:- username:test # example production user have read-write accesspassword:test # example user's passwordoptions:LOGIN # extra optionsgroups:[dbrole_readwrite ] # dborole_admin|dbrole_readwrite|dbrole_readonlycomment:default test user for production usagepgbouncer:true# add to pgbouncer
New Schema
pg_users:# complete example of user/role definition for production user- name:dbuser_meta # example production user have read-write accesspassword:DBUser.Meta # example user's password, can be encryptedlogin:true# can login, true by default (should be false for role)superuser:false# is superuser? false by defaultcreatedb:false# can create database? false by defaultcreaterole:false# can create role? false by defaultinherit:true# can this role use inherited privileges?replication:false# can this role do replication? false by defaultbypassrls:false# can this role bypass row level security? false by defaultconnlimit:-1# connection limit, -1 disable limitexpire_at:'2030-12-31'# 'timestamp' when this role is expiredexpire_in:365# now + n days when this role is expired (OVERWRITE expire_at)roles:[dbrole_readwrite] # dborole_admin|dbrole_readwrite|dbrole_readonlypgbouncer:true# add this user to pgbouncer? false by default (true for production user)parameters:# user's default search pathsearch_path:publiccomment:test user
Changes
username field rename to name
groups field rename to roles
options now split into separated configration entries:
login, superuser, createdb, createrole, inherit, replication,bypassrls,connlimit
expire_at and expire_in options
pgbouncer option for user is now false by default
Apply Changes
You can create new users on running postgres clusters with pgsql-createuser.yml playbook.
Define your new users in config files (pg_users)
Pass new user.name with option pg_user to playbook.
service_registry:consul # none | consul | etcd | bothprometheus_options:'--storage.tsdb.retention=30d'# prometheus cli optsprometheus_sd_method:consul # Prometheus service discovery method:static|consulprometheus_sd_interval:2s # Prometheus service discovery refresh intervalpg_offline_query:false# set to true to allow offline queries on this instancenode_exporter_enabled:true# enabling Node Exporterpg_exporter_enabled:true# enabling PG Exporterpgbouncer_exporter_enabled:true# enabling Pgbouncer Exporterexport_binary_install:false# install Node/PG Exporter via copy binarydcs_disable_purge:false# force dcs_exists_action = abort to avoid dcs purgepg_disable_purge:false# force pg_exists_action = abort to avoid pg purgehaproxy_weight:100# relative lb weight for backend instancehaproxy_weight_fallback:1# primary server weight in replica service group
Obsolete Config Entries
prometheus_metrics_path # duplicate with exporter_metrics_path prometheus_retention # covered by `prometheus_options`
you can customize default role system, schemas, extensions, privileges with variables now:
Template Configuration
# - system roles - #pg_replication_username:replicator # system replication userpg_replication_password:DBUser.Replicator # system replication passwordpg_monitor_username:dbuser_monitor # system monitor userpg_monitor_password:DBUser.Monitor # system monitor passwordpg_admin_username:dbuser_admin # system admin userpg_admin_password:DBUser.Admin # system admin password# - default roles - #pg_default_roles:- username:dbrole_readonly # sample user:options:NOLOGIN # role can not logincomment:role for readonly access # comment string- username: dbrole_readwrite # sample user:one object for each useroptions:NOLOGINcomment:role for read-write accessgroups:[dbrole_readonly ] # read-write includes read-only access- username: dbrole_admin # sample user:one object for each useroptions:NOLOGIN BYPASSRLS # admin can bypass row level securitycomment:role for object creationgroups:[dbrole_readwrite,pg_monitor,pg_signal_backend]# NOTE: replicator, monitor, admin password are overwritten by separated config entry- username:postgres # reset dbsu password to NULL (if dbsu is not postgres)options:SUPERUSER LOGINcomment:system superuser- username:replicatoroptions:REPLICATION LOGINgroups:[pg_monitor, dbrole_readonly]comment:system replicator- username:dbuser_monitoroptions:LOGIN CONNECTION LIMIT 10comment:system monitor usergroups:[pg_monitor, dbrole_readonly]- username:dbuser_adminoptions:LOGIN BYPASSRLScomment:system admin usergroups:[dbrole_admin]- username:dbuser_statspassword:DBUser.Statsoptions:LOGINcomment:business read-only user for statisticsgroups:[dbrole_readonly]# object created by dbsu and admin will have their privileges properly setpg_default_privilegs:- GRANT USAGE ON SCHEMAS TO dbrole_readonly- GRANT SELECT ON TABLES TO dbrole_readonly- GRANT SELECT ON SEQUENCES TO dbrole_readonly- GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly- GRANT INSERT, UPDATE, DELETE ON TABLES TO dbrole_readwrite- GRANT USAGE, UPDATE ON SEQUENCES TO dbrole_readwrite- GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES TO dbrole_admin- GRANT CREATE ON SCHEMAS TO dbrole_admin- GRANT USAGE ON TYPES TO dbrole_admin# schemaspg_default_schemas:[monitor]# extensionpg_default_extensions:- {name: 'pg_stat_statements', schema:'monitor'}- {name: 'pgstattuple', schema:'monitor'}- {name: 'pg_qualstats', schema:'monitor'}- {name: 'pg_buffercache', schema:'monitor'}- {name: 'pageinspect', schema:'monitor'}- {name: 'pg_prewarm', schema:'monitor'}- {name: 'pg_visibility', schema:'monitor'}- {name: 'pg_freespacemap', schema:'monitor'}- {name: 'pg_repack', schema:'monitor'}- name:postgres_fdw- name:file_fdw- name:btree_gist- name:btree_gin- name:pg_trgm- name:intagg- name:intarray# postgres host-based authentication rulespg_hba_rules:- title:allow meta node password accessrole:commonrules:- host all all 10.10.10.10/32 md5- title:allow intranet admin password accessrole:commonrules:- host all +dbrole_admin 10.0.0.0/8 md5- host all +dbrole_admin 172.16.0.0/12 md5- host all +dbrole_admin 192.168.0.0/16 md5- title:allow intranet password accessrole:commonrules:- host all all 10.0.0.0/8 md5- host all all 172.16.0.0/12 md5- host all all 192.168.0.0/16 md5- title:allow local read-write access (local production user via pgbouncer)role:commonrules:- local all +dbrole_readwrite md5- host all +dbrole_readwrite 127.0.0.1/32 md5- title:allow read-only user (stats, personal) password directly accessrole:replicarules:- local all +dbrole_readonly md5- host all +dbrole_readonly 127.0.0.1/32 md5pg_hba_rules_extra:[]# pgbouncer host-based authentication rulespgbouncer_hba_rules:- title:local password accessrole:commonrules:- local all all md5- host all all 127.0.0.1/32 md5- title:intranet password accessrole:commonrules:- host all all 10.0.0.0/8 md5- host all all 172.16.0.0/12 md5- host all all 192.168.0.0/16 md5pgbouncer_hba_rules_extra:[]
v0.4.0
The second public beta (v0.4.0) of pigsty is available now ! 🎉
Monitoring System
Skim version of monitoring system consist of 10 essential dashboards:
PG Overview
PG Cluster
PG Service
PG Instance
PG Database
PG Query
PG Table
PG Table Catalog
PG Table Detail
Node
Software upgrade
Upgrade to PostgreSQL 13.1, Patroni 2.0.1-4, add citus to repo.