这是本节的多页打印视图。 点击此处打印.

返回本页常规视图.

模块:INFRA

可独立使用且可选的基础设施,为 PostgreSQL 提供 NTP,DNS,可观测性等基础服务。

配置 | 管理 | 剧本 | 监控 | 参数


概览

每一套 Pigsty 部署都会提供一套基础架构组件,为纳管的节点与数据库集群提供服务,组件包括:

组件端口域名描述
Nginx80/443i.pigstyWeb 服务门户、本地软件仓库与统一入口
Grafana3000g.pigsty可视化平台,提供监控大屏、巡检与数据应用
VictoriaMetrics8428p.pigsty时序数据库与 VMUI,可兼容 Prometheus API
VictoriaLogs9428-集中式日志数据库,接收 Vector 推送的结构化日志
VictoriaTraces10428-链路追踪与事件存储,可用于慢 SQL / 请求追踪
VMAlert8880-告警规则评估器,基于 VictoriaMetrics 指标触发告警
AlertManager9059a.pigsty告警聚合与分发,接收 VMAlert 发送的通知
BlackboxExporter9115-ICMP/TCP/HTTP 黑盒探测
DNSMASQ53-DNS 服务器,提供内部域名解析
Chronyd123-NTP 时间服务器
PostgreSQL5432-CMDB 与默认数据库
Ansible--运行剧本、编排所有基础设施

在 Pigsty 中,PGSQL 模块会使用到INFRA节点上的一些服务,具体来说包括:

  • 数据库集群/主机节点的域名,依赖 INFRA 节点的 DNSMASQ 解析
  • 在数据库节点软件上安装,需要用到 INFRA 节点上的 Nginx 托管的本地 yum/apt 软件源。
  • 数据库集群/节点的监控指标,会被 INFRA 节点上的 VictoriaMetrics 拉取并存储,可通过 VMUI / PromQL 访问。
  • 数据库与节点运行日志由 Vector 收集,统一推送到 INFRA 上的 VictoriaLogs,支持在 Grafana 中检索。
  • VMAlert 根据 VictoriaMetrics 中的指标评估告警规则,并将事件转发到 Alertmanager。
  • 用户会从 Infra/Admin 节点上使用 Ansible 或其他工具发起对数据库节点的管理
    • 执行集群创建,扩缩容,实例/集群回收
    • 创建业务用户、业务数据库、修改服务、HBA修改;
    • 执行日志采集、垃圾清理,备份,巡检等
  • 数据库节点默认会从 INFRA/ADMIN 节点上的 NTP 服务器同步时间
  • 如果没有专用集群,高可用组件 Patroni 会使用 INFRA 节点上的 etcd 作为高可用DCS。
  • 如果没有专用集群,备份组件 pgbackrest 会使用 INFRA 节点上的 minio 作为可选的集中备份仓库。

Nginx

Nginx是Pigsty所有WebUI类服务的访问入口,默认使用管理节点80端口。

有许多带有 WebUI 的基础设施组件通过 Nginx 对外暴露服务,例如 Grafana、VictoriaMetrics(VMUI)、AlertManager,以及 HAProxy 流量管理页等,此外 yum/apt 仓库等静态文件资源也通过 Nginx 对外提供服务。

Nginx会根据 infra_portal 的内容,通过域名进行区分,将访问请求转发至对应的上游组件处理。如果您使用了其他的域名,或者公网域名,可以在这里进行相应修改:

infra_portal:  # domain names and upstream servers
  home         : { domain: i.pigsty }
  grafana      : { domain: g.pigsty ,endpoint: "${admin_ip}:3000" , websocket: true }
  prometheus   : { domain: p.pigsty ,endpoint: "${admin_ip}:8428" }   # VMUI
  alertmanager : { domain: a.pigsty ,endpoint: "${admin_ip}:9059" }
  blackbox     : { endpoint: "${admin_ip}:9115" }
  vmalert      : { endpoint: "${admin_ip}:8880" }
  #logs         : { domain: logs.pigsty ,endpoint: "${admin_ip}:9428" }
  #minio        : { domain: sss.pigsty  ,endpoint: "${admin_ip}:9001" ,scheme: https ,websocket: true }

Pigsty强烈建议使用域名访问Pigsty UI系统,而不是直接通过IP+端口的方式访问,基于以下几个理由:

  • 使用域名便于启用 HTTPS 流量加密,可以将访问收拢至Nginx,审计一切请求,并方便地集成认证机制。
  • 一些组件默认只监听 127.0.0.1 ,因此只能通过Nginx代理访问。
  • 域名更容易记忆,并提供了额外的配置灵活性。

如果您没有可用的互联网域名或本地DNS解析,您可以在 /etc/hosts (MacOS/Linux)或C:\Windows\System32\drivers\etc\hosts (Windows)中添加本地静态解析记录。

Nginx相关配置参数位于:配置:INFRA - NGINX


本地软件仓库

Pigsty会在安装时首先建立一个本地软件源,以加速后续软件安装。

该软件源由Nginx提供服务,默认位于为 /www/pigsty,可以访问 http://i.pigsty/pigsty 使用。

Pigsty的离线软件包即是将已经建立好的软件源目录(yum/apt)整个打成压缩包,当Pigsty尝试构建本地源时,如果发现本地源目录 /www/pigsty 已经存在, 且带有 /www/pigsty/repo_complete 标记文件,则会认为本地源已经构建完成,从而跳过从原始上游下载软件的步骤,消除了对互联网访问的依赖。

Repo定义文件位于 /www/pigsty.repo,默认可以通过 http://${admin_ip}/pigsty.repo 获取

curl -L http://i.pigsty/pigsty.repo -o /etc/yum.repos.d/pigsty.repo

您也可以在没有Nginx的情况下直接使用文件本地源:

[pigsty-local]
name=Pigsty local $releasever - $basearch
baseurl=file:///www/pigsty/
enabled=1
gpgcheck=0

本地软件仓库相关配置参数位于:配置:INFRA - REPO


Victoria 可观测性套件

Pigsty v4.0 使用 VictoriaMetrics 家族替代 Prometheus/Loki,提供统一的监控、日志与链路追踪能力:

  • VictoriaMetrics 默认监听 8428 端口,可通过 http://p.pigstyhttps://i.pigsty/vmetrics/ 访问 VMUI,兼容 Prometheus API。
  • VMAlert 负责评估 /infra/rules/*.yml 中的告警规则,监听 8880 端口,并将告警事件发送到 Alertmanager。
  • VictoriaLogs 监听 9428 端口,支持 https://i.pigsty/vlogs/ 查询界面。所有节点默认运行 Vector,将系统日志、PostgreSQL 日志等结构化后推送至 VictoriaLogs。
  • VictoriaTraces 监听 10428 端口,用于慢 SQL / Trace 采集,Grafana 以 Jaeger 数据源方式访问。
  • Alertmanager 监听 9059 端口,可通过 http://a.pigstyhttps://i.pigsty/alertmgr/ 管理告警通知。完成 SMTP、Webhook 等配置后即可推送消息。
  • Blackbox Exporter 默认监听 9115 端口,用于 Ping/TCP/HTTP 探测,可通过 https://i.pigsty/blackbox/ 访问。

更多信息请参阅:配置:INFRA - VICTORIA配置:INFRA - PROMETHEUS


Grafana

Grafana 是 Pigsty 的 WebUI 核心,默认监听 3000 端口,可以直接通过 IP:3000 或域名 http://g.pigsty 访问。

Pigsty 预置了针对 VictoriaMetrics / Logs / Traces 的数据源(vmetrics-*vlogs-*vtraces-*),以及大量 Dashboard,可通过 URL 进行联动跳转,快速定位问题。

Grafana 也可作为通用低代码可视化平台使用,因此 Pigsty 默认安装了 ECharts、victoriametrics-datasource 等插件,方便构建监控大屏或巡检报表。

Grafana 相关配置参数位于:配置:INFRA - GRAFANA


Ansible

Pigsty默认会在元节点上安装Ansible,Ansible是一个流行的运维工具,采用声明式的配置风格与幂等的剧本设计,可以极大降低系统维护的复杂度。


DNSMASQ

DNSMASQ 提供环境内的DNS解析服务,其他模块的域名将会注册到 INFRA节点上的 DNSMASQ 服务中。

DNS记录默认放置于所有INFRA节点的 /etc/hosts.d/ 目录中。

DNSMASQ相关配置参数位于:配置:INFRA - DNS


Chronyd

NTP服务用于同步环境内所有节点的时间(可选)

NTP相关配置参数位于:配置:NODES - NTP


配置

要在节点上安装 INFRA 模块,首先需要在配置清单中的 infra 分组中将其加入,并分配实例号 infra_seq

# 配置单个 INFRA 节点
infra: { hosts: { 10.10.10.10: { infra_seq: 1 } }}

# 配置两个 INFRA 节点
infra:
  hosts:
    10.10.10.10: { infra_seq: 1 }
    10.10.10.11: { infra_seq: 2 }

然后,使用 infra.yml 剧本在节点上初始化 INFRA 模块即可。


管理

下面是与 INFRA 模块相关的一些管理任务:


安装卸载Infra模块

./infra.yml     # 在 infra 分组上安装 INFRA 模块
./infra-rm.yml  # 从 infra 分组上卸载 INFRA 模块

管理本地软件仓库

您可以使用以下剧本子任务,管理 Infra节点 上的本地yun源:

./infra.yml -t repo              #从互联网或离线包中创建本地软件源

./infra.yml -t repo_dir          # 创建本地软件源
./infra.yml -t repo_check        # 检查本地软件源是否已经存在?
./infra.yml -t repo_prepare      # 如果存在,直接使用已有的本地软件源
./infra.yml -t repo_build        # 如果不存在,从上游构建本地软件源
./infra.yml     -t repo_upstream     # 处理 /etc/yum.repos.d 中的上游仓库文件
./infra.yml     -t repo_remove       # 如果 repo_remove == true,则删除现有的仓库文件
./infra.yml     -t repo_add          # 将上游仓库文件添加到 /etc/yum.repos.d (或 /etc/apt/sources.list.d)
./infra.yml     -t repo_url_pkg      # 从由 repo_url_packages 定义的互联网下载包
./infra.yml     -t repo_cache        # 使用 yum makecache / apt update 创建上游软件源元数据缓存
./infra.yml     -t repo_boot_pkg     # 安装如 createrepo_c、yum-utils 等的引导包...(或 dpkg-)
./infra.yml     -t repo_pkg          # 从上游仓库下载包 & 依赖项
./infra.yml     -t repo_create       # 使用 createrepo_c & modifyrepo_c 创建本地软件源
./infra.yml     -t repo_use          # 将新建的仓库添加到 /etc/yum.repos.d | /etc/apt/sources.list.d 用起来
./infra.yml -t repo_nginx        # 如果没有 nginx 在服务,启动一个 nginx 作为 Web Server

其中最常用的命令为:

./infra.yml     -t repo_upstream     # 向 INFRA 节点添加 repo_upstream 中定义的上游软件源
./infra.yml     -t repo_pkg          # 从上游仓库下载包及其依赖项。
./infra.yml     -t repo_create       # 使用 createrepo_c & modifyrepo_c 创建/更新本地 yum 仓库

管理基础设施组件

您可以使用以下剧本子任务,管理 Infra节点 上的各个基础设施组件

./infra.yml -t infra           # 配置基础设施
./infra.yml -t infra_env       # 配置管理节点上的环境变量:env_dir, env_pg, env_var
./infra.yml -t infra_pkg       # 安装INFRA所需的软件包:infra_pkg_yum, infra_pkg_pip
./infra.yml -t infra_user      # 设置 infra 操作系统用户组
./infra.yml -t infra_cert      # 为 infra 组件颁发证书
./infra.yml -t dns             # 配置 DNSMasq:dns_config, dns_record, dns_launch
./infra.yml -t nginx           # 配置 Nginx:nginx_config, nginx_cert, nginx_static, nginx_launch, nginx_exporter
./infra.yml -t victoria        # 配置 VictoriaMetrics/Logs/Traces:vmetrics|vlogs|vtraces|vmalert
./infra.yml -t alertmanager    # 配置 AlertManager:alertmanager_config, alertmanager_launch
./infra.yml -t blackbox        # 配置 Blackbox Exporter: blackbox_launch
./infra.yml -t grafana         # 配置 Grafana:grafana_clean, grafana_config, grafana_plugin, grafana_launch, grafana_provision
./infra.yml -t infra_register  # 将 infra 组件注册到 VictoriaMetrics / Grafana

其他常用的任务包括:

./infra.yml -t nginx_index                        # 重新渲染 Nginx 首页内容
./infra.yml -t nginx_config,nginx_reload          # 重新渲染 Nginx 网站门户配置,对外暴露新的上游服务。
./infra.yml -t vmetrics_config,vmetrics_launch    # 重新生成 VictoriaMetrics 主配置文件,并重启服务
./infra.yml -t vlogs_config,vlogs_launch          # 重新渲染 VictoriaLogs 配置
./infra.yml -t vmetrics_clean                     # 清理 VictoriaMetrics 存储数据目录
./infra.yml -t grafana_plugin                     # 从互联网上下载 Grafana 插件,通常需要科学上网

剧本

Pigsty 提供了三个与 INFRA 模块相关的剧本:

  • infra.yml :在 infra 节点上初始化 pigsty 基础设施
  • infra-rm.yml:从 infra 节点移除基础设施组件
  • install.yml:在当前节点上一次性完整安装 Pigsty

infra.yml

INFRA模块剧本 infra.yml 用于在 Infra节点 上初始化 pigsty 基础设施

执行该剧本将完成以下任务

  • 配置元节点的目录与环境变量
  • 下载并建立一个本地软件源,加速后续安装。(若使用离线软件包,则跳过下载阶段)
  • 将当前元节点作为一个普通节点纳入 Pigsty 管理
  • 部署基础设施组件,包括 VictoriaMetrics/Logs/Traces、VMAlert、Grafana、Alertmanager、Blackbox Exporter 等

该剧本默认在 INFRA节点 上执行

  • Pigsty默认将使用当前执行此剧本的节点作为Pigsty的Infra节点与Admin节点。
  • Pigsty在配置过程中默认会将当前节点标记为Infra/Admin节点,并使用当前节点首要IP地址替换配置模板中的占位IP地址10.10.10.10
  • 该节点除了可以发起管理,部署有基础设施,与一个部署普通托管节点并无区别。
  • 单机安装时,ETCD 也会安装在此节点上,提供 DCS 服务

本剧本的一些注意事项

  • 本剧本为幂等剧本,重复执行会抹除元节点上的基础设施组件。
  • 如需保留历史监控数据,请先将 vmetrics_cleanvlogs_cleanvtraces_clean 设置为 false
  • 当离线软件源 /www/pigsty/repo_complete 存在时,本剧本会跳过从互联网下载软件的任务。完整执行该剧本耗时约5-8分钟,视机器配置而异。
  • 不使用离线软件包而直接从互联网原始上游下载软件时,可能耗时10-20分钟,根据您的网络条件而异。

asciicast


infra-rm.yml

INFRA模块剧本 infra-rm.yml 用于从 Infra节点 上移除 pigsty 基础设施

常用子任务包括:

./infra-rm.yml               # 移除 INFRA 模块
./infra-rm.yml -t service    # 停止 INFRA 上的基础设施服务
./infra-rm.yml -t data       # 移除 INFRA 上的存留数据
./infra-rm.yml -t package    # 卸载 INFRA 上安装的软件包

install.yml

INFRA模块剧本 install.yml用于在所有节点上一次性完整安装 Pigsty

该剧本在 剧本:一次性安装 中有更详细的介绍。


监控

Pigsty Home : Pigsty 监控系统主页

Pigsty Home Dashboard

pigsty.jpg

INFRA Overview : Pigsty 基础设施自监控概览

INFRA Overview Dashboard

infra-overview.jpg

Nginx Instance : Nginx 监控指标与日志

Nginx Overview Dashboard

nginx-overview.jpg

Grafana Instance: Grafana 监控指标与日志

Grafana Overview Dashboard

grafana-overview.jpg

VictoriaMetrics Instance: VictoriaMetrics 抓取、查询与存储指标

VMAlert Instance: 告警规则评估与队列状态

Alertmanager Instance: 告警聚合、通知管道与 Silences

VictoriaLogs Instance: 日志写入速率、查询负载与索引命中

VictoriaTraces Instance: Trace/KV 存储与 Jaeger 接口

Logs Instance: 基于 Vector + VictoriaLogs 的节点日志检索

Logs Instance Dashboard

logs-instance.jpg

CMDB Overview: CMDB 可视化

CMDB Overview Dashboard

cmdb-overview.jpg

ETCD Overview: etcd 监控指标与日志

ETCD Overview Dashboard

etcd-overview.jpg


参数

INFRA 模块有下列10个参数组。

  • META:Pigsty元数据
  • CA:自签名公私钥基础设施/CA
  • INFRA_ID:基础设施门户,Nginx域名
  • REPO:本地软件源
  • INFRA_PACKAGE:基础设施软件包
  • NGINX:Nginx 网络服务器
  • DNS:DNSMASQ 域名服务器
  • VICTORIA:VictoriaMetrics / Logs / Traces 套件
  • PROMETHEUS:Alertmanager 与 Blackbox Exporter
  • GRAFANA:Grafana 可观测性全家桶
参数速览

为保持与 Pigsty 版本一致,请参阅 《参数列表》 获取最新的默认值、类型与层级说明。

1 - 系统架构

介绍 Pigsty 中 INFRA 模块的整体架构,功能组件与责任分工。

架构总览

一套标准的 Pigsty 部署会带有一个 INFRA 模块,为纳管的节点与数据库集群提供服务:

  • Nginx:作为 Web 服务器,提供本地软件仓库服务;作为反向代理,统一收拢 Grafana、VMUI、Alertmanager 等 WebUI 的访问入口。
  • Grafana:可视化平台,呈现监控指标、日志与链路追踪,承载监控大屏、巡检报表以及自定义数据应用。
  • VictoriaMetrics 套件:提供统一的可观测性平台。
    • VictoriaMetrics:拉取全部监控指标,兼容 Prometheus API,并通过 VMUI 提供查询界面。
    • VMAlert:评估告警规则,将事件推送至 Alertmanager。
    • VictoriaLogs:集中收集存储日志,所有节点默认运行 Vector,将系统日志与数据库日志推送到此。
    • VictoriaTraces:收集慢 SQL、服务链路等追踪数据。
    • AlertManager:聚合告警事件,分发告警通知,支持邮件、Webhook 等渠道。
    • BlackboxExporter:探测各个 IP/VIP/URL 的可达性。
  • DNSMASQ:提供 DNS 解析服务,解析 Pigsty 内部使用到的域名。
  • Chronyd:提供 NTP 时间同步服务,确保所有节点时间一致。

pigsty-arch.jpg

INFRA 模块对于高可用 PostgreSQL 并非必选项,例如在 精简安装 模式下,就不会安装 Infra 模块。 但 INFRA 模块提供了运行生产级高可用 PostgreSQL 集群所需要的支持性服务,通常强烈建议安装启用以获得完整的 Pigsty DBaaS 体验。

如果您已经有自己的基础设施(Nginx,本地仓库,监控系统,DNS,NTP),您也可以停用 INFRA 模块,并通过修改配置来使用现有的基础设施。

组件端口默认域名描述
Nginx80/443i.pigstyWeb 服务门户、本地仓库
Grafana3000g.pigsty可视化平台
VictoriaMetrics8428p.pigsty时序数据库(VMUI,兼容 Prometheus)
VictoriaLogs9428-日志数据库(接收 Vector 推送)
VictoriaTraces10428-链路追踪 / 慢 SQL 存储
VMAlert8880-计算指标、评估告警规则
AlertManager9059a.pigsty告警聚合分发
BlackboxExporter9115-黑盒监控探测
DNSMasq53-DNS 服务器
Chronyd123-NTP 时间服务器

Nginx

Nginx 是 Pigsty 所有 WebUI 类服务的访问入口,默认使用 80 / 443 端口对外提供 HTTP / HTTPS 服务。

带有 WebUI 的基础设施组件可以通过 Nginx 统一对外暴露服务,例如 Grafana、VictoriaMetrics(VMUI)、AlertManager, 以及 HAProxy 控制台,此外,本地 yum/apt 仓库等静态文件资源也通过 Nginx 对内提供服务。

Nginx 会根据 infra_portal 的定义配置本地 Web 服务器或反向代理服务器。 默认情况下将对外暴露 Pigsty 的管理首页:i.pigsty

infra_portal:
  home : { domain: i.pigsty }

Pigsty 允许对 Nginx 进行丰富的定制,将其作为本地文件服务器,或者反向代理服务器,配置自签名或者真正的 HTTPS 证书。

以下是 Pigsty 公开演示站点 demo.pigsty.cc 使用的 Nginx 配置: 您可以在 Nginx 上监听不同的域名,通过反向代理的方式,使用统一入口对外暴露不同的 Web 服务:

infra_portal:                     # domain names and upstream servers
  home         : { domain: home.pigsty.cc                                                 ,certbot: pigsty.demo }
  grafana      : { domain: demo.pigsty.cc ,endpoint: "${admin_ip}:3000", websocket: true  ,certbot: pigsty.demo }
  prometheus   : { domain: p.pigsty.cc    ,endpoint: "${admin_ip}:8428"                   ,certbot: pigsty.demo }
  alertmanager : { domain: a.pigsty.cc    ,endpoint: "${admin_ip}:9059"                   ,certbot: pigsty.demo }
  blackbox     : { endpoint: "${admin_ip}:9115"                                                               }
  vmalert      : { endpoint: "${admin_ip}:8880"                                                               }
  postgrest    : { domain: api.pigsty.cc  ,endpoint: "127.0.0.1:8884"                                         }
  pgadmin      : { domain: adm.pigsty.cc  ,endpoint: "127.0.0.1:8885"                                         }
  pgweb        : { domain: cli.pigsty.cc  ,endpoint: "127.0.0.1:8886"                                         }
  bytebase     : { domain: ddl.pigsty.cc  ,endpoint: "127.0.0.1:8887"                                         }
  jupyter      : { domain: lab.pigsty.cc  ,endpoint: "127.0.0.1:8888"   ,websocket: true                      }
  gitea        : { domain: git.pigsty.cc  ,endpoint: "127.0.0.1:8889"                     ,certbot: pigsty.cc }
  wiki         : { domain: wiki.pigsty.cc ,endpoint: "127.0.0.1:9002"                     ,certbot: pigsty.cc }
  noco         : { domain: noco.pigsty.cc ,endpoint: "127.0.0.1:9003"                     ,certbot: pigsty.cc }
  supa         : { domain: supa.pigsty.cc ,endpoint: "10.2.82.163:8000" ,websocket: true  ,certbot: pigsty.cc }
  dify         : { domain: dify.pigsty.cc ,endpoint: "10.2.82.163:8001" ,websocket: true  ,certbot: pigsty.cc }
  odoo         : { domain: odoo.pigsty.cc ,endpoint: "127.0.0.1:8069"   ,websocket: true  ,certbot: pigsty.cc }
  mm           : { domain: mm.pigsty.cc   ,endpoint: "10.2.82.163:8065" ,websocket: true                      }
  web.io:
    domain: en.pigsty.cc
    path: "/www/web.io"
    certbot: pigsty.doc
    enforce_https: true
    config: |
      # rewrite /zh/ to /
          location /zh/ {
              rewrite ^/zh/(.*)$ /$1 permanent;
          }
  web.cc:
    domain: pigsty.cc
    path: "/www/web.cc"
    domains: [ zh.pigsty.cc ]
    certbot: pigsty.doc
    config: |
      # rewrite /zh/ to /
          location /zh/ {
              rewrite ^/zh/(.*)$ /$1 permanent;
          }
  repo:
    domain: pro.pigsty.cc
    path: "/www/repo"
    index: true
    certbot: pigsty.doc

更多信息,请参阅:教程:Nginx:向外代理暴露Web服务教程:Certbot:申请与更新HTTPS证书


本地软件仓库

Pigsty 会在安装时,默认在 Infra 节点上创建一个本地软件仓库,以加速后续软件安装。

该软件仓库默认位于 /www/pigsty 目录,由 Nginx 提供服务,可以访问 http://i.pigsty/pigsty 使用。

Pigsty 的离线软件包是将已经建立好的软件源目录整个打成压缩包:当 Pigsty 尝试构建本地源时,如果发现本地源目录 /www/pigsty 已经存在,且带有 /www/pigsty/repo_complete 标记文件,则会认为本地源已经构建完成,从而跳过从原始上游下载软件的步骤,消除了对互联网访问的依赖。

Repo 定义文件位于 /www/pigsty.repo,默认可以通过 http://${admin_ip}/pigsty.repo 获取。

curl -L http://i.pigsty/pigsty.repo -o /etc/yum.repos.d/pigsty.repo

您也可以在没有 Nginx 的情况下直接使用文件本地源:

[pigsty-local]
name=Pigsty local $releasever - $basearch
baseurl=file:///www/pigsty/
enabled=1
gpgcheck=0

更多信息,请参阅:配置:INFRA - REPO


Victoria 可观测性套件

Pigsty v4.0 使用 VictoriaMetrics 系列组件取代 Prometheus/Loki,提供统一的可观测性平台:

  • VictoriaMetrics:默认监听 8428 端口,可通过 http://p.pigstyhttps://i.pigsty/vmetrics/ 访问 VMUI,兼容 PromQL、远程读写协议以及 Alertmanager API。
  • VMAlert:在 8880 端口上运行告警规则,将事件发送至 Alertmanager。
  • VictoriaLogs:默认监听 9428 端口,支持通过 https://i.pigsty/vlogs/ 检索日志。节点侧 Vector 会将系统日志、PostgreSQL 日志等结构化后推送至此。
  • VictoriaTraces:监听 10428 端口,提供 Jaeger 兼容接口,便于分析慢 SQL 与链路追踪。
  • Alertmanager:监听 9059 端口,可通过 http://a.pigstyhttps://i.pigsty/alertmgr/ 管理告警路由与通知。
  • Blackbox Exporter:默认监听 9115 端口,负责 ICMP/TCP/HTTP 黑盒探测。

更多信息请参阅:配置:INFRA - VICTORIA配置:INFRA - PROMETHEUS


Grafana

Grafana 是 Pigsty WebUI 的核心,默认监听 3000 端口,可通过 IP:3000http://g.pigsty 访问。

Pigsty 预置了基于 VictoriaMetrics / Logs / Traces 的 Dashboard,并通过 URL 跳转实现一键下钻上卷,帮助快速定位故障。

Grafana 亦可作为低代码可视化平台使用,因此默认安装 ECharts、victoriametrics-datasource、victorialogs-datasource 等插件,同时将 Vector / Victoria 数据源统一注册为 vmetrics-*vlogs-*vtraces-*,方便扩展自定义仪表板。

更多信息请参阅:配置:INFRA - GRAFANA


Ansible

Pigsty 默认会在元节点上安装 Ansible,Ansible 是一个流行的运维工具,采用声明式的配置风格与幂等的剧本设计,可以极大降低系统维护的复杂度。


DNSMASQ

DNSMASQ 提供环境内的 DNS 解析服务,其他模块的域名将会注册到 INFRA 节点上的 DNSMASQ 服务中。

DNS 记录默认放置于所有 INFRA 节点的 /etc/hosts.d/ 目录中。

更多信息,请参阅:配置:INFRA - DNS教程:DNS:配置域名解析


Chronyd

NTP 服务用于同步环境内所有节点的时间(可选)。

更多信息,请参阅:配置:NODES - NTP

2 - 集群配置

如何配置 Infra 节点?定制 Nginx 服务器的配置与本地软件仓库的内容?配置 DNS,NTP 与监控组件的方法。

配置说明

INFRA 主要用于提供监控基础设施,对于 PostgreSQL 数据库是可选项

除非手工配置了对 INFRA 节点上 DNS/NTP 服务的依赖,否则 INFRA 模块故障通常不影响 PostgreSQL 数据库集群运行。

单个 INFRA 节点足以应对绝大部分场景。生产环境建议使用 2~3 个 INFRA 节点实现高可用。

通常出于提高资源利用率的考虑,PostgreSQL 高可用依赖的 ETCD 模块可以与 INFRA 模块共用节点。

使用 3 个以上的 INFRA 节点意义不大,但可以使用更多 ETCD 节点(如 5 个)提高 DCS 服务可用性。


配置样例

在配置清单中的 infra 分组加入节点 IP,并分配 Infra 实例号 infra_seq

默认单个 INFRA 节点配置:

all:
  children:
    infra: { hosts: { 10.10.10.10: { infra_seq: 1 } }}

默认情况下,10.10.10.10 占位符在配置过程中被替换为当前节点首要 IP 地址。

使用 infra.yml 剧本在节点上初始化 INFRA 模块。

更多节点

两个 INFRA 节点配置:

all:
  children:
    infra:
      hosts:
        10.10.10.10: { infra_seq: 1 }
        10.10.10.11: { infra_seq: 2 }

三个 INFRA 节点配置(含参数):

all:
  children:
    infra:
      hosts:
        10.10.10.10: { infra_seq: 1 }
        10.10.10.11: { infra_seq: 2, repo_enabled: false }
        10.10.10.12: { infra_seq: 3, repo_enabled: false }
      vars:
        grafana_clean: false
        vmetrics_clean: false
        vlogs_clean: false
        vtraces_clean: false

Infra 高可用

Infra 模块中的大部分组件都属于"无状态/相同状态",对于这类组件,高可用只需要操心"负载均衡"问题。

高可用可通过 Keepalived L2 VIP 或 HAProxy 四层负载均衡实现。二层互通网络推荐使用 Keepalived L2 VIP。

配置示例:

infra:
  hosts:
    10.10.10.10: { infra_seq: 1 }
    10.10.10.11: { infra_seq: 2 }
    10.10.10.12: { infra_seq: 3 }
  vars:
    vip_enabled: true
    vip_vrid: 128
    vip_address: 10.10.10.8
    vip_interface: eth1

    infra_portal:
      home         : { domain: i.pigsty }
      grafana      : { domain: g.pigsty ,endpoint: "10.10.10.8:3000" , websocket: true }
      prometheus   : { domain: p.pigsty ,endpoint: "10.10.10.8:8428" }
      alertmanager : { domain: a.pigsty ,endpoint: "10.10.10.8:9059" }
      blackbox     : { endpoint: "10.10.10.8:9115" }
      vmalert      : { endpoint: "10.10.10.8:8880" }

需要设置 VIP 相关参数并在 infra_portal 中修改各 Infra 服务端点。


Nginx配置

请参阅 Nginx 参数配置教程:Nginx


本地仓库配置

请参阅 Repo 参数配置


DNS配置

请参阅 DNS 参数配置教程:DNS


NTP配置

请参阅 NTP 参数配置

3 - 参数列表

INFRA 模块提供了 10 组共 70+ 个配置参数

INFRA 模块负责配置 Pigsty 的基础设施组件:本地软件源、Nginx、DNSMasq、VictoriaMetrics、VictoriaLogs、Grafana、Alertmanager、Blackbox Exporter 等监控告警基础设施。

Pigsty v4.0 使用 VictoriaMetrics 替代 Prometheus,使用 VictoriaLogs 替代 Loki,实现了更优秀的可观测性方案。

参数组功能说明
METAPigsty 元信息:版本、管理IP、区域、语言、代理
CA自签名 CA 证书管理
INFRA_ID基础设施节点身份标识与服务门户
REPO本地软件仓库配置
INFRA_PACKAGE基础设施节点软件包安装
NGINXNginx Web服务器与反向代理配置
DNSDNSMasq 域名解析服务配置
VICTORIAVictoriaMetrics/Logs/Traces 可观测性套件
PROMETHEUSAlertmanager 与 Blackbox Exporter
GRAFANAGrafana 可视化平台配置

参数概览

META 参数组用于定义 Pigsty 的元信息,包括版本号、管理节点 IP、软件源区域、默认语言以及代理设置。

参数类型级别说明
versionstringGpigsty 版本字符串
admin_ipipG管理节点 IP 地址
regionenumG上游镜像区域:default,china,europe
languageenumG默认语言,en 或 zh
proxy_envdictG下载包时使用的全局代理环境变量

CA 参数组用于配置 Pigsty 自签名 CA 证书管理,包括是否创建 CA、CA 名称以及证书有效期。

参数类型级别说明
ca_createboolG不存在时是否创建 CA?默认为 true
ca_cnstringGCA CN名称,固定为 pigsty-ca
cert_validityintervalG证书有效期,默认为 20 年

INFRA_ID 参数组用于定义基础设施节点的身份标识,包括节点序号、服务门户配置以及数据目录。

参数类型级别说明
infra_seqintI基础设施节点序号,必选身份参数
infra_portaldictG通过 Nginx 门户暴露的基础设施服务列表
infra_datapathG基础设施数据目录,默认为 /data/infra

REPO 参数组用于配置本地软件仓库,包括仓库启用开关、目录路径、上游源定义以及要下载的软件包列表。

参数类型级别说明
repo_enabledboolG/I在此基础设施节点上创建软件仓库?
repo_homepathG软件仓库主目录,默认为/www
repo_namestringG软件仓库名称,默认为 pigsty
repo_endpointurlG仓库的访问点:域名或 ip:port 格式
repo_removeboolG/A构建本地仓库时是否移除现有上游仓库源定义文件?
repo_modulesstringG/A启用的上游仓库模块列表,用逗号分隔
repo_upstreamupstream[]G上游仓库源定义:从哪里下载上游包?
repo_packagesstring[]G从上游仓库下载哪些软件包?
repo_extra_packagesstring[]G/C/I从上游仓库下载哪些额外的软件包?
repo_url_packagesstring[]G使用URL下载的额外软件包列表

INFRA_PACKAGE 参数组用于定义在基础设施节点上安装的软件包,包括 RPM/DEB 包和 PIP 包。

参数类型级别说明
infra_packagesstring[]G在基础设施节点上要安装的软件包
infra_packages_pipstringG在基础设施节点上使用 pip 安装的包

NGINX 参数组用于配置 Nginx Web 服务器与反向代理,包括启用开关、端口、SSL 模式、证书以及基础认证。

参数类型级别说明
nginx_enabledboolG/I在此基础设施节点上启用 nginx?
nginx_cleanboolG/A初始化时清理现有 nginx 配置?
nginx_exporter_enabledboolG/I在此基础设施节点上启用 nginx_exporter?
nginx_exporter_portportGnginx_exporter 监听端口,默认为 9113
nginx_sslmodeenumGnginx SSL模式?disable,enable,enforce
nginx_cert_validitydurationGnginx 自签名证书有效期,默认为 397d
nginx_homepathGnginx 内容目录,默认为 /www,软链接到 nginx_data
nginx_datapathGnginx 实际数据目录,默认为 /data/nginx
nginx_usersdictGnginx 基础认证用户:用户名和密码字典
nginx_portportGnginx 监听端口,默认为 80
nginx_ssl_portportGnginx SSL监听端口,默认为 443
certbot_signboolG/A是否使用 certbot 签署证书?
certbot_emailstringG/Acertbot 通知邮箱地址
certbot_optionsstringG/Acertbot 额外的命令行参数

DNS 参数组用于配置 DNSMasq 域名解析服务,包括启用开关、监听端口以及动态 DNS 记录。

参数类型级别说明
dns_enabledboolG/I在此基础设施节点上设置dnsmasq?
dns_portportGDNS 服务器监听端口,默认为 53
dns_recordsstring[]G由 dnsmasq 解析的动态 DNS 记录

VICTORIA 参数组用于配置 VictoriaMetrics/Logs/Traces 可观测性套件,包括启用开关、端口、数据保留策略等。

参数类型级别说明
vmetrics_enabledboolG/I在此基础设施节点上启用 VictoriaMetrics?
vmetrics_cleanboolG/A初始化时清理 VictoriaMetrics 数据?
vmetrics_portportGVictoriaMetrics 监听端口,默认为 8428
vmetrics_scrape_intervalintervalG全局抓取间隔,默认为 10s
vmetrics_scrape_timeoutintervalG全局抓取超时,默认为 8s
vmetrics_optionsargGVictoriaMetrics 额外命令行参数
vlogs_enabledboolG/I在此基础设施节点上启用 VictoriaLogs?
vlogs_cleanboolG/A初始化时清理 VictoriaLogs 数据?
vlogs_portportGVictoriaLogs 监听端口,默认为 9428
vlogs_optionsargGVictoriaLogs 额外命令行参数
vtraces_enabledboolG/I在此基础设施节点上启用 VictoriaTraces?
vtraces_cleanboolG/A初始化时清理 VictoriaTraces 数据?
vtraces_portportGVictoriaTraces 监听端口,默认为 10428
vtraces_optionsargGVictoriaTraces 额外命令行参数
vmalert_enabledboolG/I在此基础设施节点上启用 VMAlert?
vmalert_portportGVMAlert 监听端口,默认为 8880
vmalert_optionsargGVMAlert 额外命令行参数

PROMETHEUS 参数组用于配置 Alertmanager 与 Blackbox Exporter,提供告警管理和网络探测功能。

参数类型级别说明
blackbox_enabledboolG/I在此基础设施节点上设置 blackbox_exporter?
blackbox_portportGblackbox_exporter 监听端口,默认为 9115
blackbox_optionsargGblackbox_exporter 额外的命令行参数选项
alertmanager_enabledboolG/I在此基础设施节点上设置 alertmanager?
alertmanager_portportGAlertManager 监听端口,默认为 9059
alertmanager_optionsargGalertmanager 额外的命令行参数选项
exporter_metrics_pathpathGexporter 指标路径,默认为 /metrics

GRAFANA 参数组用于配置 Grafana 可视化平台,包括启用开关、端口、管理员凭据以及数据源配置。

参数类型级别说明
grafana_enabledboolG/I在此基础设施节点上启用 Grafana?
grafana_portportGGrafana 监听端口,默认为 3000
grafana_cleanboolG/A初始化Grafana期间清除数据?
grafana_admin_usernameusernameGGrafana 管理员用户名,默认为 admin
grafana_admin_passwordpasswordGGrafana 管理员密码,默认为 pigsty
grafana_auth_proxyboolG启用 Grafana 身份代理?
grafana_pgurlurlG外部 PostgreSQL 数据库 URL(用于Grafana持久化)
grafana_view_passwordpasswordGGrafana 元数据库 PG 数据源密码

META

这一小节指定了一套 Pigsty 部署的元数据:包括版本号,管理员节点 IP 地址,软件源镜像上游区域,默认语言,以及下载软件包时使用的 http(s) 代理。

version: v4.0.0                   # pigsty 版本号
admin_ip: 10.10.10.10             # 管理节点IP地址
region: default                   # 上游镜像区域:default,china,europe
language: en                      # 默认语言: en 或 zh
proxy_env:                        # 全局HTTPS代理,用于下载、安装软件包。
  no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.myqcloud.com,*.tsinghua.edu.cn"
  # http_proxy:  # set your proxy here: e.g http://user:pass@proxy.xxx.com
  # https_proxy: # set your proxy here: e.g http://user:pass@proxy.xxx.com
  # all_proxy:   # set your proxy here: e.g http://user:pass@proxy.xxx.com

version

参数名称: version, 类型: string, 层次:G

Pigsty 版本号字符串,默认值为当前版本:v4.0.0

Pigsty 内部会使用版本号进行功能控制与内容渲染,请勿随意修改此参数。

Pigsty 使用语义化版本号,版本号字符串通常以字符 v 开头,例如 v4.0.0

admin_ip

参数名称: admin_ip, 类型: ip, 层次:G

管理节点的 IP 地址,默认为占位符 IP 地址:10.10.10.10

由该参数指定的节点将被视为管理节点,通常指向安装 Pigsty 时的第一个节点,即中控节点。

默认值 10.10.10.10 是一个占位符,会在 configure 过程中被替换为实际的管理节点 IP 地址。

许多参数都会引用此参数,例如:

在这些参数中,字符串 ${admin_ip} 会被替换为 admin_ip 的真实取值。使用这种机制,您可以为不同的节点指定不同的中控管理节点。

region

参数名称: region, 类型: enum, 层次:G

上游镜像的区域,默认可选值为:defaultchinaeurope,默认为: default

如果一个不同于 default 的区域被设置,且在 repo_upstream 中有对应的条目,将会使用该条目对应 baseurl 代替 default 中的 baseurl

例如,如果您的区域被设置为 china,那么 Pigsty 会尝试使用中国地区的上游软件镜像站点以加速下载,如果某个上游软件仓库没有对应的中国地区镜像,那么会使用默认的上游镜像站点替代。 同时,在 repo_url_packages 中定义的 URL 地址,也会进行从 repo.pigsty.iorepo.pigsty.cc 的替换,以使用国内的镜像源。

language

参数名称: language, 类型: enum, 层次:G

默认语言设置,可选值为 en(英文) 或 zh(中文),默认为 en

此参数会影响 Pigsty 生成的部分配置与内容的语言偏好,例如 Grafana 面板的初始语言设置等。

如果您是中国用户,建议将此参数设置为 zh,以获得更好的中文支持体验。

proxy_env

参数名称: proxy_env, 类型: dict, 层次:G

下载包时使用的全局代理环境变量,默认值指定了 no_proxy,即不使用代理的地址列表:

proxy_env:
  no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.aliyuncs.com,mirrors.tuna.tsinghua.edu.cn,mirrors.zju.edu.cn"
  #http_proxy: 'http://username:password@proxy.address.com'
  #https_proxy: 'http://username:password@proxy.address.com'
  #all_proxy: 'http://username:password@proxy.address.com'

当您在中国大陆地区从互联网上游安装时,特定的软件包可能会被墙,您可以使用代理来解决这个问题。

请注意,如果使用了 Docker 模块,那么这里的代理服务器配置也会写入 Docker Daemon 配置文件中。

请注意,如果在 ./configure 过程中指定了 -x 参数,那么当前环境中的代理配置信息将会被自动填入到生成的 pigsty.yaml 文件中。


CA

Pigsty 使用自签名 CA 证书,用于支持高级安全特性,例如 HTTPS 访问、PostgreSQL SSL 连接等。

ca_create: true                   # 如果 CA 不存在,是否创建?默认为 true
ca_cn: pigsty-ca                  # CA CN名称,固定为 pigsty-ca
cert_validity: 7300d              # 证书有效期,默认为 20 年

ca_create

参数名称: ca_create, 类型: bool, 层次:G

如果 CA 不存在,是否创建?默认值为 true

当设置为 true 时,如果 files/pki/ca 目录中不存在 CA 公私钥对,Pigsty 将会自动创建一个新的 CA。

如果您已经有了一对 CA 公私钥对,可以将其复制到 files/pki/ca 目录下:

  • files/pki/ca/ca.crt:CA 公钥证书
  • files/pki/ca/ca.key:CA 私钥文件

Pigsty 将会使用现有的 CA 公私钥对,而不是新建一个。如果 CA 不存在且此参数设置为 false,则会报错终止。

请务必保留并备份好部署过程中新生成的 CA 私钥文件,这对于后续签发新证书至关重要。

注意:Pigsty v3.x 使用的是 ca_method 参数(取值为 create/recreate/copy),v4.0 简化为布尔类型的 ca_create

ca_cn

参数名称: ca_cn, 类型: string, 层次:G

CA CN(Common Name)名称,固定为 pigsty-ca,不建议修改。

你可以使用以下命令来查看节点上的 Pigsty CA 证书详情:

openssl x509 -text -in /etc/pki/ca.crt

cert_validity

参数名称: cert_validity, 类型: interval, 层次:G

签发证书的有效期,默认为 20 年,对绝大多数场景都足够了。默认值为: 7300d

此参数影响由 Pigsty CA 签发的所有证书的有效期,包括:

  • PostgreSQL 服务器证书
  • Patroni API 证书
  • etcd 服务器/客户端证书
  • 其他内部服务证书

注意:Nginx 使用的 HTTPS 证书有效期由 nginx_cert_validity 单独控制,因为现代浏览器对网站证书有效期有更严格的要求(最长 397 天)。


INFRA_ID

基础设施身份标识与门户定义。

#infra_seq: 1                     # 基础设施节点序号,必选身份参数
infra_portal:                     # 通过 Nginx 门户暴露的基础设施服务
  home : { domain: i.pigsty }     # 默认首页服务器定义
infra_data: /data/infra           # 基础设施默认数据目录

infra_seq

参数名称: infra_seq, 类型: int, 层次:I

基础设施节点序号,必选身份参数,必须在基础设施节点上显式指定,所以不提供默认值。

此参数用于在多个基础设施节点的部署中唯一标识每个节点,通常使用从 1 开始的正整数。

示例配置:

infra:
  hosts:
    10.10.10.10: { infra_seq: 1 }
    10.10.10.11: { infra_seq: 2 }

infra_portal

参数名称: infra_portal, 类型: dict, 层次:G

通过 Nginx 门户暴露的基础设施服务列表。v4.0 的默认值非常简洁:

infra_portal:
  home : { domain: i.pigsty }     # 默认首页服务器定义

Pigsty 会根据实际启用的组件自动配置相应的反向代理,用户通常只需要定义首页域名即可。

每条记录由一个 Key 与一个 Value 字典组成,name 作为键,代表组件名称,value 是一个可以配置以下参数的对象:

  • name: 必填,指定 Nginx 服务器的名称
    • 默认记录:home 是固定名称,请不要修改。
    • 作为 Nginx 配置文件名称的一部分,对应配置文件:/etc/nginx/conf.d/<name>.conf
    • 没有 domain 字段的 Nginx 服务器不会生成配置文件,但会被用作引用。
  • domain: 可选,当服务需要通过 Nginx 对外暴露时为必填字段,指定使用的域名
    • 在 Pigsty 自签名 Nginx HTTPS 证书中,域名将被添加到 Nginx SSL 证书的 SAN 字段
    • Pigsty 网页交叉引用将使用这里的默认域名
  • endpoint: 通常作为 path 的替代,指定上游服务器地址。设置 endpoint 表示这是一个反向代理服务器
    • 配置中可以使用 ${admin_ip} 作为占位符,在部署时会被动态替换为 admin_ip
    • 默认反向代理服务器使用 endpoint.conf 作为配置模板
    • 反向代理服务器还可以配置 websocket 和 schema 参数
  • path: 通常作为 endpoint 的替代,指定本地文件服务器路径。设置 path 表示这是一个本地 Web 服务器
    • 本地 Web 服务器使用 path.conf 作为配置模板
    • 本地 Web 服务器还可以配置 index 参数,是否启用文件索引页
  • certbot: Certbot 证书名称,如果配置,将使用 Certbot 申请证书
    • 如果多个服务器指定相同的 certbot,Pigsty 会合并证书申请,最终证书名称为此 certbot 的值
  • cert: 证书文件路径,如果配置,将覆盖默认证书路径
  • key: 证书密钥文件路径,如果配置,将覆盖默认证书密钥路径
  • websocket: 是否启用 WebSocket 支持
    • 只有反向代理服务器可以配置此参数,如果启用将允许上游使用 WebSocket 连接
  • schema: 上游服务器使用的协议,如果配置,将覆盖默认协议
    • 默认为 http,如果配置为 https 将强制使用 HTTPS 连接到上游服务器
  • index: 是否启用文件索引页
    • 只有本地 Web 服务器可以配置此参数,如果启用将启用 autoindex 配置自动生成目录索引页
  • log: Nginx 日志文件路径
    • 如果指定,访问日志将写入此文件,否则根据服务器类型使用默认日志文件
    • 反向代理服务器使用 /var/log/nginx/<name>.log 作为默认日志文件路径
    • 本地 Web 服务器使用默认 Access 日志
  • conf: Nginx 配置文件路径
  • config: Nginx 配置代码块
    • 直接注入到 Nginx Server 配置块中的配置文本
  • enforce_https: 将 HTTP 重定向到 HTTPS
    • 可以通过 nginx_sslmode: enforce 指定全局配置
    • 此配置不影响默认的 home 服务器,它将始终同时监听 80 和 443 端口以确保兼容性

infra_data

参数名称: infra_data, 类型: path, 层次:G

基础设施数据目录,默认值为 /data/infra

此目录用于存放基础设施组件的数据文件,包括:

  • VictoriaMetrics 时序数据库数据
  • VictoriaLogs 日志数据
  • VictoriaTraces 追踪数据
  • 其他基础设施组件的持久化数据

建议将此目录放置在独立的数据盘上,以便于管理和扩展。


REPO

本节配置是关于本地软件仓库的。 Pigsty 默认会在基础设施节点上启用一个本地软件仓库(APT / YUM)。

在初始化过程中,Pigsty 会从互联网上游仓库(由 repo_upstream 指定)下载所有软件包及其依赖项(由 repo_packages 指定)到 {{ nginx_home }} / {{ repo_name }} (默认为 /www/pigsty),所有软件及其依赖的总大小约为 1GB 左右。

创建本地软件仓库时,如果仓库已存在(判断方式:仓库目录中有一个名为 repo_complete 的标记文件)Pigsty 将认为仓库已经创建完成,跳过软件下载阶段,直接使用构建好的仓库。

如果某些软件包的下载速度太慢,您可以通过使用 proxy_env 配置项来设置下载代理来完成首次下载,或直接下载预打包的 离线软件包,离线软件包本质上就是在同样操作系统上构建好的本地软件源。

repo_enabled: true                # 在此 Infra 节点上创建本地软件仓库?
repo_home: /www                   # 软件仓库主目录,默认为 /www
repo_name: pigsty                 # 软件仓库名称,默认为 pigsty
repo_endpoint: http://${admin_ip}:80 # 仓库访问端点
repo_remove: true                 # 移除现有上游仓库定义
repo_modules: infra,node,pgsql    # 启用的上游仓库模块
#repo_upstream: []                # 上游仓库定义(从操作系统变量继承)
#repo_packages: []                # 要下载的软件包(从操作系统变量继承)
#repo_extra_packages: []          # 额外要下载的软件包
repo_url_packages: []             # 通过 URL 下载的额外软件包

repo_enabled

参数名称: repo_enabled, 类型: bool, 层次:G/I

是否在当前的基础设施节点上启用本地软件源?默认为: true,即所有 Infra 节点都会设置一个本地软件仓库。

如果您有多个基础设施节点,可以只保留 1~2 个节点作为软件仓库,其他节点可以通过设置此参数为 false 来避免重复软件下载构建。

repo_home

参数名称: repo_home, 类型: path, 层次:G

本地软件仓库的家目录,默认为 Nginx 的根目录,也就是: /www

这个目录实际上是指向 nginx_data 的软链接,不建议修改此目录。如果修改,需要和 nginx_home 保持一致。

repo_name

参数名称: repo_name, 类型: string, 层次:G

本地仓库名称,默认为 pigsty,更改此仓库的名称是不明智的行为。

最终的仓库路径为 {{ repo_home }}/{{ repo_name }},默认为 /www/pigsty

repo_endpoint

参数名称: repo_endpoint, 类型: url, 层次:G

其他节点访问此仓库时使用的端点,默认值为:http://${admin_ip}:80

Pigsty 默认会在基础设施节点 80/443 端口启动 Nginx,对外提供本地软件源(静态文件)服务。

如果您修改了 nginx_portnginx_ssl_port,或者使用了不同于中控节点的基础设施节点,请相应调整此参数。

如果您使用了域名,可以在 node_default_etc_hostsnode_etc_hosts、或者 dns_records 中添加解析。

repo_remove

参数名称: repo_remove, 类型: bool, 层次:G/A

在构建本地软件源时,是否移除现有的上游仓库定义?默认值: true

当启用此参数时,/etc/yum.repos.d 中所有已有仓库文件会被移动备份至 /etc/yum.repos.d/backup,在 Debian 系上是移除 /etc/apt/sources.list/etc/apt/sources.list.d,将文件备份至 /etc/apt/backup 中。

因为操作系统已有的源内容不可控,使用 Pigsty 验证过的上游软件源可以提高从互联网下载软件包的成功率与速度。

但在一些特定情况下(例如您的操作系统是某种 EL/Deb 兼容版,许多软件包使用了自己的私有源),您可能需要保留现有的上游仓库定义,此时可以将此参数设置为 false

repo_modules

参数名称: repo_modules, 类型: string, 层次:G/A

哪些上游仓库模块会被添加到本地软件源中,默认值: infra,node,pgsql

当 Pigsty 尝试添加上游仓库时,会根据此参数的值来过滤 repo_upstream 中的条目,只有 module 字段与此参数值匹配的条目才会被添加到本地软件源中。

模块以逗号分隔,可用的模块列表请参考 repo_upstream 中的定义,常见模块包括:

  • local: 本地 Pigsty 仓库
  • infra: 基础设施软件包(Nginx、Docker 等)
  • node: 操作系统基础软件包
  • pgsql: PostgreSQL 相关软件包
  • extra: 额外的 PostgreSQL 扩展
  • docker: Docker 相关
  • redis: Redis 相关
  • mongo: MongoDB 相关
  • mysql: MySQL 相关
  • 等等…

repo_upstream

参数名称: repo_upstream, 类型: upstream[], 层次:G

构建本地软件源时,从哪里下载上游软件包?本参数没有默认值,如果用户不在配置文件中显式指定,则会从根据当前节点的操作系统族,从定义于 roles/node_id/vars 中的 repo_upstream_default 变量中加载获取。

Pigsty 为不同的操作系统版本(EL8/9/10, Debian 11/12/13, Ubuntu 22/24)预置了完整的上游仓库定义,包括:

  • 操作系统基础仓库(BaseOS、AppStream、EPEL 等)
  • PostgreSQL 官方 PGDG 仓库
  • Pigsty 扩展仓库
  • 各种第三方软件仓库(Docker、Nginx、Grafana 等)

每个上游仓库定义包含以下字段:

- name: pigsty-pgsql              # 仓库名称
  description: 'Pigsty PGSQL'     # 仓库描述
  module: pgsql                   # 所属模块
  releases: [8,9,10]              # 支持的操作系统版本
  arch: [x86_64, aarch64]         # 支持的 CPU 架构
  baseurl:                        # 仓库 URL,按区域配置
    default: 'https://repo.pigsty.io/yum/pgsql/el$releasever.$basearch'
    china: 'https://repo.pigsty.cc/yum/pgsql/el$releasever.$basearch'

用户通常不需要修改此参数,除非有特殊的仓库需求。详细的仓库定义请参考 roles/node_id/vars/ 目录下对应操作系统的配置文件。

repo_packages

参数名称: repo_packages, 类型: string[], 层次:G

字符串数组类型,每一行都是 由空格分隔 的软件包列表字符串,指定将要使用 repotrackapt download 下载到本地的软件包(及其依赖)。

本参数没有默认值,即默认值为未定义状态。如果该参数没有被显式定义,那么 Pigsty 会从 roles/node_id/vars 中定义的 repo_packages_default 变量中加载获取默认值,默认值为:

[ node-bootstrap, infra-package, infra-addons, node-package1, node-package2, pgsql-utility, extra-modules ]

该参数中的每个元素,都会在上述文件中定义的 package_map 中,根据特定的操作系统发行版大版本进行翻译。例如在 EL 系统上会翻译为:

node-bootstrap:          "ansible python3 python3-pip python3-virtualenv python3-requests python3-jmespath python3-cryptography dnf-utils modulemd-tools createrepo_c sshpass"
infra-package:           "nginx dnsmasq etcd haproxy vip-manager node_exporter keepalived_exporter pg_exporter pgbackrest_exporter redis_exporter redis minio mcli pig"
infra-addons:            "grafana grafana-plugins grafana-victoriametrics-ds grafana-victorialogs-ds victoria-metrics victoria-logs victoria-traces vlogscli vmutils vector alertmanager"

作为一个使用约定,repo_packages 中通常包括了那些与 PostgreSQL 大版本号无关的软件包(例如 Infra,Node 和 PGDG Common 等部分),而 PostgreSQL 大版本相关的软件包(内核,扩展),通常在 repo_extra_packages 中指定,方便用户切换 PG 大版本。

repo_extra_packages

参数名称: repo_extra_packages, 类型: string[], 层次:G/C/I

用于在不修改 repo_packages 的基础上,指定额外需要下载的软件包(通常是 PG 大版本相关的软件包),默认值为空列表。

如果该参数没有被显式定义,那么 Pigsty 会从 roles/node_id/vars 中定义的 repo_extra_packages_default 变量中加载获取默认值,默认值为:

[ pgsql-main ]

该参数中的元素会进行包名翻译,其中 $v 会被替换为 pg_version,即当前 PG 大版本号(默认为 18)。

这里的 pgsql-main 在 EL 系统上会翻译为:

postgresql$v postgresql$v-server postgresql$v-libs postgresql$v-contrib postgresql$v-plperl postgresql$v-plpython3 postgresql$v-pltcl postgresql$v-llvmjit pg_repack_$v* wal2json_$v* pgvector_$v*

通常用户可以在这里指定 PostgreSQL 大版本相关的软件包,而不影响 repo_packages 中定义的其他 PG 大版本无关的软件包。

repo_url_packages

参数名称: repo_url_packages, 类型: object[] | string[], 层次:G

直接使用 URL 从互联网上下载的软件包,默认为空数组: []

您可以直接在本参数中使用 URL 字符串作为数组元素,也可以使用对象结构,显式指定 URL 与文件名称。

请注意,本参数会受到 region 变量的影响,如果您在中国大陆地区,Pigsty 会自动将 URL 替换为国内镜像站点,即将 URL 里的 repo.pigsty.io 替换为 repo.pigsty.cc


INFRA_PACKAGE

这些软件包只会在 INFRA 节点上安装,包括普通的 RPM/DEB 软件包,以及 PIP 软件包。

infra_packages

参数名称: infra_packages, 类型: string[], 层次:G

字符串数组类型,每一行都是 由空格分隔 的软件包列表字符串,指定将要在 Infra 节点上安装的软件包列表。

本参数没有默认值,即默认值为未定义状态。如果用户不在配置文件中显式指定本参数,则 Pigsty 会从根据当前节点的操作系统族,从定义于 roles/node_id/vars 中的 infra_packages_default 变量中加载获取默认值。

v4.0 默认值(EL系操作系统):

infra_packages_default:
  - grafana,grafana-plugins,grafana-victorialogs-ds,grafana-victoriametrics-ds,victoria-metrics,victoria-logs,victoria-traces,vmutils,vlogscli,alertmanager
  - node_exporter,blackbox_exporter,nginx_exporter,pg_exporter,pev2,nginx,dnsmasq,ansible,etcd,python3-requests,redis,mcli,restic,certbot,python3-certbot-nginx

默认值(Debian/Ubuntu):

infra_packages_default:
  - grafana,grafana-plugins,grafana-victorialogs-ds,grafana-victoriametrics-ds,victoria-metrics,victoria-logs,victoria-traces,vmutils,vlogscli,alertmanager
  - node-exporter,blackbox-exporter,nginx-exporter,pg-exporter,pev2,nginx,dnsmasq,ansible,etcd,python3-requests,redis,mcli,restic,certbot,python3-certbot-nginx

注意:v4.0 使用 VictoriaMetrics 套件替代了 Prometheus 和 Loki,因此软件包列表与 v3.x 有显著差异。

infra_packages_pip

参数名称: infra_packages_pip, 类型: string, 层次:G

Infra 节点上要使用 pip 额外安装的软件包,包名使用逗号分隔,默认值是空字符串,即不安装任何额外的 python 包。

示例:

infra_packages_pip: 'requests,boto3,awscli'

NGINX

Pigsty 会通过 Nginx 代理所有的 Web 服务访问:Home Page、Grafana、VictoriaMetrics 等等。 以及其他可选的工具,如 PGWeb、Jupyter Lab、Pgadmin、Bytebase 等等,还有一些静态资源和报告,如 pevschemaspypgbadger

最重要的是,Nginx 还作为本地软件仓库(Yum/Apt)的 Web 服务器,用于存储和分发 Pigsty 的软件包。

nginx_enabled: true               # 在此 Infra 节点上启用 Nginx?
nginx_clean: false                # 初始化时清理现有 Nginx 配置?
nginx_exporter_enabled: true      # 启用 nginx_exporter?
nginx_exporter_port: 9113         # nginx_exporter 监听端口
nginx_sslmode: enable             # SSL 模式:disable,enable,enforce
nginx_cert_validity: 397d         # 自签名证书有效期
nginx_home: /www                  # Nginx 内容目录(软链接)
nginx_data: /data/nginx           # Nginx 实际数据目录
nginx_users: {}                   # 基础认证用户字典
nginx_port: 80                    # HTTP 端口
nginx_ssl_port: 443               # HTTPS 端口
certbot_sign: false               # 使用 certbot 签署证书?
certbot_email: your@email.com     # certbot 邮箱
certbot_options: ''               # certbot 额外选项

nginx_enabled

参数名称: nginx_enabled, 类型: bool, 层次:G/I

是否在当前的 Infra 节点上启用 Nginx?默认值为: true

Nginx 是 Pigsty 基础设施的核心组件,负责:

  • 提供本地软件仓库服务
  • 反向代理 Grafana、VictoriaMetrics 等 Web 服务
  • 托管静态文件和报告

nginx_clean

参数名称: nginx_clean, 类型: bool, 层次:G/A

初始化时是否清理现有的 Nginx 配置?默认值为: false

当设置为 true 时,在 Nginx 初始化过程中会删除 /etc/nginx/conf.d/ 下的所有现有配置文件,确保一个干净的起点。

如果您是首次部署或希望完全重建 Nginx 配置,可以将此参数设置为 true

nginx_exporter_enabled

参数名称: nginx_exporter_enabled, 类型: bool, 层次:G/I

在此基础设施节点上启用 nginx_exporter ?默认值为: true

如果禁用此选项,还会一并禁用 /nginx 健康检查 stub,当您安装使用的 Nginx 版本不支持此功能时可以考虑关闭此开关。

nginx_exporter_port

参数名称: nginx_exporter_port, 类型: port, 层次:G

nginx_exporter 监听端口,默认值为 9113

nginx_exporter 用于收集 Nginx 的运行指标,供 VictoriaMetrics 抓取监控。

nginx_sslmode

参数名称: nginx_sslmode, 类型: enum, 层次:G

Nginx 的 SSL 工作模式?有三种选择:disable , enable , enforce, 默认值为 enable,即启用 SSL,但不强制使用。

  • disable:只监听 nginx_port 指定的端口服务 HTTP 请求。
  • enable:同时会监听 nginx_ssl_port 指定的端口服务 HTTPS 请求。
  • enforce:所有链接都会被渲染为默认使用 https://
    • 同时会将 infra_portal 中非默认服务器的 80 端口重定向到 443 端口

nginx_cert_validity

参数名称: nginx_cert_validity, 类型: duration, 层次:G

Nginx 自签名证书的有效期,默认值为 397d(约13个月)。

现代浏览器要求网站证书的有效期最多为 397 天,因此这是默认值。不建议设置更长的有效期,否则浏览器可能会拒绝信任该证书。

nginx_home

参数名称: nginx_home, 类型: path, 层次:G

Nginx 服务器静态文件目录,默认为: /www

这是一个软链接,实际指向 nginx_data 目录。此目录包含静态资源和软件仓库文件。

最好不要随意修改此参数,修改时需要与 repo_home 参数保持一致。

nginx_data

参数名称: nginx_data, 类型: path, 层次:G

Nginx 实际数据目录,默认为 /data/nginx

这是 Nginx 静态文件的实际存储位置,nginx_home 是指向此目录的软链接。

建议将此目录放置在数据盘上,以便于管理大量的软件包文件。

nginx_users

参数名称: nginx_users, 类型: dict, 层次:G

Nginx 基础认证(Basic Auth)用户字典,默认为空字典 {}

格式为 { username: password } 的键值对,例如:

nginx_users:
  admin: pigsty
  viewer: readonly

这些用户可用于保护某些需要认证的 Nginx 端点。

nginx_port

参数名称: nginx_port, 类型: port, 层次:G

Nginx 默认监听的端口(提供 HTTP 服务),默认为 80 端口,最好不要修改这个参数。

当您的服务器 80 端口被占用时,可以考虑修改此参数,但是需要同时修改 repo_endpoint ,以及 node_repo_local_urls 所使用的端口并与这里保持一致。

nginx_ssl_port

参数名称: nginx_ssl_port, 类型: port, 层次:G

Nginx SSL 默认监听的端口,默认为 443,最好不要修改这个参数。

certbot_sign

参数名称: certbot_sign, 类型: bool, 层次:G/A

是否在安装过程中使用 certbot 签署 Nginx 证书?默认值为 false

当设置为 true 时,Pigsty 会在执行 infra.ymlinstall.yml 剧本(nginx 角色)期间使用 certbot 自动从 Let’s Encrypt 申请免费 SSL 证书。

infra_portal 定义的域名中,如果定义了 certbot 参数,Pigsty 将使用 certbot 为 domain 域名申请证书,证书名称将是 certbot 参数的值。如果多个服务器/域名指定了相同的 certbot 参数,Pigsty 会合并并为这些域名申请一个证书,使用 certbot 参数的值作为证书名称。

启用此选项需要:

  • 当前节点可以通过公共域名访问,并且 DNS 解析已正确指向当前节点的公网 IP
  • 当前节点可以访问 Let’s Encrypt API 接口

此选项默认禁用,您可以在安装后手动执行 make cert 命令来手动执行,它实际上调用渲染好的 /etc/nginx/sign-cert 脚本,使用 certbot 更新或申请证书。

certbot_email

参数名称: certbot_email, 类型: string, 层次:G/A

用于接收证书过期提醒邮件的电子邮件地址,默认值为 your@email.com

certbot_sign 设置为 true 时,建议提供此参数。Let’s Encrypt 会在证书即将过期时向此邮箱发送提醒邮件。

certbot_options

参数名称: certbot_options, 类型: string, 层次:G/A

传递给 certbot 的额外配置参数,默认值为空字符串。

您可以通过此参数向 certbot 传递额外的命令行选项,例如 --dry-run,则 certbot 不会实际申请证书,而是进行预览和测试。


DNS

Pigsty 默认会在 Infra 节点上启用 DNSMASQ 服务,用于解析一些辅助域名,例如 i.pigstym.pigstyapi.pigsty 等等,以及可选 MinIO 的 sss.pigsty

解析记录会记录在 Infra 节点的 /etc/hosts.d/default 文件中。 要使用这个 DNS 服务器,您必须将 nameserver <ip> 添加到 /etc/resolv.conf 中,node_dns_servers 参数可以解决这个问题。

dns_enabled: true                 # 在此 Infra 节点上设置 dnsmasq?
dns_port: 53                      # DNS 服务器监听端口
dns_records:                      # 动态 DNS 记录
  - "${admin_ip} i.pigsty"
  - "${admin_ip} m.pigsty supa.pigsty api.pigsty adm.pigsty cli.pigsty ddl.pigsty"

dns_enabled

参数名称: dns_enabled, 类型: bool, 层次:G/I

是否在这个 Infra 节点上启用 DNSMASQ 服务?默认值为: true

如果你不想使用默认的 DNS 服务器(比如你已经有了外部的 DNS 服务器,或者您的供应商不允许您使用 DNS 服务器)可以将此值设置为 false 来禁用它。 并使用 node_default_etc_hostsnode_etc_hosts 静态解析记录代替。

dns_port

参数名称: dns_port, 类型: port, 层次:G

DNSMASQ 的默认监听端口,默认是 53,不建议修改 DNS 服务默认端口。

dns_records

参数名称: dns_records, 类型: string[], 层次:G

由 dnsmasq 负责解析的动态 DNS 记录,一般用于将一些辅助域名解析到管理节点。这些记录会被写入到基础设施节点的 /etc/hosts.d/default 文件中。

v4.0 默认值:

dns_records:
  - "${admin_ip} i.pigsty"
  - "${admin_ip} m.pigsty supa.pigsty api.pigsty adm.pigsty cli.pigsty ddl.pigsty"

这里使用 ${admin_ip} 占位符,在部署时会被替换为实际的 admin_ip 值。

常见的域名用途:

  • i.pigsty:Pigsty 首页
  • m.pigsty:VictoriaMetrics Web UI
  • api.pigsty:API 服务
  • adm.pigsty:管理服务
  • 其他根据实际部署需求自定义

VICTORIA

Pigsty v4.0 使用 VictoriaMetrics 套件替代 Prometheus 和 Loki,提供更优秀的可观测性解决方案:

  • VictoriaMetrics:替代 Prometheus,作为时序数据库存储监控指标
  • VictoriaLogs:替代 Loki,作为日志聚合存储
  • VictoriaTraces:分布式追踪存储
  • VMAlert:替代 Prometheus Alerting,进行告警规则评估
vmetrics_enabled: true            # 启用 VictoriaMetrics?
vmetrics_clean: false             # 初始化时清理数据?
vmetrics_port: 8428               # 监听端口
vmetrics_scrape_interval: 10s     # 全局抓取间隔
vmetrics_scrape_timeout: 8s       # 全局抓取超时
vmetrics_options: >-
  -retentionPeriod=15d
  -promscrape.fileSDCheckInterval=5s
vlogs_enabled: true               # 启用 VictoriaLogs?
vlogs_clean: false                # 初始化时清理数据?
vlogs_port: 9428                  # 监听端口
vlogs_options: >-
  -retentionPeriod=15d
  -retention.maxDiskSpaceUsageBytes=50GiB
  -insert.maxLineSizeBytes=1MB
  -search.maxQueryDuration=120s
vtraces_enabled: true             # 启用 VictoriaTraces?
vtraces_clean: false              # 初始化时清理数据?
vtraces_port: 10428               # 监听端口
vtraces_options: >-
  -retentionPeriod=15d
  -retention.maxDiskSpaceUsageBytes=50GiB
vmalert_enabled: true             # 启用 VMAlert?
vmalert_port: 8880                # 监听端口
vmalert_options: ''               # 额外命令行参数

vmetrics_enabled

参数名称: vmetrics_enabled, 类型: bool, 层次:G/I

是否在当前 Infra 节点上启用 VictoriaMetrics?默认值为 true

VictoriaMetrics 是 Pigsty v4.0 的核心监控组件,替代 Prometheus 作为时序数据库,负责:

  • 从各个 Exporter 抓取监控指标
  • 存储时序数据
  • 提供 PromQL 兼容的查询接口
  • 支持 Grafana 数据源

vmetrics_clean

参数名称: vmetrics_clean, 类型: bool, 层次:G/A

初始化 VictoriaMetrics 时是否清理现有数据?默认值为 false

当设置为 true 时,在初始化过程中会删除已有的时序数据。谨慎使用此选项,除非您确定要重建监控数据。

vmetrics_port

参数名称: vmetrics_port, 类型: port, 层次:G

VictoriaMetrics 监听端口,默认值为 8428

此端口用于:

  • HTTP API 访问
  • Web UI 访问
  • Prometheus 兼容的远程写入/读取
  • Grafana 数据源连接

vmetrics_scrape_interval

参数名称: vmetrics_scrape_interval, 类型: interval, 层次:G

VictoriaMetrics 全局指标抓取周期,默认值为 10s

在生产环境,10秒 - 30秒是一个较为合适的抓取周期。如果您需要更精细的监控数据粒度,可以调整此参数,但会增加存储和 CPU 开销。

vmetrics_scrape_timeout

参数名称: vmetrics_scrape_timeout, 类型: interval, 层次:G

VictoriaMetrics 全局抓取超时,默认为 8s

设置抓取超时可以有效避免监控系统查询导致的雪崩,设置原则是本参数必须小于并接近 vmetrics_scrape_interval,确保每次抓取时长不超过抓取周期。

vmetrics_options

参数名称: vmetrics_options, 类型: arg, 层次:G

VictoriaMetrics 的额外命令行参数,默认值:

vmetrics_options: >-
  -retentionPeriod=15d
  -promscrape.fileSDCheckInterval=5s

常用参数说明:

  • -retentionPeriod=15d:数据保留期限,默认 15 天
  • -promscrape.fileSDCheckInterval=5s:文件服务发现刷新间隔

您可以根据需要添加其他 VictoriaMetrics 支持的参数。

vlogs_enabled

参数名称: vlogs_enabled, 类型: bool, 层次:G/I

是否在当前 Infra 节点上启用 VictoriaLogs?默认值为 true

VictoriaLogs 替代 Loki 作为日志聚合存储,负责:

  • 接收来自 Vector 的日志数据
  • 存储和索引日志
  • 提供日志查询接口
  • 支持 Grafana VictoriaLogs 数据源

vlogs_clean

参数名称: vlogs_clean, 类型: bool, 层次:G/A

初始化 VictoriaLogs 时是否清理现有数据?默认值为 false

vlogs_port

参数名称: vlogs_port, 类型: port, 层次:G

VictoriaLogs 监听端口,默认值为 9428

vlogs_options

参数名称: vlogs_options, 类型: arg, 层次:G

VictoriaLogs 的额外命令行参数,默认值:

vlogs_options: >-
  -retentionPeriod=15d
  -retention.maxDiskSpaceUsageBytes=50GiB
  -insert.maxLineSizeBytes=1MB
  -search.maxQueryDuration=120s

常用参数说明:

  • -retentionPeriod=15d:日志保留期限,默认 15 天
  • -retention.maxDiskSpaceUsageBytes=50GiB:最大磁盘使用量
  • -insert.maxLineSizeBytes=1MB:单行日志最大大小
  • -search.maxQueryDuration=120s:查询最大执行时间

vtraces_enabled

参数名称: vtraces_enabled, 类型: bool, 层次:G/I

是否在当前 Infra 节点上启用 VictoriaTraces?默认值为 true

VictoriaTraces 用于分布式追踪数据的存储和查询,支持 Jaeger、Zipkin 等追踪协议。

vtraces_clean

参数名称: vtraces_clean, 类型: bool, 层次:G/A

初始化 VictoriaTraces 时是否清理现有数据?默认值为 false

vtraces_port

参数名称: vtraces_port, 类型: port, 层次:G

VictoriaTraces 监听端口,默认值为 10428

vtraces_options

参数名称: vtraces_options, 类型: arg, 层次:G

VictoriaTraces 的额外命令行参数,默认值:

vtraces_options: >-
  -retentionPeriod=15d
  -retention.maxDiskSpaceUsageBytes=50GiB

vmalert_enabled

参数名称: vmalert_enabled, 类型: bool, 层次:G/I

是否在当前 Infra 节点上启用 VMAlert?默认值为 true

VMAlert 负责告警规则评估,替代 Prometheus Alerting 功能,与 Alertmanager 配合使用。

vmalert_port

参数名称: vmalert_port, 类型: port, 层次:G

VMAlert 监听端口,默认值为 8880

vmalert_options

参数名称: vmalert_options, 类型: arg, 层次:G

VMAlert 的额外命令行参数,默认值为空字符串。


PROMETHEUS

此部分现在主要包含 Blackbox Exporter 和 Alertmanager 的配置。

注意:Pigsty v4.0 使用 VictoriaMetrics 替代 Prometheus,原有的 prometheus_*pushgateway_* 参数已移至 VICTORIA 部分。

blackbox_enabled: true            # 启用 blackbox_exporter?
blackbox_port: 9115               # blackbox_exporter 监听端口
blackbox_options: ''              # 额外命令行参数
alertmanager_enabled: true        # 启用 alertmanager?
alertmanager_port: 9059           # alertmanager 监听端口
alertmanager_options: ''          # 额外命令行参数
exporter_metrics_path: /metrics   # exporter 指标路径

blackbox_enabled

参数名称: blackbox_enabled, 类型: bool, 层次:G/I

是否在当前 Infra 节点上启用 BlackboxExporter?默认值为 true

BlackboxExporter 会向节点 IP 地址、VIP 地址、PostgreSQL VIP 地址发送 ICMP 报文测试网络连通性,还可以进行 HTTP、TCP、DNS 等探测。

blackbox_port

参数名称: blackbox_port, 类型: port, 层次:G

Blackbox Exporter 监听端口,默认值为 9115

blackbox_options

参数名称: blackbox_options, 类型: arg, 层次:G

BlackboxExporter 的额外命令行参数,默认值:空字符串。

alertmanager_enabled

参数名称: alertmanager_enabled, 类型: bool, 层次:G/I

是否在当前 Infra 节点上启用 AlertManager?默认值为 true

AlertManager 负责接收来自 VMAlert 的告警通知,并进行告警分组、抑制、静默、路由等处理。

alertmanager_port

参数名称: alertmanager_port, 类型: port, 层次:G

AlertManager 监听端口,默认值为 9059

如果您修改了此端口,请确保相应更新 infra_portal 中 alertmanager 条目的 endpoint 配置(如果有定义的话)。

alertmanager_options

参数名称: alertmanager_options, 类型: arg, 层次:G

AlertManager 的额外命令行参数,默认值:空字符串。

exporter_metrics_path

参数名称: exporter_metrics_path, 类型: path, 层次:G

监控 exporter 暴露指标的 HTTP 端点路径,默认为: /metrics,不建议修改此参数。

此参数定义了所有 Exporter 暴露监控指标的标准路径。


GRAFANA

Pigsty 使用 Grafana 作为监控系统前端。它也可以作为数据分析与可视化平台,或者用于低代码数据应用开发,制作数据应用原型等目的。

grafana_enabled: true             # 启用 Grafana?
grafana_port: 3000                # Grafana 监听端口
grafana_clean: true               # 初始化时清理数据?
grafana_admin_username: admin     # 管理员用户名
grafana_admin_password: pigsty    # 管理员密码
grafana_auth_proxy: false         # 启用身份代理?
grafana_pgurl: ''                 # 外部 PostgreSQL URL
grafana_view_password: DBUser.Viewer  # PG 数据源密码

grafana_enabled

参数名称: grafana_enabled, 类型: bool, 层次:G/I

是否在 Infra 节点上启用 Grafana?默认值为: true,即所有基础设施节点默认都会安装启用 Grafana。

grafana_port

参数名称: grafana_port, 类型: port, 层次:G

Grafana 监听端口,默认值为 3000

如果您需要直接访问 Grafana(不通过 Nginx 反向代理),可以使用此端口。

grafana_clean

参数名称: grafana_clean, 类型: bool, 层次:G/A

是否在初始化 Grafana 时一并清理其数据文件?默认为:true

该操作会移除 /var/lib/grafana/grafana.db,确保 Grafana 全新安装。

如果您希望保留现有的 Grafana 配置(如仪表盘、用户、数据源等),请将此参数设置为 false

grafana_admin_username

参数名称: grafana_admin_username, 类型: username, 层次:G

Grafana 管理员用户名,默认为 admin

grafana_admin_password

参数名称: grafana_admin_password, 类型: password, 层次:G

Grafana 管理员密码,默认为 pigsty

重要提示:请务必在生产部署中修改此密码参数!

grafana_auth_proxy

参数名称: grafana_auth_proxy, 类型: bool, 层次:G

是否启用 Grafana 身份代理?默认为 false

当启用时,Grafana 会信任反向代理(Nginx)传递的用户身份信息,实现单点登录(SSO)功能。

这通常用于与外部身份认证系统集成的场景。

grafana_pgurl

参数名称: grafana_pgurl, 类型: url, 层次:G

外部 PostgreSQL 数据库 URL,用于 Grafana 持久化存储。默认为空字符串。

如果指定,Grafana 将使用此 PostgreSQL 数据库替代默认的 SQLite 数据库存储其配置数据。

格式示例:postgres://grafana:password@pg-meta:5432/grafana?sslmode=disable

这对于需要 Grafana 高可用部署或数据持久化的场景非常有用。

grafana_view_password

参数名称: grafana_view_password, 类型: password, 层次:G

Grafana 元数据库 PG 数据源使用的只读用户密码,默认为 DBUser.Viewer

此密码用于 Grafana 连接 PostgreSQL CMDB 数据源,以只读方式查询元数据。

4 - 预置剧本

如何使用预置的 ansible 剧本来管理 INFRA 集群,常用管理命令速查。

Pigsty 提供了三个与 INFRA 模块相关的剧本:

  • infra.yml:在 infra 节点上初始化 pigsty 基础设施
  • infra-rm.yml:从 infra 节点移除基础设施组件
  • install.yml:在当前节点上一次性完整安装 Pigsty

infra.yml

在配置文件的 infra 分组所定义的 Infra节点 上初始化基础设施模块。

执行该剧本将完成以下任务:

  • 配置 Infra节点 的目录与环境变量
  • 下载并创建本地软件仓库,加速后续安装
  • 将当前 Infra节点 作为普通节点纳入 Pigsty 管理
  • 部署基础设施组件(VictoriaMetrics/Logs/Traces、VMAlert、Grafana、Alertmanager、Blackbox Exporter 等)

剧本注意事项

  • 本剧本为幂等剧本,重复执行会抹除 Infra 节点上的基础设施组件
  • 如需保留历史监控数据,请先将 vmetrics_cleanvlogs_cleanvtraces_clean 设置为 false
  • 除非设置 grafana_cleanfalse,否则 Grafana 监控面板与配置修改会丢失
  • 当本地软件仓库 /www/pigsty/repo_complete 存在时,本剧本会跳过从互联网下载软件的任务
  • 完整执行该剧本耗时约1~3分钟,视机器配置与网络条件而异

可用任务列表

# ca: create self-signed CA on localhost files/pki
#   - ca_dir        : create CA directory
#   - ca_private    : generate ca private key: files/pki/ca/ca.key
#   - ca_cert       : signing ca cert: files/pki/ca/ca.crt
#
# id: generate node identity
#
# repo: bootstrap a local yum repo from internet or offline packages
#   - repo_dir      : create repo directory
#   - repo_check    : check repo exists
#   - repo_prepare  : use existing repo if exists
#   - repo_build    : build repo from upstream if not exists
#     - repo_upstream    : handle upstream repo files in /etc/yum.repos.d
#       - repo_remove    : remove existing repo file if repo_remove == true
#       - repo_add       : add upstream repo files to /etc/yum.repos.d
#     - repo_url_pkg     : download packages from internet defined by repo_url_packages
#     - repo_cache       : make upstream yum cache with yum makecache
#     - repo_boot_pkg    : install bootstrap pkg such as createrepo_c,yum-utils,...
#     - repo_pkg         : download packages & dependencies from upstream repo
#     - repo_create      : create a local yum repo with createrepo_c & modifyrepo_c
#     - repo_use         : add newly built repo into /etc/yum.repos.d
#   - repo_nginx    : launch a nginx for repo if no nginx is serving
#
# node/haproxy/docker/monitor: setup infra node as a common node
#   - node_name, node_hosts, node_resolv, node_firewall, node_ca, node_repo, node_pkg
#   - node_feature, node_kernel, node_tune, node_sysctl, node_profile, node_ulimit
#   - node_data, node_admin, node_timezone, node_ntp, node_crontab, node_vip
#   - haproxy_install, haproxy_config, haproxy_launch, haproxy_reload
#   - docker_install, docker_admin, docker_config, docker_launch, docker_image
#   - haproxy_register, node_exporter, node_register, vector
#
# infra: setup infra components
#   - infra_env      : env_dir, env_pg, env_pgadmin, env_var
#   - infra_pkg      : infra_pkg_yum, infra_pkg_pip
#   - infra_user     : setup infra os user group
#   - infra_cert     : issue cert for infra components
#   - dns            : dns_config, dns_record, dns_launch
#   - nginx          : nginx_config, nginx_cert, nginx_static, nginx_launch, nginx_certbot, nginx_reload, nginx_exporter
#   - victoria       : vmetrics_config, vmetrics_launch, vlogs_config, vlogs_launch, vtraces_config, vtraces_launch, vmalert_config, vmalert_launch
#   - alertmanager   : alertmanager_config, alertmanager_launch
#   - blackbox       : blackbox_config, blackbox_launch
#   - grafana        : grafana_clean, grafana_config, grafana_launch, grafana_provision
#   - infra_register : register infra components to victoria

infra-rm.yml

从配置文件 infra 分组定义的 Infra节点 上移除 Pigsty 基础设施。

常用子任务包括:

./infra-rm.yml               # 移除 INFRA 模块
./infra-rm.yml -t service    # 停止 INFRA 上的基础设施服务
./infra-rm.yml -t data       # 移除 INFRA 上的存留数据
./infra-rm.yml -t package    # 卸载 INFRA 上安装的软件包

install.yml

在所有节点上一次性完整安装 Pigsty。

该剧本在 剧本:一次性安装 中有更详细的介绍。

5 - 管理预案

基础设施组件与 Infra 集群管理 SOP:创建,销毁,扩容,缩容,证书,仓库……

本章节介绍 Pigsty 部署的日常管理和运维操作。

5.1 - Ansible

使用 Ansible 运行管理命令

所有 INFRA 节点上都默认安装了 Ansible,可以用于管理整套部署。

Pigsty 基于 Ansible 实现自动化管理,它遵循 基础设施即代码(Infrastructure-as-Code) 的理念。

对于管理数据库与基础设施而言,Ansible 的知识很有用,但并非必需。您只需知道如何执行 剧本(Playbook) —— 那些定义了一系列自动化任务的 YAML 文件即可。


安装

Pigsty 在 引导过程 中会尽力自动安装 ansible 及其依赖项。 如需手动安装,请使用以下命令:

# Debian / Ubuntu
sudo apt install -y ansible python3-jmespath

# EL 10
sudo dnf install -y ansible python-jmespath

# EL 8/9
sudo dnf install -y ansible python3.12-jmespath

# EL 7
sudo yum install -y ansible python-jmespath

macOS

macOS 用户可使用 Homebrew 安装:

brew install ansible
pip3 install jmespath

基础用法

运行剧本只需执行 ./path/to/playbook.yml 即可。以下是最常用的 Ansible 命令行参数:

用途参数说明
在哪里-l / --limit <pattern>限定目标主机/分组/匹配模式
做什么-t / --tags <tags>仅运行带有指定标签的任务
怎么做-e / --extra-vars <vars>传递额外的命令行变量
用什么配置-i / --inventory <path>指定配置清单文件路径

限定主机

使用 -l|--limit <pattern> 参数可将执行范围限定到特定的分组、主机或匹配模式:

./node.yml                      # 在所有节点上执行
./pgsql.yml -l pg-test          # 仅在 pg-test 集群上执行
./pgsql.yml -l pg-*             # 在所有以 pg- 开头的集群上执行
./pgsql.yml -l 10.10.10.10      # 仅在特定 IP 的主机上执行

不指定主机限制直接运行剧本可能非常危险!缺省情况下,大多数剧本会在 所有(all 主机上执行。务必谨慎使用!


限定任务

使用 -t|--tags <tags> 参数可仅执行带有指定标签的任务子集:

./infra.yml -t repo           # 仅执行创建本地仓库的任务
./infra.yml -t repo_upstream  # 仅执行添加上游仓库的任务
./node.yml -t node_pkg        # 仅执行安装节点软件包的任务
./pgsql.yml -t pg_hba         # 仅执行渲染 pg_hba.conf 的任务

传递变量

使用 -e|--extra-vars <key=value> 参数可在运行时覆盖变量:

./pgsql.yml -e pg_clean=true         # 强制清理现有的 PG 实例
./pgsql-rm.yml -e pg_rm_pkg=false    # 卸载时保留软件包
./node.yml -e '{"node_tune":"tiny"}' # 使用 JSON 格式传递变量
./pgsql.yml -e @/path/to/config.yml  # 从 YAML 文件加载变量

指定配置清单

默认情况下,Ansible 会使用当前目录下的 pigsty.yml 作为配置清单。 使用 -i|--inventory <path> 参数可指定其他配置文件:

./pgsql.yml -i files/pigsty/full.yml -l pg-test

[!NOTE]

若要永久更改默认配置文件的路径,可修改 ansible.cfg 中的 inventory 参数。

5.2 - 剧本

Pigsty 内置的 Ansible 剧本

Pigsty 使用幂等的 Ansible 剧本实现管理控制。执行剧本需要将 ansible-playbook 添加到系统 PATH 中,用户需要先 安装 Ansible 才能执行剧本。

可用剧本

模块剧本功能
INFRAinstall.yml一键式安装 Pigsty
INFRAinfra.yml在基础设施节点上初始化 Pigsty 基础设施
INFRAinfra-rm.yml从基础设施节点移除基础设施组件
INFRAcache.yml从目标节点制作离线安装包
INFRAcert.yml使用 Pigsty 自签名 CA 签发证书
NODEnode.yml初始化节点,将节点配置到预期状态
NODEnode-rm.yml从 Pigsty 中移除节点
PGSQLpgsql.yml初始化高可用 PostgreSQL 集群,或添加新从库
PGSQLpgsql-rm.yml移除 PostgreSQL 集群,或移除从库
PGSQLpgsql-db.yml向现有 PostgreSQL 集群添加新业务数据库
PGSQLpgsql-user.yml向现有 PostgreSQL 集群添加新业务用户
PGSQLpgsql-pitr.yml在现有 PostgreSQL 集群上执行时间点恢复(PITR)
PGSQLpgsql-monitor.yml使用本地导出器监控远程 PostgreSQL 实例
PGSQLpgsql-migration.yml为现有 PostgreSQL 生成迁移手册和脚本
PGSQLslim.yml以最小化组件安装 Pigsty
REDISredis.yml初始化 Redis 集群/节点/实例
REDISredis-rm.yml移除 Redis 集群/节点/实例
ETCDetcd.yml初始化 ETCD 集群,或添加新成员
ETCDetcd-rm.yml移除 ETCD 集群,或移除现有成员
MINIOminio.yml初始化 MinIO 集群
MINIOminio-rm.yml移除 MinIO 集群
DOCKERdocker.yml在节点上安装 Docker
DOCKERapp.yml使用 Docker Compose 安装应用
FERRETmongo.yml在节点上安装 Mongo/FerretDB

部署策略

install.yml 剧本会按照以下分组顺序协调各个专用剧本,完成完整部署:

  • infrainfra.yml(-l infra)
  • nodesnode.yml
  • etcdetcd.yml(-l etcd)
  • miniominio.yml(-l minio)
  • pgsqlpgsql.yml

循环依赖说明:NODE 和 INFRA 之间存在弱循环依赖:要将 NODE 注册到 INFRA,INFRA 必须已经存在;而 INFRA 模块又依赖 NODE 才能工作。 解决方法是先初始化 infra 节点,然后再添加其他节点。如果希望一次性完成所有部署,使用 install.yml 即可。


安全须知

大多数剧本都是幂等的,这意味着某些部署剧本在未开启保护选项的情况下,可能会擦除现有数据库并创建新数据库。 使用 pgsqlminioinfra 剧本时请特别小心。请仔细阅读文档,谨慎操作。

最佳实践

  1. 执行前仔细阅读剧本文档
  2. 发现异常时立即按 Ctrl-C 停止
  3. 先在非生产环境中进行测试
  4. 使用 -l 参数限定执行主机,避免影响非目标主机
  5. 使用 -t 参数指定特定标签,仅执行部分任务

预演模式

使用 --check --diff 选项可以预览将要进行的更改,而不实际执行:

# 预览将要进行的更改,但不实际执行
./pgsql.yml -l pg-test --check --diff

# 结合标签检查特定任务
./pgsql.yml -l pg-test -t pg_config --check --diff

5.3 - Nginx 管理

Nginx 管理,Web 门户配置,Web Server,暴露上游服务

Pigsty 在 INFRA 节点上安装 Nginx 作为所有 Web 服务的入口,默认监听在 80/443 标准端口上。

在 Pigsty 中,你可以通过修改配置清单,让 nginx 对外提供多种服务:

  • 对外暴露 Grafana、VictoriaMetrics(VMUI)、Alertmanager、VictoriaLogs 等监控组件的 Web 界面
  • 提供静态文件服务(如软件仓库、文档站,网站等)
  • 代理自定义的应用服务(如内部应用、数据库管理界面,Docker 应用的界面等)
  • 自动签发自签名的 HTTPS 证书,或者使用 certbot 申请免费的 Let’s Encrypt 证书
  • 通过不同的子域名,使用单一端口对外暴露服务

基础配置

您可以通过 infra_portal 参数定制 Nginx 的行为,

infra_portal:
  home: { domain: i.pigsty }

服务器参数

基本参数

参数说明
domain可选的代理域名
endpoint上游服务地址(IP:PORT 或 socket)
path静态内容的本地目录
scheme协议类型(http/https/tcp/udp)

SSL/TLS 选项

参数说明
certbot启用 Let’s Encrypt 证书管理
cert自定义证书文件路径
key自定义私钥文件路径

高级设置

参数说明
conf自定义 Nginx 配置模板
domains额外的域名列表
index启用目录列表
log自定义日志文件配置
websocket启用 WebSocket 支持

配置示例

静态文件与目录列表

repo: { domain: repo.pigsty.cc, path: "/www/repo", index: true }

自定义 SSL 证书

secure_app: {
  domain: secure.pigsty.cc,
  endpoint: "${admin_ip}:8443",
  cert: "/etc/ssl/certs/custom.crt",
  key: "/etc/ssl/private/custom.key"
}

TCP 流代理

pg_primary: { domain: pg.pigsty.cc, endpoint: "10.10.10.11:5432", scheme: tcp }

管理命令

./infra.yml -t nginx           # 完整重新配置 Nginx
./infra.yml -t nginx_config    # 重新生成配置文件
./infra.yml -t nginx_launch    # 重启 Nginx 服务
./infra.yml -t nginx_cert      # 重新生成 SSL 证书

域名解析

有三种方式将域名解析到 Pigsty 服务器:

  1. 公网域名:通过 DNS 服务商配置
  2. 内网 DNS 服务器:配置内部 DNS 解析
  3. 本地 hosts 文件:修改 /etc/hosts

本地开发时,在 /etc/hosts 中添加:

<your_public_ip_address> i.pigsty g.pigsty p.pigsty a.pigsty

HTTPS 配置

通过 nginx_sslmode 参数配置 HTTPS:

模式说明
disable仅监听 HTTP(nginx_port
enable同时监听 HTTPS(nginx_ssl_port),默认签发自签名证书
enforce强制跳转到 HTTPS,所有 80 端口请求都会 301 重定向

对于自签名证书,有以下几种访问方式:

  • 在浏览器中信任自签名 CA
  • 使用浏览器安全绕过(Chrome 中输入 “thisisunsafe”)
  • 为生产环境配置正规 CA 签发的证书

最佳实践

  • 使用域名而非 IP:PORT 访问
  • 正确配置 DNS 解析或 hosts 文件
  • 为实时服务启用 WebSocket
  • 生产环境部署 HTTPS
  • 使用有意义的子域名组织服务
  • 监控证书过期时间
  • 集中化代理管理
  • 利用静态文件服务托管文档

5.4 - 软件仓库

管理本地 APT/YUM 软件仓库

Pigsty 支持创建和管理本地 APT/YUM 软件仓库,用于在离线环境中部署或加速软件包安装。


快速开始

向本地仓库添加软件包:

  1. 将软件包添加到 repo_packages(默认软件包)
  2. 将软件包添加到 repo_extra_packages(额外软件包)
  3. 执行构建命令:
./infra.yml -t repo_build   # 从上游构建本地仓库
./node.yml -t node_repo     # 刷新节点仓库缓存

软件包别名

Pigsty 预定义了常用的软件包组合,方便批量安装:

EL 系统(RHEL/CentOS/Rocky)

别名说明
node-bootstrapAnsible、Python3 工具、SSH 相关
infra-packageNginx、etcd、HAProxy、监控导出器、MinIO 等
pgsql-utilityPatroni、pgBouncer、pgBackRest、PG 工具
pgsql完整 PostgreSQL(服务端、客户端、扩展)
pgsql-mini最小化 PostgreSQL 安装

Debian/Ubuntu 系统

别名说明
node-bootstrapAnsible、开发工具
infra-package基础设施组件(使用 Debian 命名规范)
pgsql-clientPostgreSQL 客户端
pgsql-serverPostgreSQL 服务端及相关包

剧本任务

主要任务

任务说明
repo从互联网或离线包创建本地仓库
repo_build如不存在则从上游构建
repo_upstream添加上游仓库文件
repo_pkg下载软件包及依赖
repo_create创建/更新 YUM 或 APT 仓库
repo_nginx启动 Nginx 文件服务器

完整任务列表

./infra.yml -t repo_dir          # 创建本地软件仓库目录
./infra.yml -t repo_check        # 检查本地仓库是否存在
./infra.yml -t repo_prepare      # 直接使用已有仓库
./infra.yml -t repo_build        # 从上游构建仓库
./infra.yml -t repo_upstream     # 添加上游仓库
./infra.yml -t repo_remove       # 删除现有仓库文件
./infra.yml -t repo_add          # 添加仓库到系统目录
./infra.yml -t repo_url_pkg      # 从互联网下载包
./infra.yml -t repo_cache        # 创建元数据缓存
./infra.yml -t repo_boot_pkg     # 安装引导包
./infra.yml -t repo_pkg          # 下载包及依赖
./infra.yml -t repo_create       # 创建本地仓库
./infra.yml -t repo_use          # 添加新建仓库到系统
./infra.yml -t repo_nginx        # 启动 Nginx 文件服务器

常用操作

添加新软件包

# 1. 配置上游仓库
./infra.yml -t repo_upstream

# 2. 下载软件包及依赖
./infra.yml -t repo_pkg

# 3. 构建本地仓库元数据
./infra.yml -t repo_create

刷新节点仓库

./node.yml -t node_repo    # 刷新所有节点的仓库缓存

完整重建仓库

./infra.yml -t repo        # 从互联网或离线包创建仓库

5.5 - 模块管理

Infra 模块本身的管理 SOP:定义,创建,销毁,扩容,缩容

本文介绍 INFRA 模块的日常管理操作,包括安装、卸载、扩容、以及各组件的管理维护。


安装 Infra 模块

使用 infra.yml 剧本在 infra 分组上安装 INFRA 模块:

./infra.yml     # 在 infra 分组上安装 INFRA 模块

卸载 Infra 模块

使用 infra-rm.yml 剧本从 infra 分组上卸载 INFRA 模块:

./infra-rm.yml  # 从 infra 分组上卸载 INFRA 模块

扩容 Infra 模块

在配置清单中为新节点分配 infra_seq 并加入 infra 分组:

all:
  children:
    infra:
      hosts:
        10.10.10.10: { infra_seq: 1 }  # 原有节点
        10.10.10.11: { infra_seq: 2 }  # 新节点

使用限制选项 -l 仅在新节点上执行剧本:

./infra.yml -l 10.10.10.11    # 在新节点上安装 INFRA 模块

管理本地软件仓库

本地软件仓库相关的管理任务:

./infra.yml -t repo              # 从互联网或离线包创建仓库
./infra.yml -t repo_upstream     # 添加上游仓库
./infra.yml -t repo_pkg          # 下载包及依赖
./infra.yml -t repo_create       # 创建本地 yum/apt 仓库

完整子任务列表:

./infra.yml -t repo_dir          # 创建本地软件仓库
./infra.yml -t repo_check        # 检查本地软件仓库是否存在
./infra.yml -t repo_prepare      # 直接使用已有仓库
./infra.yml -t repo_build        # 从上游构建仓库
./infra.yml -t repo_upstream     # 添加上游仓库
./infra.yml -t repo_remove       # 删除现有仓库文件
./infra.yml -t repo_add          # 添加仓库到系统目录
./infra.yml -t repo_url_pkg      # 从互联网下载包
./infra.yml -t repo_cache        # 创建元数据缓存
./infra.yml -t repo_boot_pkg     # 安装引导包
./infra.yml -t repo_pkg          # 下载包及依赖
./infra.yml -t repo_create       # 创建本地仓库
./infra.yml -t repo_use          # 添加新建仓库到系统
./infra.yml -t repo_nginx        # 启动 nginx 文件服务器

管理 Nginx

Nginx 相关的管理任务:

./infra.yml -t nginx                       # 重置 Nginx 组件
./infra.yml -t nginx_index                 # 重新渲染首页
./infra.yml -t nginx_config,nginx_reload   # 重新渲染配置并重载

申请 HTTPS 证书:

./infra.yml -t nginx_certbot,nginx_reload -e certbot_sign=true

管理基础设施组件

基础设施各组件的管理命令:

./infra.yml -t infra           # 配置基础设施
./infra.yml -t infra_env       # 配置环境变量
./infra.yml -t infra_pkg       # 安装软件包
./infra.yml -t infra_user      # 设置操作系统用户
./infra.yml -t infra_cert      # 颁发证书
./infra.yml -t dns             # 配置 DNSMasq
./infra.yml -t nginx           # 配置 Nginx
./infra.yml -t victoria        # 配置 VictoriaMetrics/Logs/Traces
./infra.yml -t alertmanager    # 配置 AlertManager
./infra.yml -t blackbox        # 配置 Blackbox Exporter
./infra.yml -t grafana         # 配置 Grafana
./infra.yml -t infra_register  # 注册到 VictoriaMetrics/Grafana

常用维护命令:

./infra.yml -t nginx_index                        # 重新渲染首页
./infra.yml -t nginx_config,nginx_reload          # 重新配置并重载
./infra.yml -t vmetrics_config,vmetrics_launch    # 重新生成 VictoriaMetrics 配置并重启
./infra.yml -t vlogs_config,vlogs_launch          # 更新 VictoriaLogs 配置
./infra.yml -t grafana_plugin                     # 下载 Grafana 插件

5.6 - 域名管理

配置本地或公网域名访问 Pigsty 服务

使用域名代替 IP 地址访问 Pigsty 的各项 Web 服务。


快速开始

将以下静态解析记录添加到 /etc/hosts

10.10.10.10 i.pigsty g.pigsty p.pigsty a.pigsty

将 IP 地址替换为实际 Pigsty 节点的 IP。


为什么使用域名

  • 比 IP 地址更易于记忆
  • 灵活指向不同 IP
  • 通过 Nginx 统一管理服务
  • 支持 HTTPS 加密
  • 防止某些地区的 ISP 劫持
  • 允许通过代理访问内部绑定的服务

DNS 机制

DNS 协议:将域名解析为 IP 地址。多个域名可以指向同一个 IP。

HTTP 协议:使用 Host 头将请求路由到同一端口(80/443)上的不同站点。


默认域名

Pigsty 预定义了以下默认域名:

域名服务端口用途
i.pigstyNginx80/443默认首页、本地仓库与统一入口
g.pigstyGrafana3000监控与可视化
p.pigstyVictoriaMetrics8428VMUI/PromQL 入口
a.pigstyAlertManager9059告警路由

解析方式

本地静态解析

在客户端机器的 /etc/hosts 中添加条目:

# Linux/macOS
sudo vim /etc/hosts

# Windows
notepad C:\Windows\System32\drivers\etc\hosts

添加内容:

10.10.10.10 i.pigsty g.pigsty p.pigsty a.pigsty

内网动态解析

在内部 DNS 服务器上配置这些域名记录。

公网域名

购买域名并添加 DNS A 记录指向公网 IP。


HTTPS 证书

Pigsty 默认使用自签名证书。可选方案包括:

  • 忽略警告,使用 HTTP
  • 信任自签名 CA 证书
  • 使用真实 CA 或通过 Certbot 获取免费公网域名证书

详见 CA 与证书 文档。


扩展域名

Pigsty 扩展预留了以下域名:

域名用途
adm.pigsty管理界面
ddl.pigstyDDL 管理
cli.pigsty命令行界面
api.pigstyAPI 服务
lab.pigsty实验环境
git.pigstyGit 服务
wiki.pigstyWiki 文档
noco.pigstyNocoDB
supa.pigstySupabase
dify.pigstyDify AI
odoo.pigstyOdoo ERP
mm.pigstyMattermost

5.7 - CA 与证书

使用自签名或真实 HTTPS 证书

本文介绍如何使用 Certbot 和 Let’s Encrypt 为 Pigsty 获取和管理 HTTPS 证书。

前置条件

  • 拥有公网域名
  • DNS 记录指向服务器的公网 IP
  • Nginx 已正确配置

配置步骤

第一步:域名配置

infra_portal 中配置需要证书的服务域名(Grafana、VictoriaMetrics、AlertManager、MinIO 等):

infra_portal:
  home: { domain: pigsty.cc }
  grafana: { domain: g.pigsty.cc, endpoint: "${admin_ip}:3000", websocket: true }
  prometheus: { domain: p.pigsty.cc, endpoint: "${admin_ip}:8428" }
  alertmanager: { domain: a.pigsty.cc, endpoint: "${admin_ip}:9059" }

第二步:DNS 配置

通过 A 记录将所有域名指向服务器的公网 IP。使用 nslookupdig 验证:

nslookup g.pigsty.cc
dig +short g.pigsty.cc

第三步:申请证书

交互式方式:

certbot --nginx -d pigsty.cc -d g.pigsty.cc -d p.pigsty.cc -d a.pigsty.cc

非交互式方式:

certbot --nginx --agree-tos --email admin@pigsty.cc -n \
  -d pigsty.cc -d g.pigsty.cc -d p.pigsty.cc -d a.pigsty.cc

第四步:Nginx 配置

在 portal 条目中添加 certbot: true 参数,然后重新生成配置:

./infra.yml -t nginx_config,nginx_launch

第五步:自动续期

测试续期(预演模式):

certbot renew --dry-run

设置 cron 定时任务(每月 1 日凌晨 2 点):

0 2 1 * * certbot renew --quiet

或启用 systemd 定时器:

systemctl enable certbot.timer

管理命令

命令说明
certbot certificates列出所有证书
certbot renew --cert-name domain.com续期指定证书
certbot delete --cert-name domain.com删除证书
certbot revoke --cert-path /path/to/cert.pem吊销证书

故障排查

问题解决方案
域名无法访问验证 DNS 传播是否完成
端口 80 被阻止确保验证时端口 80 开放
请求频率限制避免短时间内多次申请证书
防火墙问题开放 80 和 443 端口

最佳实践

  • 使用通配符证书管理子域名
  • 设置证书过期告警监控
  • 定期测试续期流程
  • 备份证书文件
  • 记录域名配置文档

6 - 监控告警

如何在 Pigsty 中对基础设施进行自监控?

本文介绍 Pigsty 中 INFRA 模块的监控面板与告警规则。


监控面板

Pigsty 针对 Infra 模块提供了以下监控面板:

面板描述
Pigsty HomePigsty 监控系统主页
INFRA OverviewPigsty 基础设施自监控概览
Nginx InstanceNginx 监控指标与日志
Grafana InstanceGrafana 监控指标与日志
VictoriaMetrics InstanceVictoriaMetrics 抓取/查询状态
VMAlert Instance告警规则执行情况
Alertmanager Instance告警聚合与通知
VictoriaLogs Instance日志写入、查询与索引
Logs Instance查阅单个节点上的日志信息
VictoriaTraces InstanceTrace 存储与查询
Inventory CMDBCMDB 可视化
ETCD Overviewetcd 集群监控

告警规则

Pigsty 针对 INFRA 模块提供了以下两条告警规则:

告警规则描述
InfraDown基础设施组件出现宕机
AgentDown监控 Agent 代理出现宕机

可在 files/victoria/rules/infra.yml 中修改或添加新的基础设施告警规则。

告警规则配置

################################################################
#                Infrastructure Alert Rules                    #
################################################################
- name: infra-alert
  rules:

    #==============================================================#
    #                       Infra Aliveness                        #
    #==============================================================#
    # infra components (victoria,grafana) down for 1m triggers a P1 alert
    - alert: InfraDown
      expr: infra_up < 1
      for: 1m
      labels: { level: 0, severity: CRIT, category: infra }
      annotations:
        summary: "CRIT InfraDown {{ $labels.type }}@{{ $labels.instance }}"
        description: |
          infra_up[type={{ $labels.type }}, instance={{ $labels.instance }}] = {{ $value  | printf "%.2f" }} < 1

    #==============================================================#
    #                       Agent Aliveness                        #
    #==============================================================#

    # agent aliveness are determined directly by exporter aliveness
    # including: node_exporter, pg_exporter, pgbouncer_exporter, haproxy_exporter
    - alert: AgentDown
      expr: agent_up < 1
      for: 1m
      labels: { level: 0, severity: CRIT, category: infra }
      annotations:
        summary: 'CRIT AgentDown {{ $labels.ins }}@{{ $labels.instance }}'
        description: |
          agent_up[ins={{ $labels.ins }}, instance={{ $labels.instance }}] = {{ $value  | printf "%.2f" }} < 1

7 - 指标列表

Pigsty INFRA 模块提供的完整监控指标列表与释义

注意:Pigsty v4.0 已将 Prometheus/Loki 替换为 VictoriaMetrics/Logs/Traces。以下指标清单仍基于 v3.x 生成,仅供排查旧版本问题参考。若需获取最新指标,请在 https://p.pigsty (VMUI) 或 Grafana 中直接查询,后续版本会重新生成与 Victoria 套件一致的指标速查表。

INFRA 指标

INFRA 模块包含有 964 类可用监控指标。

Metric NameTypeLabelsDescription
alertmanager_alertsgaugeins, instance, ip, job, cls, stateHow many alerts by state.
alertmanager_alerts_invalid_totalcounterversion, ins, instance, ip, job, clsThe total number of received alerts that were invalid.
alertmanager_alerts_received_totalcounterversion, ins, instance, ip, status, job, clsThe total number of received alerts.
alertmanager_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which alertmanager was built, and the goos and goarch for the build.
alertmanager_cluster_alive_messages_totalcounterins, instance, ip, peer, job, clsTotal number of received alive messages.
alertmanager_cluster_enabledgaugeins, instance, ip, job, clsIndicates whether the clustering is enabled or not.
alertmanager_cluster_failed_peersgaugeins, instance, ip, job, clsNumber indicating the current number of failed peers in the cluster.
alertmanager_cluster_health_scoregaugeins, instance, ip, job, clsHealth score of the cluster. Lower values are better and zero means ’totally healthy’.
alertmanager_cluster_membersgaugeins, instance, ip, job, clsNumber indicating current number of members in cluster.
alertmanager_cluster_messages_pruned_totalcounterins, instance, ip, job, clsTotal number of cluster messages pruned.
alertmanager_cluster_messages_queuedgaugeins, instance, ip, job, clsNumber of cluster messages which are queued.
alertmanager_cluster_messages_received_size_totalcounterins, instance, ip, msg_type, job, clsTotal size of cluster messages received.
alertmanager_cluster_messages_received_totalcounterins, instance, ip, msg_type, job, clsTotal number of cluster messages received.
alertmanager_cluster_messages_sent_size_totalcounterins, instance, ip, msg_type, job, clsTotal size of cluster messages sent.
alertmanager_cluster_messages_sent_totalcounterins, instance, ip, msg_type, job, clsTotal number of cluster messages sent.
alertmanager_cluster_peer_infogaugeins, instance, ip, peer, job, clsA metric with a constant ‘1’ value labeled by peer name.
alertmanager_cluster_peers_joined_totalcounterins, instance, ip, job, clsA counter of the number of peers that have joined.
alertmanager_cluster_peers_left_totalcounterins, instance, ip, job, clsA counter of the number of peers that have left.
alertmanager_cluster_peers_update_totalcounterins, instance, ip, job, clsA counter of the number of peers that have updated metadata.
alertmanager_cluster_reconnections_failed_totalcounterins, instance, ip, job, clsA counter of the number of failed cluster peer reconnection attempts.
alertmanager_cluster_reconnections_totalcounterins, instance, ip, job, clsA counter of the number of cluster peer reconnections.
alertmanager_cluster_refresh_join_failed_totalcounterins, instance, ip, job, clsA counter of the number of failed cluster peer joined attempts via refresh.
alertmanager_cluster_refresh_join_totalcounterins, instance, ip, job, clsA counter of the number of cluster peer joined via refresh.
alertmanager_config_hashgaugeins, instance, ip, job, clsHash of the currently loaded alertmanager configuration.
alertmanager_config_last_reload_success_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the last successful configuration reload.
alertmanager_config_last_reload_successfulgaugeins, instance, ip, job, clsWhether the last configuration reload attempt was successful.
alertmanager_dispatcher_aggregation_groupsgaugeins, instance, ip, job, clsNumber of active aggregation groups
alertmanager_dispatcher_alert_processing_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_dispatcher_alert_processing_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_http_concurrency_limit_exceeded_totalcounterins, instance, method, ip, job, clsTotal number of times an HTTP request failed because the concurrency limit was reached.
alertmanager_http_request_duration_seconds_bucketUnknownins, instance, method, ip, le, job, cls, handlerN/A
alertmanager_http_request_duration_seconds_countUnknownins, instance, method, ip, job, cls, handlerN/A
alertmanager_http_request_duration_seconds_sumUnknownins, instance, method, ip, job, cls, handlerN/A
alertmanager_http_requests_in_flightgaugeins, instance, method, ip, job, clsCurrent number of HTTP requests being processed.
alertmanager_http_response_size_bytes_bucketUnknownins, instance, method, ip, le, job, cls, handlerN/A
alertmanager_http_response_size_bytes_countUnknownins, instance, method, ip, job, cls, handlerN/A
alertmanager_http_response_size_bytes_sumUnknownins, instance, method, ip, job, cls, handlerN/A
alertmanager_integrationsgaugeins, instance, ip, job, clsNumber of configured integrations.
alertmanager_marked_alertsgaugeins, instance, ip, job, cls, stateHow many alerts by state are currently marked in the Alertmanager regardless of their expiry.
alertmanager_nflog_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_gossip_messages_propagated_totalcounterins, instance, ip, job, clsNumber of received gossip messages that have been further gossiped.
alertmanager_nflog_maintenance_errors_totalcounterins, instance, ip, job, clsHow many maintenances were executed for the notification log that failed.
alertmanager_nflog_maintenance_totalcounterins, instance, ip, job, clsHow many maintenances were executed for the notification log.
alertmanager_nflog_queries_totalcounterins, instance, ip, job, clsNumber of notification log queries were received.
alertmanager_nflog_query_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
alertmanager_nflog_query_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_query_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_query_errors_totalcounterins, instance, ip, job, clsNumber notification log received queries that failed.
alertmanager_nflog_snapshot_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_snapshot_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_snapshot_size_bytesgaugeins, instance, ip, job, clsSize of the last notification log snapshot in bytes.
alertmanager_notification_latency_seconds_bucketUnknownintegration, ins, instance, ip, le, job, clsN/A
alertmanager_notification_latency_seconds_countUnknownintegration, ins, instance, ip, job, clsN/A
alertmanager_notification_latency_seconds_sumUnknownintegration, ins, instance, ip, job, clsN/A
alertmanager_notification_requests_failed_totalcounterintegration, ins, instance, ip, job, clsThe total number of failed notification requests.
alertmanager_notification_requests_totalcounterintegration, ins, instance, ip, job, clsThe total number of attempted notification requests.
alertmanager_notifications_failed_totalcounterintegration, ins, instance, ip, reason, job, clsThe total number of failed notifications.
alertmanager_notifications_totalcounterintegration, ins, instance, ip, job, clsThe total number of attempted notifications.
alertmanager_oversize_gossip_message_duration_seconds_bucketUnknownins, instance, ip, le, key, job, clsN/A
alertmanager_oversize_gossip_message_duration_seconds_countUnknownins, instance, ip, key, job, clsN/A
alertmanager_oversize_gossip_message_duration_seconds_sumUnknownins, instance, ip, key, job, clsN/A
alertmanager_oversized_gossip_message_dropped_totalcounterins, instance, ip, key, job, clsNumber of oversized gossip messages that were dropped due to a full message queue.
alertmanager_oversized_gossip_message_failure_totalcounterins, instance, ip, key, job, clsNumber of oversized gossip message sends that failed.
alertmanager_oversized_gossip_message_sent_totalcounterins, instance, ip, key, job, clsNumber of oversized gossip message sent.
alertmanager_peer_positiongaugeins, instance, ip, job, clsPosition the Alertmanager instance believes it’s in. The position determines a peer’s behavior in the cluster.
alertmanager_receiversgaugeins, instance, ip, job, clsNumber of configured receivers.
alertmanager_silencesgaugeins, instance, ip, job, cls, stateHow many silences by state.
alertmanager_silences_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_silences_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_silences_gossip_messages_propagated_totalcounterins, instance, ip, job, clsNumber of received gossip messages that have been further gossiped.
alertmanager_silences_maintenance_errors_totalcounterins, instance, ip, job, clsHow many maintenances were executed for silences that failed.
alertmanager_silences_maintenance_totalcounterins, instance, ip, job, clsHow many maintenances were executed for silences.
alertmanager_silences_queries_totalcounterins, instance, ip, job, clsHow many silence queries were received.
alertmanager_silences_query_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
alertmanager_silences_query_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_silences_query_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_silences_query_errors_totalcounterins, instance, ip, job, clsHow many silence received queries did not succeed.
alertmanager_silences_snapshot_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_silences_snapshot_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_silences_snapshot_size_bytesgaugeins, instance, ip, job, clsSize of the last silence snapshot in bytes.
blackbox_exporter_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which blackbox_exporter was built, and the goos and goarch for the build.
blackbox_exporter_config_last_reload_success_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the last successful configuration reload.
blackbox_exporter_config_last_reload_successfulgaugeins, instance, ip, job, clsBlackbox exporter config loaded successfully.
blackbox_module_unknown_totalcounterins, instance, ip, job, clsCount of unknown modules requested by probes
cortex_distributor_ingester_clientsgaugeins, instance, ip, job, clsThe current number of ingester clients.
cortex_dns_failures_totalUnknownins, instance, ip, job, clsN/A
cortex_dns_lookups_totalUnknownins, instance, ip, job, clsN/A
cortex_frontend_query_range_duration_seconds_bucketUnknownins, instance, method, ip, le, job, cls, status_codeN/A
cortex_frontend_query_range_duration_seconds_countUnknownins, instance, method, ip, job, cls, status_codeN/A
cortex_frontend_query_range_duration_seconds_sumUnknownins, instance, method, ip, job, cls, status_codeN/A
cortex_ingester_flush_queue_lengthgaugeins, instance, ip, job, clsThe total number of series pending in the flush queue.
cortex_kv_request_duration_seconds_bucketUnknownins, instance, role, ip, le, kv_name, type, operation, job, cls, status_codeN/A
cortex_kv_request_duration_seconds_countUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
cortex_kv_request_duration_seconds_sumUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
cortex_member_consul_heartbeats_totalUnknownins, instance, ip, job, clsN/A
cortex_prometheus_notifications_alertmanagers_discoveredgaugeins, instance, ip, user, job, clsThe number of alertmanagers discovered and active.
cortex_prometheus_notifications_dropped_totalUnknownins, instance, ip, user, job, clsN/A
cortex_prometheus_notifications_queue_capacitygaugeins, instance, ip, user, job, clsThe capacity of the alert notifications queue.
cortex_prometheus_notifications_queue_lengthgaugeins, instance, ip, user, job, clsThe number of alert notifications in the queue.
cortex_prometheus_rule_evaluation_duration_secondssummaryins, instance, ip, user, job, cls, quantileThe duration for a rule to execute.
cortex_prometheus_rule_evaluation_duration_seconds_countUnknownins, instance, ip, user, job, clsN/A
cortex_prometheus_rule_evaluation_duration_seconds_sumUnknownins, instance, ip, user, job, clsN/A
cortex_prometheus_rule_group_duration_secondssummaryins, instance, ip, user, job, cls, quantileThe duration of rule group evaluations.
cortex_prometheus_rule_group_duration_seconds_countUnknownins, instance, ip, user, job, clsN/A
cortex_prometheus_rule_group_duration_seconds_sumUnknownins, instance, ip, user, job, clsN/A
cortex_query_frontend_connected_schedulersgaugeins, instance, ip, job, clsNumber of schedulers this frontend is connected to.
cortex_query_frontend_queries_in_progressgaugeins, instance, ip, job, clsNumber of queries in progress handled by this frontend.
cortex_query_frontend_retries_bucketUnknownins, instance, ip, le, job, clsN/A
cortex_query_frontend_retries_countUnknownins, instance, ip, job, clsN/A
cortex_query_frontend_retries_sumUnknownins, instance, ip, job, clsN/A
cortex_query_scheduler_connected_frontend_clientsgaugeins, instance, ip, job, clsNumber of query-frontend worker clients currently connected to the query-scheduler.
cortex_query_scheduler_connected_querier_clientsgaugeins, instance, ip, job, clsNumber of querier worker clients currently connected to the query-scheduler.
cortex_query_scheduler_inflight_requestssummaryins, instance, ip, job, cls, quantileNumber of inflight requests (either queued or processing) sampled at a regular interval. Quantile buckets keep track of inflight requests over the last 60s.
cortex_query_scheduler_inflight_requests_countUnknownins, instance, ip, job, clsN/A
cortex_query_scheduler_inflight_requests_sumUnknownins, instance, ip, job, clsN/A
cortex_query_scheduler_queue_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
cortex_query_scheduler_queue_duration_seconds_countUnknownins, instance, ip, job, clsN/A
cortex_query_scheduler_queue_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
cortex_query_scheduler_queue_lengthUnknownins, instance, ip, user, job, clsN/A
cortex_query_scheduler_runninggaugeins, instance, ip, job, clsValue will be 1 if the scheduler is in the ReplicationSet and actively receiving/processing requests
cortex_ring_member_heartbeats_totalUnknownins, instance, ip, job, clsN/A
cortex_ring_member_tokens_ownedgaugeins, instance, ip, job, clsThe number of tokens owned in the ring.
cortex_ring_member_tokens_to_owngaugeins, instance, ip, job, clsThe number of tokens to own in the ring.
cortex_ring_membersgaugeins, instance, ip, job, cls, stateNumber of members in the ring
cortex_ring_oldest_member_timestampgaugeins, instance, ip, job, cls, stateTimestamp of the oldest member in the ring.
cortex_ring_tokens_totalgaugeins, instance, ip, job, clsNumber of tokens in the ring
cortex_ruler_clientsgaugeins, instance, ip, job, clsThe current number of ruler clients in the pool.
cortex_ruler_config_last_reload_successfulgaugeins, instance, ip, user, job, clsBoolean set to 1 whenever the last configuration reload attempt was successful.
cortex_ruler_config_last_reload_successful_secondsgaugeins, instance, ip, user, job, clsTimestamp of the last successful configuration reload.
cortex_ruler_config_updates_totalUnknownins, instance, ip, user, job, clsN/A
cortex_ruler_managers_totalgaugeins, instance, ip, job, clsTotal number of managers registered and running in the ruler
cortex_ruler_ring_check_errors_totalUnknownins, instance, ip, job, clsN/A
cortex_ruler_sync_rules_totalUnknownins, instance, ip, reason, job, clsN/A
deprecated_flags_inuse_totalUnknownins, instance, ip, job, clsN/A
go_cgo_go_to_c_calls_calls_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_gc_mark_assist_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_gc_mark_dedicated_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_gc_mark_idle_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_gc_pause_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_gc_total_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_idle_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_scavenge_assist_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_scavenge_background_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_scavenge_total_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_total_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_user_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_gc_cycles_automatic_gc_cycles_totalUnknownins, instance, ip, job, clsN/A
go_gc_cycles_forced_gc_cycles_totalUnknownins, instance, ip, job, clsN/A
go_gc_cycles_total_gc_cycles_totalUnknownins, instance, ip, job, clsN/A
go_gc_duration_secondssummaryins, instance, ip, job, cls, quantileA summary of the pause duration of garbage collection cycles.
go_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
go_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
go_gc_gogc_percentgaugeins, instance, ip, job, clsHeap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function.
go_gc_gomemlimit_bytesgaugeins, instance, ip, job, clsGo runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function.
go_gc_heap_allocs_by_size_bytes_bucketUnknownins, instance, ip, le, job, clsN/A
go_gc_heap_allocs_by_size_bytes_countUnknownins, instance, ip, job, clsN/A
go_gc_heap_allocs_by_size_bytes_sumUnknownins, instance, ip, job, clsN/A
go_gc_heap_allocs_bytes_totalUnknownins, instance, ip, job, clsN/A
go_gc_heap_allocs_objects_totalUnknownins, instance, ip, job, clsN/A
go_gc_heap_frees_by_size_bytes_bucketUnknownins, instance, ip, le, job, clsN/A
go_gc_heap_frees_by_size_bytes_countUnknownins, instance, ip, job, clsN/A
go_gc_heap_frees_by_size_bytes_sumUnknownins, instance, ip, job, clsN/A
go_gc_heap_frees_bytes_totalUnknownins, instance, ip, job, clsN/A
go_gc_heap_frees_objects_totalUnknownins, instance, ip, job, clsN/A
go_gc_heap_goal_bytesgaugeins, instance, ip, job, clsHeap size target for the end of the GC cycle.
go_gc_heap_live_bytesgaugeins, instance, ip, job, clsHeap memory occupied by live objects that were marked by the previous GC.
go_gc_heap_objects_objectsgaugeins, instance, ip, job, clsNumber of objects, live or unswept, occupying heap memory.
go_gc_heap_tiny_allocs_objects_totalUnknownins, instance, ip, job, clsN/A
go_gc_limiter_last_enabled_gc_cyclegaugeins, instance, ip, job, clsGC cycle the last time the GC CPU limiter was enabled. This metric is useful for diagnosing the root cause of an out-of-memory error, because the limiter trades memory for CPU time when the GC’s CPU time gets too high. This is most likely to occur with use of SetMemoryLimit. The first GC cycle is cycle 1, so a value of 0 indicates that it was never enabled.
go_gc_pauses_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
go_gc_pauses_seconds_countUnknownins, instance, ip, job, clsN/A
go_gc_pauses_seconds_sumUnknownins, instance, ip, job, clsN/A
go_gc_scan_globals_bytesgaugeins, instance, ip, job, clsThe total amount of global variable space that is scannable.
go_gc_scan_heap_bytesgaugeins, instance, ip, job, clsThe total amount of heap space that is scannable.
go_gc_scan_stack_bytesgaugeins, instance, ip, job, clsThe number of bytes of stack that were scanned last GC cycle.
go_gc_scan_total_bytesgaugeins, instance, ip, job, clsThe total amount space that is scannable. Sum of all metrics in /gc/scan.
go_gc_stack_starting_size_bytesgaugeins, instance, ip, job, clsThe stack size of new goroutines.
go_godebug_non_default_behavior_execerrdot_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_gocachehash_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_gocachetest_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_gocacheverify_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_http2client_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_http2server_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_installgoroot_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_jstmpllitinterp_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_multipartmaxheaders_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_multipartmaxparts_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_multipathtcp_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_panicnil_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_randautoseed_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_tarinsecurepath_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_tlsmaxrsasize_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_x509sha1_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_x509usefallbackroots_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_zipinsecurepath_events_totalUnknownins, instance, ip, job, clsN/A
go_goroutinesgaugeins, instance, ip, job, clsNumber of goroutines that currently exist.
go_infogaugeversion, ins, instance, ip, job, clsInformation about the Go environment.
go_memory_classes_heap_free_bytesgaugeins, instance, ip, job, clsMemory that is completely free and eligible to be returned to the underlying system, but has not been. This metric is the runtime’s estimate of free address space that is backed by physical memory.
go_memory_classes_heap_objects_bytesgaugeins, instance, ip, job, clsMemory occupied by live objects and dead objects that have not yet been marked free by the garbage collector.
go_memory_classes_heap_released_bytesgaugeins, instance, ip, job, clsMemory that is completely free and has been returned to the underlying system. This metric is the runtime’s estimate of free address space that is still mapped into the process, but is not backed by physical memory.
go_memory_classes_heap_stacks_bytesgaugeins, instance, ip, job, clsMemory allocated from the heap that is reserved for stack space, whether or not it is currently in-use. Currently, this represents all stack memory for goroutines. It also includes all OS thread stacks in non-cgo programs. Note that stacks may be allocated differently in the future, and this may change.
go_memory_classes_heap_unused_bytesgaugeins, instance, ip, job, clsMemory that is reserved for heap objects but is not currently used to hold heap objects.
go_memory_classes_metadata_mcache_free_bytesgaugeins, instance, ip, job, clsMemory that is reserved for runtime mcache structures, but not in-use.
go_memory_classes_metadata_mcache_inuse_bytesgaugeins, instance, ip, job, clsMemory that is occupied by runtime mcache structures that are currently being used.
go_memory_classes_metadata_mspan_free_bytesgaugeins, instance, ip, job, clsMemory that is reserved for runtime mspan structures, but not in-use.
go_memory_classes_metadata_mspan_inuse_bytesgaugeins, instance, ip, job, clsMemory that is occupied by runtime mspan structures that are currently being used.
go_memory_classes_metadata_other_bytesgaugeins, instance, ip, job, clsMemory that is reserved for or used to hold runtime metadata.
go_memory_classes_os_stacks_bytesgaugeins, instance, ip, job, clsStack memory allocated by the underlying operating system. In non-cgo programs this metric is currently zero. This may change in the future.In cgo programs this metric includes OS thread stacks allocated directly from the OS. Currently, this only accounts for one stack in c-shared and c-archive build modes, and other sources of stacks from the OS are not measured. This too may change in the future.
go_memory_classes_other_bytesgaugeins, instance, ip, job, clsMemory used by execution trace buffers, structures for debugging the runtime, finalizer and profiler specials, and more.
go_memory_classes_profiling_buckets_bytesgaugeins, instance, ip, job, clsMemory that is used by the stack trace hash map used for profiling.
go_memory_classes_total_bytesgaugeins, instance, ip, job, clsAll memory mapped by the Go runtime into the current process as read-write. Note that this does not include memory mapped by code called via cgo or via the syscall package. Sum of all metrics in /memory/classes.
go_memstats_alloc_bytescounterins, instance, ip, job, clsTotal number of bytes allocated, even if freed.
go_memstats_alloc_bytes_totalcounterins, instance, ip, job, clsTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalcounterins, instance, ip, job, clsTotal number of frees.
go_memstats_gc_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesgaugeins, instance, ip, job, clsNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesgaugeins, instance, ip, job, clsNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesgaugeins, instance, ip, job, clsNumber of heap bytes that are in use.
go_memstats_heap_objectsgaugeins, instance, ip, job, clsNumber of allocated objects.
go_memstats_heap_released_bytesgaugeins, instance, ip, job, clsNumber of heap bytes released to OS.
go_memstats_heap_sys_bytesgaugeins, instance, ip, job, clsNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsgaugeins, instance, ip, job, clsNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalcounterins, instance, ip, job, clsTotal number of pointer lookups.
go_memstats_mallocs_totalcounterins, instance, ip, job, clsTotal number of mallocs.
go_memstats_mcache_inuse_bytesgaugeins, instance, ip, job, clsNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesgaugeins, instance, ip, job, clsNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesgaugeins, instance, ip, job, clsNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesgaugeins, instance, ip, job, clsNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes obtained from system.
go_sched_gomaxprocs_threadsgaugeins, instance, ip, job, clsThe current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously.
go_sched_goroutines_goroutinesgaugeins, instance, ip, job, clsCount of live goroutines.
go_sched_latencies_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
go_sched_latencies_seconds_countUnknownins, instance, ip, job, clsN/A
go_sched_latencies_seconds_sumUnknownins, instance, ip, job, clsN/A
go_sql_stats_connections_blocked_secondsunknownins, instance, db_name, ip, job, clsThe total time blocked waiting for a new connection.
go_sql_stats_connections_closed_max_idleunknownins, instance, db_name, ip, job, clsThe total number of connections closed due to SetMaxIdleConns.
go_sql_stats_connections_closed_max_idle_timeunknownins, instance, db_name, ip, job, clsThe total number of connections closed due to SetConnMaxIdleTime.
go_sql_stats_connections_closed_max_lifetimeunknownins, instance, db_name, ip, job, clsThe total number of connections closed due to SetConnMaxLifetime.
go_sql_stats_connections_idlegaugeins, instance, db_name, ip, job, clsThe number of idle connections.
go_sql_stats_connections_in_usegaugeins, instance, db_name, ip, job, clsThe number of connections currently in use.
go_sql_stats_connections_max_opengaugeins, instance, db_name, ip, job, clsMaximum number of open connections to the database.
go_sql_stats_connections_opengaugeins, instance, db_name, ip, job, clsThe number of established connections both in use and idle.
go_sql_stats_connections_waited_forunknownins, instance, db_name, ip, job, clsThe total number of connections waited for.
go_sync_mutex_wait_total_seconds_totalUnknownins, instance, ip, job, clsN/A
go_threadsgaugeins, instance, ip, job, clsNumber of OS threads created.
grafana_access_evaluation_countunknownins, instance, ip, job, clsnumber of evaluation calls
grafana_access_evaluation_duration_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_access_evaluation_duration_countUnknownins, instance, ip, job, clsN/A
grafana_access_evaluation_duration_sumUnknownins, instance, ip, job, clsN/A
grafana_access_permissions_duration_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_access_permissions_duration_countUnknownins, instance, ip, job, clsN/A
grafana_access_permissions_duration_sumUnknownins, instance, ip, job, clsN/A
grafana_aggregator_discovery_aggregation_count_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_active_alertsgaugeins, instance, ip, job, clsamount of active alerts
grafana_alerting_active_configurationsgaugeins, instance, ip, job, clsThe number of active Alertmanager configurations.
grafana_alerting_alertmanager_config_matchgaugeins, instance, ip, job, clsThe total number of match
grafana_alerting_alertmanager_config_match_regaugeins, instance, ip, job, clsThe total number of matchRE
grafana_alerting_alertmanager_config_matchersgaugeins, instance, ip, job, clsThe total number of matchers
grafana_alerting_alertmanager_config_object_matchersgaugeins, instance, ip, job, clsThe total number of object_matchers
grafana_alerting_discovered_configurationsgaugeins, instance, ip, job, clsThe number of organizations we’ve discovered that require an Alertmanager configuration.
grafana_alerting_dispatcher_aggregation_groupsgaugeins, instance, ip, job, clsNumber of active aggregation groups
grafana_alerting_dispatcher_alert_processing_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_dispatcher_alert_processing_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_execution_time_millisecondssummaryins, instance, ip, job, cls, quantilesummary of alert execution duration
grafana_alerting_execution_time_milliseconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_execution_time_milliseconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_gossip_messages_propagated_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_queries_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_query_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_nflog_query_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_query_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_query_errors_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_snapshot_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_snapshot_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_snapshot_size_bytesgaugeins, instance, ip, job, clsSize of the last notification log snapshot in bytes.
grafana_alerting_notification_latency_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_notification_latency_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_notification_latency_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_schedule_alert_rulesgaugeins, instance, ip, job, clsThe number of alert rules that could be considered for evaluation at the next tick.
grafana_alerting_schedule_alert_rules_hashgaugeins, instance, ip, job, clsA hash of the alert rules that could be considered for evaluation at the next tick.
grafana_alerting_schedule_periodic_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_schedule_periodic_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_schedule_periodic_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_schedule_query_alert_rules_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_schedule_query_alert_rules_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_schedule_query_alert_rules_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_scheduler_behind_secondsgaugeins, instance, ip, job, clsThe total number of seconds the scheduler is behind.
grafana_alerting_silences_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_gossip_messages_propagated_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_queries_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_query_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_silences_query_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_query_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_query_errors_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_snapshot_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_snapshot_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_snapshot_size_bytesgaugeins, instance, ip, job, clsSize of the last silence snapshot in bytes.
grafana_alerting_state_calculation_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_state_calculation_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_state_calculation_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_state_history_writes_bytes_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_ticker_interval_secondsgaugeins, instance, ip, job, clsInterval at which the ticker is meant to tick.
grafana_alerting_ticker_last_consumed_tick_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the last consumed tick in seconds.
grafana_alerting_ticker_next_tick_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the next tick in seconds before it is consumed.
grafana_api_admin_user_created_totalUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_get_millisecondssummaryins, instance, ip, job, cls, quantilesummary for dashboard get duration
grafana_api_dashboard_get_milliseconds_countUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_get_milliseconds_sumUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_save_millisecondssummaryins, instance, ip, job, cls, quantilesummary for dashboard save duration
grafana_api_dashboard_save_milliseconds_countUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_save_milliseconds_sumUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_search_millisecondssummaryins, instance, ip, job, cls, quantilesummary for dashboard search duration
grafana_api_dashboard_search_milliseconds_countUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_search_milliseconds_sumUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_snapshot_create_totalUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_snapshot_external_totalUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_snapshot_get_totalUnknownins, instance, ip, job, clsN/A
grafana_api_dataproxy_request_all_millisecondssummaryins, instance, ip, job, cls, quantilesummary for dataproxy request duration
grafana_api_dataproxy_request_all_milliseconds_countUnknownins, instance, ip, job, clsN/A
grafana_api_dataproxy_request_all_milliseconds_sumUnknownins, instance, ip, job, clsN/A
grafana_api_login_oauth_totalUnknownins, instance, ip, job, clsN/A
grafana_api_login_post_totalUnknownins, instance, ip, job, clsN/A
grafana_api_login_saml_totalUnknownins, instance, ip, job, clsN/A
grafana_api_models_dashboard_insert_totalUnknownins, instance, ip, job, clsN/A
grafana_api_org_create_totalUnknownins, instance, ip, job, clsN/A
grafana_api_response_status_totalUnknownins, instance, ip, job, cls, codeN/A
grafana_api_user_signup_completed_totalUnknownins, instance, ip, job, clsN/A
grafana_api_user_signup_invite_totalUnknownins, instance, ip, job, clsN/A
grafana_api_user_signup_started_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_audit_event_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_audit_requests_rejected_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_client_certificate_expiration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_apiserver_client_certificate_expiration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_apiserver_client_certificate_expiration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_apiserver_envelope_encryption_dek_cache_fill_percentgaugeins, instance, ip, job, cls[ALPHA] Percent of the cache slots currently occupied by cached DEKs.
grafana_apiserver_flowcontrol_seat_fair_fracgaugeins, instance, ip, job, cls[ALPHA] Fair fraction of server’s concurrency to allocate to each priority level that can use it
grafana_apiserver_storage_data_key_generation_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_apiserver_storage_data_key_generation_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_apiserver_storage_data_key_generation_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_apiserver_storage_data_key_generation_failures_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_storage_envelope_transformation_cache_misses_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_tls_handshake_errors_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_webhooks_x509_insecure_sha1_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_webhooks_x509_missing_san_totalUnknownins, instance, ip, job, clsN/A
grafana_authn_authn_failed_authentication_totalUnknownins, instance, ip, job, clsN/A
grafana_authn_authn_successful_authentication_totalUnknownins, instance, ip, client, job, clsN/A
grafana_authn_authn_successful_login_totalUnknownins, instance, ip, client, job, clsN/A
grafana_aws_cloudwatch_get_metric_data_totalUnknownins, instance, ip, job, clsN/A
grafana_aws_cloudwatch_get_metric_statistics_totalUnknownins, instance, ip, job, clsN/A
grafana_aws_cloudwatch_list_metrics_totalUnknownins, instance, ip, job, clsN/A
grafana_build_infogaugerevision, version, ins, instance, edition, ip, goversion, job, cls, branchA metric with a constant ‘1’ value labeled by version, revision, branch, and goversion from which Grafana was built
grafana_build_timestampgaugerevision, version, ins, instance, edition, ip, goversion, job, cls, branchA metric exposing when the binary was built in epoch
grafana_cardinality_enforcement_unexpected_categorizations_totalUnknownins, instance, ip, job, clsN/A
grafana_database_conn_idlegaugeins, instance, ip, job, clsThe number of idle connections
grafana_database_conn_in_usegaugeins, instance, ip, job, clsThe number of connections currently in use
grafana_database_conn_max_idle_closed_secondsunknownins, instance, ip, job, clsThe total number of connections closed due to SetConnMaxIdleTime
grafana_database_conn_max_idle_closed_totalUnknownins, instance, ip, job, clsN/A
grafana_database_conn_max_lifetime_closed_totalUnknownins, instance, ip, job, clsN/A
grafana_database_conn_max_opengaugeins, instance, ip, job, clsMaximum number of open connections to the database
grafana_database_conn_opengaugeins, instance, ip, job, clsThe number of established connections both in use and idle
grafana_database_conn_wait_count_totalUnknownins, instance, ip, job, clsN/A
grafana_database_conn_wait_duration_secondsunknownins, instance, ip, job, clsThe total time blocked waiting for a new connection
grafana_datasource_request_duration_seconds_bucketUnknowndatasource, ins, instance, method, ip, le, datasource_type, job, cls, codeN/A
grafana_datasource_request_duration_seconds_countUnknowndatasource, ins, instance, method, ip, datasource_type, job, cls, codeN/A
grafana_datasource_request_duration_seconds_sumUnknowndatasource, ins, instance, method, ip, datasource_type, job, cls, codeN/A
grafana_datasource_request_in_flightgaugedatasource, ins, instance, ip, datasource_type, job, clsA gauge of outgoing data source requests currently being sent by Grafana
grafana_datasource_request_totalUnknowndatasource, ins, instance, method, ip, datasource_type, job, cls, codeN/A
grafana_datasource_response_size_bytes_bucketUnknowndatasource, ins, instance, ip, le, datasource_type, job, clsN/A
grafana_datasource_response_size_bytes_countUnknowndatasource, ins, instance, ip, datasource_type, job, clsN/A
grafana_datasource_response_size_bytes_sumUnknowndatasource, ins, instance, ip, datasource_type, job, clsN/A
grafana_db_datasource_query_by_id_totalUnknownins, instance, ip, job, clsN/A
grafana_disabled_metrics_totalUnknownins, instance, ip, job, clsN/A
grafana_emails_sent_failedunknownins, instance, ip, job, clsNumber of emails Grafana failed to send
grafana_emails_sent_totalUnknownins, instance, ip, job, clsN/A
grafana_encryption_cache_reads_totalUnknownins, instance, method, ip, hit, job, clsN/A
grafana_encryption_ops_totalUnknownins, instance, ip, success, operation, job, clsN/A
grafana_environment_infogaugeversion, ins, instance, ip, job, cls, commitA metric with a constant ‘1’ value labeled by environment information about the running instance.
grafana_feature_toggles_infogaugeins, instance, ip, job, clsinfo metric that exposes what feature toggles are enabled or not
grafana_frontend_boot_css_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_boot_css_time_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_css_time_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_first_contentful_paint_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_boot_first_contentful_paint_time_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_first_contentful_paint_time_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_first_paint_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_boot_first_paint_time_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_first_paint_time_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_js_done_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_boot_js_done_time_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_js_done_time_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_load_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_boot_load_time_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_load_time_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_frontend_plugins_preload_ms_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_plugins_preload_ms_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_plugins_preload_ms_sumUnknownins, instance, ip, job, clsN/A
grafana_hidden_metrics_totalUnknownins, instance, ip, job, clsN/A
grafana_http_request_duration_seconds_bucketUnknownins, instance, method, ip, le, job, cls, status_code, handlerN/A
grafana_http_request_duration_seconds_countUnknownins, instance, method, ip, job, cls, status_code, handlerN/A
grafana_http_request_duration_seconds_sumUnknownins, instance, method, ip, job, cls, status_code, handlerN/A
grafana_http_request_in_flightgaugeins, instance, ip, job, clsA gauge of requests currently being served by Grafana.
grafana_idforwarding_idforwarding_failed_token_signing_totalUnknownins, instance, ip, job, clsN/A
grafana_idforwarding_idforwarding_token_signing_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_idforwarding_idforwarding_token_signing_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_idforwarding_idforwarding_token_signing_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_idforwarding_idforwarding_token_signing_from_cache_totalUnknownins, instance, ip, job, clsN/A
grafana_idforwarding_idforwarding_token_signing_totalUnknownins, instance, ip, job, clsN/A
grafana_instance_start_totalUnknownins, instance, ip, job, clsN/A
grafana_ldap_users_sync_execution_timesummaryins, instance, ip, job, cls, quantilesummary for LDAP users sync execution duration
grafana_ldap_users_sync_execution_time_countUnknownins, instance, ip, job, clsN/A
grafana_ldap_users_sync_execution_time_sumUnknownins, instance, ip, job, clsN/A
grafana_live_client_command_duration_secondssummaryins, instance, method, ip, job, cls, quantileClient command duration summary.
grafana_live_client_command_duration_seconds_countUnknownins, instance, method, ip, job, clsN/A
grafana_live_client_command_duration_seconds_sumUnknownins, instance, method, ip, job, clsN/A
grafana_live_client_num_reply_errorsunknownins, instance, method, ip, job, cls, codeNumber of errors in replies sent to clients.
grafana_live_client_num_server_disconnectsunknownins, instance, ip, job, cls, codeNumber of server initiated disconnects.
grafana_live_client_recoverunknownins, instance, ip, recovered, job, clsCount of recover operations.
grafana_live_node_action_countunknownaction, ins, instance, ip, job, clsNumber of node actions called.
grafana_live_node_buildgaugeversion, ins, instance, ip, job, clsNode build info.
grafana_live_node_messages_received_countunknownins, instance, ip, type, job, clsNumber of messages received.
grafana_live_node_messages_sent_countunknownins, instance, ip, type, job, clsNumber of messages sent.
grafana_live_node_num_channelsgaugeins, instance, ip, job, clsNumber of channels with one or more subscribers.
grafana_live_node_num_clientsgaugeins, instance, ip, job, clsNumber of clients connected.
grafana_live_node_num_nodesgaugeins, instance, ip, job, clsNumber of nodes in cluster.
grafana_live_node_num_subscriptionsgaugeins, instance, ip, job, clsNumber of subscriptions.
grafana_live_node_num_usersgaugeins, instance, ip, job, clsNumber of unique users connected.
grafana_live_transport_connect_countunknownins, instance, ip, transport, job, clsNumber of connections to specific transport.
grafana_live_transport_messages_sentunknownins, instance, ip, transport, job, clsNumber of messages sent over specific transport.
grafana_loki_plugin_parse_response_duration_seconds_bucketUnknownendpoint, ins, instance, ip, le, status, job, clsN/A
grafana_loki_plugin_parse_response_duration_seconds_countUnknownendpoint, ins, instance, ip, status, job, clsN/A
grafana_loki_plugin_parse_response_duration_seconds_sumUnknownendpoint, ins, instance, ip, status, job, clsN/A
grafana_page_response_status_totalUnknownins, instance, ip, job, cls, codeN/A
grafana_plugin_build_infogaugeversion, signature_status, ins, instance, plugin_type, ip, plugin_id, job, clsA metric with a constant ‘1’ value labeled by pluginId, pluginType and version from which Grafana plugin was built
grafana_plugin_request_duration_milliseconds_bucketUnknownendpoint, ins, instance, target, ip, le, plugin_id, job, clsN/A
grafana_plugin_request_duration_milliseconds_countUnknownendpoint, ins, instance, target, ip, plugin_id, job, clsN/A
grafana_plugin_request_duration_milliseconds_sumUnknownendpoint, ins, instance, target, ip, plugin_id, job, clsN/A
grafana_plugin_request_duration_seconds_bucketUnknownendpoint, ins, instance, target, ip, le, status, plugin_id, source, job, clsN/A
grafana_plugin_request_duration_seconds_countUnknownendpoint, ins, instance, target, ip, status, plugin_id, source, job, clsN/A
grafana_plugin_request_duration_seconds_sumUnknownendpoint, ins, instance, target, ip, status, plugin_id, source, job, clsN/A
grafana_plugin_request_size_bytes_bucketUnknownendpoint, ins, instance, target, ip, le, plugin_id, source, job, clsN/A
grafana_plugin_request_size_bytes_countUnknownendpoint, ins, instance, target, ip, plugin_id, source, job, clsN/A
grafana_plugin_request_size_bytes_sumUnknownendpoint, ins, instance, target, ip, plugin_id, source, job, clsN/A
grafana_plugin_request_totalUnknownendpoint, ins, instance, target, ip, status, plugin_id, job, clsN/A
grafana_process_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
grafana_process_max_fdsgaugeins, instance, ip, job, clsMaximum number of open file descriptors.
grafana_process_open_fdsgaugeins, instance, ip, job, clsNumber of open file descriptors.
grafana_process_resident_memory_bytesgaugeins, instance, ip, job, clsResident memory size in bytes.
grafana_process_start_time_secondsgaugeins, instance, ip, job, clsStart time of the process since unix epoch in seconds.
grafana_process_virtual_memory_bytesgaugeins, instance, ip, job, clsVirtual memory size in bytes.
grafana_process_virtual_memory_max_bytesgaugeins, instance, ip, job, clsMaximum amount of virtual memory available in bytes.
grafana_prometheus_plugin_backend_request_countunknownendpoint, ins, instance, ip, status, errorSource, job, clsThe total amount of prometheus backend plugin requests
grafana_proxy_response_status_totalUnknownins, instance, ip, job, cls, codeN/A
grafana_public_dashboard_request_countunknownins, instance, ip, job, clscounter for public dashboards requests
grafana_registered_metrics_totalUnknownins, instance, ip, stability_level, deprecated_version, job, clsN/A
grafana_rendering_queue_sizegaugeins, instance, ip, job, clssize of rendering queue
grafana_search_dashboard_search_failures_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_search_dashboard_search_failures_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_search_dashboard_search_failures_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_search_dashboard_search_successes_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_search_dashboard_search_successes_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_search_dashboard_search_successes_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_stat_active_usersgaugeins, instance, ip, job, clsnumber of active users
grafana_stat_total_orgsgaugeins, instance, ip, job, clstotal amount of orgs
grafana_stat_total_playlistsgaugeins, instance, ip, job, clstotal amount of playlists
grafana_stat_total_service_account_tokensgaugeins, instance, ip, job, clstotal amount of service account tokens
grafana_stat_total_service_accountsgaugeins, instance, ip, job, clstotal amount of service accounts
grafana_stat_total_service_accounts_role_nonegaugeins, instance, ip, job, clstotal amount of service accounts with no role
grafana_stat_total_teamsgaugeins, instance, ip, job, clstotal amount of teams
grafana_stat_total_usersgaugeins, instance, ip, job, clstotal amount of users
grafana_stat_totals_active_adminsgaugeins, instance, ip, job, clstotal amount of active admins
grafana_stat_totals_active_editorsgaugeins, instance, ip, job, clstotal amount of active editors
grafana_stat_totals_active_viewersgaugeins, instance, ip, job, clstotal amount of active viewers
grafana_stat_totals_adminsgaugeins, instance, ip, job, clstotal amount of admins
grafana_stat_totals_alert_rulesgaugeins, instance, ip, job, clstotal amount of alert rules in the database
grafana_stat_totals_annotationsgaugeins, instance, ip, job, clstotal amount of annotations in the database
grafana_stat_totals_correlationsgaugeins, instance, ip, job, clstotal amount of correlations
grafana_stat_totals_dashboardgaugeins, instance, ip, job, clstotal amount of dashboards
grafana_stat_totals_dashboard_versionsgaugeins, instance, ip, job, clstotal amount of dashboard versions in the database
grafana_stat_totals_data_keysgaugeins, instance, ip, job, cls, activetotal amount of data keys in the database
grafana_stat_totals_datasourcegaugeins, instance, ip, plugin_id, job, clstotal number of defined datasources, labeled by pluginId
grafana_stat_totals_editorsgaugeins, instance, ip, job, clstotal amount of editors
grafana_stat_totals_foldergaugeins, instance, ip, job, clstotal amount of folders
grafana_stat_totals_library_panelsgaugeins, instance, ip, job, clstotal amount of library panels in the database
grafana_stat_totals_library_variablesgaugeins, instance, ip, job, clstotal amount of library variables in the database
grafana_stat_totals_public_dashboardgaugeins, instance, ip, job, clstotal amount of public dashboards
grafana_stat_totals_rule_groupsgaugeins, instance, ip, job, clstotal amount of alert rule groups in the database
grafana_stat_totals_viewersgaugeins, instance, ip, job, clstotal amount of viewers
infra_upUnknownins, instance, ip, job, clsN/A
jaeger_tracer_baggage_restrictions_updates_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_baggage_truncations_totalUnknownins, instance, ip, job, clsN/A
jaeger_tracer_baggage_updates_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_finished_spans_totalUnknownins, instance, ip, sampled, job, clsN/A
jaeger_tracer_reporter_queue_lengthgaugeins, instance, ip, job, clsCurrent number of spans in the reporter queue
jaeger_tracer_reporter_spans_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_sampler_queries_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_sampler_updates_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_span_context_decoding_errors_totalUnknownins, instance, ip, job, clsN/A
jaeger_tracer_started_spans_totalUnknownins, instance, ip, sampled, job, clsN/A
jaeger_tracer_throttled_debug_spans_totalUnknownins, instance, ip, job, clsN/A
jaeger_tracer_throttler_updates_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_traces_totalUnknownins, instance, ip, sampled, job, cls, stateN/A
kv_request_duration_seconds_bucketUnknownins, instance, role, ip, le, kv_name, type, operation, job, cls, status_codeN/A
kv_request_duration_seconds_countUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
kv_request_duration_seconds_sumUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
legacy_grafana_alerting_ticker_interval_secondsgaugeins, instance, ip, job, clsInterval at which the ticker is meant to tick.
legacy_grafana_alerting_ticker_last_consumed_tick_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the last consumed tick in seconds.
legacy_grafana_alerting_ticker_next_tick_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the next tick in seconds before it is consumed.
logql_query_duration_seconds_bucketUnknownins, instance, query_type, ip, le, job, clsN/A
logql_query_duration_seconds_countUnknownins, instance, query_type, ip, job, clsN/A
logql_query_duration_seconds_sumUnknownins, instance, query_type, ip, job, clsN/A
loki_azure_blob_egress_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_boltdb_shipper_apply_retention_last_successful_run_timestamp_secondsgaugeins, instance, ip, job, clsUnix timestamp of the last successful retention run
loki_boltdb_shipper_compact_tables_operation_duration_secondsgaugeins, instance, ip, job, clsTime (in seconds) spent in compacting all the tables
loki_boltdb_shipper_compact_tables_operation_last_successful_run_timestamp_secondsgaugeins, instance, ip, job, clsUnix timestamp of the last successful compaction run
loki_boltdb_shipper_compact_tables_operation_totalUnknownins, instance, ip, status, job, clsN/A
loki_boltdb_shipper_compactor_runninggaugeins, instance, ip, job, clsValue will be 1 if compactor is currently running on this instance
loki_boltdb_shipper_open_existing_file_failures_totalUnknownins, instance, ip, component, job, clsN/A
loki_boltdb_shipper_query_time_table_download_duration_secondsunknownins, instance, ip, component, job, cls, tableTime (in seconds) spent in downloading of files per table at query time
loki_boltdb_shipper_request_duration_seconds_bucketUnknownins, instance, ip, le, component, operation, job, cls, status_codeN/A
loki_boltdb_shipper_request_duration_seconds_countUnknownins, instance, ip, component, operation, job, cls, status_codeN/A
loki_boltdb_shipper_request_duration_seconds_sumUnknownins, instance, ip, component, operation, job, cls, status_codeN/A
loki_boltdb_shipper_tables_download_operation_duration_secondsgaugeins, instance, ip, component, job, clsTime (in seconds) spent in downloading updated files for all the tables
loki_boltdb_shipper_tables_sync_operation_totalUnknownins, instance, ip, status, component, job, clsN/A
loki_boltdb_shipper_tables_upload_operation_totalUnknownins, instance, ip, status, component, job, clsN/A
loki_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which loki was built, and the goos and goarch for the build.
loki_bytes_per_line_bucketUnknownins, instance, ip, le, job, clsN/A
loki_bytes_per_line_countUnknownins, instance, ip, job, clsN/A
loki_bytes_per_line_sumUnknownins, instance, ip, job, clsN/A
loki_cache_corrupt_chunks_totalUnknownins, instance, ip, job, clsN/A
loki_cache_fetched_keysunknownins, instance, ip, job, clsTotal count of keys requested from cache.
loki_cache_hitsunknownins, instance, ip, job, clsTotal count of keys found in cache.
loki_cache_request_duration_seconds_bucketUnknownins, instance, method, ip, le, job, cls, status_codeN/A
loki_cache_request_duration_seconds_countUnknownins, instance, method, ip, job, cls, status_codeN/A
loki_cache_request_duration_seconds_sumUnknownins, instance, method, ip, job, cls, status_codeN/A
loki_cache_value_size_bytes_bucketUnknownins, instance, method, ip, le, job, clsN/A
loki_cache_value_size_bytes_countUnknownins, instance, method, ip, job, clsN/A
loki_cache_value_size_bytes_sumUnknownins, instance, method, ip, job, clsN/A
loki_chunk_fetcher_cache_dequeued_totalUnknownins, instance, ip, job, clsN/A
loki_chunk_fetcher_cache_enqueued_totalUnknownins, instance, ip, job, clsN/A
loki_chunk_fetcher_cache_skipped_buffer_full_totalUnknownins, instance, ip, job, clsN/A
loki_chunk_fetcher_fetched_size_bytes_bucketUnknownins, instance, ip, le, source, job, clsN/A
loki_chunk_fetcher_fetched_size_bytes_countUnknownins, instance, ip, source, job, clsN/A
loki_chunk_fetcher_fetched_size_bytes_sumUnknownins, instance, ip, source, job, clsN/A
loki_chunk_store_chunks_per_query_bucketUnknownins, instance, ip, le, job, clsN/A
loki_chunk_store_chunks_per_query_countUnknownins, instance, ip, job, clsN/A
loki_chunk_store_chunks_per_query_sumUnknownins, instance, ip, job, clsN/A
loki_chunk_store_deduped_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_chunk_store_deduped_chunks_totalUnknownins, instance, ip, job, clsN/A
loki_chunk_store_fetched_chunk_bytes_totalUnknownins, instance, ip, user, job, clsN/A
loki_chunk_store_fetched_chunks_totalUnknownins, instance, ip, user, job, clsN/A
loki_chunk_store_index_entries_per_chunk_bucketUnknownins, instance, ip, le, job, clsN/A
loki_chunk_store_index_entries_per_chunk_countUnknownins, instance, ip, job, clsN/A
loki_chunk_store_index_entries_per_chunk_sumUnknownins, instance, ip, job, clsN/A
loki_chunk_store_index_lookups_per_query_bucketUnknownins, instance, ip, le, job, clsN/A
loki_chunk_store_index_lookups_per_query_countUnknownins, instance, ip, job, clsN/A
loki_chunk_store_index_lookups_per_query_sumUnknownins, instance, ip, job, clsN/A
loki_chunk_store_series_post_intersection_per_query_bucketUnknownins, instance, ip, le, job, clsN/A
loki_chunk_store_series_post_intersection_per_query_countUnknownins, instance, ip, job, clsN/A
loki_chunk_store_series_post_intersection_per_query_sumUnknownins, instance, ip, job, clsN/A
loki_chunk_store_series_pre_intersection_per_query_bucketUnknownins, instance, ip, le, job, clsN/A
loki_chunk_store_series_pre_intersection_per_query_countUnknownins, instance, ip, job, clsN/A
loki_chunk_store_series_pre_intersection_per_query_sumUnknownins, instance, ip, job, clsN/A
loki_chunk_store_stored_chunk_bytes_totalUnknownins, instance, ip, user, job, clsN/A
loki_chunk_store_stored_chunks_totalUnknownins, instance, ip, user, job, clsN/A
loki_consul_request_duration_seconds_bucketUnknownins, instance, ip, le, kv_name, operation, job, cls, status_codeN/A
loki_consul_request_duration_seconds_countUnknownins, instance, ip, kv_name, operation, job, cls, status_codeN/A
loki_consul_request_duration_seconds_sumUnknownins, instance, ip, kv_name, operation, job, cls, status_codeN/A
loki_delete_request_lookups_failed_totalUnknownins, instance, ip, job, clsN/A
loki_delete_request_lookups_totalUnknownins, instance, ip, job, clsN/A
loki_discarded_bytes_totalUnknownins, instance, ip, reason, job, cls, tenantN/A
loki_discarded_samples_totalUnknownins, instance, ip, reason, job, cls, tenantN/A
loki_distributor_bytes_received_totalUnknownins, instance, retention_hours, ip, job, cls, tenantN/A
loki_distributor_ingester_appends_totalUnknownins, instance, ip, ingester, job, clsN/A
loki_distributor_lines_received_totalUnknownins, instance, ip, job, cls, tenantN/A
loki_distributor_replication_factorgaugeins, instance, ip, job, clsThe configured replication factor.
loki_distributor_structured_metadata_bytes_received_totalUnknownins, instance, retention_hours, ip, job, cls, tenantN/A
loki_experimental_features_in_use_totalUnknownins, instance, ip, job, clsN/A
loki_index_chunk_refs_totalUnknownins, instance, ip, status, job, clsN/A
loki_index_request_duration_seconds_bucketUnknownins, instance, ip, le, component, operation, job, cls, status_codeN/A
loki_index_request_duration_seconds_countUnknownins, instance, ip, component, operation, job, cls, status_codeN/A
loki_index_request_duration_seconds_sumUnknownins, instance, ip, component, operation, job, cls, status_codeN/A
loki_inflight_requestsgaugeins, instance, method, ip, route, job, clsCurrent number of inflight requests.
loki_ingester_autoforget_unhealthy_ingesters_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_blocks_per_chunk_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_blocks_per_chunk_countUnknownins, instance, ip, job, clsN/A
loki_ingester_blocks_per_chunk_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_creations_failed_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_creations_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_deletions_failed_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_deletions_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_duration_secondssummaryins, instance, ip, job, cls, quantileTime taken to create a checkpoint.
loki_ingester_checkpoint_duration_seconds_countUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_logged_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_age_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_age_seconds_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_age_seconds_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_bounds_hours_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_bounds_hours_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_bounds_hours_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_compression_ratio_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_compression_ratio_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_compression_ratio_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_encode_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_encode_time_seconds_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_encode_time_seconds_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_entries_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_entries_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_entries_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_size_bytes_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_size_bytes_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_size_bytes_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_stored_bytes_totalUnknownins, instance, ip, job, cls, tenantN/A
loki_ingester_chunk_utilization_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_utilization_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_utilization_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunks_created_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_chunks_flushed_totalUnknownins, instance, ip, reason, job, clsN/A
loki_ingester_chunks_stored_totalUnknownins, instance, ip, job, cls, tenantN/A
loki_ingester_client_request_duration_seconds_bucketUnknownins, instance, ip, le, operation, job, cls, status_codeN/A
loki_ingester_client_request_duration_seconds_countUnknownins, instance, ip, operation, job, cls, status_codeN/A
loki_ingester_client_request_duration_seconds_sumUnknownins, instance, ip, operation, job, cls, status_codeN/A
loki_ingester_limiter_enabledgaugeins, instance, ip, job, clsWhether the ingester’s limiter is enabled
loki_ingester_memory_chunksgaugeins, instance, ip, job, clsThe total number of chunks in memory.
loki_ingester_memory_streamsgaugeins, instance, ip, job, cls, tenantThe total number of streams in memory per tenant.
loki_ingester_memory_streams_labels_bytesgaugeins, instance, ip, job, clsTotal bytes of labels of the streams in memory.
loki_ingester_received_chunksunknownins, instance, ip, job, clsThe total number of chunks received by this ingester whilst joining.
loki_ingester_samples_per_chunk_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_samples_per_chunk_countUnknownins, instance, ip, job, clsN/A
loki_ingester_samples_per_chunk_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_sent_chunksunknownins, instance, ip, job, clsThe total number of chunks sent by this ingester whilst leaving.
loki_ingester_shutdown_markergaugeins, instance, ip, job, cls1 if prepare shutdown has been called, 0 otherwise
loki_ingester_streams_created_totalUnknownins, instance, ip, job, cls, tenantN/A
loki_ingester_streams_removed_totalUnknownins, instance, ip, job, cls, tenantN/A
loki_ingester_wal_bytes_in_usegaugeins, instance, ip, job, clsTotal number of bytes in use by the WAL recovery process.
loki_ingester_wal_disk_full_failures_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_duplicate_entries_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_logged_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_records_logged_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_recovered_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_recovered_chunks_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_recovered_entries_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_recovered_streams_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_replay_activegaugeins, instance, ip, job, clsWhether the WAL is replaying
loki_ingester_wal_replay_duration_secondsgaugeins, instance, ip, job, clsTime taken to replay the checkpoint and the WAL.
loki_ingester_wal_replay_flushinggaugeins, instance, ip, job, clsWhether the wal replay is in a flushing phase due to backpressure
loki_internal_log_messages_totalUnknownins, instance, ip, level, job, clsN/A
loki_kv_request_duration_seconds_bucketUnknownins, instance, role, ip, le, kv_name, type, operation, job, cls, status_codeN/A
loki_kv_request_duration_seconds_countUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
loki_kv_request_duration_seconds_sumUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
loki_log_flushes_bucketUnknownins, instance, ip, le, job, clsN/A
loki_log_flushes_countUnknownins, instance, ip, job, clsN/A
loki_log_flushes_sumUnknownins, instance, ip, job, clsN/A
loki_log_messages_totalUnknownins, instance, ip, level, job, clsN/A
loki_logql_querystats_bytes_processed_per_seconds_bucketUnknownins, instance, range, ip, le, sharded, type, job, cls, status_code, latency_typeN/A
loki_logql_querystats_bytes_processed_per_seconds_countUnknownins, instance, range, ip, sharded, type, job, cls, status_code, latency_typeN/A
loki_logql_querystats_bytes_processed_per_seconds_sumUnknownins, instance, range, ip, sharded, type, job, cls, status_code, latency_typeN/A
loki_logql_querystats_chunk_download_latency_seconds_bucketUnknownins, instance, range, ip, le, type, job, cls, status_codeN/A
loki_logql_querystats_chunk_download_latency_seconds_countUnknownins, instance, range, ip, type, job, cls, status_codeN/A
loki_logql_querystats_chunk_download_latency_seconds_sumUnknownins, instance, range, ip, type, job, cls, status_codeN/A
loki_logql_querystats_downloaded_chunk_totalUnknownins, instance, range, ip, type, job, cls, status_codeN/A
loki_logql_querystats_duplicates_totalUnknownins, instance, ip, job, clsN/A
loki_logql_querystats_ingester_sent_lines_totalUnknownins, instance, ip, job, clsN/A
loki_logql_querystats_latency_seconds_bucketUnknownins, instance, range, ip, le, type, job, cls, status_codeN/A
loki_logql_querystats_latency_seconds_countUnknownins, instance, range, ip, type, job, cls, status_codeN/A
loki_logql_querystats_latency_seconds_sumUnknownins, instance, range, ip, type, job, cls, status_codeN/A
loki_panic_totalUnknownins, instance, ip, job, clsN/A
loki_querier_index_cache_corruptions_totalUnknownins, instance, ip, job, clsN/A
loki_querier_index_cache_encode_errors_totalUnknownins, instance, ip, job, clsN/A
loki_querier_index_cache_gets_totalUnknownins, instance, ip, job, clsN/A
loki_querier_index_cache_hits_totalUnknownins, instance, ip, job, clsN/A
loki_querier_index_cache_puts_totalUnknownins, instance, ip, job, clsN/A
loki_querier_query_frontend_clientsgaugeins, instance, ip, job, clsThe current number of clients connected to query-frontend.
loki_querier_query_frontend_request_duration_seconds_bucketUnknownins, instance, ip, le, operation, job, cls, status_codeN/A
loki_querier_query_frontend_request_duration_seconds_countUnknownins, instance, ip, operation, job, cls, status_codeN/A
loki_querier_query_frontend_request_duration_seconds_sumUnknownins, instance, ip, operation, job, cls, status_codeN/A
loki_querier_tail_activegaugeins, instance, ip, job, clsNumber of active tailers
loki_querier_tail_active_streamsgaugeins, instance, ip, job, clsNumber of active streams being tailed
loki_querier_tail_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_querier_worker_concurrencygaugeins, instance, ip, job, clsNumber of concurrent querier workers
loki_querier_worker_inflight_queriesgaugeins, instance, ip, job, clsNumber of queries being processed by the querier workers
loki_query_frontend_log_result_cache_hit_totalUnknownins, instance, ip, job, clsN/A
loki_query_frontend_log_result_cache_miss_totalUnknownins, instance, ip, job, clsN/A
loki_query_frontend_partitions_bucketUnknownins, instance, ip, le, job, clsN/A
loki_query_frontend_partitions_countUnknownins, instance, ip, job, clsN/A
loki_query_frontend_partitions_sumUnknownins, instance, ip, job, clsN/A
loki_query_frontend_shard_factor_bucketUnknownins, instance, ip, le, mapper, job, clsN/A
loki_query_frontend_shard_factor_countUnknownins, instance, ip, mapper, job, clsN/A
loki_query_frontend_shard_factor_sumUnknownins, instance, ip, mapper, job, clsN/A
loki_query_scheduler_enqueue_countUnknownins, instance, ip, level, user, job, clsN/A
loki_rate_store_expired_streams_totalUnknownins, instance, ip, job, clsN/A
loki_rate_store_max_stream_rate_bytesgaugeins, instance, ip, job, clsThe maximum stream rate for any stream reported by ingesters during a sync operation. Sharded Streams are combined.
loki_rate_store_max_stream_shardsgaugeins, instance, ip, job, clsThe number of shards for a single stream reported by ingesters during a sync operation.
loki_rate_store_max_unique_stream_rate_bytesgaugeins, instance, ip, job, clsThe maximum stream rate for any stream reported by ingesters during a sync operation. Sharded Streams are considered separate.
loki_rate_store_stream_rate_bytes_bucketUnknownins, instance, ip, le, job, clsN/A
loki_rate_store_stream_rate_bytes_countUnknownins, instance, ip, job, clsN/A
loki_rate_store_stream_rate_bytes_sumUnknownins, instance, ip, job, clsN/A
loki_rate_store_stream_shards_bucketUnknownins, instance, ip, le, job, clsN/A
loki_rate_store_stream_shards_countUnknownins, instance, ip, job, clsN/A
loki_rate_store_stream_shards_sumUnknownins, instance, ip, job, clsN/A
loki_rate_store_streamsgaugeins, instance, ip, job, clsThe number of unique streams reported by all ingesters. Sharded streams are combined
loki_request_duration_seconds_bucketUnknownins, instance, method, ip, le, ws, route, job, cls, status_codeN/A
loki_request_duration_seconds_countUnknownins, instance, method, ip, ws, route, job, cls, status_codeN/A
loki_request_duration_seconds_sumUnknownins, instance, method, ip, ws, route, job, cls, status_codeN/A
loki_request_message_bytes_bucketUnknownins, instance, method, ip, le, route, job, clsN/A
loki_request_message_bytes_countUnknownins, instance, method, ip, route, job, clsN/A
loki_request_message_bytes_sumUnknownins, instance, method, ip, route, job, clsN/A
loki_response_message_bytes_bucketUnknownins, instance, method, ip, le, route, job, clsN/A
loki_response_message_bytes_countUnknownins, instance, method, ip, route, job, clsN/A
loki_response_message_bytes_sumUnknownins, instance, method, ip, route, job, clsN/A
loki_results_cache_version_comparisons_totalUnknownins, instance, ip, job, clsN/A
loki_store_chunks_downloaded_totalUnknownins, instance, ip, status, job, clsN/A
loki_store_chunks_per_batch_bucketUnknownins, instance, ip, le, status, job, clsN/A
loki_store_chunks_per_batch_countUnknownins, instance, ip, status, job, clsN/A
loki_store_chunks_per_batch_sumUnknownins, instance, ip, status, job, clsN/A
loki_store_series_totalUnknownins, instance, ip, status, job, clsN/A
loki_stream_sharding_countunknownins, instance, ip, job, clsTotal number of times the distributor has sharded streams
loki_tcp_connectionsgaugeins, instance, ip, protocol, job, clsCurrent number of accepted TCP connections.
loki_tcp_connections_limitgaugeins, instance, ip, protocol, job, clsThe max number of TCP connections that can be accepted (0 means no limit).
net_conntrack_dialer_conn_attempted_totalcounterins, instance, ip, dialer_name, job, clsTotal number of connections attempted by the given dialer a given name.
net_conntrack_dialer_conn_closed_totalcounterins, instance, ip, dialer_name, job, clsTotal number of connections closed which originated from the dialer of a given name.
net_conntrack_dialer_conn_established_totalcounterins, instance, ip, dialer_name, job, clsTotal number of connections successfully established by the given dialer a given name.
net_conntrack_dialer_conn_failed_totalcounterins, instance, ip, dialer_name, reason, job, clsTotal number of connections failed to dial by the dialer a given name.
net_conntrack_listener_conn_accepted_totalcounterins, instance, ip, listener_name, job, clsTotal number of connections opened to the listener of a given name.
net_conntrack_listener_conn_closed_totalcounterins, instance, ip, listener_name, job, clsTotal number of connections closed that were made to the listener of a given name.
nginx_connections_acceptedcounterins, instance, ip, job, clsAccepted client connections
nginx_connections_activegaugeins, instance, ip, job, clsActive client connections
nginx_connections_handledcounterins, instance, ip, job, clsHandled client connections
nginx_connections_readinggaugeins, instance, ip, job, clsConnections where NGINX is reading the request header
nginx_connections_waitinggaugeins, instance, ip, job, clsIdle client connections
nginx_connections_writinggaugeins, instance, ip, job, clsConnections where NGINX is writing the response back to the client
nginx_exporter_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which nginx_exporter was built, and the goos and goarch for the build.
nginx_http_requests_totalcounterins, instance, ip, job, clsTotal http requests
nginx_upgaugeins, instance, ip, job, clsStatus of the last metric scrape
plugins_active_instancesgaugeins, instance, ip, job, clsThe number of active plugin instances
plugins_datasource_instances_totalUnknownins, instance, ip, job, clsN/A
process_cpu_seconds_totalcounterins, instance, ip, job, clsTotal user and system CPU time spent in seconds.
process_max_fdsgaugeins, instance, ip, job, clsMaximum number of open file descriptors.
process_open_fdsgaugeins, instance, ip, job, clsNumber of open file descriptors.
process_resident_memory_bytesgaugeins, instance, ip, job, clsResident memory size in bytes.
process_start_time_secondsgaugeins, instance, ip, job, clsStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesgaugeins, instance, ip, job, clsVirtual memory size in bytes.
process_virtual_memory_max_bytesgaugeins, instance, ip, job, clsMaximum amount of virtual memory available in bytes.
prometheus_api_remote_read_queriesgaugeins, instance, ip, job, clsThe current number of remote read queries being executed or waiting.
prometheus_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which prometheus was built, and the goos and goarch for the build.
prometheus_config_last_reload_success_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the last successful configuration reload.
prometheus_config_last_reload_successfulgaugeins, instance, ip, job, clsWhether the last configuration reload attempt was successful.
prometheus_engine_queriesgaugeins, instance, ip, job, clsThe current number of queries being executed or waiting.
prometheus_engine_queries_concurrent_maxgaugeins, instance, ip, job, clsThe max number of concurrent queries.
prometheus_engine_query_duration_secondssummaryins, instance, ip, job, cls, quantile, sliceQuery timings
prometheus_engine_query_duration_seconds_countUnknownins, instance, ip, job, cls, sliceN/A
prometheus_engine_query_duration_seconds_sumUnknownins, instance, ip, job, cls, sliceN/A
prometheus_engine_query_log_enabledgaugeins, instance, ip, job, clsState of the query log.
prometheus_engine_query_log_failures_totalcounterins, instance, ip, job, clsThe number of query log failures.
prometheus_engine_query_samples_totalcounterins, instance, ip, job, clsThe total number of samples loaded by all queries.
prometheus_http_request_duration_seconds_bucketUnknownins, instance, ip, le, job, cls, handlerN/A
prometheus_http_request_duration_seconds_countUnknownins, instance, ip, job, cls, handlerN/A
prometheus_http_request_duration_seconds_sumUnknownins, instance, ip, job, cls, handlerN/A
prometheus_http_requests_totalcounterins, instance, ip, job, cls, code, handlerCounter of HTTP requests.
prometheus_http_response_size_bytes_bucketUnknownins, instance, ip, le, job, cls, handlerN/A
prometheus_http_response_size_bytes_countUnknownins, instance, ip, job, cls, handlerN/A
prometheus_http_response_size_bytes_sumUnknownins, instance, ip, job, cls, handlerN/A
prometheus_notifications_alertmanagers_discoveredgaugeins, instance, ip, job, clsThe number of alertmanagers discovered and active.
prometheus_notifications_dropped_totalcounterins, instance, ip, job, clsTotal number of alerts dropped due to errors when sending to Alertmanager.
prometheus_notifications_errors_totalcounterins, instance, ip, alertmanager, job, clsTotal number of errors sending alert notifications.
prometheus_notifications_latency_secondssummaryins, instance, ip, alertmanager, job, cls, quantileLatency quantiles for sending alert notifications.
prometheus_notifications_latency_seconds_countUnknownins, instance, ip, alertmanager, job, clsN/A
prometheus_notifications_latency_seconds_sumUnknownins, instance, ip, alertmanager, job, clsN/A
prometheus_notifications_queue_capacitygaugeins, instance, ip, job, clsThe capacity of the alert notifications queue.
prometheus_notifications_queue_lengthgaugeins, instance, ip, job, clsThe number of alert notifications in the queue.
prometheus_notifications_sent_totalcounterins, instance, ip, alertmanager, job, clsTotal number of alerts sent.
prometheus_readygaugeins, instance, ip, job, clsWhether Prometheus startup was fully completed and the server is ready for normal operation.
prometheus_remote_storage_exemplars_in_totalcounterins, instance, ip, job, clsExemplars in to remote storage, compare to exemplars out for queue managers.
prometheus_remote_storage_highest_timestamp_in_secondsgaugeins, instance, ip, job, clsHighest timestamp that has come into the remote storage via the Appender interface, in seconds since epoch.
prometheus_remote_storage_histograms_in_totalcounterins, instance, ip, job, clsHistogramSamples in to remote storage, compare to histograms out for queue managers.
prometheus_remote_storage_samples_in_totalcounterins, instance, ip, job, clsSamples in to remote storage, compare to samples out for queue managers.
prometheus_remote_storage_string_interner_zero_reference_releases_totalcounterins, instance, ip, job, clsThe number of times release has been called for strings that are not interned.
prometheus_rule_evaluation_duration_secondssummaryins, instance, ip, job, cls, quantileThe duration for a rule to execute.
prometheus_rule_evaluation_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_rule_evaluation_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_rule_evaluation_failures_totalcounterins, instance, ip, job, cls, rule_groupThe total number of rule evaluation failures.
prometheus_rule_evaluations_totalcounterins, instance, ip, job, cls, rule_groupThe total number of rule evaluations.
prometheus_rule_group_duration_secondssummaryins, instance, ip, job, cls, quantileThe duration of rule group evaluations.
prometheus_rule_group_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_rule_group_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_rule_group_interval_secondsgaugeins, instance, ip, job, cls, rule_groupThe interval of a rule group.
prometheus_rule_group_iterations_missed_totalcounterins, instance, ip, job, cls, rule_groupThe total number of rule group evaluations missed due to slow rule group evaluation.
prometheus_rule_group_iterations_totalcounterins, instance, ip, job, cls, rule_groupThe total number of scheduled rule group evaluations, whether executed or missed.
prometheus_rule_group_last_duration_secondsgaugeins, instance, ip, job, cls, rule_groupThe duration of the last rule group evaluation.
prometheus_rule_group_last_evaluation_samplesgaugeins, instance, ip, job, cls, rule_groupThe number of samples returned during the last rule group evaluation.
prometheus_rule_group_last_evaluation_timestamp_secondsgaugeins, instance, ip, job, cls, rule_groupThe timestamp of the last rule group evaluation in seconds.
prometheus_rule_group_rulesgaugeins, instance, ip, job, cls, rule_groupThe number of rules.
prometheus_sd_azure_cache_hit_totalcounterins, instance, ip, job, clsNumber of cache hit during refresh.
prometheus_sd_azure_failures_totalcounterins, instance, ip, job, clsNumber of Azure service discovery refresh failures.
prometheus_sd_consul_rpc_duration_secondssummaryendpoint, ins, instance, ip, job, cls, call, quantileThe duration of a Consul RPC call in seconds.
prometheus_sd_consul_rpc_duration_seconds_countUnknownendpoint, ins, instance, ip, job, cls, callN/A
prometheus_sd_consul_rpc_duration_seconds_sumUnknownendpoint, ins, instance, ip, job, cls, callN/A
prometheus_sd_consul_rpc_failures_totalcounterins, instance, ip, job, clsThe number of Consul RPC call failures.
prometheus_sd_discovered_targetsgaugeins, instance, ip, config, job, clsCurrent number of discovered targets.
prometheus_sd_dns_lookup_failures_totalcounterins, instance, ip, job, clsThe number of DNS-SD lookup failures.
prometheus_sd_dns_lookups_totalcounterins, instance, ip, job, clsThe number of DNS-SD lookups.
prometheus_sd_failed_configsgaugeins, instance, ip, job, clsCurrent number of service discovery configurations that failed to load.
prometheus_sd_file_mtime_secondsgaugeins, instance, ip, filename, job, clsTimestamp (mtime) of files read by FileSD. Timestamp is set at read time.
prometheus_sd_file_read_errors_totalcounterins, instance, ip, job, clsThe number of File-SD read errors.
prometheus_sd_file_scan_duration_secondssummaryins, instance, ip, job, cls, quantileThe duration of the File-SD scan in seconds.
prometheus_sd_file_scan_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_sd_file_scan_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_sd_file_watcher_errors_totalcounterins, instance, ip, job, clsThe number of File-SD errors caused by filesystem watch failures.
prometheus_sd_http_failures_totalcounterins, instance, ip, job, clsNumber of HTTP service discovery refresh failures.
prometheus_sd_kubernetes_events_totalcounterevent, ins, instance, role, ip, job, clsThe number of Kubernetes events handled.
prometheus_sd_kuma_fetch_duration_secondssummaryins, instance, ip, job, cls, quantileThe duration of a Kuma MADS fetch call.
prometheus_sd_kuma_fetch_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_sd_kuma_fetch_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_sd_kuma_fetch_failures_totalcounterins, instance, ip, job, clsThe number of Kuma MADS fetch call failures.
prometheus_sd_kuma_fetch_skipped_updates_totalcounterins, instance, ip, job, clsThe number of Kuma MADS fetch calls that result in no updates to the targets.
prometheus_sd_linode_failures_totalcounterins, instance, ip, job, clsNumber of Linode service discovery refresh failures.
prometheus_sd_nomad_failures_totalcounterins, instance, ip, job, clsNumber of nomad service discovery refresh failures.
prometheus_sd_received_updates_totalcounterins, instance, ip, job, clsTotal number of update events received from the SD providers.
prometheus_sd_updates_totalcounterins, instance, ip, job, clsTotal number of update events sent to the SD consumers.
prometheus_target_interval_length_secondssummaryins, instance, interval, ip, job, cls, quantileActual intervals between scrapes.
prometheus_target_interval_length_seconds_countUnknownins, instance, interval, ip, job, clsN/A
prometheus_target_interval_length_seconds_sumUnknownins, instance, interval, ip, job, clsN/A
prometheus_target_metadata_cache_bytesgaugeins, instance, ip, scrape_job, job, clsThe number of bytes that are currently used for storing metric metadata in the cache
prometheus_target_metadata_cache_entriesgaugeins, instance, ip, scrape_job, job, clsTotal number of metric metadata entries in the cache
prometheus_target_scrape_pool_exceeded_label_limits_totalcounterins, instance, ip, job, clsTotal number of times scrape pools hit the label limits, during sync or config reload.
prometheus_target_scrape_pool_exceeded_target_limit_totalcounterins, instance, ip, job, clsTotal number of times scrape pools hit the target limit, during sync or config reload.
prometheus_target_scrape_pool_reloads_failed_totalcounterins, instance, ip, job, clsTotal number of failed scrape pool reloads.
prometheus_target_scrape_pool_reloads_totalcounterins, instance, ip, job, clsTotal number of scrape pool reloads.
prometheus_target_scrape_pool_sync_totalcounterins, instance, ip, scrape_job, job, clsTotal number of syncs that were executed on a scrape pool.
prometheus_target_scrape_pool_target_limitgaugeins, instance, ip, scrape_job, job, clsMaximum number of targets allowed in this scrape pool.
prometheus_target_scrape_pool_targetsgaugeins, instance, ip, scrape_job, job, clsCurrent number of targets in this scrape pool.
prometheus_target_scrape_pools_failed_totalcounterins, instance, ip, job, clsTotal number of scrape pool creations that failed.
prometheus_target_scrape_pools_totalcounterins, instance, ip, job, clsTotal number of scrape pool creation attempts.
prometheus_target_scrapes_cache_flush_forced_totalcounterins, instance, ip, job, clsHow many times a scrape cache was flushed due to getting big while scrapes are failing.
prometheus_target_scrapes_exceeded_body_size_limit_totalcounterins, instance, ip, job, clsTotal number of scrapes that hit the body size limit
prometheus_target_scrapes_exceeded_native_histogram_bucket_limit_totalcounterins, instance, ip, job, clsTotal number of scrapes that hit the native histogram bucket limit and were rejected.
prometheus_target_scrapes_exceeded_sample_limit_totalcounterins, instance, ip, job, clsTotal number of scrapes that hit the sample limit and were rejected.
prometheus_target_scrapes_exemplar_out_of_order_totalcounterins, instance, ip, job, clsTotal number of exemplar rejected due to not being out of the expected order.
prometheus_target_scrapes_sample_duplicate_timestamp_totalcounterins, instance, ip, job, clsTotal number of samples rejected due to duplicate timestamps but different values.
prometheus_target_scrapes_sample_out_of_bounds_totalcounterins, instance, ip, job, clsTotal number of samples rejected due to timestamp falling outside of the time bounds.
prometheus_target_scrapes_sample_out_of_order_totalcounterins, instance, ip, job, clsTotal number of samples rejected due to not being out of the expected order.
prometheus_target_sync_failed_totalcounterins, instance, ip, scrape_job, job, clsTotal number of target sync failures.
prometheus_target_sync_length_secondssummaryins, instance, ip, scrape_job, job, cls, quantileActual interval to sync the scrape pool.
prometheus_target_sync_length_seconds_countUnknownins, instance, ip, scrape_job, job, clsN/A
prometheus_target_sync_length_seconds_sumUnknownins, instance, ip, scrape_job, job, clsN/A
prometheus_template_text_expansion_failures_totalcounterins, instance, ip, job, clsThe total number of template text expansion failures.
prometheus_template_text_expansions_totalcounterins, instance, ip, job, clsThe total number of template text expansions.
prometheus_treecache_watcher_goroutinesgaugeins, instance, ip, job, clsThe current number of watcher goroutines.
prometheus_treecache_zookeeper_failures_totalcounterins, instance, ip, job, clsThe total number of ZooKeeper failures.
prometheus_tsdb_blocks_loadedgaugeins, instance, ip, job, clsNumber of currently loaded data blocks
prometheus_tsdb_checkpoint_creations_failed_totalcounterins, instance, ip, job, clsTotal number of checkpoint creations that failed.
prometheus_tsdb_checkpoint_creations_totalcounterins, instance, ip, job, clsTotal number of checkpoint creations attempted.
prometheus_tsdb_checkpoint_deletions_failed_totalcounterins, instance, ip, job, clsTotal number of checkpoint deletions that failed.
prometheus_tsdb_checkpoint_deletions_totalcounterins, instance, ip, job, clsTotal number of checkpoint deletions attempted.
prometheus_tsdb_clean_startgaugeins, instance, ip, job, cls-1: lockfile is disabled. 0: a lockfile from a previous execution was replaced. 1: lockfile creation was clean
prometheus_tsdb_compaction_chunk_range_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
prometheus_tsdb_compaction_chunk_range_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_chunk_range_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_chunk_samples_bucketUnknownins, instance, ip, le, job, clsN/A
prometheus_tsdb_compaction_chunk_samples_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_chunk_samples_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_chunk_size_bytes_bucketUnknownins, instance, ip, le, job, clsN/A
prometheus_tsdb_compaction_chunk_size_bytes_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_chunk_size_bytes_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
prometheus_tsdb_compaction_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_populating_blockgaugeins, instance, ip, job, clsSet to 1 when a block is currently being written to the disk.
prometheus_tsdb_compactions_failed_totalcounterins, instance, ip, job, clsTotal number of compactions that failed for the partition.
prometheus_tsdb_compactions_skipped_totalcounterins, instance, ip, job, clsTotal number of skipped compactions due to disabled auto compaction.
prometheus_tsdb_compactions_totalcounterins, instance, ip, job, clsTotal number of compactions that were executed for the partition.
prometheus_tsdb_compactions_triggered_totalcounterins, instance, ip, job, clsTotal number of triggered compactions for the partition.
prometheus_tsdb_data_replay_duration_secondsgaugeins, instance, ip, job, clsTime taken to replay the data on disk.
prometheus_tsdb_exemplar_exemplars_appended_totalcounterins, instance, ip, job, clsTotal number of appended exemplars.
prometheus_tsdb_exemplar_exemplars_in_storagegaugeins, instance, ip, job, clsNumber of exemplars currently in circular storage.
prometheus_tsdb_exemplar_last_exemplars_timestamp_secondsgaugeins, instance, ip, job, clsThe timestamp of the oldest exemplar stored in circular storage. Useful to check for what timerange the current exemplar buffer limit allows. This usually means the last timestampfor all exemplars for a typical setup. This is not true though if one of the series timestamp is in future compared to rest series.
prometheus_tsdb_exemplar_max_exemplarsgaugeins, instance, ip, job, clsTotal number of exemplars the exemplar storage can store, resizeable.
prometheus_tsdb_exemplar_out_of_order_exemplars_totalcounterins, instance, ip, job, clsTotal number of out of order exemplar ingestion failed attempts.
prometheus_tsdb_exemplar_series_with_exemplars_in_storagegaugeins, instance, ip, job, clsNumber of series with exemplars currently in circular storage.
prometheus_tsdb_head_active_appendersgaugeins, instance, ip, job, clsNumber of currently active appender transactions
prometheus_tsdb_head_chunksgaugeins, instance, ip, job, clsTotal number of chunks in the head block.
prometheus_tsdb_head_chunks_created_totalcounterins, instance, ip, job, clsTotal number of chunks created in the head
prometheus_tsdb_head_chunks_removed_totalcounterins, instance, ip, job, clsTotal number of chunks removed in the head
prometheus_tsdb_head_chunks_storage_size_bytesgaugeins, instance, ip, job, clsSize of the chunks_head directory.
prometheus_tsdb_head_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_head_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_head_max_timegaugeins, instance, ip, job, clsMaximum timestamp of the head block. The unit is decided by the library consumer.
prometheus_tsdb_head_max_time_secondsgaugeins, instance, ip, job, clsMaximum timestamp of the head block.
prometheus_tsdb_head_min_timegaugeins, instance, ip, job, clsMinimum time bound of the head block. The unit is decided by the library consumer.
prometheus_tsdb_head_min_time_secondsgaugeins, instance, ip, job, clsMinimum time bound of the head block.
prometheus_tsdb_head_out_of_order_samples_appended_totalcounterins, instance, ip, job, clsTotal number of appended out of order samples.
prometheus_tsdb_head_samples_appended_totalcounterins, instance, ip, type, job, clsTotal number of appended samples.
prometheus_tsdb_head_seriesgaugeins, instance, ip, job, clsTotal number of series in the head block.
prometheus_tsdb_head_series_created_totalcounterins, instance, ip, job, clsTotal number of series created in the head
prometheus_tsdb_head_series_not_found_totalcounterins, instance, ip, job, clsTotal number of requests for series that were not found.
prometheus_tsdb_head_series_removed_totalcounterins, instance, ip, job, clsTotal number of series removed in the head
prometheus_tsdb_head_truncations_failed_totalcounterins, instance, ip, job, clsTotal number of head truncations that failed.
prometheus_tsdb_head_truncations_totalcounterins, instance, ip, job, clsTotal number of head truncations attempted.
prometheus_tsdb_isolation_high_watermarkgaugeins, instance, ip, job, clsThe highest TSDB append ID that has been given out.
prometheus_tsdb_isolation_low_watermarkgaugeins, instance, ip, job, clsThe lowest TSDB append ID that is still referenced.
prometheus_tsdb_lowest_timestampgaugeins, instance, ip, job, clsLowest timestamp value stored in the database. The unit is decided by the library consumer.
prometheus_tsdb_lowest_timestamp_secondsgaugeins, instance, ip, job, clsLowest timestamp value stored in the database.
prometheus_tsdb_mmap_chunk_corruptions_totalcounterins, instance, ip, job, clsTotal number of memory-mapped chunk corruptions.
prometheus_tsdb_mmap_chunks_totalcounterins, instance, ip, job, clsTotal number of chunks that were memory-mapped.
prometheus_tsdb_out_of_bound_samples_totalcounterins, instance, ip, type, job, clsTotal number of out of bound samples ingestion failed attempts with out of order support disabled.
prometheus_tsdb_out_of_order_samples_totalcounterins, instance, ip, type, job, clsTotal number of out of order samples ingestion failed attempts due to out of order being disabled.
prometheus_tsdb_reloads_failures_totalcounterins, instance, ip, job, clsNumber of times the database failed to reloadBlocks block data from disk.
prometheus_tsdb_reloads_totalcounterins, instance, ip, job, clsNumber of times the database reloaded block data from disk.
prometheus_tsdb_retention_limit_bytesgaugeins, instance, ip, job, clsMax number of bytes to be retained in the tsdb blocks, configured 0 means disabled
prometheus_tsdb_retention_limit_secondsgaugeins, instance, ip, job, clsHow long to retain samples in storage.
prometheus_tsdb_size_retentions_totalcounterins, instance, ip, job, clsThe number of times that blocks were deleted because the maximum number of bytes was exceeded.
prometheus_tsdb_snapshot_replay_error_totalcounterins, instance, ip, job, clsTotal number snapshot replays that failed.
prometheus_tsdb_storage_blocks_bytesgaugeins, instance, ip, job, clsThe number of bytes that are currently used for local storage by all blocks.
prometheus_tsdb_symbol_table_size_bytesgaugeins, instance, ip, job, clsSize of symbol table in memory for loaded blocks
prometheus_tsdb_time_retentions_totalcounterins, instance, ip, job, clsThe number of times that blocks were deleted because the maximum time limit was exceeded.
prometheus_tsdb_tombstone_cleanup_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
prometheus_tsdb_tombstone_cleanup_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_tombstone_cleanup_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_too_old_samples_totalcounterins, instance, ip, type, job, clsTotal number of out of order samples ingestion failed attempts with out of support enabled, but sample outside of time window.
prometheus_tsdb_vertical_compactions_totalcounterins, instance, ip, job, clsTotal number of compactions done on overlapping blocks.
prometheus_tsdb_wal_completed_pages_totalcounterins, instance, ip, job, clsTotal number of completed pages.
prometheus_tsdb_wal_corruptions_totalcounterins, instance, ip, job, clsTotal number of WAL corruptions.
prometheus_tsdb_wal_fsync_duration_secondssummaryins, instance, ip, job, cls, quantileDuration of write log fsync.
prometheus_tsdb_wal_fsync_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_wal_fsync_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_wal_page_flushes_totalcounterins, instance, ip, job, clsTotal number of page flushes.
prometheus_tsdb_wal_segment_currentgaugeins, instance, ip, job, clsWrite log segment index that TSDB is currently writing to.
prometheus_tsdb_wal_storage_size_bytesgaugeins, instance, ip, job, clsSize of the write log directory.
prometheus_tsdb_wal_truncate_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_wal_truncate_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_wal_truncations_failed_totalcounterins, instance, ip, job, clsTotal number of write log truncations that failed.
prometheus_tsdb_wal_truncations_totalcounterins, instance, ip, job, clsTotal number of write log truncations attempted.
prometheus_tsdb_wal_writes_failed_totalcounterins, instance, ip, job, clsTotal number of write log writes that failed.
prometheus_web_federation_errors_totalcounterins, instance, ip, job, clsTotal number of errors that occurred while sending federation responses.
prometheus_web_federation_warnings_totalcounterins, instance, ip, job, clsTotal number of warnings that occurred while sending federation responses.
promhttp_metric_handler_requests_in_flightgaugeins, instance, ip, job, clsCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalcounterins, instance, ip, job, cls, codeTotal number of scrapes by HTTP status code.
pushgateway_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which pushgateway was built, and the goos and goarch for the build.
pushgateway_http_requests_totalcounterins, instance, method, ip, job, cls, code, handlerTotal HTTP requests processed by the Pushgateway, excluding scrapes.
querier_cache_added_new_totalUnknownins, instance, ip, job, cache, clsN/A
querier_cache_added_totalUnknownins, instance, ip, job, cache, clsN/A
querier_cache_entriesgaugeins, instance, ip, job, cache, clsThe total number of entries
querier_cache_evicted_totalUnknownins, instance, ip, job, reason, cache, clsN/A
querier_cache_gets_totalUnknownins, instance, ip, job, cache, clsN/A
querier_cache_memory_bytesgaugeins, instance, ip, job, cache, clsThe current cache size in bytes
querier_cache_misses_totalUnknownins, instance, ip, job, cache, clsN/A
querier_cache_stale_gets_totalUnknownins, instance, ip, job, cache, clsN/A
ring_member_heartbeats_totalUnknownins, instance, ip, job, clsN/A
ring_member_tokens_ownedgaugeins, instance, ip, job, clsThe number of tokens owned in the ring.
ring_member_tokens_to_owngaugeins, instance, ip, job, clsThe number of tokens to own in the ring.
scrape_duration_secondsUnknownins, instance, ip, job, clsN/A
scrape_samples_post_metric_relabelingUnknownins, instance, ip, job, clsN/A
scrape_samples_scrapedUnknownins, instance, ip, job, clsN/A
scrape_series_addedUnknownins, instance, ip, job, clsN/A
upUnknownins, instance, ip, job, clsN/A

PING 指标

PING 任务包含有 54 类可用监控指标,由 blackbox_epxorter 提供。

Metric NameTypeLabelsDescription
agent_upUnknownins, ip, job, instance, clsN/A
probe_dns_lookup_time_secondsgaugeins, ip, job, instance, clsReturns the time taken for probe dns lookup in seconds
probe_duration_secondsgaugeins, ip, job, instance, clsReturns how long the probe took to complete in seconds
probe_icmp_duration_secondsgaugeins, ip, job, phase, instance, clsDuration of icmp request by phase
probe_icmp_reply_hop_limitgaugeins, ip, job, instance, clsReplied packet hop limit (TTL for ipv4)
probe_ip_addr_hashgaugeins, ip, job, instance, clsSpecifies the hash of IP address. It’s useful to detect if the IP address changes.
probe_ip_protocolgaugeins, ip, job, instance, clsSpecifies whether probe ip protocol is IP4 or IP6
probe_successgaugeins, ip, job, instance, clsDisplays whether or not the probe was a success
scrape_duration_secondsUnknownins, ip, job, instance, clsN/A
scrape_samples_post_metric_relabelingUnknownins, ip, job, instance, clsN/A
scrape_samples_scrapedUnknownins, ip, job, instance, clsN/A
scrape_series_addedUnknownins, ip, job, instance, clsN/A
upUnknownins, ip, job, instance, clsN/A

PUSH 指标

PushGateway 提供 44 类监控指标。

Metric NameTypeLabelsDescription
agent_upUnknownjob, cls, instance, ins, ipN/A
go_gc_duration_secondssummaryjob, cls, instance, ins, quantile, ipA summary of the pause duration of garbage collection cycles.
go_gc_duration_seconds_countUnknownjob, cls, instance, ins, ipN/A
go_gc_duration_seconds_sumUnknownjob, cls, instance, ins, ipN/A
go_goroutinesgaugejob, cls, instance, ins, ipNumber of goroutines that currently exist.
go_infogaugejob, cls, instance, ins, ip, versionInformation about the Go environment.
go_memstats_alloc_bytescounterjob, cls, instance, ins, ipTotal number of bytes allocated, even if freed.
go_memstats_alloc_bytes_totalcounterjob, cls, instance, ins, ipTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalcounterjob, cls, instance, ins, ipTotal number of frees.
go_memstats_gc_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes that are in use.
go_memstats_heap_objectsgaugejob, cls, instance, ins, ipNumber of allocated objects.
go_memstats_heap_released_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes released to OS.
go_memstats_heap_sys_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsgaugejob, cls, instance, ins, ipNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalcounterjob, cls, instance, ins, ipTotal number of pointer lookups.
go_memstats_mallocs_totalcounterjob, cls, instance, ins, ipTotal number of mallocs.
go_memstats_mcache_inuse_bytesgaugejob, cls, instance, ins, ipNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesgaugejob, cls, instance, ins, ipNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesgaugejob, cls, instance, ins, ipNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes obtained from system.
go_threadsgaugejob, cls, instance, ins, ipNumber of OS threads created.
process_cpu_seconds_totalcounterjob, cls, instance, ins, ipTotal user and system CPU time spent in seconds.
process_max_fdsgaugejob, cls, instance, ins, ipMaximum number of open file descriptors.
process_open_fdsgaugejob, cls, instance, ins, ipNumber of open file descriptors.
process_resident_memory_bytesgaugejob, cls, instance, ins, ipResident memory size in bytes.
process_start_time_secondsgaugejob, cls, instance, ins, ipStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesgaugejob, cls, instance, ins, ipVirtual memory size in bytes.
process_virtual_memory_max_bytesgaugejob, cls, instance, ins, ipMaximum amount of virtual memory available in bytes.
pushgateway_build_infogaugejob, goversion, cls, branch, instance, tags, revision, goarch, ins, ip, version, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which pushgateway was built, and the goos and goarch for the build.
pushgateway_http_requests_totalcounterjob, cls, method, code, handler, instance, ins, ipTotal HTTP requests processed by the Pushgateway, excluding scrapes.
scrape_duration_secondsUnknownjob, cls, instance, ins, ipN/A
scrape_samples_post_metric_relabelingUnknownjob, cls, instance, ins, ipN/A
scrape_samples_scrapedUnknownjob, cls, instance, ins, ipN/A
scrape_series_addedUnknownjob, cls, instance, ins, ipN/A
upUnknownjob, cls, instance, ins, ipN/A

8 - 常见问题

Pigsty INFRA 基础设施模块常见问题答疑

INFRA 模块中包含了哪些组件?

  • Ansible:用于自动化配置、部署和日常运维。
  • Nginx:对外暴露 Grafana、VictoriaMetrics(VMUI)、Alertmanager 等 WebUI,并托管本地 YUM/APT 仓库。
  • 自签名 CA:为 Nginx、Patroni、pgBackRest 等组件签发 SSL/TLS 证书。
  • VictoriaMetrics 套件:替代 Prometheus/Loki,包含 VictoriaMetrics(TSDB)、VMAlert(告警评估)、VictoriaLogs(集中日志)、VictoriaTraces(链路追踪)。
  • Vector:节点侧日志采集器,负责将系统/数据库日志推送至 VictoriaLogs。
  • AlertManager:聚合并分发告警通知。
  • Grafana:监控/可视化平台,预置大量仪表板和数据源。
  • Chronyd:提供 NTP 时间同步。
  • DNSMasq:提供 DNS 注册与解析。
  • ETCD:作为 PostgreSQL 高可用 DCS(亦可在专用集群部署)。
  • PostgreSQL:在管理节点上充当 CMDB(可选)。
  • Docker:在节点上运行无状态工具或应用(可选)。

如何重新向 VictoriaMetrics 注册监控目标?

VictoriaMetrics 通过 /infra/targets/<job>/*.yml 目录进行静态服务发现。如果目标文件被误删,可使用如下命令重新注册:

./infra.yml  -t infra_register   # 重新渲染 infra 自监控目标
./node.yml   -t node_register    # 重新渲染节点 / HAProxy / Vector 目标
./etcd.yml   -t etcd_register    # 重新渲染 etcd 目标
./minio.yml  -t minio_register   # 重新渲染 MinIO 目标
./pgsql.yml  -t pg_register      # 重新渲染 PGSQL/Patroni 目标
./redis.yml  -t redis_register   # 重新渲染 Redis 目标

其他模块(如 pg_monitor.ymlmongo.ymlmysql.yml)也提供了对应的 *_register 标签,可按需执行。


如何重新向 Grafana 注册 PostgreSQL 数据源?

pg_databases 中定义的 PGSQL 数据库默认会被注册为 Grafana 数据源(以供 PGCAT 应用使用)。

如果你不小心删除了在 Grafana 中注册的 postgres 数据源,你可以使用以下命令再次注册它们:

# 将所有(在 pg_databases 中定义的) pgsql 数据库注册为 grafana 数据源
./pgsql.yml -t register_grafana

如何重新向 Nginx 注册节点的 Haproxy 管控界面?

如果你不小心删除了 /etc/nginx/conf.d/haproxy 中的已注册 haproxy 代理设置,你可以使用以下命令再次恢复它们:

./node.yml -t register_nginx     # 在 infra 节点上向 nginx 注册所有 haproxy 管理页面的代理设置

如何恢复 DNSMASQ 中的域名注册记录?

PGSQL 集群/实例域名默认注册到 infra 节点的 /etc/hosts.d/<name>。你可以使用以下命令再次恢复它们:

./pgsql.yml -t pg_dns    # 在 infra 节点上向 dnsmasq 注册 pg 的 DNS 名称

如何使用Nginx对外暴露新的上游服务?

尽管您可以直接通过 IP:Port 的方式访问服务,但我们依然建议收敛访问入口,使用域名并统一从 Nginx 代理访问各类带有 Web 界面的服务。 这样有利于统一收口访问,减少暴露的端口,便于进行访问控制与审计。

如果你希望通过 Nginx 门户公开新的 WebUI 服务,你可以将服务定义添加到 infra_portal 参数中。 例如,下面是 Pigsty 官方 Demo 使用的 Infra 门户配置,对外暴露了几种额外的服务:

infra_portal:
  home         : { domain: home.pigsty.cc }
  grafana      : { domain: demo.pigsty.cc ,endpoint: "${admin_ip}:3000" ,websocket: true }
  prometheus   : { domain: p.pigsty.cc ,endpoint: "${admin_ip}:8428" }
  alertmanager : { domain: a.pigsty.cc ,endpoint: "${admin_ip}:9059" }
  blackbox     : { endpoint: "${admin_ip}:9115" }
  vmalert      : { endpoint: "${admin_ip}:8880" }
  # 新增的 Web 门户
  minio        : { domain: sss.pigsty  ,endpoint: "${admin_ip}:9001" ,scheme: https ,websocket: true }
  postgrest    : { domain: api.pigsty.cc  ,endpoint: "127.0.0.1:8884"   }
  pgadmin      : { domain: adm.pigsty.cc  ,endpoint: "127.0.0.1:8885"   }
  pgweb        : { domain: cli.pigsty.cc  ,endpoint: "127.0.0.1:8886"   }
  bytebase     : { domain: ddl.pigsty.cc  ,endpoint: "127.0.0.1:8887"   }
  gitea        : { domain: git.pigsty.cc  ,endpoint: "127.0.0.1:8889"   }
  wiki         : { domain: wiki.pigsty.cc ,endpoint: "127.0.0.1:9002"   }
  noco         : { domain: noco.pigsty.cc ,endpoint: "127.0.0.1:9003"   }
  supa         : { domain: supa.pigsty.cc ,endpoint: "127.0.0.1:8000", websocket: true }

完成 Nginx 上游服务定义后,使用以下配置与命令,向 Nginx 注册新的服务。

./infra.yml -t nginx_config           # 重新生成 Nginx 配置文件
./infra.yml -t nginx_launch           # 更新并应用 Nginx 配置。

# 您也可以使用 Ansible 手工重载 Nginx 配置
ansible infra -b -a 'nginx -s reload'  # 重载Nginx配置

如果你希望通过 HTTPS 访问,你必须删除 files/pki/csr/pigsty.csrfiles/pki/nginx/pigsty.{key,crt} 以强制重新生成 Nginx SSL/TLS 证书以包括新上游的域名。 如果您希望使用权威机构签发的 SSL 证书,而不是 Pigsty 自签名 CA 颁发的证书,可以将其放置于 /etc/nginx/conf.d/cert/ 目录中并修改相应配置:/etc/nginx/conf.d/<name>.conf


如何手动向节点添加上游仓库的Repo文件?

Pigsty 有一个内置的包装脚本 bin/repo-add,它将调用 ansible 剧本 node.yml 来将 repo 文件添加到相应的节点。

bin/repo-add <selector> [modules]
bin/repo-add 10.10.10.10           # 为节点 10.10.10.10 添加 node 源
bin/repo-add infra   node,infra    # 为 infra 分组添加 node 和 infra 源
bin/repo-add infra   node,local    # 为 infra 分组添加节点仓库和本地pigsty源
bin/repo-add pg-test node,pgsql    # 为 pg-test 分组添加 node 和 pgsql 源