这是本节的多页打印视图。 点击此处打印.

返回本页常规视图.

模块:NODE

配置目标服务器,纳管主机节点,并将其调整至描述的状态。也包括节点上的 VIP,HAProxy 以及监控组件。

配置目标服务器,纳管主机节点,并将其调整至描述的状态。也包括节点上的 VIP,HAProxy 以及监控组件。

1 - 核心概念

介绍 Node 集群中涉及到的重要概念

节点是硬件资源的抽象,可以是物理机,裸金属、虚拟机、容器或 K8s pods。 只要装着 Linux 操作系统(以及 systemd 守护进程),能使用 CPU/内存/磁盘/网络 等标准资源即可。

Pigsty 中存在不同类型节点,区别在于安装了不同的 模块

单机安装 Pigsty 时,四者合一,当前节点将同时作为普通节点,管理节点、基础设施节点,以及数据库节点。


普通节点

使用 Pigsty 管理节点,可在其上安装模块。node.yml 剧本将调整节点至所需状态。以下服务默认会被添加到所有节点:

组件端口描述状态
Node Exporter9100节点监控指标导出器默认启用
HAProxy Admin9101HAProxy 管理页面默认启用
Vector9598日志收集代理默认启用
Docker Daemon9323启用容器支持按需启用
Keepalived管理主机集群 L2 VIP按需启用
Keepalived Exporter9650监控 Keepalived 状态按需启用

可为节点选装 Docker 与 Keepalived 及其监控组件,这两个默认不启用。


ADMIN节点

一套 Pigsty 部署中有且只有一个管理节点,由 admin_ip 指定。在单机安装配置过程中,设置为该机器的首要 IP 地址。

该节点拥有对所有其他节点的 ssh/sudo 访问权限。管理节点的安全至关重要,请确保它的访问受到严格控制。

通常管理节点与基础设施节点(INFRA 节点)重合。若有多个 INFRA 节点,管理节点通常是其中第一个,其他作为备份。


INFRA节点

一套 Pigsty 部署可能有 1 个或多个 INFRA 节点,大型生产环境可能有 2-3 个。

配置清单中的 infra 分组指定哪些节点是 INFRA 节点,这些节点安装 INFRA 模块(DNS、Nginx、Prometheus、Grafana 等)。

管理节点通常是 INFRA 节点分组中的第一个,其他 INFRA 节点可被用作备用管理节点。

组件端口域名描述
Nginx80i.pigstyWeb 服务门户(软件仓库)
Grafana3000-可视化平台
VictoriaMetrics9090-时序数据库(收存监控指标)
VictoriaLogs9428日志收集服务器
VictoriaTraces10428日志收集服务器
vmalert8880日志收集服务器
AlertManager9093-告警聚合分发
BlackboxExporter9115黑盒监控探测
DNSMASQ53DNS 服务器
Chronyd123NTP 时间服务器
Ansible运行剧本

PGSQL节点

安装了 PGSQL 模块的节点被称为 PGSQL 节点。节点与 PostgreSQL 实例为 1:1 部署。

PGSQL 节点可从相应 PostgreSQL 实例借用身份,由 node_id_from_pg 参数控制。

组件端口描述状态
Postgres5432PostgreSQL 数据库默认启用
Pgbouncer6432Pgbouncer 连接池默认启用
Patroni8008Patroni 高可用组件默认启用
HAProxy Primary5433主连接池:读/写服务默认启用
HAProxy Replica5434副本连接池:只读服务默认启用
HAProxy Default5436主直连服务默认启用
HAProxy Offline5438离线直连:离线读服务默认启用
HAProxy service543xPostgreSQL 定制服务按需定制
HAProxy Admin9101监控指标和流量管理默认启用
PG Exporter9630PG 监控指标导出器默认启用
PGBouncer Exporter9631PGBouncer 监控指标导出器默认启用
Node Exporter9100节点监控指标导出器默认启用
Vector9598收集数据库组件与主机日志默认启用
vip-manager将 VIP 绑定到主节点按需启用
Docker Daemon9323Docker 守护进程按需启用
keepalived为整个集群绑定 L2 VIP按需启用
Keepalived Exporter9650Keepalived 指标导出器按需启用

2 - 集群配置

根据需求场景选择合适的 Node 部署类型,并对外提供可靠的接入。

Pigsty 使用 IP地址 作为 节点 的唯一身份标识,该IP地址应当是数据库实例监听并对外提供服务的内网IP地址

node-test:
  hosts:
    10.10.10.11: { nodename: node-test-1 }
    10.10.10.12: { nodename: node-test-2 }
    10.10.10.13: { nodename: node-test-3 }
  vars:
    node_cluster: node-test

该IP地址必须是数据库实例监听并对外提供服务的IP地址,但不宜使用公网IP地址。尽管如此,用户并不一定非要通过该IP地址连接至该数据库。例如,通过SSH隧道或跳板机中转的方式间接操作管理目标节点也是可行的。但在标识数据库节点时,首要IPv4地址依然是节点的核心标识符。这一点非常重要,用户应当在配置时保证这一点

IP地址即配置清单中主机的 inventory_hostname,体现为 <cluster>.hosts 对象中的 key。除此之外,每个节点还有两个额外的身份参数:

名称类型层级必要性说明
inventory_hostnameip-必选节点IP地址
nodenamestringI可选节点名称
node_clusterstringC可选节点集群名称

nodenamenode_cluster 两个参数是可选的,如果不提供,会使用节点现有的主机名,和固定值 nodes 作为默认值。在 Pigsty 的监控系统中,这两者将会被用作节点的 集群标识cls)与 实例标识ins)。

对于 PGSQL节点 来说,因为Pigsty默认采用PG:节点独占1:1部署,因此可以通过 node_id_from_pg 参数,将 PostgreSQL 实例的身份参数(pg_clusterpg_seq)借用至节点的 inscls 标签上,从而让数据库与节点的监控指标拥有相同的标签,便于交叉分析。

#nodename:                # [实例] # 节点实例标识,如缺失则使用现有主机名,可选,无默认值
node_cluster: nodes       # [集群] # 节点集群标识,如缺失则使用默认值'nodes',可选
nodename_overwrite: true          # 用 nodename 覆盖节点的主机名吗?
nodename_exchange: false          # 在剧本主机之间交换 nodename 吗?
node_id_from_pg: true             # 如果可行,是否借用 postgres 身份作为节点身份?

您还可以为主机集群配置丰富的功能参数,例如,使用节点集群上的 HAProxy 对外提供负载均衡,暴露服务,或者为集群绑定一个 L2 VIP。

3 - 参数列表

NODE 模块提供了 11 组共 60+ 个配置参数

NODE 模块负责将主机节点调整到期待的目标状态,并将其纳入 Pigsty 的监控系统中。


参数组功能说明
NODE_IDNODE_ID 相关参数
NODE_DNSNODE_DNS 相关参数
NODE_PACKAGENODE_PACKAGE 相关参数
NODE_TUNENODE_TUNE 相关参数
NODE_SECNODE_SEC 安全相关参数
NODE_ADMINNODE_ADMIN 相关参数
NODE_TIMENODE_TIME 相关参数
NODE_VIPNODE_VIP 相关参数
HAPROXYHAPROXY 相关参数
NODE_EXPORTERNODE_EXPORTER 相关参数
VECTORVECTOR 日志收集相关参数

参数概览

NODE_ID 参数组用于定义节点的身份标识参数,包括节点名称、集群名称,以及是否从 PostgreSQL 借用身份。

参数类型级别说明
nodenamestringInode 实例标识,如缺失则使用主机名,可选
node_clusterstringCnode 集群标识,如缺失则使用默认值’nodes’,可选
nodename_overwriteboolC用 nodename 覆盖节点的主机名吗?
nodename_exchangeboolC在剧本主机之间交换 nodename 吗?
node_id_from_pgboolC如果可行,是否借用 postgres 身份作为节点身份?

NODE_DNS 参数组用于配置节点的 DNS 解析,包括静态 hosts 记录与动态 DNS 服务器。

参数类型级别说明
node_write_etc_hostsboolG/C/I是否修改目标节点上的 /etc/hosts
node_default_etc_hostsstring[]G/etc/hosts 中的静态 DNS 记录
node_etc_hostsstring[]C/etc/hosts 中的额外静态 DNS 记录
node_dns_methodenumC如何处理现有DNS服务器:add,none,overwrite
node_dns_serversstring[]C/etc/resolv.conf 中的动态域名服务器列表
node_dns_optionsstring[]C/etc/resolv.conf 中的DNS解析选项

NODE_PACKAGE 参数组用于配置节点的软件源与软件包安装。

参数类型级别说明
node_repo_modulesenumC在节点上启用哪些软件源模块?默认为 local
node_repo_removeboolC配置节点软件仓库时,删除节点上现有的仓库吗?
node_packagesstring[]C要在当前节点上安装的软件包列表
node_default_packagesstring[]G默认在所有节点上安装的软件包列表

NODE_TUNE 参数组用于配置节点的内核参数、特性开关与性能调优模板。

参数类型级别说明
node_disable_numaboolC禁用节点 numa,禁用需要重启
node_disable_swapboolC禁用节点 Swap,谨慎使用
node_static_networkboolC重启后保留 DNS 解析器设置,即静态网络,默认启用
node_disk_prefetchboolC在 HDD 上配置磁盘预取以提高性能
node_kernel_modulesstring[]C在此节点上启用的内核模块列表
node_hugepage_countintC主机节点分配的 2MB 大页数量,优先级比比例更高
node_hugepage_ratiofloatC主机节点分配的内存大页占总内存比例,0 默认禁用
node_overcommit_ratiofloatC节点内存允许的 OverCommit 超额比率 (50-100),0 默认禁用
node_tuneenumC节点调优配置文件:无,oltp,olap,crit,tiny
node_sysctl_paramsdictC额外的 sysctl 配置参数,k:v 格式

NODE_SEC 参数组用于配置节点的安全相关选项,包括 SELinux、防火墙等。

参数类型级别说明
node_selinux_modeenumCSELinux 模式:disabled, permissive, enforcing
node_firewall_modeenumC防火墙模式:off, none, zone
node_firewall_intranetcidr[]C内网 CIDR 列表,用于配置防火墙规则
node_firewall_public_portport[]C公网开放端口列表,默认为 [22, 80, 443, 5432]

NODE_ADMIN 参数组用于配置节点的管理员用户、数据目录与命令别名。

参数类型级别说明
node_datapathC节点主数据目录,默认为 /data
node_admin_enabledboolC在目标节点上创建管理员用户吗?
node_admin_uidintC节点管理员用户的 uid 和 gid
node_admin_usernameusernameC节点管理员用户的名称,默认为 dba
node_admin_sudoenumC管理员用户的 sudo 权限:limited, nopass, all, none
node_admin_ssh_exchangeboolC是否在节点集群之间交换管理员 ssh 密钥
node_admin_pk_currentboolC将当前用户的 ssh 公钥添加到管理员的 authorized_keys 中吗?
node_admin_pk_liststring[]C要添加到管理员用户的 ssh 公钥
node_aliasesdictC配置主机上的 Shell Alias 命令,KV字典

NODE_TIME 参数组用于配置节点的时区、NTP 时间同步与定时任务。

参数类型级别说明
node_timezonestringC设置主机节点时区,空字符串跳过
node_ntp_enabledboolC启用 chronyd 时间同步服务吗?
node_ntp_serversstring[]C/etc/chrony.conf 中的 ntp 服务器列表
node_crontab_overwriteboolC写入 /etc/crontab 时,追加写入还是全部覆盖?
node_crontabstring[]C在 /etc/crontab 中的 crontab 条目

NODE_VIP 参数组用于配置节点集群的 L2 VIP,由 keepalived 实现。

参数类型级别说明
vip_enabledboolC在此节点集群上启用 L2 vip 吗?
vip_addressipC节点 vip 地址的 ipv4 格式,启用 vip 时为必要参数
vip_vridintC所需的整数,1-254,在同一 VLAN 中应唯一
vip_roleenumI可选,master/backup,默认为 backup
vip_preemptboolC/I可选,true/false,默认为 false,启用 vip 抢占
vip_interfacestringC/I节点 vip 网络接口监听,默认为 eth0
vip_dns_suffixstringC节点 vip DNS 名称后缀,默认为空字符串
vip_exporter_portportCkeepalived exporter 监听端口,默认为 9650

HAPROXY 参数组用于配置节点上的 HAProxy 负载均衡器与服务暴露。

参数类型级别说明
haproxy_enabledboolC在此节点上启用 haproxy 吗?
haproxy_cleanboolG/C/A清除所有现有的 haproxy 配置吗?
haproxy_reloadboolA配置后重新加载 haproxy 吗?
haproxy_auth_enabledboolG启用 haproxy 管理页面的身份验证?
haproxy_admin_usernameusernameGhaproxy 管理用户名,默认为 admin
haproxy_admin_passwordpasswordGhaproxy 管理密码,默认为 pigsty
haproxy_exporter_portportChaproxy exporter 的端口,默认为 9101
haproxy_client_timeoutintervalChaproxy 客户端连接超时,默认为 24h
haproxy_server_timeoutintervalChaproxy 服务器端连接超时,默认为 24h
haproxy_servicesservice[]C要在节点上对外暴露的 haproxy 服务列表

NODE_EXPORTER 参数组用于配置节点监控 Exporter。

参数类型级别说明
node_exporter_enabledboolC在此节点上配置 node_exporter 吗?
node_exporter_portportCnode exporter 监听端口,默认为 9100
node_exporter_optionsargCnode_exporter 的额外服务器选项

VECTOR 参数组用于配置 Vector 日志收集器。

参数类型级别说明
vector_enabledboolC启用 vector 日志收集器吗?
vector_cleanboolG/A初始化期间清除 vector 数据目录吗?
vector_datapathCvector 数据目录,默认为 /data/vector
vector_portportCvector 指标监听端口,默认为 9598
vector_read_fromenumCvector 从头还是从尾开始读取日志
vector_log_endpointstring[]C日志发送目标端点,默认发送至 infra 组

NODE_ID

每个节点都有身份参数,通过在<cluster>.hosts<cluster>.vars中的相关参数进行配置。

Pigsty使用IP地址作为数据库节点的唯一标识,该IP地址必须是数据库实例监听并对外提供服务的IP地址,但不宜使用公网IP地址。 尽管如此,用户并不一定非要通过该IP地址连接至该数据库。例如,通过SSH隧道或跳板机中转的方式间接操作管理目标节点也是可行的。 但在标识数据库节点时,首要IPv4地址依然是节点的核心标识符。这一点非常重要,用户应当在配置时保证这一点。 IP地址即配置清单中主机的inventory_hostname ,体现为<cluster>.hosts对象中的key

node-test:
  hosts:
    10.10.10.11: { nodename: node-test-1 }
    10.10.10.12: { nodename: node-test-2 }
    10.10.10.13: { nodename: node-test-3 }
  vars:
    node_cluster: node-test

除此之外,在Pigsty监控系统中,节点还有两个重要的身份参数:nodenamenode_cluster,这两者将在监控系统中被用作节点的 实例标识ins) 与 集群标识cls)。

node_load1{cls="pg-meta", ins="pg-meta-1", ip="10.10.10.10", job="nodes"}
node_load1{cls="pg-test", ins="pg-test-1", ip="10.10.10.11", job="nodes"}
node_load1{cls="pg-test", ins="pg-test-2", ip="10.10.10.12", job="nodes"}
node_load1{cls="pg-test", ins="pg-test-3", ip="10.10.10.13", job="nodes"}

在执行默认的PostgreSQL部署时,因为Pigsty默认采用节点独占1:1部署,因此可以通过 node_id_from_pg 参数,将数据库实例的身份参数( pg_cluster 借用至节点的inscls标签上。

名称类型层级必要性说明
inventory_hostnameip-必选节点IP地址
nodenamestringI可选节点名称
node_clusterstringC可选节点集群名称
#nodename:                # [实例] # 节点实例标识,如缺失则使用现有主机名,可选,无默认值
node_cluster: nodes       # [集群] # 节点集群标识,如缺失则使用默认值'nodes',可选
nodename_overwrite: true          # 用 nodename 覆盖节点的主机名吗?
nodename_exchange: false          # 在剧本主机之间交换 nodename 吗?
node_id_from_pg: true             # 如果可行,是否借用 postgres 身份作为节点身份?

nodename

参数名称: nodename, 类型: string, 层次:I

主机节点的身份参数,如果没有显式设置,则会使用现有的主机 Hostname 作为节点名。本参数虽然是身份参数,但因为有合理默认值,所以是可选项。

如果启用了 node_id_from_pg 选项(默认启用),且 nodename 没有被显式指定, 那么 nodename 会尝试使用 ${pg_cluster}-${pg_seq} 作为实例身份参数,如果集群没有定义 PGSQL 模块,那么会回归到默认值,也就是主机节点的 HOSTNAME。

node_cluster

参数名称: node_cluster, 类型: string, 层次:C

该选项可为节点显式指定一个集群名称,通常在节点集群层次定义才有意义。使用默认空值将直接使用固定值nodes作为节点集群标识。

如果启用了 node_id_from_pg 选项(默认启用),且 node_cluster 没有被显式指定,那么 node_cluster 会尝试使用 ${pg_cluster}-${pg_seq} 作为集群身份参数,如果集群没有定义 PGSQL 模块,那么会回归到默认值 nodes

nodename_overwrite

参数名称: nodename_overwrite, 类型: bool, 层次:C

是否使用 nodename 覆盖主机名?默认值为 true,在这种情况下,如果你设置了一个非空的 nodename ,那么它会被用作当前主机的 HOSTNAME 。

nodename 配置为空时,如果 node_id_from_pg 参数被配置为 true (默认为真),那么 Pigsty 会尝试借用1:1定义在节点上的 PostgreSQL 实例的身份参数作为主机的节点名。 也就是 {{ pg_cluster }}-{{ pg_seq }},如果该节点没有安装 PGSQL 模块,则会回归到默认什么都不做的状态。

因此,如果您将 nodename 留空,并且没有启用 node_id_from_pg 参数时,Pigsty不会对现有主机名进行任何修改。

nodename_exchange

参数名称: nodename_exchange, 类型: bool, 层次:C

是否在剧本节点间交换主机名?默认值为:false

启用此参数时,同一批组执行 node.yml 剧本的节点之间会相互交换节点名称,写入/etc/hosts中。

node_id_from_pg

参数名称: node_id_from_pg, 类型: bool, 层次:C

从节点上 1:1 部署的 PostgreSQL 实例/集群上借用身份参数? 默认值为 true

Pigsty 中的 PostgreSQL 实例与节点默认使用 1:1 部署,因此,您可以从数据库实例上“借用” 身份参数。 此参数默认启用,这意味着一套 PostgreSQL 集群如果没有特殊配置,主机节点集群和实例的身份参数默认值是与数据库身份参数保持一致的。对于问题分析,监控数据处理都提供了额外便利。


NODE_DNS

Pigsty会为节点配置静态DNS解析记录与动态DNS服务器。

如果您的节点供应商已经为您配置了DNS服务器,您可以将 node_dns_method 设置为 none 跳过DNS设置。

node_write_etc_hosts: true        # modify `/etc/hosts` on target node?
node_default_etc_hosts:           # static dns records in `/etc/hosts`
  - "${admin_ip} i.pigsty"
node_etc_hosts: []                # extra static dns records in `/etc/hosts`
node_dns_method: add              # how to handle dns servers: add,none,overwrite
node_dns_servers: ['${admin_ip}'] # dynamic nameserver in `/etc/resolv.conf`
node_dns_options:                 # dns resolv options in `/etc/resolv.conf`
  - options single-request-reopen timeout:1

node_write_etc_hosts

参数名称: node_write_etc_hosts, 类型: bool, 层次:G|C|I

是否修改目标节点上的 /etc/hosts?例如,在容器环境中通常不允许修改此配置文件。

node_default_etc_hosts

参数名称: node_default_etc_hosts, 类型: string[], 层次:G

默认写入所有节点 /etc/hosts 的静态DNS记录,默认值为:

["${admin_ip} i.pigsty"]

node_default_etc_hosts 是一个数组,每个元素都是一条 DNS 记录,格式为 <ip> <name>,您可以指定多个用空格分隔的域名。

这个参数是用于配置全局静态DNS解析记录的,如果您希望为单个集群与实例配置特定的静态DNS解析,则可以使用 node_etc_hosts 参数。

node_etc_hosts

参数名称: node_etc_hosts, 类型: string[], 层次:C

写入节点 /etc/hosts 的额外的静态DNS记录,默认值为:[] 空数组。

本参数与 node_default_etc_hosts,形式一样,但用途不同:适合在集群/实例层面进行配置。

node_dns_method

参数名称: node_dns_method, 类型: enum, 层次:C

如何配置DNS服务器?有三种选项:addnoneoverwrite,默认值为 add

  • add:将 node_dns_servers 中的记录追加/etc/resolv.conf,并保留已有DNS服务器。(默认)
  • overwrite:使用将 node_dns_servers 中的记录覆盖/etc/resolv.conf
  • none:跳过DNS服务器配置,如果您的环境中已经配置有DNS服务器,则可以直接跳过DNS配置。

node_dns_servers

参数名称: node_dns_servers, 类型: string[], 层次:C

配置 /etc/resolv.conf 中的动态DNS服务器列表:默认值为: ["${admin_ip}"],即将管理节点作为首要DNS服务器。

node_dns_options

参数名称: node_dns_options, 类型: string[], 层次:C

/etc/resolv.conf 中的DNS解析选项,默认值为:

- "options single-request-reopen timeout:1"

如果 node_dns_method 配置为addoverwrite,则本配置项中的记录会被首先写入/etc/resolv.conf 中。具体格式请参考Linux文档关于/etc/resolv.conf的说明


NODE_PACKAGE

Pigsty会为纳入管理的节点配置Yum源,并安装软件包。

node_repo_modules: local          # upstream repo to be added on node, local by default.
node_repo_remove: true            # remove existing repo on node?
node_packages: [openssh-server]   # packages to be installed current nodes with latest version
#node_default_packages:           # default packages to be installed on all nodes

node_repo_modules

参数名称: node_repo_modules, 类型: string, 层次:C/A

需要在节点上添加的的软件源模块列表,形式同 repo_modules。默认值为 local,即使用 repo_upstreamlocal 所指定的本地软件源。

当 Pigsty 纳管节点时,会根据此参数的值来过滤 repo_upstream 中的条目,只有 module 字段与此参数值匹配的条目才会被添加到节点的软件源中。

node_repo_remove

参数名称: node_repo_remove, 类型: bool, 层次:C/A

是否移除节点已有的软件仓库定义?默认值为:true

如果启用,则Pigsty会 移除 节点上/etc/yum.repos.d中原有的配置文件,并备份至/etc/yum.repos.d/backup。 在 Debian/Ubuntu 系统上,则是 /etc/apt/sources.list(.d) 备份至 /etc/apt/backup

node_packages

参数名称: node_packages, 类型: string[], 层次:C

在当前节点上要安装并升级的软件包列表,默认值为:[openssh-server] ,即在安装时会将 sshd 升级到最新版本(避免安全漏洞)。

每一个数组元素都是字符串:由逗号分隔的软件包名称。形式上与 node_packages_default 相同。本参数通常用于在节点/集群层面指定需要额外安装的软件包。

在本参数中指定的软件包,会 升级到可用的最新版本,如果您需要保持现有节点软件版本不变(存在即可),请使用 node_default_packages 参数。

node_default_packages

参数名称: node_default_packages, 类型: string[], 层次:G

默认在所有节点上安装的软件包,默认值是 EL 7/8/9 通用的 RPM 软件包列表,数组,每个元素为逗号分隔的包名:

字符串数组类型,每一行都是 由空格分隔 的软件包列表字符串,指定默认在所有节点上安装的软件包列表。

在此变量中指定的软件包,只要求 存在,而不要求 最新。如果您需要安装最新版本的软件包,请使用 node_packages 参数。

本参数没有默认值,即默认值为未定义状态。如果用户不在配置文件中显式指定本参数,则 Pigsty 会从根据当前节点的操作系统族,从定义于 roles/node_id/vars 中的 node_packages_default 变量中加载获取默认值。

默认值(EL系操作系统):

- lz4,unzip,bzip2,pv,jq,git,ncdu,make,patch,bash,lsof,wget,uuid,tuned,nvme-cli,numactl,sysstat,iotop,htop,rsync,tcpdump
- python3,python3-pip,socat,lrzsz,net-tools,ipvsadm,telnet,ca-certificates,openssl,keepalived,etcd,haproxy,chrony,pig
- zlib,yum,audit,bind-utils,readline,vim-minimal,node_exporter,grubby,openssh-server,openssh-clients,chkconfig,vector

默认值(Debian/Ubuntu):

- lz4,unzip,bzip2,pv,jq,git,ncdu,make,patch,bash,lsof,wget,uuid,tuned,nvme-cli,numactl,sysstat,iotop,htop,rsync
- python3,python3-pip,socat,lrzsz,net-tools,ipvsadm,telnet,ca-certificates,openssl,keepalived,etcd,haproxy,chrony,pig
- zlib1g,acl,dnsutils,libreadline-dev,vim-tiny,node-exporter,openssh-server,openssh-client,vector

本参数形式上与 node_packages 相同,但本参数通常用于全局层面指定所有节点都必须安装的默认软件包


NODE_TUNE

主机节点特性、内核模块与参数调优模板。

node_disable_numa: false          # disable node numa, reboot required
node_disable_swap: false          # disable node swap, use with caution
node_static_network: true         # preserve dns resolver settings after reboot
node_disk_prefetch: false         # setup disk prefetch on HDD to increase performance
node_kernel_modules: [ softdog, ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh ]
node_hugepage_count: 0            # number of 2MB hugepage, take precedence over ratio
node_hugepage_ratio: 0            # node mem hugepage ratio, 0 disable it by default
node_overcommit_ratio: 0          # node mem overcommit ratio, 0 disable it by default
node_tune: oltp                   # node tuned profile: none,oltp,olap,crit,tiny
node_sysctl_params: { }           # sysctl parameters in k:v format in addition to tuned

node_disable_numa

参数名称: node_disable_numa, 类型: bool, 层次:C

是否关闭NUMA?默认不关闭NUMA:false

注意,关闭NUMA需要重启机器后方可生效!如果您不清楚如何绑核,在生产环境使用数据库时建议关闭 NUMA。

node_disable_swap

参数名称: node_disable_swap, 类型: bool, 层次:C

是否关闭 SWAP ? 默认不关闭SWAP:false

通常情况下不建议关闭 SWAP,例外情况是如果您有足够的内存用于独占式 PostgreSQL 部署,则可以关闭 SWAP 提高性能。

例外:当您的节点用于部署 Kubernetes 模块时,应当禁用SWAP。

node_static_network

参数名称: node_static_network, 类型: bool, 层次:C

是否使用静态DNS服务器, 类型:bool,层级:C,默认值为:true,默认启用。

启用静态网络,意味着您的DNS Resolv配置不会因为机器重启与网卡变动被覆盖,建议启用,或由网络工程师负责配置。

node_disk_prefetch

参数名称: node_disk_prefetch, 类型: bool, 层次:C

是否启用磁盘预读?默认不启用:false

针对HDD部署的实例可以优化性能,使用机械硬盘时建议启用。

node_kernel_modules

参数名称: node_kernel_modules, 类型: string[], 层次:C

启用哪些内核模块?默认启用以下内核模块:

node_kernel_modules: [ softdog, ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh ]

形式上是由内核模块名称组成的数组,声明了需要在节点上安装的内核模块。

node_hugepage_count

参数名称: node_hugepage_count, 类型: int, 层次:C

在节点上分配 2MB 大页的数量,默认为 0,另一个相关的参数是 node_hugepage_ratio

如果这两个参数 node_hugepage_countnode_hugepage_ratio 都为 0(默认),则大页将完全被禁用,本参数的优先级相比 node_hugepage_ratio 更高,因为它更加精确。

如果设定了一个非零值,它将被写入 /etc/sysctl.d/hugepage.conf 中应用生效;负值将不起作用,高于 90% 节点内存的数字将被限制为节点内存的 90%

如果不为零,它应该略大于pg_shared_buffer_ratio 的对应值,这样才能让 PostgreSQL 用上大页。

node_hugepage_ratio

参数名称: node_hugepage_ratio, 类型: float, 层次:C

节点内存大页占内存的比例,默认为 0,有效范围:0 ~ 0.40

此内存比例将以大页的形式分配,并为PostgreSQL预留。 node_hugepage_count 是具有更高优先级和精度的参数版本。

默认值:0,这将设置 vm.nr_hugepages=0 并完全不使用大页。

本参数应该等于或略大于pg_shared_buffer_ratio,如果不为零。

例如,如果您为Postgres共享缓冲区默认分配了25%的内存,您可以将此值设置为 0.27 ~ 0.30,并在初始化后使用 /pg/bin/pg-tune-hugepage 精准回收浪费的大页。

node_overcommit_ratio

参数名称: node_overcommit_ratio, 类型: int, 层次:C

节点内存超额分配比率,默认为:0。这是一个从 0100+ 的整数。

默认值:0,这将设置 vm.overcommit_memory=0,否则将使用 vm.overcommit_memory=2, 并使用此值作为 vm.overcommit_ratio

建议在 pgsql 独占节点上设置 vm.overcommit_ratio,避免内存过度提交。

node_tune

参数名称: node_tune, 类型: enum, 层次:C

针对机器进行调优的预制方案,基于tuned 提供服务。有四种预制模式:

  • tiny:微型虚拟机
  • oltp:常规OLTP模板,优化延迟(默认值)
  • olap:常规OLAP模板,优化吞吐量
  • crit:核心金融业务模板,优化脏页数量

通常,数据库的调优模板 pg_conf应当与机器调优模板配套。

node_sysctl_params

参数名称: node_sysctl_params, 类型: dict, 层次:C

使用 K:V 形式的 sysctl 内核参数,会添加到 tuned profile 中,默认值为: {} 空对象。

这是一个 KV 结构的字典参数,Key 是内核 sysctl 参数名,Value 是参数值。你也可以考虑直接在 roles/node/templates 中的 tuned 模板中直接定义额外的 sysctl 参数。


NODE_SEC

节点安全相关参数,包括 SELinux 与防火墙配置。

node_selinux_mode: permissive             # selinux mode: disabled, permissive, enforcing
node_firewall_mode: zone                  # firewall mode: disabled, zone, rules
node_firewall_intranet:           # which intranet cidr considered as internal network
  - 10.0.0.0/8
  - 192.168.0.0/16
  - 172.16.0.0/12
node_firewall_public_port:        # expose these ports to public network in (zone, strict) mode
  - 22                            # enable ssh access
  - 80                            # enable http access
  - 443                           # enable https access
  - 5432                          # enable postgresql access (think twice before exposing it!)

node_selinux_mode

参数名称: node_selinux_mode, 类型: enum, 层次:C

SELinux 运行模式,默认值为:permissive(宽容模式)。

可选值:

  • disabled:完全禁用 SELinux(等同于旧版本的 node_disable_selinux: true
  • permissive:宽容模式,记录违规但不阻止(推荐,默认值)
  • enforcing:强制模式,严格执行 SELinux 策略

如果您没有专业的操作系统/安全专家,建议使用 permissivedisabled 模式。

请注意,SELinux 默认只在 EL 系列系统上启用,如果你想要在 Debian/Ubuntu 系统上启用 SELinux,请自行安装并启用 SELinux 配置。 另外,SELinux 模式的更改可能需要重启系统才能完全生效。

node_firewall_mode

参数名称: node_firewall_mode, 类型: enum, 层次:C

防火墙运行模式,默认值为:zone(区域模式)。

可选值:

  • off:关闭并禁用防火墙(等同于旧版本的 node_disable_firewall: true
  • none:什么也不管,维持现有防火墙规则不变。
  • zone:使用 firewalld / ufw 配置防火墙规则:内网信任,公网只开放指定端口。

在 EL 系统上使用 firewalld 服务,在 Debian/Ubuntu 系统上使用 ufw 服务。

如果您在完全受信任的内网环境中部署,或者通过云厂商安全组等方式进行访问控制,您可以选择 none 模式以保留现有防火墙配置,或者设置为 off 完全禁用防火墙。

生产环境建议使用 zone 模式,配合 node_firewall_intranetnode_firewall_public_port 进行精细化访问控制。

请注意,zone 模式不会自动替你启用防火墙。

node_firewall_intranet

参数名称: node_firewall_intranet, 类型: cidr[], 层次:C

内网 CIDR 地址列表,空数组,自 v4.0 版本引入,默认值为:

node_firewall_intranet:
  - 10.0.0.0/8
  - 172.16.0.0/12
  - 192.168.0.0/16

此参数定义了被视为"内部网络"的 IP 地址范围。来自这些网络的流量将被允许访问所有服务端口,而无需单独配置开放规则。

这些 CIDR 范围内的主机将被视为可信内网主机,享有更宽松的防火墙规则。同时,在 PG/PGB HBA 规则中,这里定义的内网范围也会被视作 “内网” 对待。

node_firewall_public_port

参数名称: node_firewall_public_port, 类型: port[], 层次:C

公网开放端口列表,默认值为:[22, 80, 443, 5432]

此参数定义了对公网(非内网 CIDR)开放的端口列表。默认开放的端口包括:

  • 22:SSH 服务端口
  • 80:HTTP 服务端口
  • 443:HTTPS 服务端口
  • 5432:PostgreSQL 数据库端口

您可以根据实际需求调整此列表。例如,如果不需要对外暴露数据库端口,可以移除 5432

node_firewall_public_port: [22, 80, 443]

Pigsty 中 PostgreSQL 默认安全策略仅允许管理员通过公网访问数据库端口。 如果您想要让其他用户也能通过公网访问数据库,请确保在 PG/PGB HBA 规则中正确配置相应的访问权限。

如果你想要将其他服务端口对公网开放,也可以将它们添加到此列表中, 如果您想要收紧防火墙规则,可以移除 5432 数据库端口,确保只开放真正需要的服务端口。

请注意,只有当 node_firewall_mode 设置为 zone 时,此参数才会生效。


NODE_ADMIN

这一节关于主机节点上的管理员,谁能登陆,怎么登陆。

node_data: /data                  # node main data directory, `/data` by default
node_admin_enabled: true          # create a admin user on target node?
node_admin_uid: 88                # uid and gid for node admin user
node_admin_username: dba          # name of node admin user, `dba` by default
node_admin_sudo: nopass           # admin user's sudo privilege: limited, nopass, all, none
node_admin_ssh_exchange: true     # exchange admin ssh key among node cluster
node_admin_pk_current: true       # add current user's ssh pk to admin authorized_keys
node_admin_pk_list: []            # ssh public keys to be added to admin user
node_aliases: {}                  # alias name -> IP address dict for `/etc/hosts`

node_data

参数名称: node_data, 类型: path, 层次:C

节点的主数据目录,默认为 /data

如果该目录不存在,则该目录会被创建。该目录应当由 root 拥有,并拥有 777 权限。

node_admin_enabled

参数名称: node_admin_enabled, 类型: bool, 层次:C

是否在本节点上创建一个专用管理员用户?默认值为:true

Pigsty默认会在每个节点上创建一个管理员用户(拥有免密sudo与ssh权限),默认的管理员名为dba (uid=88)的管理用户,可以从元节点上通过SSH免密访问环境中的其他节点并执行免密sudo。

node_admin_uid

参数名称: node_admin_uid, 类型: int, 层次:C

管理员用户UID,默认值为:88

请尽可能确保 UID 在所有节点上都相同,可以避免一些无谓的权限问题。

如果默认 UID 88 已经被占用,您可以选择一个其他 UID ,手工分配时请注意UID命名空间冲突。

node_admin_username

参数名称: node_admin_username, 类型: username, 层次:C

管理员用户名,默认为 dba

node_admin_sudo

参数名称: node_admin_sudo, 类型: enum, 层次:C

管理员用户的 sudo 权限级别,默认值为:nopass(免密 sudo)。

可选值:

  • none:不授予 sudo 权限
  • limited:授予有限的 sudo 权限(仅允许执行特定命令)
  • nopass:授予免密 sudo 权限(默认,允许执行所有命令但无需密码)
  • all:授予完整 sudo 权限(需要密码)

Pigsty 默认使用 nopass 模式,管理员用户可以无需密码执行任意 sudo 命令,这对于自动化运维非常方便。

在安全要求较高的生产环境中,您可能需要将此参数调整为 limitedall,以限制管理员的权限范围。

node_admin_ssh_exchange

参数名称: node_admin_ssh_exchange, 类型: bool, 层次:C

在节点集群间交换节点管理员SSH密钥, 类型:bool,层级:C,默认值为:true

启用时,Pigsty会在执行剧本时,在成员间交换SSH公钥,允许管理员 node_admin_username 从不同节点上相互访问。

node_admin_pk_current

参数名称: node_admin_pk_current, 类型: bool, 层次:C

是否将当前节点 & 用户的公钥加入管理员账户,默认值是: true

启用时,将会把当前节点上执行此剧本的管理用户的SSH公钥(~/.ssh/id_rsa.pub)拷贝至目标节点管理员用户的 authorized_keys 中。

生产环境部署时,请务必注意此参数,此参数会将当前执行命令用户的默认公钥安装至所有机器的管理用户上。

node_admin_pk_list

参数名称: node_admin_pk_list, 类型: string[], 层次:C

可登陆管理员的公钥列表,默认值为:[] 空数组。

数组的每一个元素为字符串,内容为写入到管理员用户~/.ssh/authorized_keys中的公钥,持有对应私钥的用户可以以管理员身份登录。

生产环境部署时,请务必注意此参数,仅将信任的密钥加入此列表中。

node_aliases

参数名称: node_aliases, 类型: dict, 层次:C

用于写入主机 /etc/profile.d/node.alias.sh 的 shell 别名,默认值为:{} 空字典。

此参数允许您为主机的 shell 环境配置方便使用的 alias,此处定义的 K:V 字典将以 alias k=v 的形式写入到目标节点的 profile.d 文件中生效。

例如,以下命令声明了一个名为 dp 的别名,用于快速执行 docker compose pull 命令:

node_alias:
  dp: 'docker compose pull'

NODE_TIME

关于主机时间/时区/NTP/定时任务的相关配置。

时间同步对于数据库服务来说非常重要,请确保系统 chronyd 授时服务正常运行。

node_timezone: ''                 # 设置节点时区,空字符串表示跳过
node_ntp_enabled: true            # 启用chronyd时间同步服务?
node_ntp_servers:                 # `/etc/chrony.conf`中的ntp服务器
  - pool pool.ntp.org iburst
node_crontab_overwrite: true      # 覆盖还是追加到`/etc/crontab`?
node_crontab: [ ]                 # `/etc/crontab`中的crontab条目

node_timezone

参数名称: node_timezone, 类型: string, 层次:C

设置节点时区,空字符串表示跳过。默认值是空字符串,默认不会修改默认的时区(即使用通常的默认值UTC)

在中国地区使用时,建议设置为 Asia/Hong_Kong / Asia/ShangHai

node_ntp_enabled

参数名称: node_ntp_enabled, 类型: bool, 层次:C

启用chronyd时间同步服务?默认值为:true

此时 Pigsty 将使用 node_ntp_servers 中指定的 NTP服务器列表覆盖节点的 /etc/chrony.conf

如果您的节点已经配置好了 NTP 服务器,那么可以将此参数设置为 false 跳过时间同步配置。

node_ntp_servers

参数名称: node_ntp_servers, 类型: string[], 层次:C

/etc/chrony.conf 中使用的 NTP 服务器列表。默认值为:["pool pool.ntp.org iburst"]

本参数是一个数组,每一个数组元素是一个字符串,代表一行 NTP 服务器配置。仅当 node_ntp_enabled 启用时生效。

Pigsty 默认使用全球 NTP 服务器 pool.ntp.org,您可以根据自己的网络环境修改此参数,例如 cn.pool.ntp.org iburst,或内网的时钟服务。

您也可以在配置中使用 ${admin_ip} 占位符,使用管理节点上的时间服务器。

node_ntp_servers: [ 'pool ${admin_ip} iburst' ]

node_crontab_overwrite

参数名称: node_crontab_overwrite, 类型: bool, 层次:C

处理 node_crontab 中的定时任务时,是追加还是覆盖?默认值为:true,即覆盖。

如果您希望在节点上追加定时任务,可以将此参数设置为 false,Pigsty 将会在节点的 crontab 上 追加,而非 覆盖所有 定时任务。

node_crontab

参数名称: node_crontab, 类型: string[], 层次:C

定义在节点 /etc/crontab 中的定时任务:默认值为:[] 空数组。

每一个数组数组元素都是一个字符串,代表一行定时任务。使用标准的 cron 格式定义。

例如,以下配置会以 postgres 用户在每天凌晨1点执行全量备份任务。

node_crontab: 
  - '00 01 * * * postgres /pg/bin/pg-backup full' ] # make a full backup every 1am

NODE_VIP

您可以为节点集群绑定一个可选的 L2 VIP,默认不启用此特性。L2 VIP 只对一组节点集群有意义,该 VIP 会根据配置的优先级在集群中的节点之间进行切换,确保节点服务的高可用。

请注意,L2 VIP 只能 在同一 L2 网段中使用,这可能会对您的网络拓扑产生额外的限制,如果不想受此限制,您可以考虑使用 DNS LB 或者 Haproxy 实现类似的功能。

当启用此功能时,您需要为这个 L2 VIP 显式分配可用的 vip_addressvip_vrid,用户应当确保这两者在同一网段内唯一。

请注意,NODE VIP 与 PG VIP 不同,PG VIP 是为 PostgreSQL 实例服务的 VIP,由 vip-manager 组件管理并绑定在 PG 集群主库上。 而 NODE VIP 由 Keepalived 组件管理,绑定在节点集群上。可以是主备模式,也可以是负载均衡模式,两者可以并存。

vip_enabled: false                # enable vip on this node cluster?
# vip_address:         [IDENTITY] # node vip address in ipv4 format, required if vip is enabled
# vip_vrid:            [IDENTITY] # required, integer, 1-254, should be unique among same VLAN
vip_role: backup                  # optional, `master/backup`, backup by default, use as init role
vip_preempt: false                # optional, `true/false`, false by default, enable vip preemption
vip_interface: eth0               # node vip network interface to listen, `eth0` by default
vip_dns_suffix: ''                # node vip dns name suffix, empty string by default
vip_exporter_port: 9650           # keepalived exporter listen port, 9650 by default

vip_enabled

参数名称: vip_enabled, 类型: bool, 层次:C

是否在当前这个节点集群中配置一个由 Keepalived 管理的 L2 VIP ? 默认值为: false

vip_address

参数名称: vip_address, 类型: ip, 层次:C

节点 VIP 地址,IPv4 格式(不带 CIDR 网段后缀),当节点启用 vip_enabled 时,这是一个必选参数。

本参数没有默认值,这意味着您必须显式地为节点集群分配一个唯一的 VIP 地址。

vip_vrid

参数名称: vip_vrid, 类型: int, 层次:C

VRID 是一个范围从 1254 的正整数,用于标识一个网络中的 VIP,当节点启用 vip_enabled 时,这是一个必选参数。

本参数没有默认值,这意味着您必须显式地为节点集群分配一个网段内唯一的 ID。

vip_role

参数名称: vip_role, 类型: enum, 层次:I

节点 VIP 角色,可选值为: masterbackup,默认值为 backup

该参数的值会被设置为 keepalived 的初始状态。

vip_preempt

参数名称: vip_preempt, 类型: bool, 层次:C/I

是否启用 VIP 抢占?可选参数,默认值为 false,即不抢占 VIP。

所谓抢占,是指一个 backup 角色的节点,当其优先级高于当前存活且正常工作的 master 角色的节点时,是否取抢占其 VIP?

vip_interface

参数名称: vip_interface, 类型: string, 层次:C/I

节点 VIP 监听使用的网卡,默认为 eth0

您应当使用与节点主IP地址(即:你填入清单中IP地址)所使用网卡相同的名称。

如果你的节点有着不同的网卡名称,你可以在实例/节点层次对其进行覆盖。

vip_dns_suffix

参数名称: vip_dns_suffix, 类型: string, 层次:C/I

节点集群 L2 VIP 使用的DNS名称,默认是空字符串,即直接使用集群名本身作为DNS名。

vip_exporter_port

参数名称: vip_exporter_port, 类型: port, 层次:C/I

keepalived exporter 监听端口号,默认为:9650


HAPROXY

HAProxy 默认在所有节点上安装启用,并以类似于 Kubernetes NodePort 的方式对外暴露服务。

PGSQL 模块对外服务使用到了 Haproxy。

haproxy_enabled: true             # 在此节点上启用haproxy?
haproxy_clean: false              # 清理所有现有的haproxy配置?
haproxy_reload: true              # 配置后重新加载haproxy?
haproxy_auth_enabled: true        # 为haproxy管理页面启用身份验证
haproxy_admin_username: admin     # haproxy管理用户名,默认为`admin`
haproxy_admin_password: pigsty    # haproxy管理密码,默认为`pigsty`
haproxy_exporter_port: 9101       # haproxy管理/导出端口,默认为9101
haproxy_client_timeout: 24h       # 客户端连接超时,默认为24小时
haproxy_server_timeout: 24h       # 服务器端连接超时,默认为24小时
haproxy_services: []              # 需要在节点上暴露的haproxy服务列表

haproxy_enabled

参数名称: haproxy_enabled, 类型: bool, 层次:C

在此节点上启用haproxy?默认值为: true

haproxy_clean

参数名称: haproxy_clean, 类型: bool, 层次:G/C/A

清理所有现有的haproxy配置?默认值为 false

haproxy_reload

参数名称: haproxy_reload, 类型: bool, 层次:A

配置后重新加载 haproxy?默认值为 true,配置更改后会重新加载haproxy。

如果您希望在应用配置前进行手工检查,您可以使用命令参数关闭此选项,并进行检查后再应用。

haproxy_auth_enabled

参数名称: haproxy_auth_enabled, 类型: bool, 层次:G

为haproxy管理页面启用身份验证,默认值为 true,它将要求管理页面进行http基本身份验证。

建议不要禁用认证,因为您的流量控制页面将对外暴露,这是比较危险的。

haproxy_admin_username

参数名称: haproxy_admin_username, 类型: username, 层次:G

haproxy 管理员用户名,默认为:admin

haproxy_admin_password

参数名称: haproxy_admin_password, 类型: password, 层次:G

haproxy管理密码,默认为 pigsty

在生产环境中请务必修改此密码!

haproxy_exporter_port

参数名称: haproxy_exporter_port, 类型: port, 层次:C

haproxy 流量管理/指标对外暴露的端口,默认为:9101

haproxy_client_timeout

参数名称: haproxy_client_timeout, 类型: interval, 层次:C

客户端连接超时,默认为 24h

设置一个超时可以避免难以清理的超长的连接,但如果您真的需要一个长连接,您可以将其设置为更长的时间。

haproxy_server_timeout

参数名称: haproxy_server_timeout, 类型: interval, 层次:C

服务端连接超时,默认为 24h

设置一个超时可以避免难以清理的超长的连接,但如果您真的需要一个长连接,您可以将其设置为更长的时间。

haproxy_services

参数名称: haproxy_services, 类型: service[], 层次:C

需要在此节点上通过 Haproxy 对外暴露的服务列表,默认值为: [] 空数组。

每一个数组元素都是一个服务定义,下面是一个服务定义的例子:

haproxy_services:                   # list of haproxy service

  # expose pg-test read only replicas
  - name: pg-test-ro                # [REQUIRED] service name, unique
    port: 5440                      # [REQUIRED] service port, unique
    ip: "*"                         # [OPTIONAL] service listen addr, "*" by default
    protocol: tcp                   # [OPTIONAL] service protocol, 'tcp' by default
    balance: leastconn              # [OPTIONAL] load balance algorithm, roundrobin by default (or leastconn)
    maxconn: 20000                  # [OPTIONAL] max allowed front-end connection, 20000 by default
    default: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
    options:
      - option httpchk
      - option http-keep-alive
      - http-check send meth OPTIONS uri /read-only
      - http-check expect status 200
    servers:
      - { name: pg-test-1 ,ip: 10.10.10.11 , port: 5432 , options: check port 8008 , backup: true }
      - { name: pg-test-2 ,ip: 10.10.10.12 , port: 5432 , options: check port 8008 }
      - { name: pg-test-3 ,ip: 10.10.10.13 , port: 5432 , options: check port 8008 }

每个服务定义会被渲染为 /etc/haproxy/<service.name>.cfg 配置文件,并在 Haproxy 重载后生效。


NODE_EXPORTER

node_exporter_enabled: true       # setup node_exporter on this node?
node_exporter_port: 9100          # node exporter listen port, 9100 by default
node_exporter_options: '--no-collector.softnet --no-collector.nvme --collector.tcpstat --collector.processes'

node_exporter_enabled

参数名称: node_exporter_enabled, 类型: bool, 层次:C

在当前节点上启用节点指标收集器?默认启用:true

node_exporter_port

参数名称: node_exporter_port, 类型: port, 层次:C

对外暴露节点指标使用的端口,默认为 9100

node_exporter_options

参数名称: node_exporter_options, 类型: arg, 层次:C

节点指标采集器的命令行参数,默认值为:

--no-collector.softnet --no-collector.nvme --collector.tcpstat --collector.processes

该选项会启用/禁用一些指标收集器,请根据您的需要进行调整。


VECTOR

Vector 是 Pigsty v4.0 使用的日志收集组件,会收集各个模块产生的日志并发送至基础设施节点上的 VictoriaLogs 服务。

  • INFRA: 基础设施组件的日志只会在 Infra 节点上收集。

    • nginx-access: /var/log/nginx/access.log
    • nginx-error: /var/log/nginx/error.log
    • grafana: /var/log/grafana/grafana.log
  • NODES:主机相关的日志,所有节点上都会启用收集。

    • syslog: /var/log/messages (Debian上为 /var/log/syslog
    • dmesg: /var/log/dmesg
    • cron: /var/log/cron
  • PGSQL:PostgreSQL 相关的日志,只有节点配置了 PGSQL 模块才会启用收集。

    • postgres: /pg/log/postgres/*
    • patroni: /pg/log/patroni.log
    • pgbouncer: /pg/log/pgbouncer/pgbouncer.log
    • pgbackrest: /pg/log/pgbackrest/*.log
  • REDIS:Redis 相关日志,只有节点配置了 REDIS 模块才会启用收集。

    • redis: /var/log/redis/*.log

日志目录会根据这些参数的配置自动调整:pg_log_dir, patroni_log_dir, pgbouncer_log_dir, pgbackrest_log_dir

vector_enabled: true              # 启用 vector 日志收集器吗?
vector_clean: false               # 初始化时清除 vector 数据目录吗?
vector_data: /data/vector         # vector 数据目录,默认为 /data/vector
vector_port: 9598                 # vector 指标端口,默认为 9598
vector_read_from: beginning       # vector 从头还是从尾开始读取日志
vector_log_endpoint: [ infra ]    # 日志发送目标端点,默认发送至 infra 组

vector_enabled

参数名称: vector_enabled, 类型: bool, 层次:C

是否启用 Vector 日志收集服务?默认值为: true

Vector 是 Pigsty v4.0 使用的日志收集代理,替代了之前版本使用的 Promtail,用于收集节点和服务的日志并发送至 VictoriaLogs。

vector_clean

参数名称: vector_clean, 类型: bool, 层次:G/A

是否在安装 Vector 时清除已有数据目录?默认值为: false

默认不会清理,当您选择清理时,Pigsty 会在部署 Vector 时移除现有数据目录 vector_data,这意味着 Vector 会重新收集当前节点上的所有日志并发送至 VictoriaLogs。

vector_data

参数名称: vector_data, 类型: path, 层次:C

Vector 数据目录路径,默认值为:/data/vector

Vector 会将日志读取的偏移量和缓冲数据存储在此目录中。

vector_port

参数名称: vector_port, 类型: port, 层次:C

Vector 指标监听端口号,默认为:9598

此端口用于暴露 Vector 自身的监控指标,可被 VictoriaMetrics 抓取。

vector_read_from

参数名称: vector_read_from, 类型: enum, 层次:C

Vector 日志读取起始位置,默认值为:beginning

可选值为 beginning(从头开始)或 end(从尾开始)。beginning 会读取现有日志文件的全部内容,end 只读取新产生的日志。

vector_log_endpoint

参数名称: vector_log_endpoint, 类型: string[], 层次:C

日志发送目标端点列表,默认值为:[ infra ]

指定将日志发送至哪个节点组的 VictoriaLogs 服务。默认发送至 infra 组的节点。

4 - 预置剧本

如何使用预置的 ansible 剧本来管理 Node 集群,常用管理命令速查。

Pigsty 提供两个与 NODE 模块相关的剧本:

  • node.yml:纳管节点,调整节点到期望状态
  • node-rm.yml:从 Pigsty 中移除纳管节点

另提供两个包装命令工具:bin/node-addbin/node-rm,用于快速调用剧本。


node.yml

向 Pigsty 添加节点的 node.yml 包含以下子任务:

node-id       :生成节点身份标识
node_name     :设置主机名
node_hosts    :配置 /etc/hosts 记录
node_resolv   :配置 DNS 解析器 /etc/resolv.conf
node_firewall :设置防火墙 & selinux
node_ca       :添加并信任CA证书
node_repo     :添加上游软件仓库
node_pkg      :安装 rpm/deb 软件包
node_feature  :配置 numa、grub、静态网络等特性
node_kernel   :配置操作系统内核模块
node_tune     :配置 tuned 调优模板
node_sysctl   :设置额外的 sysctl 参数
node_profile  :写入 /etc/profile.d/node.sh
node_ulimit   :配置资源限制
node_data     :配置数据目录
node_admin    :配置管理员用户和ssh密钥
node_timezone :配置时区
node_ntp      :配置 NTP 服务器/客户端
node_crontab  :添加/覆盖 crontab 定时任务
node_vip      :为节点集群设置可选的 L2 VIP
haproxy       :在节点上设置 haproxy 以暴露服务
monitor       :配置节点监控:node_exporter & vector

node-rm.yml

从 Pigsty 中移除节点的剧本 node-rm.yml 包含以下子任务:

register       : 从 prometheus & nginx 中移除节点注册信息
  - prometheus : 移除已注册的 prometheus 监控目标
  - nginx      : 移除用于 haproxy 管理界面的 nginx 代理记录
vip            : 移除节点的 keepalived 与 L2 VIP(如果启用 VIP)
haproxy        : 移除 haproxy 负载均衡器
node_exporter  : 移除节点监控:Node Exporter
vip_exporter   : 移除 keepalived_exporter (如果启用 VIP)
vector         : 移除日志收集代理 vector
profile        : 移除 /etc/profile.d/node.sh 环境配置文件

常用命令速查

# 基础节点管理
./node.yml -l <cls|ip|group>          # 向 Pigsty 中添加节点
./node-rm.yml -l <cls|ip|group>       # 从 Pigsty 中移除节点

# 节点管理快捷命令
bin/node-add node-test                 # 初始化节点集群 'node-test'
bin/node-add 10.10.10.10               # 初始化节点 '10.10.10.10'
bin/node-rm node-test                  # 移除节点集群 'node-test'
bin/node-rm 10.10.10.10                # 移除节点 '10.10.10.10'

# 节点主体初始化
./node.yml -t node                     # 完成节点主体初始化(haproxy,监控除外)
./node.yml -t haproxy                  # 在节点上设置 haproxy
./node.yml -t monitor                  # 配置节点监控:node_exporter & vector

# VIP 管理
./node.yml -t node_vip                 # 为节点集群设置可选的 L2 VIP
./node.yml -t vip_config,vip_reload    # 刷新节点 L2 VIP 配置

# HAProxy 管理
./node.yml -t haproxy_config,haproxy_reload   # 刷新节点上的服务定义

# 注册管理
./node.yml -t register_prometheus      # 重新将节点注册到 Prometheus 中
./node.yml -t register_nginx           # 重新将节点 haproxy 管控界面注册到 Nginx 中

# 具体任务
./node.yml -t node-id                  # 生成节点身份标识
./node.yml -t node_name                # 设置主机名
./node.yml -t node_hosts               # 配置节点 /etc/hosts 记录
./node.yml -t node_resolv              # 配置节点 DNS 解析器 /etc/resolv.conf
./node.yml -t node_firewall            # 配置防火墙 & selinux
./node.yml -t node_ca                  # 配置节点的CA证书
./node.yml -t node_repo                # 配置节点上游软件仓库
./node.yml -t node_pkg                 # 在节点上安装 yum 软件包
./node.yml -t node_feature             # 配置 numa、grub、静态网络等特性
./node.yml -t node_kernel              # 配置操作系统内核模块
./node.yml -t node_tune                # 配置 tuned 调优模板
./node.yml -t node_sysctl              # 设置额外的 sysctl 参数
./node.yml -t node_profile             # 配置节点环境变量:/etc/profile.d/node.sh
./node.yml -t node_ulimit              # 配置节点资源限制
./node.yml -t node_data                # 配置节点首要数据目录
./node.yml -t node_admin               # 配置管理员用户和ssh密钥
./node.yml -t node_timezone            # 配置节点时区
./node.yml -t node_ntp                 # 配置节点 NTP 服务器/客户端
./node.yml -t node_crontab             # 添加/覆盖 crontab 定时任务

5 - 管理预案

Node 集群管理 SOP:创建,销毁,扩容,缩容,节点故障与磁盘故障的处理。

下面是 Node 模块中常用的管理操作:

更多问题请参考 FAQ:NODE


添加节点

要将节点添加到 Pigsty,您需要对该节点具有无密码的 ssh/sudo 访问权限。

您也可以选择一次性添加一个集群,或使用通配符匹配配置清单中要加入 Pigsty 的节点。

# ./node.yml -l <cls|ip|group>        # 向 Pigsty 中添加节点的实际剧本
# bin/node-add <selector|ip...>       # 向 Pigsty 中添加节点
bin/node-add node-test                # 初始化节点集群 'node-test'
bin/node-add 10.10.10.10              # 初始化节点 '10.10.10.10'

移除节点

要从 Pigsty 中移除一个节点,您可以使用以下命令:

# ./node-rm.yml -l <cls|ip|group>    # 从 pigsty 中移除节点的实际剧本
# bin/node-rm <cls|ip|selector> ...  # 从 pigsty 中移除节点
bin/node-rm node-test                # 移除节点集群 'node-test'
bin/node-rm 10.10.10.10              # 移除节点 '10.10.10.10'

您也可以选择一次性移除一个集群,或使用通配符匹配配置清单中要从 Pigsty 移除的节点。


创建管理员

如果当前用户没有对节点的无密码 ssh/sudo 访问权限,您可以使用另一个管理员用户来初始化该节点:

node.yml -t node_admin -k -K -e ansible_user=<另一个管理员>   # 为另一个管理员输入 ssh/sudo 密码以完成此任务

绑定VIP

您可以在节点集群上绑定一个可选的 L2 VIP,使用 vip_enabled 参数。

proxy:
  hosts:
    10.10.10.29: { nodename: proxy-1 }   # 您可以显式指定初始的 VIP 角色:MASTER / BACKUP
    10.10.10.30: { nodename: proxy-2 }   # , vip_role: master }
  vars:
    node_cluster: proxy
    vip_enabled: true
    vip_vrid: 128
    vip_address: 10.10.10.99
    vip_interface: eth1
./node.yml -l proxy -t node_vip     # 首次启用 VIP
./node.yml -l proxy -t vip_refresh  # 刷新 vip 配置(例如指定 master)

添加节点监控

如果您想要在现有节点上添加或重新配置监控,可以使用以下命令:

./node.yml -t node_exporter,node_register  # 配置监控并注册
./node.yml -t vector                        # 配置日志收集

其他常见任务

# Play
./node.yml -t node                            # 完成节点主体初始化(haproxy,监控除外)
./node.yml -t haproxy                         # 在节点上设置 haproxy
./node.yml -t monitor                         # 配置节点监控:node_exporter & vector
./node.yml -t node_vip                        # 为没启用过 VIP 的集群安装、配置、启用 L2 VIP
./node.yml -t vip_config,vip_reload           # 刷新节点 L2 VIP 配置
./node.yml -t haproxy_config,haproxy_reload   # 刷新节点上的服务定义
./node.yml -t register_prometheus             # 重新将节点注册到 Prometheus 中
./node.yml -t register_nginx                  # 重新将节点 haproxy 管控界面注册到 Nginx 中

# Task
./node.yml -t node-id        # 生成节点身份标识
./node.yml -t node_name      # 设置主机名
./node.yml -t node_hosts     # 配置节点 /etc/hosts 记录
./node.yml -t node_resolv    # 配置节点 DNS 解析器 /etc/resolv.conf
./node.yml -t node_firewall  # 配置防火墙 & selinux
./node.yml -t node_ca        # 配置节点的CA证书
./node.yml -t node_repo      # 配置节点上游软件仓库
./node.yml -t node_pkg       # 在节点上安装 yum 软件包
./node.yml -t node_feature   # 配置 numa、grub、静态网络等特性
./node.yml -t node_kernel    # 配置操作系统内核模块
./node.yml -t node_tune      # 配置 tuned 调优模板
./node.yml -t node_sysctl    # 设置额外的 sysctl 参数
./node.yml -t node_profile   # 配置节点环境变量:/etc/profile.d/node.sh
./node.yml -t node_ulimit    # 配置节点资源限制
./node.yml -t node_data      # 配置节点首要数据目录
./node.yml -t node_admin     # 配置管理员用户和ssh密钥
./node.yml -t node_timezone  # 配置节点时区
./node.yml -t node_ntp       # 配置节点 NTP 服务器/客户端
./node.yml -t node_crontab   # 添加/覆盖 crontab 定时任务
./node.yml -t node_vip       # 为节点集群设置可选的 L2 VIP

6 - 监控告警

如何在 Pigsty 中监控 Node?如何使用 Node 本身的管控面板?有哪些告警规则值得关注?

Pigsty 中的 NODE 模块提供了 6 个监控面板和完善的告警规则。


监控面板

NODE 模块提供 6 个监控仪表板:

NODE Overview

展示当前环境所有主机节点的总体情况概览。

node-overview.jpg

NODE Cluster

显示特定主机集群的详细监控数据。

node-cluster.jpg

Node Instance

呈现单个主机节点的详细监控信息。

node-instance.jpg

NODE Alert

集中展示环境中所有主机的告警信息。

node-alert.jpg

NODE VIP

监控 L2 虚拟 IP 的详细状态。

node-vip.jpg

Node Haproxy

追踪 HAProxy 负载均衡器的运行情况。

node-haproxy.jpg


告警规则

Pigsty 针对 NODE 实现了以下告警规则:

可用性告警

规则级别说明
NodeDownCRIT节点离线
HaproxyDownCRITHAProxy 服务离线
PromtailDownWARN日志收集代理离线(Vector)
DockerDownWARN容器引擎离线
KeepalivedDownWARNKeepalived 守护进程离线

CPU 告警

规则级别说明
NodeCpuHighWARNCPU 使用率超过 70%

调度告警

规则级别说明
NodeLoadHighWARN标准化负载超过 100%

内存告警

规则级别说明
NodeOutOfMemWARN可用内存少于 10%
NodeMemSwappedWARNSwap 使用率超过 1%

文件系统告警

规则级别说明
NodeFsSpaceFullWARN磁盘使用率超过 90%
NodeFsFilesFullWARNInode 使用率超过 90%
NodeFdFullWARN文件描述符使用率超过 90%

磁盘告警

规则级别说明
NodeDiskSlowWARN读写延迟超过 32ms

网络协议告警

规则级别说明
NodeTcpErrHighWARNTCP 错误率超过 1/分钟
NodeTcpRetransHighWARNTCP 重传率超过 1%

时间同步告警

规则级别说明
NodeTimeDriftWARN系统时间未同步

7 - 指标列表

Pigsty NODE 模块提供的完整监控指标列表与释义

NODE 模块包含有 747 类可用监控指标。

Metric NameTypeLabelsDescription
ALERTSUnknownalertname, ip, level, severity, ins, job, alertstate, category, instance, clsN/A
ALERTS_FOR_STATEUnknownalertname, ip, level, severity, ins, job, category, instance, clsN/A
deprecated_flags_inuse_totalUnknowninstance, ins, job, ip, clsN/A
go_gc_duration_secondssummaryquantile, instance, ins, job, ip, clsA summary of the pause duration of garbage collection cycles.
go_gc_duration_seconds_countUnknowninstance, ins, job, ip, clsN/A
go_gc_duration_seconds_sumUnknowninstance, ins, job, ip, clsN/A
go_goroutinesgaugeinstance, ins, job, ip, clsNumber of goroutines that currently exist.
go_infogaugeversion, instance, ins, job, ip, clsInformation about the Go environment.
go_memstats_alloc_bytesgaugeinstance, ins, job, ip, clsNumber of bytes allocated and still in use.
go_memstats_alloc_bytes_totalcounterinstance, ins, job, ip, clsTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesgaugeinstance, ins, job, ip, clsNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalcounterinstance, ins, job, ip, clsTotal number of frees.
go_memstats_gc_sys_bytesgaugeinstance, ins, job, ip, clsNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesgaugeinstance, ins, job, ip, clsNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesgaugeinstance, ins, job, ip, clsNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesgaugeinstance, ins, job, ip, clsNumber of heap bytes that are in use.
go_memstats_heap_objectsgaugeinstance, ins, job, ip, clsNumber of allocated objects.
go_memstats_heap_released_bytesgaugeinstance, ins, job, ip, clsNumber of heap bytes released to OS.
go_memstats_heap_sys_bytesgaugeinstance, ins, job, ip, clsNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsgaugeinstance, ins, job, ip, clsNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalcounterinstance, ins, job, ip, clsTotal number of pointer lookups.
go_memstats_mallocs_totalcounterinstance, ins, job, ip, clsTotal number of mallocs.
go_memstats_mcache_inuse_bytesgaugeinstance, ins, job, ip, clsNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesgaugeinstance, ins, job, ip, clsNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesgaugeinstance, ins, job, ip, clsNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesgaugeinstance, ins, job, ip, clsNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesgaugeinstance, ins, job, ip, clsNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesgaugeinstance, ins, job, ip, clsNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesgaugeinstance, ins, job, ip, clsNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesgaugeinstance, ins, job, ip, clsNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesgaugeinstance, ins, job, ip, clsNumber of bytes obtained from system.
go_threadsgaugeinstance, ins, job, ip, clsNumber of OS threads created.
haproxy:cls:usageUnknownjob, clsN/A
haproxy:ins:uptimeUnknowninstance, ins, job, ip, clsN/A
haproxy:ins:usageUnknowninstance, ins, job, ip, clsN/A
haproxy_backend_active_serversgaugeproxy, instance, ins, job, ip, clsTotal number of active UP servers with a non-zero weight
haproxy_backend_agg_check_statusgaugestate, proxy, instance, ins, job, ip, clsBackend’s aggregated gauge of servers’ state check status
haproxy_backend_agg_server_check_statusgaugestate, proxy, instance, ins, job, ip, cls[DEPRECATED] Backend’s aggregated gauge of servers’ status
haproxy_backend_agg_server_statusgaugestate, proxy, instance, ins, job, ip, clsBackend’s aggregated gauge of servers’ status
haproxy_backend_backup_serversgaugeproxy, instance, ins, job, ip, clsTotal number of backup UP servers with a non-zero weight
haproxy_backend_bytes_in_totalcounterproxy, instance, ins, job, ip, clsTotal number of request bytes since process started
haproxy_backend_bytes_out_totalcounterproxy, instance, ins, job, ip, clsTotal number of response bytes since process started
haproxy_backend_check_last_change_secondsgaugeproxy, instance, ins, job, ip, clsHow long ago the last server state changed, in seconds
haproxy_backend_check_up_down_totalcounterproxy, instance, ins, job, ip, clsTotal number of failed checks causing UP to DOWN server transitions, per server/backend, since the worker process started
haproxy_backend_client_aborts_totalcounterproxy, instance, ins, job, ip, clsTotal number of requests or connections aborted by the client since the worker process started
haproxy_backend_connect_time_average_secondsgaugeproxy, instance, ins, job, ip, clsAvg. connect time for last 1024 successful connections.
haproxy_backend_connection_attempts_totalcounterproxy, instance, ins, job, ip, clsTotal number of outgoing connection attempts on this backend/server since the worker process started
haproxy_backend_connection_errors_totalcounterproxy, instance, ins, job, ip, clsTotal number of failed connections to server since the worker process started
haproxy_backend_connection_reuses_totalcounterproxy, instance, ins, job, ip, clsTotal number of reused connection on this backend/server since the worker process started
haproxy_backend_current_queuegaugeproxy, instance, ins, job, ip, clsNumber of current queued connections
haproxy_backend_current_sessionsgaugeproxy, instance, ins, job, ip, clsNumber of current sessions on the frontend, backend or server
haproxy_backend_downtime_seconds_totalcounterproxy, instance, ins, job, ip, clsTotal time spent in DOWN state, for server or backend
haproxy_backend_failed_header_rewriting_totalcounterproxy, instance, ins, job, ip, clsTotal number of failed HTTP header rewrites since the worker process started
haproxy_backend_http_cache_hits_totalcounterproxy, instance, ins, job, ip, clsTotal number of HTTP requests not found in the cache on this frontend/backend since the worker process started
haproxy_backend_http_cache_lookups_totalcounterproxy, instance, ins, job, ip, clsTotal number of HTTP requests looked up in the cache on this frontend/backend since the worker process started
haproxy_backend_http_comp_bytes_bypassed_totalcounterproxy, instance, ins, job, ip, clsTotal number of bytes that bypassed HTTP compression for this object since the worker process started (CPU/memory/bandwidth limitation)
haproxy_backend_http_comp_bytes_in_totalcounterproxy, instance, ins, job, ip, clsTotal number of bytes submitted to the HTTP compressor for this object since the worker process started
haproxy_backend_http_comp_bytes_out_totalcounterproxy, instance, ins, job, ip, clsTotal number of bytes emitted by the HTTP compressor for this object since the worker process started
haproxy_backend_http_comp_responses_totalcounterproxy, instance, ins, job, ip, clsTotal number of HTTP responses that were compressed for this object since the worker process started
haproxy_backend_http_requests_totalcounterproxy, instance, ins, job, ip, clsTotal number of HTTP requests processed by this object since the worker process started
haproxy_backend_http_responses_totalcounterip, proxy, ins, code, job, instance, clsTotal number of HTTP responses with status 100-199 returned by this object since the worker process started
haproxy_backend_internal_errors_totalcounterproxy, instance, ins, job, ip, clsTotal number of internal errors since process started
haproxy_backend_last_session_secondsgaugeproxy, instance, ins, job, ip, clsHow long ago some traffic was seen on this object on this worker process, in seconds
haproxy_backend_limit_sessionsgaugeproxy, instance, ins, job, ip, clsFrontend/listener/server’s maxconn, backend’s fullconn
haproxy_backend_loadbalanced_totalcounterproxy, instance, ins, job, ip, clsTotal number of requests routed by load balancing since the worker process started (ignores queue pop and stickiness)
haproxy_backend_max_connect_time_secondsgaugeproxy, instance, ins, job, ip, clsMaximum observed time spent waiting for a connection to complete
haproxy_backend_max_queuegaugeproxy, instance, ins, job, ip, clsHighest value of queued connections encountered since process started
haproxy_backend_max_queue_time_secondsgaugeproxy, instance, ins, job, ip, clsMaximum observed time spent in the queue
haproxy_backend_max_response_time_secondsgaugeproxy, instance, ins, job, ip, clsMaximum observed time spent waiting for a server response
haproxy_backend_max_session_rategaugeproxy, instance, ins, job, ip, clsHighest value of sessions per second observed since the worker process started
haproxy_backend_max_sessionsgaugeproxy, instance, ins, job, ip, clsHighest value of current sessions encountered since process started
haproxy_backend_max_total_time_secondsgaugeproxy, instance, ins, job, ip, clsMaximum observed total request+response time (request+queue+connect+response+processing)
haproxy_backend_queue_time_average_secondsgaugeproxy, instance, ins, job, ip, clsAvg. queue time for last 1024 successful connections.
haproxy_backend_redispatch_warnings_totalcounterproxy, instance, ins, job, ip, clsTotal number of server redispatches due to connection failures since the worker process started
haproxy_backend_requests_denied_totalcounterproxy, instance, ins, job, ip, clsTotal number of denied requests since process started
haproxy_backend_response_errors_totalcounterproxy, instance, ins, job, ip, clsTotal number of invalid responses since the worker process started
haproxy_backend_response_time_average_secondsgaugeproxy, instance, ins, job, ip, clsAvg. response time for last 1024 successful connections.
haproxy_backend_responses_denied_totalcounterproxy, instance, ins, job, ip, clsTotal number of denied responses since process started
haproxy_backend_retry_warnings_totalcounterproxy, instance, ins, job, ip, clsTotal number of server connection retries since the worker process started
haproxy_backend_server_aborts_totalcounterproxy, instance, ins, job, ip, clsTotal number of requests or connections aborted by the server since the worker process started
haproxy_backend_sessions_totalcounterproxy, instance, ins, job, ip, clsTotal number of sessions since process started
haproxy_backend_statusgaugestate, proxy, instance, ins, job, ip, clsCurrent status of the service, per state label value.
haproxy_backend_total_time_average_secondsgaugeproxy, instance, ins, job, ip, clsAvg. total time for last 1024 successful connections.
haproxy_backend_uweightgaugeproxy, instance, ins, job, ip, clsServer’s user weight, or sum of active servers’ user weights for a backend
haproxy_backend_weightgaugeproxy, instance, ins, job, ip, clsServer’s effective weight, or sum of active servers’ effective weights for a backend
haproxy_frontend_bytes_in_totalcounterproxy, instance, ins, job, ip, clsTotal number of request bytes since process started
haproxy_frontend_bytes_out_totalcounterproxy, instance, ins, job, ip, clsTotal number of response bytes since process started
haproxy_frontend_connections_rate_maxgaugeproxy, instance, ins, job, ip, clsHighest value of connections per second observed since the worker process started
haproxy_frontend_connections_totalcounterproxy, instance, ins, job, ip, clsTotal number of new connections accepted on this frontend since the worker process started
haproxy_frontend_current_sessionsgaugeproxy, instance, ins, job, ip, clsNumber of current sessions on the frontend, backend or server
haproxy_frontend_denied_connections_totalcounterproxy, instance, ins, job, ip, clsTotal number of incoming connections blocked on a listener/frontend by a tcp-request connection rule since the worker process started
haproxy_frontend_denied_sessions_totalcounterproxy, instance, ins, job, ip, clsTotal number of incoming sessions blocked on a listener/frontend by a tcp-request connection rule since the worker process started
haproxy_frontend_failed_header_rewriting_totalcounterproxy, instance, ins, job, ip, clsTotal number of failed HTTP header rewrites since the worker process started
haproxy_frontend_http_cache_hits_totalcounterproxy, instance, ins, job, ip, clsTotal number of HTTP requests not found in the cache on this frontend/backend since the worker process started
haproxy_frontend_http_cache_lookups_totalcounterproxy, instance, ins, job, ip, clsTotal number of HTTP requests looked up in the cache on this frontend/backend since the worker process started
haproxy_frontend_http_comp_bytes_bypassed_totalcounterproxy, instance, ins, job, ip, clsTotal number of bytes that bypassed HTTP compression for this object since the worker process started (CPU/memory/bandwidth limitation)
haproxy_frontend_http_comp_bytes_in_totalcounterproxy, instance, ins, job, ip, clsTotal number of bytes submitted to the HTTP compressor for this object since the worker process started
haproxy_frontend_http_comp_bytes_out_totalcounterproxy, instance, ins, job, ip, clsTotal number of bytes emitted by the HTTP compressor for this object since the worker process started
haproxy_frontend_http_comp_responses_totalcounterproxy, instance, ins, job, ip, clsTotal number of HTTP responses that were compressed for this object since the worker process started
haproxy_frontend_http_requests_rate_maxgaugeproxy, instance, ins, job, ip, clsHighest value of http requests observed since the worker process started
haproxy_frontend_http_requests_totalcounterproxy, instance, ins, job, ip, clsTotal number of HTTP requests processed by this object since the worker process started
haproxy_frontend_http_responses_totalcounterip, proxy, ins, code, job, instance, clsTotal number of HTTP responses with status 100-199 returned by this object since the worker process started
haproxy_frontend_intercepted_requests_totalcounterproxy, instance, ins, job, ip, clsTotal number of HTTP requests intercepted on the frontend (redirects/stats/services) since the worker process started
haproxy_frontend_internal_errors_totalcounterproxy, instance, ins, job, ip, clsTotal number of internal errors since process started
haproxy_frontend_limit_session_rategaugeproxy, instance, ins, job, ip, clsLimit on the number of sessions accepted in a second (frontend only, ‘rate-limit sessions’ setting)
haproxy_frontend_limit_sessionsgaugeproxy, instance, ins, job, ip, clsFrontend/listener/server’s maxconn, backend’s fullconn
haproxy_frontend_max_session_rategaugeproxy, instance, ins, job, ip, clsHighest value of sessions per second observed since the worker process started
haproxy_frontend_max_sessionsgaugeproxy, instance, ins, job, ip, clsHighest value of current sessions encountered since process started
haproxy_frontend_request_errors_totalcounterproxy, instance, ins, job, ip, clsTotal number of invalid requests since process started
haproxy_frontend_requests_denied_totalcounterproxy, instance, ins, job, ip, clsTotal number of denied requests since process started
haproxy_frontend_responses_denied_totalcounterproxy, instance, ins, job, ip, clsTotal number of denied responses since process started
haproxy_frontend_sessions_totalcounterproxy, instance, ins, job, ip, clsTotal number of sessions since process started
haproxy_frontend_statusgaugestate, proxy, instance, ins, job, ip, clsCurrent status of the service, per state label value.
haproxy_process_active_peersgaugeinstance, ins, job, ip, clsCurrent number of verified active peers connections on the current worker process
haproxy_process_build_infogaugeversion, instance, ins, job, ip, clsBuild info
haproxy_process_busy_polling_enabledgaugeinstance, ins, job, ip, cls1 if busy-polling is currently in use on the worker process, otherwise zero (config.busy-polling)
haproxy_process_bytes_out_rategaugeinstance, ins, job, ip, clsNumber of bytes emitted by current worker process over the last second
haproxy_process_bytes_out_totalcounterinstance, ins, job, ip, clsTotal number of bytes emitted by current worker process since started
haproxy_process_connected_peersgaugeinstance, ins, job, ip, clsCurrent number of peers having passed the connection step on the current worker process
haproxy_process_connections_totalcounterinstance, ins, job, ip, clsTotal number of connections on this worker process since started
haproxy_process_current_backend_ssl_key_rategaugeinstance, ins, job, ip, clsNumber of SSL keys created on backends in this worker process over the last second
haproxy_process_current_connection_rategaugeinstance, ins, job, ip, clsNumber of front connections created on this worker process over the last second
haproxy_process_current_connectionsgaugeinstance, ins, job, ip, clsCurrent number of connections on this worker process
haproxy_process_current_frontend_ssl_key_rategaugeinstance, ins, job, ip, clsNumber of SSL keys created on frontends in this worker process over the last second
haproxy_process_current_run_queuegaugeinstance, ins, job, ip, clsTotal number of active tasks+tasklets in the current worker process
haproxy_process_current_session_rategaugeinstance, ins, job, ip, clsNumber of sessions created on this worker process over the last second
haproxy_process_current_ssl_connectionsgaugeinstance, ins, job, ip, clsCurrent number of SSL endpoints on this worker process (front+back)
haproxy_process_current_ssl_rategaugeinstance, ins, job, ip, clsNumber of SSL connections created on this worker process over the last second
haproxy_process_current_tasksgaugeinstance, ins, job, ip, clsTotal number of tasks in the current worker process (active + sleeping)
haproxy_process_current_zlib_memorygaugeinstance, ins, job, ip, clsAmount of memory currently used by HTTP compression on the current worker process (in bytes)
haproxy_process_dropped_logs_totalcounterinstance, ins, job, ip, clsTotal number of dropped logs for current worker process since started
haproxy_process_failed_resolutionscounterinstance, ins, job, ip, clsTotal number of failed DNS resolutions in current worker process since started
haproxy_process_frontend_ssl_reusegaugeinstance, ins, job, ip, clsPercent of frontend SSL connections which did not require a new key
haproxy_process_hard_max_connectionsgaugeinstance, ins, job, ip, clsHard limit on the number of per-process connections (imposed by Memmax_MB or Ulimit-n)
haproxy_process_http_comp_bytes_in_totalcounterinstance, ins, job, ip, clsNumber of bytes submitted to the HTTP compressor in this worker process over the last second
haproxy_process_http_comp_bytes_out_totalcounterinstance, ins, job, ip, clsNumber of bytes emitted by the HTTP compressor in this worker process over the last second
haproxy_process_idle_time_percentgaugeinstance, ins, job, ip, clsPercentage of last second spent waiting in the current worker thread
haproxy_process_jobsgaugeinstance, ins, job, ip, clsCurrent number of active jobs on the current worker process (frontend connections, master connections, listeners)
haproxy_process_limit_connection_rategaugeinstance, ins, job, ip, clsHard limit for ConnRate (global.maxconnrate)
haproxy_process_limit_http_compgaugeinstance, ins, job, ip, clsLimit of CompressBpsOut beyond which HTTP compression is automatically disabled
haproxy_process_limit_session_rategaugeinstance, ins, job, ip, clsHard limit for SessRate (global.maxsessrate)
haproxy_process_limit_ssl_rategaugeinstance, ins, job, ip, clsHard limit for SslRate (global.maxsslrate)
haproxy_process_listenersgaugeinstance, ins, job, ip, clsCurrent number of active listeners on the current worker process
haproxy_process_max_backend_ssl_key_rategaugeinstance, ins, job, ip, clsHighest SslBackendKeyRate reached on this worker process since started (in SSL keys per second)
haproxy_process_max_connection_rategaugeinstance, ins, job, ip, clsHighest ConnRate reached on this worker process since started (in connections per second)
haproxy_process_max_connectionsgaugeinstance, ins, job, ip, clsHard limit on the number of per-process connections (configured or imposed by Ulimit-n)
haproxy_process_max_fdsgaugeinstance, ins, job, ip, clsHard limit on the number of per-process file descriptors
haproxy_process_max_frontend_ssl_key_rategaugeinstance, ins, job, ip, clsHighest SslFrontendKeyRate reached on this worker process since started (in SSL keys per second)
haproxy_process_max_memory_bytesgaugeinstance, ins, job, ip, clsWorker process’s hard limit on memory usage in byes (-m on command line)
haproxy_process_max_pipesgaugeinstance, ins, job, ip, clsHard limit on the number of pipes for splicing, 0=unlimited
haproxy_process_max_session_rategaugeinstance, ins, job, ip, clsHighest SessRate reached on this worker process since started (in sessions per second)
haproxy_process_max_socketsgaugeinstance, ins, job, ip, clsHard limit on the number of per-process sockets
haproxy_process_max_ssl_connectionsgaugeinstance, ins, job, ip, clsHard limit on the number of per-process SSL endpoints (front+back), 0=unlimited
haproxy_process_max_ssl_rategaugeinstance, ins, job, ip, clsHighest SslRate reached on this worker process since started (in connections per second)
haproxy_process_max_zlib_memorygaugeinstance, ins, job, ip, clsLimit on the amount of memory used by HTTP compression above which it is automatically disabled (in bytes, see global.maxzlibmem)
haproxy_process_nbprocgaugeinstance, ins, job, ip, clsNumber of started worker processes (historical, always 1)
haproxy_process_nbthreadgaugeinstance, ins, job, ip, clsNumber of started threads (global.nbthread)
haproxy_process_pipes_free_totalcounterinstance, ins, job, ip, clsCurrent number of allocated and available pipes in this worker process
haproxy_process_pipes_used_totalcounterinstance, ins, job, ip, clsCurrent number of pipes in use in this worker process
haproxy_process_pool_allocated_bytesgaugeinstance, ins, job, ip, clsAmount of memory allocated in pools (in bytes)
haproxy_process_pool_failures_totalcounterinstance, ins, job, ip, clsNumber of failed pool allocations since this worker was started
haproxy_process_pool_used_bytesgaugeinstance, ins, job, ip, clsAmount of pool memory currently used (in bytes)
haproxy_process_recv_logs_totalcounterinstance, ins, job, ip, clsTotal number of log messages received by log-forwarding listeners on this worker process since started
haproxy_process_relative_process_idgaugeinstance, ins, job, ip, clsRelative worker process number (1)
haproxy_process_requests_totalcounterinstance, ins, job, ip, clsTotal number of requests on this worker process since started
haproxy_process_spliced_bytes_out_totalcounterinstance, ins, job, ip, clsTotal number of bytes emitted by current worker process through a kernel pipe since started
haproxy_process_ssl_cache_lookups_totalcounterinstance, ins, job, ip, clsTotal number of SSL session ID lookups in the SSL session cache on this worker since started
haproxy_process_ssl_cache_misses_totalcounterinstance, ins, job, ip, clsTotal number of SSL session ID lookups that didn’t find a session in the SSL session cache on this worker since started
haproxy_process_ssl_connections_totalcounterinstance, ins, job, ip, clsTotal number of SSL endpoints on this worker process since started (front+back)
haproxy_process_start_time_secondsgaugeinstance, ins, job, ip, clsStart time in seconds
haproxy_process_stoppinggaugeinstance, ins, job, ip, cls1 if the worker process is currently stopping, otherwise zero
haproxy_process_unstoppable_jobsgaugeinstance, ins, job, ip, clsCurrent number of unstoppable jobs on the current worker process (master connections)
haproxy_process_uptime_secondsgaugeinstance, ins, job, ip, clsHow long ago this worker process was started (seconds)
haproxy_server_bytes_in_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of request bytes since process started
haproxy_server_bytes_out_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of response bytes since process started
haproxy_server_check_codegaugeproxy, instance, ins, job, server, ip, clslayer5-7 code, if available of the last health check.
haproxy_server_check_duration_secondsgaugeproxy, instance, ins, job, server, ip, clsTotal duration of the latest server health check, in seconds.
haproxy_server_check_failures_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of failed individual health checks per server/backend, since the worker process started
haproxy_server_check_last_change_secondsgaugeproxy, instance, ins, job, server, ip, clsHow long ago the last server state changed, in seconds
haproxy_server_check_statusgaugestate, proxy, instance, ins, job, server, ip, clsStatus of last health check, per state label value.
haproxy_server_check_up_down_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of failed checks causing UP to DOWN server transitions, per server/backend, since the worker process started
haproxy_server_client_aborts_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of requests or connections aborted by the client since the worker process started
haproxy_server_connect_time_average_secondsgaugeproxy, instance, ins, job, server, ip, clsAvg. connect time for last 1024 successful connections.
haproxy_server_connection_attempts_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of outgoing connection attempts on this backend/server since the worker process started
haproxy_server_connection_errors_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of failed connections to server since the worker process started
haproxy_server_connection_reuses_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of reused connection on this backend/server since the worker process started
haproxy_server_current_queuegaugeproxy, instance, ins, job, server, ip, clsNumber of current queued connections
haproxy_server_current_sessionsgaugeproxy, instance, ins, job, server, ip, clsNumber of current sessions on the frontend, backend or server
haproxy_server_current_throttlegaugeproxy, instance, ins, job, server, ip, clsThrottling ratio applied to a server’s maxconn and weight during the slowstart period (0 to 100%)
haproxy_server_downtime_seconds_totalcounterproxy, instance, ins, job, server, ip, clsTotal time spent in DOWN state, for server or backend
haproxy_server_failed_header_rewriting_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of failed HTTP header rewrites since the worker process started
haproxy_server_idle_connections_currentgaugeproxy, instance, ins, job, server, ip, clsCurrent number of idle connections available for reuse on this server
haproxy_server_idle_connections_limitgaugeproxy, instance, ins, job, server, ip, clsLimit on the number of available idle connections on this server (server ‘pool_max_conn’ directive)
haproxy_server_internal_errors_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of internal errors since process started
haproxy_server_last_session_secondsgaugeproxy, instance, ins, job, server, ip, clsHow long ago some traffic was seen on this object on this worker process, in seconds
haproxy_server_limit_sessionsgaugeproxy, instance, ins, job, server, ip, clsFrontend/listener/server’s maxconn, backend’s fullconn
haproxy_server_loadbalanced_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of requests routed by load balancing since the worker process started (ignores queue pop and stickiness)
haproxy_server_max_connect_time_secondsgaugeproxy, instance, ins, job, server, ip, clsMaximum observed time spent waiting for a connection to complete
haproxy_server_max_queuegaugeproxy, instance, ins, job, server, ip, clsHighest value of queued connections encountered since process started
haproxy_server_max_queue_time_secondsgaugeproxy, instance, ins, job, server, ip, clsMaximum observed time spent in the queue
haproxy_server_max_response_time_secondsgaugeproxy, instance, ins, job, server, ip, clsMaximum observed time spent waiting for a server response
haproxy_server_max_session_rategaugeproxy, instance, ins, job, server, ip, clsHighest value of sessions per second observed since the worker process started
haproxy_server_max_sessionsgaugeproxy, instance, ins, job, server, ip, clsHighest value of current sessions encountered since process started
haproxy_server_max_total_time_secondsgaugeproxy, instance, ins, job, server, ip, clsMaximum observed total request+response time (request+queue+connect+response+processing)
haproxy_server_need_connections_currentgaugeproxy, instance, ins, job, server, ip, clsEstimated needed number of connections
haproxy_server_queue_limitgaugeproxy, instance, ins, job, server, ip, clsLimit on the number of connections in queue, for servers only (maxqueue argument)
haproxy_server_queue_time_average_secondsgaugeproxy, instance, ins, job, server, ip, clsAvg. queue time for last 1024 successful connections.
haproxy_server_redispatch_warnings_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of server redispatches due to connection failures since the worker process started
haproxy_server_response_errors_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of invalid responses since the worker process started
haproxy_server_response_time_average_secondsgaugeproxy, instance, ins, job, server, ip, clsAvg. response time for last 1024 successful connections.
haproxy_server_responses_denied_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of denied responses since process started
haproxy_server_retry_warnings_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of server connection retries since the worker process started
haproxy_server_safe_idle_connections_currentgaugeproxy, instance, ins, job, server, ip, clsCurrent number of safe idle connections
haproxy_server_server_aborts_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of requests or connections aborted by the server since the worker process started
haproxy_server_sessions_totalcounterproxy, instance, ins, job, server, ip, clsTotal number of sessions since process started
haproxy_server_statusgaugestate, proxy, instance, ins, job, server, ip, clsCurrent status of the service, per state label value.
haproxy_server_total_time_average_secondsgaugeproxy, instance, ins, job, server, ip, clsAvg. total time for last 1024 successful connections.
haproxy_server_unsafe_idle_connections_currentgaugeproxy, instance, ins, job, server, ip, clsCurrent number of unsafe idle connections
haproxy_server_used_connections_currentgaugeproxy, instance, ins, job, server, ip, clsCurrent number of connections in use
haproxy_server_uweightgaugeproxy, instance, ins, job, server, ip, clsServer’s user weight, or sum of active servers’ user weights for a backend
haproxy_server_weightgaugeproxy, instance, ins, job, server, ip, clsServer’s effective weight, or sum of active servers’ effective weights for a backend
haproxy_upUnknowninstance, ins, job, ip, clsN/A
inflight_requestsgaugeinstance, ins, job, route, ip, cls, methodCurrent number of inflight requests.
jaeger_tracer_baggage_restrictions_updates_totalUnknowninstance, ins, job, result, ip, clsN/A
jaeger_tracer_baggage_truncations_totalUnknowninstance, ins, job, ip, clsN/A
jaeger_tracer_baggage_updates_totalUnknowninstance, ins, job, result, ip, clsN/A
jaeger_tracer_finished_spans_totalUnknowninstance, ins, job, sampled, ip, clsN/A
jaeger_tracer_reporter_queue_lengthgaugeinstance, ins, job, ip, clsCurrent number of spans in the reporter queue
jaeger_tracer_reporter_spans_totalUnknowninstance, ins, job, result, ip, clsN/A
jaeger_tracer_sampler_queries_totalUnknowninstance, ins, job, result, ip, clsN/A
jaeger_tracer_sampler_updates_totalUnknowninstance, ins, job, result, ip, clsN/A
jaeger_tracer_span_context_decoding_errors_totalUnknowninstance, ins, job, ip, clsN/A
jaeger_tracer_started_spans_totalUnknowninstance, ins, job, sampled, ip, clsN/A
jaeger_tracer_throttled_debug_spans_totalUnknowninstance, ins, job, ip, clsN/A
jaeger_tracer_throttler_updates_totalUnknowninstance, ins, job, result, ip, clsN/A
jaeger_tracer_traces_totalUnknownstate, instance, ins, job, sampled, ip, clsN/A
loki_experimental_features_in_use_totalUnknowninstance, ins, job, ip, clsN/A
loki_internal_log_messages_totalUnknownlevel, instance, ins, job, ip, clsN/A
loki_log_flushes_bucketUnknowninstance, ins, job, le, ip, clsN/A
loki_log_flushes_countUnknowninstance, ins, job, ip, clsN/A
loki_log_flushes_sumUnknowninstance, ins, job, ip, clsN/A
loki_log_messages_totalUnknownlevel, instance, ins, job, ip, clsN/A
loki_logql_querystats_duplicates_totalUnknowninstance, ins, job, ip, clsN/A
loki_logql_querystats_ingester_sent_lines_totalUnknowninstance, ins, job, ip, clsN/A
loki_querier_index_cache_corruptions_totalUnknowninstance, ins, job, ip, clsN/A
loki_querier_index_cache_encode_errors_totalUnknowninstance, ins, job, ip, clsN/A
loki_querier_index_cache_gets_totalUnknowninstance, ins, job, ip, clsN/A
loki_querier_index_cache_hits_totalUnknowninstance, ins, job, ip, clsN/A
loki_querier_index_cache_puts_totalUnknowninstance, ins, job, ip, clsN/A
net_conntrack_dialer_conn_attempted_totalcounterip, ins, job, instance, cls, dialer_nameTotal number of connections attempted by the given dialer a given name.
net_conntrack_dialer_conn_closed_totalcounterip, ins, job, instance, cls, dialer_nameTotal number of connections closed which originated from the dialer of a given name.
net_conntrack_dialer_conn_established_totalcounterip, ins, job, instance, cls, dialer_nameTotal number of connections successfully established by the given dialer a given name.
net_conntrack_dialer_conn_failed_totalcounterip, ins, job, reason, instance, cls, dialer_nameTotal number of connections failed to dial by the dialer a given name.
node:cls:avail_bytesUnknownjob, clsN/A
node:cls:cpu_countUnknownjob, clsN/A
node:cls:cpu_usageUnknownjob, clsN/A
node:cls:cpu_usage_15mUnknownjob, clsN/A
node:cls:cpu_usage_1mUnknownjob, clsN/A
node:cls:cpu_usage_5mUnknownjob, clsN/A
node:cls:disk_io_bytes_rate1mUnknownjob, clsN/A
node:cls:disk_iops_1mUnknownjob, clsN/A
node:cls:disk_mreads_rate1mUnknownjob, clsN/A
node:cls:disk_mreads_ratio1mUnknownjob, clsN/A
node:cls:disk_mwrites_rate1mUnknownjob, clsN/A
node:cls:disk_mwrites_ratio1mUnknownjob, clsN/A
node:cls:disk_read_bytes_rate1mUnknownjob, clsN/A
node:cls:disk_reads_rate1mUnknownjob, clsN/A
node:cls:disk_write_bytes_rate1mUnknownjob, clsN/A
node:cls:disk_writes_rate1mUnknownjob, clsN/A
node:cls:free_bytesUnknownjob, clsN/A
node:cls:mem_usageUnknownjob, clsN/A
node:cls:network_io_bytes_rate1mUnknownjob, clsN/A
node:cls:network_rx_bytes_rate1mUnknownjob, clsN/A
node:cls:network_rx_pps1mUnknownjob, clsN/A
node:cls:network_tx_bytes_rate1mUnknownjob, clsN/A
node:cls:network_tx_pps1mUnknownjob, clsN/A
node:cls:size_bytesUnknownjob, clsN/A
node:cls:space_usageUnknownjob, clsN/A
node:cls:space_usage_maxUnknownjob, clsN/A
node:cls:stdload1Unknownjob, clsN/A
node:cls:stdload15Unknownjob, clsN/A
node:cls:stdload5Unknownjob, clsN/A
node:cls:time_drift_maxUnknownjob, clsN/A
node:cpu:idle_time_irate1mUnknownip, ins, job, cpu, instance, clsN/A
node:cpu:sched_timeslices_rate1mUnknownip, ins, job, cpu, instance, clsN/A
node:cpu:sched_wait_rate1mUnknownip, ins, job, cpu, instance, clsN/A
node:cpu:time_irate1mUnknownip, mode, ins, job, cpu, instance, clsN/A
node:cpu:total_time_irate1mUnknownip, ins, job, cpu, instance, clsN/A
node:cpu:usageUnknownip, ins, job, cpu, instance, clsN/A
node:cpu:usage_avg15mUnknownip, ins, job, cpu, instance, clsN/A
node:cpu:usage_avg1mUnknownip, ins, job, cpu, instance, clsN/A
node:cpu:usage_avg5mUnknownip, ins, job, cpu, instance, clsN/A
node:dev:disk_avg_queue_sizeUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_io_batch_1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_io_bytes_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_io_rt_1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_io_time_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_iops_1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_mreads_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_mreads_ratio1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_mwrites_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_mwrites_ratio1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_read_batch_1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_read_bytes_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_read_rt_1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_read_time_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_reads_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_util_1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_write_batch_1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_write_bytes_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_write_rt_1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_write_time_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:disk_writes_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:network_io_bytes_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:network_rx_bytes_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:network_rx_pps1mUnknownip, device, ins, job, instance, clsN/A
node:dev:network_tx_bytes_rate1mUnknownip, device, ins, job, instance, clsN/A
node:dev:network_tx_pps1mUnknownip, device, ins, job, instance, clsN/A
node:env:avail_bytesUnknownjobN/A
node:env:cpu_countUnknownjobN/A
node:env:cpu_usageUnknownjobN/A
node:env:cpu_usage_15mUnknownjobN/A
node:env:cpu_usage_1mUnknownjobN/A
node:env:cpu_usage_5mUnknownjobN/A
node:env:device_space_usage_maxUnknowndevice, mountpoint, job, fstypeN/A
node:env:free_bytesUnknownjobN/A
node:env:mem_availUnknownjobN/A
node:env:mem_totalUnknownjobN/A
node:env:mem_usageUnknownjobN/A
node:env:size_bytesUnknownjobN/A
node:env:space_usageUnknownjobN/A
node:env:stdload1UnknownjobN/A
node:env:stdload15UnknownjobN/A
node:env:stdload5UnknownjobN/A
node:fs:avail_bytesUnknownip, device, mountpoint, ins, cls, job, instance, fstypeN/A
node:fs:free_bytesUnknownip, device, mountpoint, ins, cls, job, instance, fstypeN/A
node:fs:inode_freeUnknownip, device, mountpoint, ins, cls, job, instance, fstypeN/A
node:fs:inode_totalUnknownip, device, mountpoint, ins, cls, job, instance, fstypeN/A
node:fs:inode_usageUnknownip, device, mountpoint, ins, cls, job, instance, fstypeN/A
node:fs:inode_usedUnknownip, device, mountpoint, ins, cls, job, instance, fstypeN/A
node:fs:size_bytesUnknownip, device, mountpoint, ins, cls, job, instance, fstypeN/A
node:fs:space_deriv1hUnknownip, device, mountpoint, ins, cls, job, instance, fstypeN/A
node:fs:space_exhaustUnknownip, device, mountpoint, ins, cls, job, instance, fstypeN/A
node:fs:space_predict_1dUnknownip, device, mountpoint, ins, cls, job, instance, fstypeN/A
node:fs:space_usageUnknownip, device, mountpoint, ins, cls, job, instance, fstypeN/A
node:insUnknownid, ip, ins, job, nodename, instance, clsN/A
node:ins:avail_bytesUnknowninstance, ins, job, ip, clsN/A
node:ins:cpu_countUnknowninstance, ins, job, ip, clsN/A
node:ins:cpu_usageUnknowninstance, ins, job, ip, clsN/A
node:ins:cpu_usage_15mUnknowninstance, ins, job, ip, clsN/A
node:ins:cpu_usage_1mUnknowninstance, ins, job, ip, clsN/A
node:ins:cpu_usage_5mUnknowninstance, ins, job, ip, clsN/A
node:ins:ctx_switch_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:disk_io_bytes_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:disk_iops_1mUnknowninstance, ins, job, ip, clsN/A
node:ins:disk_mreads_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:disk_mreads_ratio1mUnknowninstance, ins, job, ip, clsN/A
node:ins:disk_mwrites_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:disk_mwrites_ratio1mUnknowninstance, ins, job, ip, clsN/A
node:ins:disk_read_bytes_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:disk_reads_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:disk_write_bytes_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:disk_writes_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:fd_alloc_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:fd_usageUnknowninstance, ins, job, ip, clsN/A
node:ins:forks_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:free_bytesUnknowninstance, ins, job, ip, clsN/A
node:ins:inode_usageUnknowninstance, ins, job, ip, clsN/A
node:ins:interrupt_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:mem_availUnknowninstance, ins, job, ip, clsN/A
node:ins:mem_commit_ratioUnknowninstance, ins, job, ip, clsN/A
node:ins:mem_kernelUnknowninstance, ins, job, ip, clsN/A
node:ins:mem_rssUnknowninstance, ins, job, ip, clsN/A
node:ins:mem_usageUnknowninstance, ins, job, ip, clsN/A
node:ins:network_io_bytes_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:network_rx_bytes_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:network_rx_pps1mUnknowninstance, ins, job, ip, clsN/A
node:ins:network_tx_bytes_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:network_tx_pps1mUnknowninstance, ins, job, ip, clsN/A
node:ins:pagefault_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:pagein_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:pageout_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:pgmajfault_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:sched_wait_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:size_bytesUnknowninstance, ins, job, ip, clsN/A
node:ins:space_usage_maxUnknowninstance, ins, job, ip, clsN/A
node:ins:stdload1Unknowninstance, ins, job, ip, clsN/A
node:ins:stdload15Unknowninstance, ins, job, ip, clsN/A
node:ins:stdload5Unknowninstance, ins, job, ip, clsN/A
node:ins:swap_usageUnknowninstance, ins, job, ip, clsN/A
node:ins:swapin_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:swapout_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:tcp_active_opens_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:tcp_dropped_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:tcp_errorUnknowninstance, ins, job, ip, clsN/A
node:ins:tcp_error_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:tcp_insegs_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:tcp_outsegs_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:tcp_overflow_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:tcp_passive_opens_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:tcp_retrans_ratio1mUnknowninstance, ins, job, ip, clsN/A
node:ins:tcp_retranssegs_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:tcp_segs_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:time_driftUnknowninstance, ins, job, ip, clsN/A
node:ins:udp_in_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:udp_out_rate1mUnknowninstance, ins, job, ip, clsN/A
node:ins:uptimeUnknowninstance, ins, job, ip, clsN/A
node_arp_entriesgaugeip, device, ins, job, instance, clsARP entries by device
node_boot_time_secondsgaugeinstance, ins, job, ip, clsNode boot time, in unixtime.
node_context_switches_totalcounterinstance, ins, job, ip, clsTotal number of context switches.
node_cooling_device_cur_stategaugeinstance, ins, job, type, ip, clsCurrent throttle state of the cooling device
node_cooling_device_max_stategaugeinstance, ins, job, type, ip, clsMaximum throttle state of the cooling device
node_cpu_guest_seconds_totalcounterip, mode, ins, job, cpu, instance, clsSeconds the CPUs spent in guests (VMs) for each mode.
node_cpu_seconds_totalcounterip, mode, ins, job, cpu, instance, clsSeconds the CPUs spent in each mode.
node_disk_discard_time_seconds_totalcounterip, device, ins, job, instance, clsThis is the total number of seconds spent by all discards.
node_disk_discarded_sectors_totalcounterip, device, ins, job, instance, clsThe total number of sectors discarded successfully.
node_disk_discards_completed_totalcounterip, device, ins, job, instance, clsThe total number of discards completed successfully.
node_disk_discards_merged_totalcounterip, device, ins, job, instance, clsThe total number of discards merged.
node_disk_filesystem_infogaugeip, usage, version, device, uuid, ins, type, job, instance, clsInfo about disk filesystem.
node_disk_infogaugeminor, ip, major, revision, device, model, serial, path, ins, job, instance, clsInfo of /sys/block/<block_device>.
node_disk_io_nowgaugeip, device, ins, job, instance, clsThe number of I/Os currently in progress.
node_disk_io_time_seconds_totalcounterip, device, ins, job, instance, clsTotal seconds spent doing I/Os.
node_disk_io_time_weighted_seconds_totalcounterip, device, ins, job, instance, clsThe weighted # of seconds spent doing I/Os.
node_disk_read_bytes_totalcounterip, device, ins, job, instance, clsThe total number of bytes read successfully.
node_disk_read_time_seconds_totalcounterip, device, ins, job, instance, clsThe total number of seconds spent by all reads.
node_disk_reads_completed_totalcounterip, device, ins, job, instance, clsThe total number of reads completed successfully.
node_disk_reads_merged_totalcounterip, device, ins, job, instance, clsThe total number of reads merged.
node_disk_write_time_seconds_totalcounterip, device, ins, job, instance, clsThis is the total number of seconds spent by all writes.
node_disk_writes_completed_totalcounterip, device, ins, job, instance, clsThe total number of writes completed successfully.
node_disk_writes_merged_totalcounterip, device, ins, job, instance, clsThe number of writes merged.
node_disk_written_bytes_totalcounterip, device, ins, job, instance, clsThe total number of bytes written successfully.
node_dmi_infogaugebios_vendor, ip, product_family, product_version, product_uuid, system_vendor, bios_version, ins, bios_date, cls, job, product_name, instance, chassis_version, chassis_vendor, product_serialA metric with a constant ‘1’ value labeled by bios_date, bios_release, bios_vendor, bios_version, board_asset_tag, board_name, board_serial, board_vendor, board_version, chassis_asset_tag, chassis_serial, chassis_vendor, chassis_version, product_family, product_name, product_serial, product_sku, product_uuid, product_version, system_vendor if provided by DMI.
node_entropy_available_bitsgaugeinstance, ins, job, ip, clsBits of available entropy.
node_entropy_pool_size_bitsgaugeinstance, ins, job, ip, clsBits of entropy pool.
node_exporter_build_infogaugeip, version, revision, goversion, branch, ins, goarch, job, tags, instance, cls, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which node_exporter was built, and the goos and goarch for the build.
node_filefd_allocatedgaugeinstance, ins, job, ip, clsFile descriptor statistics: allocated.
node_filefd_maximumgaugeinstance, ins, job, ip, clsFile descriptor statistics: maximum.
node_filesystem_avail_bytesgaugeip, device, mountpoint, ins, cls, job, instance, fstypeFilesystem space available to non-root users in bytes.
node_filesystem_device_errorgaugeip, device, mountpoint, ins, cls, job, instance, fstypeWhether an error occurred while getting statistics for the given device.
node_filesystem_filesgaugeip, device, mountpoint, ins, cls, job, instance, fstypeFilesystem total file nodes.
node_filesystem_files_freegaugeip, device, mountpoint, ins, cls, job, instance, fstypeFilesystem total free file nodes.
node_filesystem_free_bytesgaugeip, device, mountpoint, ins, cls, job, instance, fstypeFilesystem free space in bytes.
node_filesystem_readonlygaugeip, device, mountpoint, ins, cls, job, instance, fstypeFilesystem read-only status.
node_filesystem_size_bytesgaugeip, device, mountpoint, ins, cls, job, instance, fstypeFilesystem size in bytes.
node_forks_totalcounterinstance, ins, job, ip, clsTotal number of forks.
node_hwmon_chip_namesgaugechip_name, ip, ins, chip, job, instance, clsAnnotation metric for human-readable chip names
node_hwmon_energy_joule_totalcountersensor, ip, ins, chip, job, instance, clsHardware monitor for joules used so far (input)
node_hwmon_sensor_labelgaugesensor, ip, ins, chip, job, label, instance, clsLabel for given chip and sensor
node_intr_totalcounterinstance, ins, job, ip, clsTotal number of interrupts serviced.
node_ipvs_connections_totalcounterinstance, ins, job, ip, clsThe total number of connections made.
node_ipvs_incoming_bytes_totalcounterinstance, ins, job, ip, clsThe total amount of incoming data.
node_ipvs_incoming_packets_totalcounterinstance, ins, job, ip, clsThe total number of incoming packets.
node_ipvs_outgoing_bytes_totalcounterinstance, ins, job, ip, clsThe total amount of outgoing data.
node_ipvs_outgoing_packets_totalcounterinstance, ins, job, ip, clsThe total number of outgoing packets.
node_load1gaugeinstance, ins, job, ip, cls1m load average.
node_load15gaugeinstance, ins, job, ip, cls15m load average.
node_load5gaugeinstance, ins, job, ip, cls5m load average.
node_memory_Active_anon_bytesgaugeinstance, ins, job, ip, clsMemory information field Active_anon_bytes.
node_memory_Active_bytesgaugeinstance, ins, job, ip, clsMemory information field Active_bytes.
node_memory_Active_file_bytesgaugeinstance, ins, job, ip, clsMemory information field Active_file_bytes.
node_memory_AnonHugePages_bytesgaugeinstance, ins, job, ip, clsMemory information field AnonHugePages_bytes.
node_memory_AnonPages_bytesgaugeinstance, ins, job, ip, clsMemory information field AnonPages_bytes.
node_memory_Bounce_bytesgaugeinstance, ins, job, ip, clsMemory information field Bounce_bytes.
node_memory_Buffers_bytesgaugeinstance, ins, job, ip, clsMemory information field Buffers_bytes.
node_memory_Cached_bytesgaugeinstance, ins, job, ip, clsMemory information field Cached_bytes.
node_memory_CommitLimit_bytesgaugeinstance, ins, job, ip, clsMemory information field CommitLimit_bytes.
node_memory_Committed_AS_bytesgaugeinstance, ins, job, ip, clsMemory information field Committed_AS_bytes.
node_memory_DirectMap1G_bytesgaugeinstance, ins, job, ip, clsMemory information field DirectMap1G_bytes.
node_memory_DirectMap2M_bytesgaugeinstance, ins, job, ip, clsMemory information field DirectMap2M_bytes.
node_memory_DirectMap4k_bytesgaugeinstance, ins, job, ip, clsMemory information field DirectMap4k_bytes.
node_memory_Dirty_bytesgaugeinstance, ins, job, ip, clsMemory information field Dirty_bytes.
node_memory_FileHugePages_bytesgaugeinstance, ins, job, ip, clsMemory information field FileHugePages_bytes.
node_memory_FilePmdMapped_bytesgaugeinstance, ins, job, ip, clsMemory information field FilePmdMapped_bytes.
node_memory_HardwareCorrupted_bytesgaugeinstance, ins, job, ip, clsMemory information field HardwareCorrupted_bytes.
node_memory_HugePages_Freegaugeinstance, ins, job, ip, clsMemory information field HugePages_Free.
node_memory_HugePages_Rsvdgaugeinstance, ins, job, ip, clsMemory information field HugePages_Rsvd.
node_memory_HugePages_Surpgaugeinstance, ins, job, ip, clsMemory information field HugePages_Surp.
node_memory_HugePages_Totalgaugeinstance, ins, job, ip, clsMemory information field HugePages_Total.
node_memory_Hugepagesize_bytesgaugeinstance, ins, job, ip, clsMemory information field Hugepagesize_bytes.
node_memory_Hugetlb_bytesgaugeinstance, ins, job, ip, clsMemory information field Hugetlb_bytes.
node_memory_Inactive_anon_bytesgaugeinstance, ins, job, ip, clsMemory information field Inactive_anon_bytes.
node_memory_Inactive_bytesgaugeinstance, ins, job, ip, clsMemory information field Inactive_bytes.
node_memory_Inactive_file_bytesgaugeinstance, ins, job, ip, clsMemory information field Inactive_file_bytes.
node_memory_KReclaimable_bytesgaugeinstance, ins, job, ip, clsMemory information field KReclaimable_bytes.
node_memory_KernelStack_bytesgaugeinstance, ins, job, ip, clsMemory information field KernelStack_bytes.
node_memory_Mapped_bytesgaugeinstance, ins, job, ip, clsMemory information field Mapped_bytes.
node_memory_MemAvailable_bytesgaugeinstance, ins, job, ip, clsMemory information field MemAvailable_bytes.
node_memory_MemFree_bytesgaugeinstance, ins, job, ip, clsMemory information field MemFree_bytes.
node_memory_MemTotal_bytesgaugeinstance, ins, job, ip, clsMemory information field MemTotal_bytes.
node_memory_Mlocked_bytesgaugeinstance, ins, job, ip, clsMemory information field Mlocked_bytes.
node_memory_NFS_Unstable_bytesgaugeinstance, ins, job, ip, clsMemory information field NFS_Unstable_bytes.
node_memory_PageTables_bytesgaugeinstance, ins, job, ip, clsMemory information field PageTables_bytes.
node_memory_Percpu_bytesgaugeinstance, ins, job, ip, clsMemory information field Percpu_bytes.
node_memory_SReclaimable_bytesgaugeinstance, ins, job, ip, clsMemory information field SReclaimable_bytes.
node_memory_SUnreclaim_bytesgaugeinstance, ins, job, ip, clsMemory information field SUnreclaim_bytes.
node_memory_ShmemHugePages_bytesgaugeinstance, ins, job, ip, clsMemory information field ShmemHugePages_bytes.
node_memory_ShmemPmdMapped_bytesgaugeinstance, ins, job, ip, clsMemory information field ShmemPmdMapped_bytes.
node_memory_Shmem_bytesgaugeinstance, ins, job, ip, clsMemory information field Shmem_bytes.
node_memory_Slab_bytesgaugeinstance, ins, job, ip, clsMemory information field Slab_bytes.
node_memory_SwapCached_bytesgaugeinstance, ins, job, ip, clsMemory information field SwapCached_bytes.
node_memory_SwapFree_bytesgaugeinstance, ins, job, ip, clsMemory information field SwapFree_bytes.
node_memory_SwapTotal_bytesgaugeinstance, ins, job, ip, clsMemory information field SwapTotal_bytes.
node_memory_Unevictable_bytesgaugeinstance, ins, job, ip, clsMemory information field Unevictable_bytes.
node_memory_VmallocChunk_bytesgaugeinstance, ins, job, ip, clsMemory information field VmallocChunk_bytes.
node_memory_VmallocTotal_bytesgaugeinstance, ins, job, ip, clsMemory information field VmallocTotal_bytes.
node_memory_VmallocUsed_bytesgaugeinstance, ins, job, ip, clsMemory information field VmallocUsed_bytes.
node_memory_WritebackTmp_bytesgaugeinstance, ins, job, ip, clsMemory information field WritebackTmp_bytes.
node_memory_Writeback_bytesgaugeinstance, ins, job, ip, clsMemory information field Writeback_bytes.
node_netstat_Icmp6_InErrorsunknowninstance, ins, job, ip, clsStatistic Icmp6InErrors.
node_netstat_Icmp6_InMsgsunknowninstance, ins, job, ip, clsStatistic Icmp6InMsgs.
node_netstat_Icmp6_OutMsgsunknowninstance, ins, job, ip, clsStatistic Icmp6OutMsgs.
node_netstat_Icmp_InErrorsunknowninstance, ins, job, ip, clsStatistic IcmpInErrors.
node_netstat_Icmp_InMsgsunknowninstance, ins, job, ip, clsStatistic IcmpInMsgs.
node_netstat_Icmp_OutMsgsunknowninstance, ins, job, ip, clsStatistic IcmpOutMsgs.
node_netstat_Ip6_InOctetsunknowninstance, ins, job, ip, clsStatistic Ip6InOctets.
node_netstat_Ip6_OutOctetsunknowninstance, ins, job, ip, clsStatistic Ip6OutOctets.
node_netstat_IpExt_InOctetsunknowninstance, ins, job, ip, clsStatistic IpExtInOctets.
node_netstat_IpExt_OutOctetsunknowninstance, ins, job, ip, clsStatistic IpExtOutOctets.
node_netstat_Ip_Forwardingunknowninstance, ins, job, ip, clsStatistic IpForwarding.
node_netstat_TcpExt_ListenDropsunknowninstance, ins, job, ip, clsStatistic TcpExtListenDrops.
node_netstat_TcpExt_ListenOverflowsunknowninstance, ins, job, ip, clsStatistic TcpExtListenOverflows.
node_netstat_TcpExt_SyncookiesFailedunknowninstance, ins, job, ip, clsStatistic TcpExtSyncookiesFailed.
node_netstat_TcpExt_SyncookiesRecvunknowninstance, ins, job, ip, clsStatistic TcpExtSyncookiesRecv.
node_netstat_TcpExt_SyncookiesSentunknowninstance, ins, job, ip, clsStatistic TcpExtSyncookiesSent.
node_netstat_TcpExt_TCPSynRetransunknowninstance, ins, job, ip, clsStatistic TcpExtTCPSynRetrans.
node_netstat_TcpExt_TCPTimeoutsunknowninstance, ins, job, ip, clsStatistic TcpExtTCPTimeouts.
node_netstat_Tcp_ActiveOpensunknowninstance, ins, job, ip, clsStatistic TcpActiveOpens.
node_netstat_Tcp_CurrEstabunknowninstance, ins, job, ip, clsStatistic TcpCurrEstab.
node_netstat_Tcp_InErrsunknowninstance, ins, job, ip, clsStatistic TcpInErrs.
node_netstat_Tcp_InSegsunknowninstance, ins, job, ip, clsStatistic TcpInSegs.
node_netstat_Tcp_OutRstsunknowninstance, ins, job, ip, clsStatistic TcpOutRsts.
node_netstat_Tcp_OutSegsunknowninstance, ins, job, ip, clsStatistic TcpOutSegs.
node_netstat_Tcp_PassiveOpensunknowninstance, ins, job, ip, clsStatistic TcpPassiveOpens.
node_netstat_Tcp_RetransSegsunknowninstance, ins, job, ip, clsStatistic TcpRetransSegs.
node_netstat_Udp6_InDatagramsunknowninstance, ins, job, ip, clsStatistic Udp6InDatagrams.
node_netstat_Udp6_InErrorsunknowninstance, ins, job, ip, clsStatistic Udp6InErrors.
node_netstat_Udp6_NoPortsunknowninstance, ins, job, ip, clsStatistic Udp6NoPorts.
node_netstat_Udp6_OutDatagramsunknowninstance, ins, job, ip, clsStatistic Udp6OutDatagrams.
node_netstat_Udp6_RcvbufErrorsunknowninstance, ins, job, ip, clsStatistic Udp6RcvbufErrors.
node_netstat_Udp6_SndbufErrorsunknowninstance, ins, job, ip, clsStatistic Udp6SndbufErrors.
node_netstat_UdpLite6_InErrorsunknowninstance, ins, job, ip, clsStatistic UdpLite6InErrors.
node_netstat_UdpLite_InErrorsunknowninstance, ins, job, ip, clsStatistic UdpLiteInErrors.
node_netstat_Udp_InDatagramsunknowninstance, ins, job, ip, clsStatistic UdpInDatagrams.
node_netstat_Udp_InErrorsunknowninstance, ins, job, ip, clsStatistic UdpInErrors.
node_netstat_Udp_NoPortsunknowninstance, ins, job, ip, clsStatistic UdpNoPorts.
node_netstat_Udp_OutDatagramsunknowninstance, ins, job, ip, clsStatistic UdpOutDatagrams.
node_netstat_Udp_RcvbufErrorsunknowninstance, ins, job, ip, clsStatistic UdpRcvbufErrors.
node_netstat_Udp_SndbufErrorsunknowninstance, ins, job, ip, clsStatistic UdpSndbufErrors.
node_network_address_assign_typegaugeip, device, ins, job, instance, clsNetwork device property: address_assign_type
node_network_carriergaugeip, device, ins, job, instance, clsNetwork device property: carrier
node_network_carrier_changes_totalcounterip, device, ins, job, instance, clsNetwork device property: carrier_changes_total
node_network_carrier_down_changes_totalcounterip, device, ins, job, instance, clsNetwork device property: carrier_down_changes_total
node_network_carrier_up_changes_totalcounterip, device, ins, job, instance, clsNetwork device property: carrier_up_changes_total
node_network_device_idgaugeip, device, ins, job, instance, clsNetwork device property: device_id
node_network_dormantgaugeip, device, ins, job, instance, clsNetwork device property: dormant
node_network_flagsgaugeip, device, ins, job, instance, clsNetwork device property: flags
node_network_iface_idgaugeip, device, ins, job, instance, clsNetwork device property: iface_id
node_network_iface_linkgaugeip, device, ins, job, instance, clsNetwork device property: iface_link
node_network_iface_link_modegaugeip, device, ins, job, instance, clsNetwork device property: iface_link_mode
node_network_infogaugebroadcast, ip, device, operstate, ins, job, adminstate, duplex, address, instance, clsNon-numeric data from /sys/class/net/, value is always 1.
node_network_mtu_bytesgaugeip, device, ins, job, instance, clsNetwork device property: mtu_bytes
node_network_name_assign_typegaugeip, device, ins, job, instance, clsNetwork device property: name_assign_type
node_network_net_dev_groupgaugeip, device, ins, job, instance, clsNetwork device property: net_dev_group
node_network_protocol_typegaugeip, device, ins, job, instance, clsNetwork device property: protocol_type
node_network_receive_bytes_totalcounterip, device, ins, job, instance, clsNetwork device statistic receive_bytes.
node_network_receive_compressed_totalcounterip, device, ins, job, instance, clsNetwork device statistic receive_compressed.
node_network_receive_drop_totalcounterip, device, ins, job, instance, clsNetwork device statistic receive_drop.
node_network_receive_errs_totalcounterip, device, ins, job, instance, clsNetwork device statistic receive_errs.
node_network_receive_fifo_totalcounterip, device, ins, job, instance, clsNetwork device statistic receive_fifo.
node_network_receive_frame_totalcounterip, device, ins, job, instance, clsNetwork device statistic receive_frame.
node_network_receive_multicast_totalcounterip, device, ins, job, instance, clsNetwork device statistic receive_multicast.
node_network_receive_nohandler_totalcounterip, device, ins, job, instance, clsNetwork device statistic receive_nohandler.
node_network_receive_packets_totalcounterip, device, ins, job, instance, clsNetwork device statistic receive_packets.
node_network_speed_bytesgaugeip, device, ins, job, instance, clsNetwork device property: speed_bytes
node_network_transmit_bytes_totalcounterip, device, ins, job, instance, clsNetwork device statistic transmit_bytes.
node_network_transmit_carrier_totalcounterip, device, ins, job, instance, clsNetwork device statistic transmit_carrier.
node_network_transmit_colls_totalcounterip, device, ins, job, instance, clsNetwork device statistic transmit_colls.
node_network_transmit_compressed_totalcounterip, device, ins, job, instance, clsNetwork device statistic transmit_compressed.
node_network_transmit_drop_totalcounterip, device, ins, job, instance, clsNetwork device statistic transmit_drop.
node_network_transmit_errs_totalcounterip, device, ins, job, instance, clsNetwork device statistic transmit_errs.
node_network_transmit_fifo_totalcounterip, device, ins, job, instance, clsNetwork device statistic transmit_fifo.
node_network_transmit_packets_totalcounterip, device, ins, job, instance, clsNetwork device statistic transmit_packets.
node_network_transmit_queue_lengthgaugeip, device, ins, job, instance, clsNetwork device property: transmit_queue_length
node_network_upgaugeip, device, ins, job, instance, clsValue is 1 if operstate is ‘up’, 0 otherwise.
node_nf_conntrack_entriesgaugeinstance, ins, job, ip, clsNumber of currently allocated flow entries for connection tracking.
node_nf_conntrack_entries_limitgaugeinstance, ins, job, ip, clsMaximum size of connection tracking table.
node_nf_conntrack_stat_dropgaugeinstance, ins, job, ip, clsNumber of packets dropped due to conntrack failure.
node_nf_conntrack_stat_early_dropgaugeinstance, ins, job, ip, clsNumber of dropped conntrack entries to make room for new ones, if maximum table size was reached.
node_nf_conntrack_stat_foundgaugeinstance, ins, job, ip, clsNumber of searched entries which were successful.
node_nf_conntrack_stat_ignoregaugeinstance, ins, job, ip, clsNumber of packets seen which are already connected to a conntrack entry.
node_nf_conntrack_stat_insertgaugeinstance, ins, job, ip, clsNumber of entries inserted into the list.
node_nf_conntrack_stat_insert_failedgaugeinstance, ins, job, ip, clsNumber of entries for which list insertion was attempted but failed.
node_nf_conntrack_stat_invalidgaugeinstance, ins, job, ip, clsNumber of packets seen which can not be tracked.
node_nf_conntrack_stat_search_restartgaugeinstance, ins, job, ip, clsNumber of conntrack table lookups which had to be restarted due to hashtable resizes.
node_os_infogaugeid, ip, version, version_id, ins, instance, job, pretty_name, id_like, clsA metric with a constant ‘1’ value labeled by build_id, id, id_like, image_id, image_version, name, pretty_name, variant, variant_id, version, version_codename, version_id.
node_os_versiongaugeid, ip, ins, instance, job, id_like, clsMetric containing the major.minor part of the OS version.
node_processes_max_processesgaugeinstance, ins, job, ip, clsNumber of max PIDs limit
node_processes_max_threadsgaugeinstance, ins, job, ip, clsLimit of threads in the system
node_processes_pidsgaugeinstance, ins, job, ip, clsNumber of PIDs
node_processes_stategaugestate, instance, ins, job, ip, clsNumber of processes in each state.
node_processes_threadsgaugeinstance, ins, job, ip, clsAllocated threads in system
node_processes_threads_stategaugeinstance, ins, job, thread_state, ip, clsNumber of threads in each state.
node_procs_blockedgaugeinstance, ins, job, ip, clsNumber of processes blocked waiting for I/O to complete.
node_procs_runninggaugeinstance, ins, job, ip, clsNumber of processes in runnable state.
node_schedstat_running_seconds_totalcounterip, ins, job, cpu, instance, clsNumber of seconds CPU spent running a process.
node_schedstat_timeslices_totalcounterip, ins, job, cpu, instance, clsNumber of timeslices executed by CPU.
node_schedstat_waiting_seconds_totalcounterip, ins, job, cpu, instance, clsNumber of seconds spent by processing waiting for this CPU.
node_scrape_collector_duration_secondsgaugeip, collector, ins, job, instance, clsnode_exporter: Duration of a collector scrape.
node_scrape_collector_successgaugeip, collector, ins, job, instance, clsnode_exporter: Whether a collector succeeded.
node_selinux_enabledgaugeinstance, ins, job, ip, clsSELinux is enabled, 1 is true, 0 is false
node_sockstat_FRAG6_inusegaugeinstance, ins, job, ip, clsNumber of FRAG6 sockets in state inuse.
node_sockstat_FRAG6_memorygaugeinstance, ins, job, ip, clsNumber of FRAG6 sockets in state memory.
node_sockstat_FRAG_inusegaugeinstance, ins, job, ip, clsNumber of FRAG sockets in state inuse.
node_sockstat_FRAG_memorygaugeinstance, ins, job, ip, clsNumber of FRAG sockets in state memory.
node_sockstat_RAW6_inusegaugeinstance, ins, job, ip, clsNumber of RAW6 sockets in state inuse.
node_sockstat_RAW_inusegaugeinstance, ins, job, ip, clsNumber of RAW sockets in state inuse.
node_sockstat_TCP6_inusegaugeinstance, ins, job, ip, clsNumber of TCP6 sockets in state inuse.
node_sockstat_TCP_allocgaugeinstance, ins, job, ip, clsNumber of TCP sockets in state alloc.
node_sockstat_TCP_inusegaugeinstance, ins, job, ip, clsNumber of TCP sockets in state inuse.
node_sockstat_TCP_memgaugeinstance, ins, job, ip, clsNumber of TCP sockets in state mem.
node_sockstat_TCP_mem_bytesgaugeinstance, ins, job, ip, clsNumber of TCP sockets in state mem_bytes.
node_sockstat_TCP_orphangaugeinstance, ins, job, ip, clsNumber of TCP sockets in state orphan.
node_sockstat_TCP_twgaugeinstance, ins, job, ip, clsNumber of TCP sockets in state tw.
node_sockstat_UDP6_inusegaugeinstance, ins, job, ip, clsNumber of UDP6 sockets in state inuse.
node_sockstat_UDPLITE6_inusegaugeinstance, ins, job, ip, clsNumber of UDPLITE6 sockets in state inuse.
node_sockstat_UDPLITE_inusegaugeinstance, ins, job, ip, clsNumber of UDPLITE sockets in state inuse.
node_sockstat_UDP_inusegaugeinstance, ins, job, ip, clsNumber of UDP sockets in state inuse.
node_sockstat_UDP_memgaugeinstance, ins, job, ip, clsNumber of UDP sockets in state mem.
node_sockstat_UDP_mem_bytesgaugeinstance, ins, job, ip, clsNumber of UDP sockets in state mem_bytes.
node_sockstat_sockets_usedgaugeinstance, ins, job, ip, clsNumber of IPv4 sockets in use.
node_tcp_connection_statesgaugestate, instance, ins, job, ip, clsNumber of connection states.
node_textfile_scrape_errorgaugeinstance, ins, job, ip, cls1 if there was an error opening or reading a file, 0 otherwise
node_time_clocksource_available_infogaugeip, device, ins, clocksource, job, instance, clsAvailable clocksources read from ‘/sys/devices/system/clocksource’.
node_time_clocksource_current_infogaugeip, device, ins, clocksource, job, instance, clsCurrent clocksource read from ‘/sys/devices/system/clocksource’.
node_time_secondsgaugeinstance, ins, job, ip, clsSystem time in seconds since epoch (1970).
node_time_zone_offset_secondsgaugeinstance, ins, job, time_zone, ip, clsSystem time zone offset in seconds.
node_timex_estimated_error_secondsgaugeinstance, ins, job, ip, clsEstimated error in seconds.
node_timex_frequency_adjustment_ratiogaugeinstance, ins, job, ip, clsLocal clock frequency adjustment.
node_timex_loop_time_constantgaugeinstance, ins, job, ip, clsPhase-locked loop time constant.
node_timex_maxerror_secondsgaugeinstance, ins, job, ip, clsMaximum error in seconds.
node_timex_offset_secondsgaugeinstance, ins, job, ip, clsTime offset in between local system and reference clock.
node_timex_pps_calibration_totalcounterinstance, ins, job, ip, clsPulse per second count of calibration intervals.
node_timex_pps_error_totalcounterinstance, ins, job, ip, clsPulse per second count of calibration errors.
node_timex_pps_frequency_hertzgaugeinstance, ins, job, ip, clsPulse per second frequency.
node_timex_pps_jitter_secondsgaugeinstance, ins, job, ip, clsPulse per second jitter.
node_timex_pps_jitter_totalcounterinstance, ins, job, ip, clsPulse per second count of jitter limit exceeded events.
node_timex_pps_shift_secondsgaugeinstance, ins, job, ip, clsPulse per second interval duration.
node_timex_pps_stability_exceeded_totalcounterinstance, ins, job, ip, clsPulse per second count of stability limit exceeded events.
node_timex_pps_stability_hertzgaugeinstance, ins, job, ip, clsPulse per second stability, average of recent frequency changes.
node_timex_statusgaugeinstance, ins, job, ip, clsValue of the status array bits.
node_timex_sync_statusgaugeinstance, ins, job, ip, clsIs clock synchronized to a reliable server (1 = yes, 0 = no).
node_timex_tai_offset_secondsgaugeinstance, ins, job, ip, clsInternational Atomic Time (TAI) offset.
node_timex_tick_secondsgaugeinstance, ins, job, ip, clsSeconds between clock ticks.
node_udp_queuesgaugeip, queue, ins, job, exported_ip, instance, clsNumber of allocated memory in the kernel for UDP datagrams in bytes.
node_uname_infogaugeip, sysname, version, domainname, release, ins, job, nodename, instance, cls, machineLabeled system information as provided by the uname system call.
node_upUnknowninstance, ins, job, ip, clsN/A
node_vmstat_oom_killunknowninstance, ins, job, ip, cls/proc/vmstat information field oom_kill.
node_vmstat_pgfaultunknowninstance, ins, job, ip, cls/proc/vmstat information field pgfault.
node_vmstat_pgmajfaultunknowninstance, ins, job, ip, cls/proc/vmstat information field pgmajfault.
node_vmstat_pgpginunknowninstance, ins, job, ip, cls/proc/vmstat information field pgpgin.
node_vmstat_pgpgoutunknowninstance, ins, job, ip, cls/proc/vmstat information field pgpgout.
node_vmstat_pswpinunknowninstance, ins, job, ip, cls/proc/vmstat information field pswpin.
node_vmstat_pswpoutunknowninstance, ins, job, ip, cls/proc/vmstat information field pswpout.
process_cpu_seconds_totalcounterinstance, ins, job, ip, clsTotal user and system CPU time spent in seconds.
process_max_fdsgaugeinstance, ins, job, ip, clsMaximum number of open file descriptors.
process_open_fdsgaugeinstance, ins, job, ip, clsNumber of open file descriptors.
process_resident_memory_bytesgaugeinstance, ins, job, ip, clsResident memory size in bytes.
process_start_time_secondsgaugeinstance, ins, job, ip, clsStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesgaugeinstance, ins, job, ip, clsVirtual memory size in bytes.
process_virtual_memory_max_bytesgaugeinstance, ins, job, ip, clsMaximum amount of virtual memory available in bytes.
prometheus_remote_storage_exemplars_in_totalcounterinstance, ins, job, ip, clsExemplars in to remote storage, compare to exemplars out for queue managers.
prometheus_remote_storage_histograms_in_totalcounterinstance, ins, job, ip, clsHistogramSamples in to remote storage, compare to histograms out for queue managers.
prometheus_remote_storage_samples_in_totalcounterinstance, ins, job, ip, clsSamples in to remote storage, compare to samples out for queue managers.
prometheus_remote_storage_string_interner_zero_reference_releases_totalcounterinstance, ins, job, ip, clsThe number of times release has been called for strings that are not interned.
prometheus_sd_azure_failures_totalcounterinstance, ins, job, ip, clsNumber of Azure service discovery refresh failures.
prometheus_sd_consul_rpc_duration_secondssummaryip, call, quantile, ins, job, instance, cls, endpointThe duration of a Consul RPC call in seconds.
prometheus_sd_consul_rpc_duration_seconds_countUnknownip, call, ins, job, instance, cls, endpointN/A
prometheus_sd_consul_rpc_duration_seconds_sumUnknownip, call, ins, job, instance, cls, endpointN/A
prometheus_sd_consul_rpc_failures_totalcounterinstance, ins, job, ip, clsThe number of Consul RPC call failures.
prometheus_sd_consulagent_rpc_duration_secondssummaryip, call, quantile, ins, job, instance, cls, endpointThe duration of a Consul Agent RPC call in seconds.
prometheus_sd_consulagent_rpc_duration_seconds_countUnknownip, call, ins, job, instance, cls, endpointN/A
prometheus_sd_consulagent_rpc_duration_seconds_sumUnknownip, call, ins, job, instance, cls, endpointN/A
prometheus_sd_consulagent_rpc_failures_totalUnknowninstance, ins, job, ip, clsN/A
prometheus_sd_dns_lookup_failures_totalcounterinstance, ins, job, ip, clsThe number of DNS-SD lookup failures.
prometheus_sd_dns_lookups_totalcounterinstance, ins, job, ip, clsThe number of DNS-SD lookups.
prometheus_sd_file_read_errors_totalcounterinstance, ins, job, ip, clsThe number of File-SD read errors.
prometheus_sd_file_scan_duration_secondssummaryquantile, instance, ins, job, ip, clsThe duration of the File-SD scan in seconds.
prometheus_sd_file_scan_duration_seconds_countUnknowninstance, ins, job, ip, clsN/A
prometheus_sd_file_scan_duration_seconds_sumUnknowninstance, ins, job, ip, clsN/A
prometheus_sd_file_watcher_errors_totalcounterinstance, ins, job, ip, clsThe number of File-SD errors caused by filesystem watch failures.
prometheus_sd_kubernetes_events_totalcounterip, event, ins, job, role, instance, clsThe number of Kubernetes events handled.
prometheus_target_scrape_pool_exceeded_label_limits_totalcounterinstance, ins, job, ip, clsTotal number of times scrape pools hit the label limits, during sync or config reload.
prometheus_target_scrape_pool_exceeded_target_limit_totalcounterinstance, ins, job, ip, clsTotal number of times scrape pools hit the target limit, during sync or config reload.
prometheus_target_scrape_pool_reloads_failed_totalcounterinstance, ins, job, ip, clsTotal number of failed scrape pool reloads.
prometheus_target_scrape_pool_reloads_totalcounterinstance, ins, job, ip, clsTotal number of scrape pool reloads.
prometheus_target_scrape_pools_failed_totalcounterinstance, ins, job, ip, clsTotal number of scrape pool creations that failed.
prometheus_target_scrape_pools_totalcounterinstance, ins, job, ip, clsTotal number of scrape pool creation attempts.
prometheus_target_scrapes_cache_flush_forced_totalcounterinstance, ins, job, ip, clsHow many times a scrape cache was flushed due to getting big while scrapes are failing.
prometheus_target_scrapes_exceeded_body_size_limit_totalcounterinstance, ins, job, ip, clsTotal number of scrapes that hit the body size limit
prometheus_target_scrapes_exceeded_sample_limit_totalcounterinstance, ins, job, ip, clsTotal number of scrapes that hit the sample limit and were rejected.
prometheus_target_scrapes_exemplar_out_of_order_totalcounterinstance, ins, job, ip, clsTotal number of exemplar rejected due to not being out of the expected order.
prometheus_target_scrapes_sample_duplicate_timestamp_totalcounterinstance, ins, job, ip, clsTotal number of samples rejected due to duplicate timestamps but different values.
prometheus_target_scrapes_sample_out_of_bounds_totalcounterinstance, ins, job, ip, clsTotal number of samples rejected due to timestamp falling outside of the time bounds.
prometheus_target_scrapes_sample_out_of_order_totalcounterinstance, ins, job, ip, clsTotal number of samples rejected due to not being out of the expected order.
prometheus_template_text_expansion_failures_totalcounterinstance, ins, job, ip, clsThe total number of template text expansion failures.
prometheus_template_text_expansions_totalcounterinstance, ins, job, ip, clsThe total number of template text expansions.
prometheus_treecache_watcher_goroutinesgaugeinstance, ins, job, ip, clsThe current number of watcher goroutines.
prometheus_treecache_zookeeper_failures_totalcounterinstance, ins, job, ip, clsThe total number of ZooKeeper failures.
promhttp_metric_handler_errors_totalcounterip, cause, ins, job, instance, clsTotal number of internal errors encountered by the promhttp metric handler.
promhttp_metric_handler_requests_in_flightgaugeinstance, ins, job, ip, clsCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalcounterip, ins, code, job, instance, clsTotal number of scrapes by HTTP status code.
promtail_batch_retries_totalUnknownhost, ip, ins, job, instance, clsN/A
promtail_build_infogaugeip, version, revision, goversion, branch, ins, goarch, job, tags, instance, cls, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which promtail was built, and the goos and goarch for the build.
promtail_config_reload_fail_totalUnknowninstance, ins, job, ip, clsN/A
promtail_config_reload_success_totalUnknowninstance, ins, job, ip, clsN/A
promtail_dropped_bytes_totalUnknownhost, ip, ins, job, reason, instance, clsN/A
promtail_dropped_entries_totalUnknownhost, ip, ins, job, reason, instance, clsN/A
promtail_encoded_bytes_totalUnknownhost, ip, ins, job, instance, clsN/A
promtail_file_bytes_totalgaugepath, instance, ins, job, ip, clsNumber of bytes total.
promtail_files_active_totalgaugeinstance, ins, job, ip, clsNumber of active files.
promtail_mutated_bytes_totalUnknownhost, ip, ins, job, reason, instance, clsN/A
promtail_mutated_entries_totalUnknownhost, ip, ins, job, reason, instance, clsN/A
promtail_read_bytes_totalgaugepath, instance, ins, job, ip, clsNumber of bytes read.
promtail_read_lines_totalUnknownpath, instance, ins, job, ip, clsN/A
promtail_request_duration_seconds_bucketUnknownhost, ip, ins, job, status_code, le, instance, clsN/A
promtail_request_duration_seconds_countUnknownhost, ip, ins, job, status_code, instance, clsN/A
promtail_request_duration_seconds_sumUnknownhost, ip, ins, job, status_code, instance, clsN/A
promtail_sent_bytes_totalUnknownhost, ip, ins, job, instance, clsN/A
promtail_sent_entries_totalUnknownhost, ip, ins, job, instance, clsN/A
promtail_targets_active_totalgaugeinstance, ins, job, ip, clsNumber of active total.
promtail_upUnknowninstance, ins, job, ip, clsN/A
request_duration_seconds_bucketUnknowninstance, ins, job, status_code, route, ws, le, ip, cls, methodN/A
request_duration_seconds_countUnknowninstance, ins, job, status_code, route, ws, ip, cls, methodN/A
request_duration_seconds_sumUnknowninstance, ins, job, status_code, route, ws, ip, cls, methodN/A
request_message_bytes_bucketUnknowninstance, ins, job, route, le, ip, cls, methodN/A
request_message_bytes_countUnknowninstance, ins, job, route, ip, cls, methodN/A
request_message_bytes_sumUnknowninstance, ins, job, route, ip, cls, methodN/A
response_message_bytes_bucketUnknowninstance, ins, job, route, le, ip, cls, methodN/A
response_message_bytes_countUnknowninstance, ins, job, route, ip, cls, methodN/A
response_message_bytes_sumUnknowninstance, ins, job, route, ip, cls, methodN/A
scrape_duration_secondsUnknowninstance, ins, job, ip, clsN/A
scrape_samples_post_metric_relabelingUnknowninstance, ins, job, ip, clsN/A
scrape_samples_scrapedUnknowninstance, ins, job, ip, clsN/A
scrape_series_addedUnknowninstance, ins, job, ip, clsN/A
tcp_connectionsgaugeinstance, ins, job, protocol, ip, clsCurrent number of accepted TCP connections.
tcp_connections_limitgaugeinstance, ins, job, protocol, ip, clsThe max number of TCP connections that can be accepted (0 means no limit).
upUnknowninstance, ins, job, ip, clsN/A

8 - 常见问题

Pigsty NODE 主机节点模块常见问题答疑

如何配置主机节点上的NTP服务?

NTP对于生产环境各项服务非常重要,如果没有配置 NTP,您可以使用公共 NTP 服务,或管理节点上的 Chronyd 作为标准时间。

如果您的节点已经配置了 NTP,可以通过设置 node_ntp_enabledfalse 来保留现有配置,不进行任何变更。

否则,如果您有互联网访问权限,可以使用公共 NTP 服务,例如 pool.ntp.org

如果您没有互联网访问权限,可以使用以下方式,确保所有环境内的节点与管理节点时间是同步的,或者使用其他内网环境的 NTP 授时服务。

node_ntp_servers:                 # /etc/chrony.conf 中的 ntp 服务器列表
  - pool cn.pool.ntp.org iburst
  - pool ${admin_ip} iburst       # 假设其他节点都没有互联网访问,那么至少与 Admin 节点保持时间同步。

如何在节点上强制同步时间?

为了使用 chronyc 来同步时间。您首先需要配置 NTP 服务。

ansible all -b -a 'chronyc -a makestep'     # 同步时间

您可以用任何组或主机 IP 地址替换 all,以限制执行范围。


远程节点无法通过SSH访问怎么办?

如果目标机器隐藏在SSH跳板机后面, 或者进行了一些无法直接使用ssh ip访问的自定义操作, 可以使用诸如 ansible_portansible_host 这一类Ansible连接参数来指定各种 SSH 连接信息,如下所示:

pg-test:
  vars: { pg_cluster: pg-test }
  hosts:
    10.10.10.11: {pg_seq: 1, pg_role: primary, ansible_host: node-1 }
    10.10.10.12: {pg_seq: 2, pg_role: replica, ansible_port: 22223, ansible_user: admin }
    10.10.10.13: {pg_seq: 3, pg_role: offline, ansible_port: 22224 }

远程节点SSH与SUDO需要密码怎么办?

执行部署和更改时,使用的管理员用户必须对所有节点拥有sshsudo权限。无需密码免密登录。

您可以在执行剧本时通过-k|-K 参数传入ssh和sudo密码,甚至可以通过-eansible_host=<another_user>使用另一个用户来运行剧本。

但是,Pigsty强烈建议为管理员用户配置SSH无密码登录以及无密码的sudo


如何使用现有管理员创建专用管理员用户?

使用以下命令,使用该节点上现有的管理员用户,创建由node_admin_username 定义的新的标准的管理员用户。

./node.yml -k -K -e ansible_user=<another_admin> -t node_admin

如何使用节点上的HAProxy对外暴露服务?

您可以在配置中中使用haproxy_services 来暴露服务,并使用 node.yml -t haproxy_config,haproxy_reload 来更新配置。

以下是使用它暴露MinIO服务的示例:暴露MinIO服务


为什么我的 /etc/yum.repos.d/* 全没了?

Pigsty会在infra节点上构建的本地软件仓库源中包含所有依赖项。而所有普通节点会根据 node_repo_modules 的默认配置 local 来引用并使用 Infra 节点上的本地软件源。

这一设计从而避免了互联网访问,增强了安装过程的稳定性与可靠性。所有原有的源定义文件会被移动到 /etc/yum.repos.d/backup 目录中,您只要按需复制回来即可。

如果您想在普通节点安装过程中保留原有的源定义文件,将 node_repo_remove 设置为false即可。

如果您想在 Infra 节点构建本地源的过程中保留原有的源定义文件,将 repo_remove 设置为false即可。


为什么我的命令行提示符变样了?怎么恢复?

Pigsty 使用的 Shell 命令行提示符是由环境变量 PS1 指定,定义在 /etc/profile.d/node.sh 文件中。

如果您不喜欢,想要修改或恢复原样,可以将这个文件移除,重新登陆即可。


为什么我的主机名变了?

在两种情况下,Pigsty 会修改您的节点主机名:

如果您不希望修改主机名,可以在全局/集群/实例层面修改 nodename_overwrite 参数为 false (默认值为 true)。

详情请参考 NODE_ID 一节。


腾讯云的 OpenCloudOS 有什么兼容性问题?

OpenCloudOS 上的 softdog 内核模块不可用,需要从 node_kernel_modules 中移除。在配置文件全局变量中添加以下配置项以覆盖:

node_kernel_modules: [ ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh ]

Debian 系统有哪些常见问题?

在 Debian/Ubuntu 系统上使用 Pigsty 时,可能遇到以下问题:

本地语言环境缺失

如果系统提示 locale 相关错误,可以使用以下命令修复:

localedef -i en_US -f UTF-8 en_US.UTF-8

缺少 rsync 工具

Pigsty 依赖 rsync 进行文件同步,如果系统未安装,可以使用以下命令安装:

apt-get install rsync