监控告警

如何在 Pigsty 中对基础设施进行自监控?

本文介绍 Pigsty 中 INFRA 模块的监控面板与告警规则。


监控面板

Pigsty 针对 Infra 模块提供了以下监控面板:

面板描述
Pigsty HomePigsty 监控系统主页
INFRA OverviewPigsty 基础设施自监控概览
Nginx InstanceNginx 监控指标与日志
Grafana InstanceGrafana 监控指标与日志
VictoriaMetrics InstanceVictoriaMetrics 抓取/查询状态
VMAlert Instance告警规则执行情况
Alertmanager Instance告警聚合与通知
VictoriaLogs Instance日志写入、查询与索引
Logs Instance查阅单个节点上的日志信息
VictoriaTraces InstanceTrace 存储与查询
Inventory CMDBCMDB 可视化
ETCD Overviewetcd 集群监控

告警规则

Pigsty 针对 INFRA 模块提供了以下两条告警规则:

告警规则描述
InfraDown基础设施组件出现宕机
AgentDown监控 Agent 代理出现宕机

可在 files/victoria/rules/infra.yml 中修改或添加新的基础设施告警规则。

告警规则配置

################################################################
#                Infrastructure Alert Rules                    #
################################################################
- name: infra-alert
  rules:

    #==============================================================#
    #                       Infra Aliveness                        #
    #==============================================================#
    # infra components (victoria,grafana) down for 1m triggers a P1 alert
    - alert: InfraDown
      expr: infra_up < 1
      for: 1m
      labels: { level: 0, severity: CRIT, category: infra }
      annotations:
        summary: "CRIT InfraDown {{ $labels.type }}@{{ $labels.instance }}"
        description: |
          infra_up[type={{ $labels.type }}, instance={{ $labels.instance }}] = {{ $value  | printf "%.2f" }} < 1

    #==============================================================#
    #                       Agent Aliveness                        #
    #==============================================================#

    # agent aliveness are determined directly by exporter aliveness
    # including: node_exporter, pg_exporter, pgbouncer_exporter, haproxy_exporter
    - alert: AgentDown
      expr: agent_up < 1
      for: 1m
      labels: { level: 0, severity: CRIT, category: infra }
      annotations:
        summary: 'CRIT AgentDown {{ $labels.ins }}@{{ $labels.instance }}'
        description: |
          agent_up[ins={{ $labels.ins }}, instance={{ $labels.instance }}] = {{ $value  | printf "%.2f" }} < 1

最后修改 2025-12-20: update some docs to v4.0 (6c231c3)