写点什么

发送 Tidb 告警信息到企业微信群实践

  • 2023-06-30
    北京
  • 本文字数:2160 字

    阅读完需:约 7 分钟

作者: 像风一样的男子原文来源:https://tidb.net/blog/199877a0

背景

tidb 提供了 Prometheus + Grafana 来对集群中的各个 TiKV、TiDB 和 PD 组件的各种性能指标进行监控。但是还缺少实时告警功能,本文主要介绍通过 webhook 方式发送告警信息到企业微信群。

工具安装

Prometheus Alert 是开源的运维告警中心消息转发系统,支持主流的监控系统 Prometheus,日志系统 Graylog 和数据可视化系统 Grafana 发出的预警消息。通知渠道支持钉钉、微信、企业微信、华为云短信、腾讯云短信、腾讯云电话、阿里云短信、阿里云电话等。


项目地址:https://github.com/feiyu563/PrometheusAlert


按照 README 文档一步步安装。


安装完访问页面:



记住这个模版地址后面有用


告警方案一:Granafa+Prometheus Alert(未调通)

Tidb 集群内置了一套 Granafa 监控,可以配置告警,有默认的告警规则。



添加通知渠道,选择 webhook, 地址填写 http://10.20.10.118:8080/prometheusalert?type=wx&tpl=grafana-wx&wxurl=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxxxxxxxx&at=xxxxxxx


key 为企业微信群里告警机器人的 key。




测试结果:企业微信群里可以正常接收到消息



回头再进入 Granafa 告警规则编辑页面,发现有一个红色警告 Template variables are not supported in alert queries。



官方回复:


https://github.com/grafana/grafana/issues/9334


Template variables are not supported in alerting.


Template variables should be used for discovery and drill down. Not controlling alert rules


默认的图表监控中使用了模版变量,granafa 告警不支持。


解决办法:复制这个 Dashboard 的 json,将里面的变量都改为常量。


这个工作量有点大,遂放弃,有兴趣的小伙伴可以尝试下。

告警方案二:prometheus alertmanager+PrometheusAlert

通过 Tidb 集群自带的 alertmanager 发送告警信息到 PrometheusAlert 告警。


alertmanager 地址 http://IP:9093/#/alerts



修改 prometheus 配置文件 prometheus.yml


添加如下配置:


alerting:  alertmanagers:  - static_configs:    - targets:      - '10.20.10.61:9093'
复制代码



修改 alertmanager 配置文件 alertmanager.yml


global:  # The smarthost and SMTP sender used for mail notifications.  smtp_smarthost: "localhost:25"  smtp_from: "alertmanager@example.org"  smtp_auth_username: "alertmanager"  smtp_auth_password: "password"  # smtp_require_tls: true
# The Slack webhook URL. # slack_api_url: ''
route: # A default receiver# receiver: "blackhole" receiver: "webhook1"
# The labels by which incoming alerts are grouped together. For example, # multiple alerts coming in for cluster=A and alertname=LatencyHigh would # be batched into a single group. group_by: ["env", "instance", "alertname", "type", "group", "job"]
# When a new group of alerts is created by an incoming alert, wait at # least 'group_wait' to send the initial notification. # This way ensures that you get multiple alerts for the same group that start # firing shortly after another are batched together on the first # notification. group_wait: 30s
# When the first notification was sent, wait 'group_interval' to send a batch # of new alerts that started firing for that group. group_interval: 3m
# If an alert has successfully been sent, wait 'repeat_interval' to # resend them. repeat_interval: 3m
routes: # - match: # receiver: webhook-kafka-adapter # continue: true # - match: # env: test-cluster # receiver: db-alert-slack # - match: # env: test-cluster # receiver: db-alert-email
receivers: # - name: 'webhook-kafka-adapter' # webhook_configs: # - send_resolved: true # url: 'http://10.0.3.6:28082/v1/alertmanager'
#- name: 'db-alert-slack' # slack_configs: # - channel: '#alerts' # username: 'db-alert' # icon_emoji: ':bell:' # title: '{{ .CommonLabels.alertname }}' # text: '{{ .CommonAnnotations.summary }} {{ .CommonAnnotations.description }} expr: {{ .CommonLabels.expr }} http://172.0.0.1:9093/#/alerts'
# - name: "db-alert-email" # email_configs: # - send_resolved: true # to: "example@example.com" - name: webhook1 webhook_configs: - url: 'http://10.20.10.118:8080/prometheusalert?type=wx&tpl=prometheus-wx&wxurl=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxxxxxx&at=xxxxxxxxx' send_resolved: true # 警报被解决之后是否通知
# This doesn't alert anything, please configure your own receiver# - name: "blackhole"

复制代码


重启 alertmanager 和 prometheus 服务


tiup cluster restart cluster-name -N ip:9093


tiup cluster restart cluster-name -N ip:9090


企业微信群告警效果:



发布于: 刚刚阅读数: 2
用户头像

TiDB 社区官网:https://tidb.net/ 2021-12-15 加入

TiDB 社区干货传送门是由 TiDB 社区中布道师组委会自发组织的 TiDB 社区优质内容对外宣布的栏目,旨在加深 TiDBer 之间的交流和学习。一起构建有爱、互助、共创共建的 TiDB 社区 https://tidb.net/

评论

发布
暂无评论
发送Tidb告警信息到企业微信群实践_监控_TiDB 社区干货传送门_InfoQ写作社区