基于 Prometheus 的微服务应用监控

关注
发布于: 2020 年 09 月 03 日
﻿
﻿
导读
Prometheus是一套开源的系统监控报警框架。它启发于Google的borgmon 监控系统，由工作在 SoundCloud 的 google 前员工在 2012 年创建，作为社区开源项目进行开发，并于2015年正式发布。2016年，Prometheus 正式加入 Cloud Native Computing Foundation，成为受欢迎度仅次于Kubernetes 的项目。
﻿
现今公司分布式系统都采用Spring Cloud去实现，其简单高效为开发者带来了极大的便利。Spring Cloud本身也有对服务的监控应用指标，如Actuator可以显示系统各种基本信息。但是对于我们业务开发人员，其指标还是过于简单，为了能够详细清楚的描述当前一个微服务的应用指标，我们使用Prometheus对微服务进行指标采集与分析。
﻿
材料准备：
1.Spring Boot（被监控的对象微服务应用）
2.Prometheus（指标采集的时间序列数据库）
3.Alertmanager（监控预警组件）
4.Micrometer（微服务指标暴露者）
5.Grafana（对指标进行丰富的图像展示）
﻿
▌Prometheus搭建
﻿
Prometheus 是一套开源的系统监控报警框架。它启发于 Google 的 borgmon 监控系统，由工作在 SoundCloud 的 google 前员工在 2012 年创建，作为社区开源项目进行开发，并于 2015 年正式发布。2016 年，Prometheus 正式加入 Cloud Native Computing Foundation，成为受欢迎度仅次于 Kubernetes 的项目。
﻿
作为新一代的监控框架，Prometheus 具有以下特点：
﻿
强大的多维度数据模型：
1.时间序列数据通过metric 名和键值对来区分。
2.所有的metrics 都可以设置任意的多维标签。
3.数据模型更随意，不需要刻意设置为以点分隔的字符串。
4.可以对数据模型进行聚合，切割和切片操作。
5.支持双精度浮点类型,标签可以设为全unicode。
﻿
灵活而强大的查询语句（PromQL):在同一个查询语句，可以对多个 metrics 进行乘法、加法、连接、取分数位等操作。
﻿
 易于管理:Prometheus server 是一个单独的二进制文件，可直接在本地工作，不依赖于分布式存储。
﻿
高效:平均每个采样点仅占 3.5 bytes，且一个 Prometheus server 可以处理数百万的 metrics。
﻿
使用 pull 模式采集时间序列数据，这样不仅有利于本机测试而且可以避免有问题的服务器推送坏的 metrics。
﻿
可以采用 push gateway 的方式把时间序列数据推送至 Prometheus server 端。
﻿
有多种可视化图形界面。
﻿
易于伸缩。
﻿
需要指出的是，由于数据采集可能会有丢失，所以 Prometheus 不适用对采集数据要 100% 准确的情形。但如果用于记录时间序列数据，Prometheus 具有很大的查询优势，此外，Prometheus 适用于微服务的体系架构。
﻿
wget https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz
# 解压
tar -zxvf prometheus-2.3.2.linux-amd64.tar.gz
# 修改配置文件
vim prometheus.yml
﻿
global:
  scrape_interval: 3s
  evaluation_interval: 3s
rule_files:
  - "/home/prometheus/rules/*.rules"
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['127.0.0.1:9091']
 
# 启动
./prometheus --web.listen-address=0.0.0.0:9091 --web.enable-lifecycle
访问http://127.0.0.1:9091/ 看到下图即安装成功
﻿
▌Alertmanager搭建
﻿
1.下载Alertmanager
﻿
# 下载
wget https://github.com/prometheus/alertmanager/releases/download/v0.15.2/alertmanager-0.15.2.linux-amd64.tar.gz
# 解压
tar -zxvf alertmanager-0.15.2.linux-amd64.tar.gz
﻿
2.配置 alertmanager.yml
﻿
vim alertmanager.yml
global:
  resolve_timeout: 2h
  smtp_from: '{{ template "email.from" . }}'
  smtp_smarthost: 'smtp.exmail.qq.com:25'
  smtp_auth_username: 'xxxx@analysys.com.cn'
  smtp_auth_password: 'yourpassword'
templates:
- '/home/prometheus/alertmanager/template/*.tmpl'
route:
  group_by: ['node_up']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 1h
  receiver: 'dingding'
route:
  group_by: ['webhook_http']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 1h
  receiver: 'email'
inhibit_rules:
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
      severity: page
  equal: ['node_up']
receivers:
- name: 'wechat'
  wechat_configs:
  - to_user: '@all'
    agent_id: '1000002'
    corp_id: 'corp_id'
    api_secret: 'api_secret'
    message: '{{ template "wechat.text" . }}'
    send_resolved: true
- name: 'email'
  email_configs:
  - to: '{{ template "email.to" . }}'
    text: '{{ template "email.to.text" . }}'
    send_resolved: true
- name: 'dingding'
  webhook_configs:
  - url: 'http://127.0.0.1:9101/hook/dingding'
    send_resolved: true
﻿
3.模版文件实例
﻿
{{ define "email.from" }}xxxx@analysys.com.cn{{ end}}
{{ define "email.to" }}xxxx@analysys.com.cn{{ end}}
{{ define "email.to.text" }}
警告：
{{ template "__my_text_alert_list" .Alerts.Firing }}
{{ end}}
﻿
4.配置Prometheus Rule 预警规则
﻿
vim node_alert.rules
groups:
- name: node_up
  rules:
  - alert: node_up
    expr: up{job="node"} == 0
    for: 15s
    labels:
    annotations:
      summary: "{{ $labels.instance }} 已停止运行！"
 
# 启动Alertmanager
./alertmanager --config.file=alertmanager.yml
 
# 刷新Prometheus配置
curl -X POST http://127.0.0.1:9091/-/reload
﻿
5.查看Rules
﻿
﻿
▌Grafana搭建，并配置Prometheus数据源
﻿
Grafana是一个开源的度量分析与可视化套件。经常被用作基础设施的时间序列数据和应用程序分析的可视化，它在其他领域也被广泛的使用包括工业传感器、家庭自动化、天气和过程控制等。
﻿
Grafana支持许多不同的数据源。每个数据源都有一个特定的查询编辑器,该编辑器定制的特性和功能是公开的特定数据来源。
﻿
# 下载Grafana
wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.2.linux-amd64.tar.gz
# 解压
tar -zxvf grafana-5.2.2.linux-amd64.tar.gz
# 使用默认配置运行
./grafana-server -config /home/prometheus/grafana/grafana-5.2.2/conf/defaults.ini
﻿
1.访问：http://127.0.0.1:3000/
登录：admin admin
﻿
2.配置数据源
﻿
﻿
﻿
▌开发SpringBoot微服务监控
﻿
Micrometer 是一款监控指标的度量类库，可以让在没有供应商锁定的情况下对 JVM 的应用程序代码进行调整。
﻿
1.在项目pom.xml中引入Jar
﻿
<micrometer.version>1.0.6</micrometer.version>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
 
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-spring-legacy</artifactId>
    <version>${micrometer.version}</version>
</dependency>
﻿
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    <version>${micrometer.version}</version>
</dependency>
﻿
2.编写启动配置类
﻿
package cn.analysys.monitor.alertmanager.config;
 
import org.apache.log4j.Logger;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.autoconfigure.condition.ConditionalOnMissingBean;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
 
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.spring.autoconfigure.MeterRegistryCustomizer;
 
/**
 *
 * Description: MicrometerConfiguration<br/>
 *
 * @author litaiqing
 * @date: 2018年8月14日 下午7:10:55
 * @version 1.0
 * @since JDK 1.8
 */
@Configuration
public class MicrometerConfiguration {
 
    private static final Logger logger = Logger.getLogger(MicrometerConfiguration.class);
 
    @Bean
    @ConditionalOnMissingBean
    MeterRegistryCustomizer<?> meterRegistryCustomizer(MeterRegistry meterRegistry,
            @Value("${spring.application.name}") String application) {
        logger.info(application);
        return mr -> {
            mr.config().commonTags("application", application);
        };
    }
 
}
﻿
3.配置application.properties
﻿
spring:
  application:
    name: alertmanager-webhook
server:
  port: 9101
  tomcat:
    uri-encoding: UTF-8
  context-path: /
 
security:
  basic:
    enabled: true
  user:
    name: webhook
    password: analysys_cs
 
endpoints:
  metrics:
    enabled: true
  health:
    enabled: true
    path: /health
﻿
4.打包并发布应用
﻿
访问：http://127.0.0.1:9101/prometheus 即可查看应用实时指标。
﻿
▌让Prometheus拉取监控微服务指标
﻿
1.配置Prometheus监控
﻿
global:
  scrape_interval:     3s
  evaluation_interval: 3s
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 127.0.0.1:9093
rule_files:
  - "/home/prometheus/rules/*.rules"
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['127.0.0.1:9091']
  - job_name: 'node'
    static_configs:
    - targets: ['127.0.0.1:9100']
  - job_name: 'alertmanager-webhook'
    metrics_path: /prometheus
    basic_auth:
      username: webhook
      password: analysys_cs
    static_configs:
    - targets: ['127.0.0.1:9101']
 
# 刷新Prometheus配置
curl -X POST http://127.0.0.1:9091/-/reload
﻿
2.Grafana导入模版
﻿
﻿
﻿
查看JVM监控面板
﻿
﻿
▌预警展示
﻿
在node_up停机的情况下会收到下面的通知：
 
﻿
重新启动node_up运行情况下会收到下面的通知：
﻿
﻿
▌监控总结
﻿
此实例仅描述微服务监控解决的主流程，在实际生产过程中，监控预警系统需要有高可靠性。
﻿
因此Prometheus要采用多台服务器拉去标签，分别存储。Alertmanager预警需要配置集群预警，使用Gossip机制去完成过滤、去噪等操作。
﻿
同时由于微服务上下线非常频繁，迭代速度较快，因此需要有自动化服务发现（如Consul）配置来替代当前的手动配置。
﻿
针对上述问题，易观技术群组已经解决并投入了使用，本次技术分享主要分享监控的主体思路与实践操作，请敬请期待Prometheus微服务监控生产实战的后篇分享。