背景
Milvus Standalone 作为单机服务器部署,把所有组件都打包到一个 Docker 镜像中,部署起来非常方便。对于中型数据集而言,在内存充足的单机上运行 Milvus Standalone 是一个不错的选择。此外,Milvus Standalone 通过主从复制支持高可用性。
另外,Milvus 天然支持 Prometheus 来监控指标,以及 Grafana 来可视化指标和创建警报,但是文档中只是列出了在 Kubernetes 上部署监控服务操作步骤 https://milvus.io/docs/zh/monitor.md , 其实在 Milvus Standalone 也可以集成部署 Prometheus 和 Grafana 来监控 Milvus 服务。
Docker compose 的 Milvus Standalone 监控部署
首先,我们来看看完整的 docker compose 文件
 services:  etcd:    container_name: milvus-etcd    image: quay.io/coreos/etcd:v3.5.5    environment:      - ETCD_AUTO_COMPACTION_MODE=revision      - ETCD_AUTO_COMPACTION_RETENTION=1000      - ETCD_QUOTA_BACKEND_BYTES=4294967296      - ETCD_SNAPSHOT_COUNT=50000    volumes:      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd    healthcheck:      test: ["CMD", "etcdctl", "endpoint", "health"]      interval: 30s      timeout: 20s      retries: 3
  minio:    container_name: milvus-minio    image: minio/minio:RELEASE.2023-03-20T20-16-18Z    environment:      MINIO_ACCESS_KEY: minioadmin      MINIO_SECRET_KEY: minioadmin    ports:      - "9001:9001"      - "9000:9000"    volumes:      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data    command: minio server /minio_data --console-address ":9001"    healthcheck:      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]      interval: 30s      timeout: 20s      retries: 3
  standalone:    container_name: milvus-standalone    image: milvusdb/milvus:v2.4.11    command: ["milvus", "run", "standalone"]    security_opt:    - seccomp:unconfined    environment:      ETCD_ENDPOINTS: etcd:2379      MINIO_ADDRESS: minio:9000    volumes:      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus      - ./milvus.yaml:/milvus/configs/milvus.yaml     healthcheck:      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]      interval: 30s      start_period: 90s      timeout: 20s      retries: 3    ports:      - "19530:19530"      - "9091:9091"    depends_on:      - "etcd"      - "minio"
  prometheus:    image: prom/prometheus    container_name: prometheus    user: root    command:      - '--config.file=/etc/prometheus/prometheus.yml'    ports:      - 9090:9090    restart: unless-stopped    volumes:      - ./prometheus:/etc/prometheus      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/prometheus:/prometheus
  grafana:    image: grafana/grafana    container_name: grafana    user: root    ports:      - 3000:3000    restart: unless-stopped    environment:      - GF_SECURITY_ADMIN_USER=admin      - GF_SECURITY_ADMIN_PASSWORD=grafana    volumes:      - ./grafana/datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml      - ./grafana/dashboard.yml:/etc/grafana/provisioning/dashboards/main.yml      - ./grafana/dashboards:/var/lib/grafana/dashboards      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/grafana:/var/lib/grafana
networks:  default:    name: milvus
       复制代码
 部署 Prometheus
由于 Milvus 为 Prometheus 在 http://<component-host>:9091/metrics 上导出每个 Milvus 组件的指标。因此,我们在 Promtheus 的 scrape_configs 设置这个地址
 scrape_configs:   # Allows ephemeral and batch jobs to expose their metrics to Prometheus   - job_name: 'milvus-standalone'    honor_labels: true    metrics_path: /metrics    static_configs:    - targets: ['standalone:9091']
       复制代码
 
同时在 docker compose 文件里,增加 Prometheus Service 部署
   prometheus:    image: prom/prometheus    container_name: prometheus    user: root    command:      - '--config.file=/etc/prometheus/prometheus.yml'    ports:      - 9090:9090    restart: unless-stopped    volumes:      - ./prometheus:/etc/prometheus      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/prometheus:/prometheus
       复制代码
 部署 Grafana
前面的 Promtheus 的部署中,我们定义 Prometheus 的端口是 9090。因此,在 Grafana 定义 Prometheus 数据源
 datasources:- name: Prometheus  type: prometheus  url: http://prometheus:9090   isDefault: true  access: proxy  editable: true
       复制代码
 
同时,一个 Milvus Standalone 监控看板 参见 https://github.com/milvus-io/milvus-docs/blob/v2.5.x/assets/standalone-monitoring/grafana/dashboards/milvus-standalone-dashboard.json
同样,我们也需要在 docker compose 文件里,增加 Grafana Service 部署
   grafana:    image: grafana/grafana    container_name: grafana    user: root    ports:      - 3000:3000    restart: unless-stopped    environment:      - GF_SECURITY_ADMIN_USER=admin      - GF_SECURITY_ADMIN_PASSWORD=grafana    volumes:      - ./grafana/datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml      - ./grafana/dashboard.yml:/etc/grafana/provisioning/dashboards/main.yml      - ./grafana/dashboards:/var/lib/grafana/dashboards      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/grafana:/var/lib/grafana
       复制代码
 
此时,我们可以通过 http://<your-host>:3000 进入 Grafana 界面
然后查看 Milvus Standalone 监控大盘
详细的 docker compose 以及相关文件参见 https://github.com/milvus-io/milvus-docs/tree/v2.5.x/assets/standalone-monitoring ,需要注意到是,这个模版里是以 Milvus 2.4.11 为例,如果需要更新 Milvus 版本,需要对应的 Docker image 版本号即可。
总结
本文介绍如何在 docker compose 部署的 Milvus Standalone 服务增加 Prometheus 和 Grafana 来实现服务监控,为 Milvus Standalone 服务监控提供了便利。
作者介绍
Zilliz 黄金写手:臧伟
评论