背景
Milvus Standalone 作为单机服务器部署,把所有组件都打包到一个 Docker 镜像中,部署起来非常方便。对于中型数据集而言,在内存充足的单机上运行 Milvus Standalone 是一个不错的选择。此外,Milvus Standalone 通过主从复制支持高可用性。
另外,Milvus 天然支持 Prometheus 来监控指标,以及 Grafana 来可视化指标和创建警报,但是文档中只是列出了在 Kubernetes 上部署监控服务操作步骤 https://milvus.io/docs/zh/monitor.md , 其实在 Milvus Standalone 也可以集成部署 Prometheus 和 Grafana 来监控 Milvus 服务。
Docker compose 的 Milvus Standalone 监控部署
首先,我们来看看完整的 docker compose 文件
services: etcd: container_name: milvus-etcd image: quay.io/coreos/etcd:v3.5.5 environment: - ETCD_AUTO_COMPACTION_MODE=revision - ETCD_AUTO_COMPACTION_RETENTION=1000 - ETCD_QUOTA_BACKEND_BYTES=4294967296 - ETCD_SNAPSHOT_COUNT=50000 volumes: - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd healthcheck: test: ["CMD", "etcdctl", "endpoint", "health"] interval: 30s timeout: 20s retries: 3
minio: container_name: milvus-minio image: minio/minio:RELEASE.2023-03-20T20-16-18Z environment: MINIO_ACCESS_KEY: minioadmin MINIO_SECRET_KEY: minioadmin ports: - "9001:9001" - "9000:9000" volumes: - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data command: minio server /minio_data --console-address ":9001" healthcheck: test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"] interval: 30s timeout: 20s retries: 3
standalone: container_name: milvus-standalone image: milvusdb/milvus:v2.4.11 command: ["milvus", "run", "standalone"] security_opt: - seccomp:unconfined environment: ETCD_ENDPOINTS: etcd:2379 MINIO_ADDRESS: minio:9000 volumes: - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus - ./milvus.yaml:/milvus/configs/milvus.yaml healthcheck: test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"] interval: 30s start_period: 90s timeout: 20s retries: 3 ports: - "19530:19530" - "9091:9091" depends_on: - "etcd" - "minio"
prometheus: image: prom/prometheus container_name: prometheus user: root command: - '--config.file=/etc/prometheus/prometheus.yml' ports: - 9090:9090 restart: unless-stopped volumes: - ./prometheus:/etc/prometheus - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/prometheus:/prometheus
grafana: image: grafana/grafana container_name: grafana user: root ports: - 3000:3000 restart: unless-stopped environment: - GF_SECURITY_ADMIN_USER=admin - GF_SECURITY_ADMIN_PASSWORD=grafana volumes: - ./grafana/datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml - ./grafana/dashboard.yml:/etc/grafana/provisioning/dashboards/main.yml - ./grafana/dashboards:/var/lib/grafana/dashboards - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/grafana:/var/lib/grafana
networks: default: name: milvus
复制代码
部署 Prometheus
由于 Milvus 为 Prometheus 在 http://<component-host>:9091/metrics 上导出每个 Milvus 组件的指标。因此,我们在 Promtheus 的 scrape_configs 设置这个地址
scrape_configs: # Allows ephemeral and batch jobs to expose their metrics to Prometheus - job_name: 'milvus-standalone' honor_labels: true metrics_path: /metrics static_configs: - targets: ['standalone:9091']
复制代码
同时在 docker compose 文件里,增加 Prometheus Service 部署
prometheus: image: prom/prometheus container_name: prometheus user: root command: - '--config.file=/etc/prometheus/prometheus.yml' ports: - 9090:9090 restart: unless-stopped volumes: - ./prometheus:/etc/prometheus - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/prometheus:/prometheus
复制代码
部署 Grafana
前面的 Promtheus 的部署中,我们定义 Prometheus 的端口是 9090。因此,在 Grafana 定义 Prometheus 数据源
datasources:- name: Prometheus type: prometheus url: http://prometheus:9090 isDefault: true access: proxy editable: true
复制代码
同时,一个 Milvus Standalone 监控看板 参见 https://github.com/milvus-io/milvus-docs/blob/v2.5.x/assets/standalone-monitoring/grafana/dashboards/milvus-standalone-dashboard.json
同样,我们也需要在 docker compose 文件里,增加 Grafana Service 部署
grafana: image: grafana/grafana container_name: grafana user: root ports: - 3000:3000 restart: unless-stopped environment: - GF_SECURITY_ADMIN_USER=admin - GF_SECURITY_ADMIN_PASSWORD=grafana volumes: - ./grafana/datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml - ./grafana/dashboard.yml:/etc/grafana/provisioning/dashboards/main.yml - ./grafana/dashboards:/var/lib/grafana/dashboards - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/grafana:/var/lib/grafana
复制代码
此时,我们可以通过 http://<your-host>:3000 进入 Grafana 界面
然后查看 Milvus Standalone 监控大盘
详细的 docker compose 以及相关文件参见 https://github.com/milvus-io/milvus-docs/tree/v2.5.x/assets/standalone-monitoring ,需要注意到是,这个模版里是以 Milvus 2.4.11 为例,如果需要更新 Milvus 版本,需要对应的 Docker image 版本号即可。
总结
本文介绍如何在 docker compose 部署的 Milvus Standalone 服务增加 Prometheus 和 Grafana 来实现服务监控,为 Milvus Standalone 服务监控提供了便利。
作者介绍
Zilliz 黄金写手:臧伟
评论