首家支持阿里云函数计算 APM 技术为 Serverless 环境赋能

用户头像
BonreeAPM
关注
发布于: 2020 年 11 月 30 日
首家支持阿里云函数计算 APM技术为Serverless环境赋能

阿里云函数计算近日正式接入APM技术,从而能够为阿里云用户的Serverless运行环境赋予完整的应用全局监控和管理能力,助推云计算相关业务的发展。



发布于2017年的阿里云函数计算,是中国最早的Serverless云产品之一。其免运维、极速、自动、弹性扩缩容、按需付费等优势吸引了大批开发者和企业。2019年,CNCF基金会的调查问卷显示,46%的回应者选择了阿里云函数计算部署> Serverless 应用,位居中国第一。



然而,Serverless在简化开发者运维负担的同时,也对可观测性提出了诸多新的挑战。FaaS无实例的概念,运行环境和关键指标(如 CPU)的黑盒,细粒度单一责任多函数互相调用,函数访问其他云服务分布式链路追踪等都让传统开发者感到不适应,也加大了发现,调查分布式系统中问题的难度。



博睿数据(股票代码:688229)由此成为首家支持阿里云函数计算的厂商。博睿数据认为,对APM技术的应用不应仅限于传统的服务器、虚拟机或容器之中,而在Serverless运行时更加需要

APM技术的全面监控与管理能力。



本文将重点介绍在函数计算运行环境内启动Bonree APM探针,并将函数计算指标收集至博睿数据产品服务端的过程,以提供完整的APM体验。



部署、测试函数



- 使用函数计算自定义镜像功能完成函数的构建和部署。详细部署请步骤参考函数计算自定义镜像创建函数。(阿里云函数计算自定义镜像:https://help.aliyun.com/document_detail/179372.html



FROM golang:1.12.17
RUN apt update && apt install -y zip && apt install -y vim
ADD ./bonree-golang-sdk-v1.3.97444-x64 /go/src/bonree-golang-sdk-v1.3.97444-x64
RUN cd /go/src/bonree-golang-sdk-v1.3.97444-x64/bonree-agent-sdk && sh ./install.sh
RUN go get github.com/bonreeapm/go && go get github.com/bonreeapm/go/common && go get github.com/garyburd/redigo/redis && go get github.com/go-sql-driver/mysql
RUN cd /go/src/bonree-golang-sdk-v1.3.97444-x64/github.com/bonreeapm/go/sample && go build -o /fc-app ./main.go && chmod -R 777 /opt/bonree-agent-sdk/
ADD ./start.sh /start.sh
ENTRYPOINT ["/start.sh"]



- 调用函数



while true; do curl https://{account-id}.{region}.fc.aliyuncs.com/2016-08-15/proxy/test-apm/bonree-golang/setURL; done



查看函数请求监控



函数计算具有事件驱动、请求粒度的成本计费,因此对于请求数量和延迟的追踪是FaaS监控的基础能力。博睿数据APM探针支持HTTP请求数量,状态,延迟的显示、报警。博睿数据APM提供的的百分位延迟(P95, P99)指标简化对函数计算的长尾、冷启动等事件的发现。





实例级别监控

函数计算默认不提供实例(函数容器)级别的监控,博睿数据基础监控无需额外配置即可对函数单个容器实例展示CPU、内存使用、连接数、文件打开数等指标进行监控。实例级别的监控可以清楚地观测到函数计算的冷启动、扩缩容时机和行为。单个实例级别的CPU、网络带宽、负载、文件打开数等指标也极大地丰富了函数计算的默认监控指标集,帮助开发者在Serverless黑盒中定位性能和可用性问题。







链路追踪



Tracing能力对于Serverless FaaS服务是可观测性的刚需,Serverless FaaS应用通常有以下特点:



· 函数责任单一

· 应用中涉及多个函数调用

· 函数会调用其他云服务或第三方服务



在函数计算中借助博睿数据APM探针可以实现无侵入,零代码改动对HTTP服务,SQL/NoSQL数据库,MQ进行链路追踪,分析长延迟链路,在复杂的分布式系统中快速定位,响应问题。





函数调用链Profiling

与基于传统常驻实例虚拟机或容器环境不同,函数计算实例拉起、销毁十分频繁。开发者不允许进行SSH或是exec进入容器中查看日志、性能profiling等操作。因此将细粒度的代码调用链可以简化调查问题难度。然而调用链埋点技术门槛偏高,博睿数据APM无侵入模式支持代码调用链,让分布式性能profiling在Serverless环境中变成可能。



Serverless FaaS对于运维效率提成和成本优化的同时,其请求级别的运行模式对可观测性提出了诸多新的挑战。阿里云函数计算使用中,博睿数据APM获得以下高级可观测能力:



· 百分位延迟

· 函数实例级别丰富指标(如:CPU、网络流量)

· 链路追踪Tracing

· 调用链Profiling

· AI智能诊断



未来,博睿数据将与阿里云持续合作,双方继续探索、创新和优化Serverless运行环境中的可观测性,提升函数计算开发者发现、调查问题的效率,最终为云计算用户提供更加优质的服务和体验。



English Version



Bonree Data adds APM support to Alibaba Cloud FunctionCompute



Bonree Data (Shanghai Stock Code: 688229), one of the APM market leaders in China, today announced that it has taken the lead in the industry to reach a cooperation with Alibaba Cloud FunctionCompute, becoming the first vendor to support Alibaba Cloud’s functional computing.



Observability and Serverless



Alibaba Cloud FunctionCompute (FC), released in 2017, is one of the earliest Serverless cloud services in China. Function as a service (FaaS) frees developers from operations such as servers management, resource planning. FC supports sub-seconds rapid auto-scaling and organizations only pay for the actual usage (100-milliseconds billing unit). However, FaaS vendors abstract many concepts that DevOps are already familiar with, which poses several new observability challenges compared with Serverfull technologies:



- Lack of instance/container level metrics and insights, such as CPU, networking traffic

- Single responsibility functions are distributed with arbitrary concurrency that can become difficult to trace the individual requests



- Functions usually invoke other cloud services such as SQL/NoSQL databases or message queues.



- Cannot exec/ssh to an individual instance to perform performance profiling and other advanced investigation tooling.



In Bonree, we believe that Serverless observability is of high demand by its nature that vendors only provide grey/backboxed runtime environments. In this article, we will introduce the integration of Bonree APM with Alibaba Cloud FunctionCompute that will help customers quickly locate and investigate production issues in a highly dynamic and distributed environment.



Deploying and testing functions



Step 1: FC supports deploying functions with custom container images. The following Dockerfile includes a Bonree agent, a proxy and a Golang demo. Please see this document for more details to create a function.



https://help.aliyun.com/document_detail/179372.html



FROM golang:1.12.17
RUN apt update && apt install -y zip && apt install -y vim
ADD ./bonree-golang-sdk-v1.3.97444-x64 /go/src/bonree-golang-sdk-v1.3.97444-x64
RUN cd /go/src/bonree-golang-sdk-v1.3.97444-x64/bonree-agent-sdk && sh ./install.sh
RUN go get github.com/bonreeapm/go && go get github.com/bonreeapm/go/common && go get github.com/garyburd/redigo/redis && go get github.com/go-sql-driver/mysql
RUN cd /go/src/bonree-golang-sdk-v1.3.97444-x64/github.com/bonreeapm/go/sample && go build -o /fc-app ./main.go && chmod -R 777 /opt/bonree-agent-sdk/
ADD ./start.sh /start.sh
ENTRYPOINT ["/start.sh"]



Step 2: Invoke the function periodically



while true; do curl https://{account-id}.{region}.fc.aliyuncs.com/2016-08-15/proxy/test-apm/bonree-golang/setURL; done



Invocation requests monitoring



FunctionCompute is event driven and bills by each individual request duration. Therefore, request metrics such are count, latency are fundamental requirements for availability, performance and cost monitoring. Bonree APM agent not only supports HTTP requests count, status code, latency and alerts, it also provides percentile metrics (P95, P99) that are not provided by current FC offering. These percentile metrics helps identifying if the function suffers from long tail requests or cold start.

Per-instance monitoring



By default, FC does not support function instances monitoring. Using Bonree APM agents within FC function packages, developers are able to observe metrics such as CPU, network traffic, load and open files count for each function container. It also helps to understand FC autoscaling behavior and correlate events such as cold start.

Distributed tracing



Serverless FaaS shares some common characteristics:



- Single responsibility functions

- An application consists of multiple functions that forms an invocation topology



- Function business logic tends to call other cloud or remote services



Using Bonree APM agent, developers are able to automatically gain distributed tracing of various types of remote calls such as HTTP services, SQL/NoSQL database and message queues. Such ability improves the speed of issues detection, investigation and recovery.

Function performance profiling



FC execution environment is much more dynamic and can be short lived compared to traditional VM or containers. Developers are not allowed to ssh or exec into the containers for viewing logs nor can they enable profiling on-demand. Bonree performance profiling enables developers to profile their function code in production, identify slow code path and drive performance optimizations.

Summary



Serverless functions are being adopted in a rapid pace. It reduces the operation load and optimizes cost for organizations. At the same time, FaaS observability is still in the early stage and offers far less capabilities compared with VM/containers hosted applications. Bonree's integration with Alibaba Cloud FunctionCompute offers the following advanced capabilities:



- Percentile latency

- Per-function rich metrics, e.g. CPU, network traffic etc.



- Distributed tracing



- Function performance profiling



- AI diagnostics



In the future, Bonree Data plan to continue exploring, innovating and optimize in Serverless observability to help DevOps detect production issues in realtime, identify root causes and quickly recover.



用户头像

BonreeAPM

关注

让IT运营更智能 2019.09.18 加入

北京博睿宏远数据科技股份有限公司(股票简称:博睿数据,股票代码:688229)是中国APM(应用性能管理)技术的领导厂商,成立于2008年。公司专注于利用数据赋能IT运维,助力企业数字化转型成功。

评论

发布
暂无评论
首家支持阿里云函数计算 APM技术为Serverless环境赋能