写点什么

DolphinScheduler 集成 Arthas 实现接口调用监控,提升调度任务可靠性

作者:白鲸开源
  • 2024-11-06
    天津
  • 本文字数:3677 字

    阅读完需:约 12 分钟

本文介绍了在 Apache DolphinScheduler 中嵌入 Arthas 的方法,以实现对接口调用的监控。Arthas 是一款强大的 Java 诊断工具,能够帮助开发者实时查看应用程序的运行状态、性能瓶颈和方法调用情况。在 DolphinScheduler 中集成 Arthas,可以方便地捕获任务调度时的关键调用信息,及时发现并解决性能问题,提高系统的稳定性。本文将详细说明如何在 DolphinScheduler 环境下启动 Arthas,监控特定接口的调用,并分析收集到的性能数据,从而提升任务调度的可靠性和可维护性。


手动安装

https://arthas.aliyun.com/download/latest_version?mirror=aliyunarthas-packaging-3.7.2-bin.zip
cp arthas-packaging-3.7.2-bin.zip /opt/arthascd /opt/arthasunzip arthas-packaging-3.7.2-bin.zip
java -jar arthas-boot.jar
选择对应的进程号
复制代码


报错解决

报错 1

[ERROR] Start arthas failed, exception stack trace: com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded        at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106)        at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:78)        at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:250)        at com.taobao.arthas.core.Arthas.attachAgent(Arthas.java:102)        at com.taobao.arthas.core.Arthas.<init>(Arthas.java:27)        at com.taobao.arthas.core.Arthas.main(Arthas.java:161)
复制代码

解决 :

进入 ${DOLPINSCHEUDLER_HOME}/api-server/bin下,在 jvm_args_env.sh 中添加如下 :-XX:+StartAttachListener
复制代码

报错 2

Picked up JAVA_TOOL_OPTIONS: java.io.IOException: well-known file /tmp/.java_pid731688 is not secure: file should be owned by the current user (which is 0) but is owned by 989        at sun.tools.attach.LinuxVirtualMachine.checkPermissions(Native Method)        at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:117)        at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:78)        at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:250)        at com.taobao.arthas.core.Arthas.attachAgent(Arthas.java:102)        at com.taobao.arthas.core.Arthas.<init>(Arthas.java:27)        at com.taobao.arthas.core.Arthas.main(Arthas.java:161)[ERROR] Start arthas failed, exception stack trace: [ERROR] attach fail, targetPid: 731688
复制代码

解决:

arthas启动的服务和dolpinscheduler启动服务所属的用户要一样,不然有如上的报错
复制代码

Watch


Watch 用于监控方法的具体执行细节,如参数、返回值等

watch org.apache.dolphinscheduler.api.controller.UsersController queryUserList returnObj
复制代码


[arthas@731688]$ watch org.apache.dolphinscheduler.api.controller.UsersController queryUserList returnObjPress Q or Ctrl+C to abort.Affect(class count: 1 , method count: 1) cost in 126 ms, listenerId: 2method=org.apache.dolphinscheduler.api.controller.UsersController.queryUserList location=AtExitts=2024-08-27 02:04:01; [cost=4.918943ms] result=@Result[    code=@Integer[0],    msg=@String[成功],    data=@PageInfo[PageInfo(totalList=[User(id=1, userName=admin, userPassword=null, email=825193156@qq.com, phone=, userType=ADMIN_USER, tenantId=1, state=1, tenantCode=hdfs, queueName=default, alertGroup=null, queue=default, timeZone=null, createTime=Fri Jul 19 04:19:31 GMT-05:00 2024, updateTime=Mon Aug 12 22:15:58 GMT-05:00 2024)], total=1, totalPage=1, pageSize=10, currentPage=1, pageNo=0)],]method=org.apache.dolphinscheduler.api.controller.UsersController.queryUserList location=AtExitts=2024-08-27 02:04:18; [cost=6.905345ms] result=@Result[    code=@Integer[0],    msg=@String[成功],    data=@PageInfo[PageInfo(totalList=[User(id=1, userName=admin, userPassword=null, email=825193156@qq.com, phone=, userType=ADMIN_USER, tenantId=1, state=1, tenantCode=hdfs, queueName=default, alertGroup=null, queue=default, timeZone=null, createTime=Fri Jul 19 04:19:31 GMT-05:00 2024, updateTime=Mon Aug 12 22:15:58 GMT-05:00 2024)], total=1, totalPage=1, pageSize=10, currentPage=1, pageNo=0)],]method=org.apache.dolphinscheduler.api.controller.UsersController.queryUserList location=AtExitts=2024-08-27 02:04:27; [cost=5.803269ms] result=@Result[    code=@Integer[0],    msg=@String[成功],    data=@PageInfo[PageInfo(totalList=[User(id=1, userName=admin, userPassword=null, email=825193156@qq.com, phone=, userType=ADMIN_USER, tenantId=1, state=1, tenantCode=hdfs, queueName=default, alertGroup=null, queue=default, timeZone=null, createTime=Fri Jul 19 04:19:31 GMT-05:00 2024, updateTime=Mon Aug 12 22:15:58 GMT-05:00 2024)], total=1, totalPage=1, pageSize=10, currentPage=1, pageNo=0)],]
复制代码

Trace


Trace 用于监控方法调用的深度,包括调用了哪些方法以及每个方法的执行时间。

[arthas@973263]$ trace org.apache.dolphinscheduler.api.controller.UsersController queryUserList Press Q or Ctrl+C to abort.Affect(class count: 1 , method count: 1) cost in 319 ms, listenerId: 1`---ts=2024-08-27 10:33:08;thread_name=qtp1836984213-26;id=26;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@439f5b3d    `---[13.962731ms] org.apache.dolphinscheduler.api.controller.UsersController:queryUserList()        +---[0.18% 0.025123ms ] org.apache.dolphinscheduler.api.controller.UsersController:checkPageParams() #130        +---[0.09% 0.012549ms ] org.apache.dolphinscheduler.plugin.task.api.utils.ParameterUtils:handleEscapes() #131        `---[96.47% 13.469876ms ] org.apache.dolphinscheduler.api.service.UsersService:queryUserList() #132
复制代码

Dump


heapdump arthas-output/dump.hprof 生成堆转储文件:

[arthas@973263]$ heapdump arthas-output/dump.hprofDumping heap to arthas-output/dump.hprof ...Heap dump file created
复制代码

使用 MAT 进行内存泄漏分析。

查看 JVM 内存变化


memory 查看 JVM 内存

[arthas@973263]$ memory Memory                                                         used                 total                max                  usage                heap                                                           485M                 900M                 900M                 53.91%               ps_eden_space                                                  277M                 327M                 358M                 77.61%               ps_survivor_space                                              61M                  61M                  61M                  99.98%               ps_old_gen                                                     146M                 512M                 512M                 28.54%               nonheap                                                        162M                 188M                 -1                   85.96%               code_cache                                                     11M                  32M                  240M                 4.89%                metaspace                                                      135M                 140M                 -1                   96.67%               compressed_class_space                                         14M                  15M                  1024M                1.43%                direct                                                         949K                 949K                 -                    100.00%              mapped                                                         0K                   0K                   -                    0.00%
复制代码

查看 CPU 使用率


dashboard 可以查看 CPU 使用率,查看是哪个线程的,通过 thread -n 线程 id 查看:



转载自 Journey

原文链接:https://segmentfault.com/a/1190000045219355

用户头像

白鲸开源

关注

一家开源原生的DataOps商业公司。 2022-03-18 加入

致力于打造下一代开源原生的DataOps 平台,助力企业在大数据和云时代,智能化地完成多数据源、多云及信创环境的数据集成、调度开发和治理,以提高企业解决数据问题的效率,提升企业分析洞察能力和决策能力。

评论

发布
暂无评论
DolphinScheduler集成Arthas实现接口调用监控,提升调度任务可靠性_工作流调度_白鲸开源_InfoQ写作社区