写点什么

TiUP 升级集群报 Run Command Timeout/SSH Timeout 错误解决方案

  • 2022 年 7 月 11 日
  • 本文字数:2891 字

    阅读完需:约 9 分钟

作者: 代晓磊 _Mars 原文来源:https://tidb.net/blog/fdbe79bc


(1) 问题现象:升级 tiup 过程中 stop tikv 节点超时:ERROR Run Command Timeout,其实登录到 192.168.1.43 查看 tikv 其实已经 stop 了。


2020-06-29T05:21:18.289+0800 INFO Stopping instance 192.168.1.43


2020-06-29T05:22:58.364+0800 INFO SSHCommand {“host”: “192.168.1.43”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin sudo -H -u root bash -c “systemctl daemon-reload && systemctl stop tikv-20160.service””, “stdout”: “”, “stderr”: “Run Command Timeout!\“n”}


2020-06-29T05:22:58.364+0800 ERROR Run Command Timeout!


2020-06-29T05:22:58.364+0800 INFO Execute command finished {“code”: 1, “error”: “failed to upgrade: failed to stop 192.168.1.43: failed to stop: tikv 192.168.1.43:20160: executor.ssh.execute_timedout: Execute command over SSH timedout for ‘tidb\@192.168.1.43:22’ {ssh_stderr: Run Command Timeout!\“n, ssh_stdout: , ssh_command: export LANG=C; PATH=PATH:/usr/bin:/usr/sbin sudo -H -u root bash -c “systemctl daemon-reload && systemctl stop tikv-20160.service”}”, “errorVerbose”: “executor.ssh.execute_timedout: Execute command over SSH timedout for ‘tidb\@192.168.1.43:22’ {ssh_stderr: Run Command Timeout!\“n, ssh_stdout: , ssh_command: export LANG=C; PATH=PATH:/usr/bin:/usr/sbin sudo -H -u root bash -c “systemctl daemon-reload && systemctl stop tikv-20160.service”}\“n at github.com/pingcap/tiup/pkg/cluster/executor.(*SSHExecutor).Execute()\“n\“tgithub.com/pingcap/tiup@/pkg/cluster/executor/ssh.go:172\“n at github.com/pingcap/tiup/pkg/cluster/module.(*SystemdModule).Execute()\“n\“tgithub.com/pingcap/tiup@/pkg/cluster/module/systemd.go:89\“n at github.com/pingcap/tiup/pkg/cluster/operation.stopInstance()\“n\“tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:574\“n at github.com/pingcap/tiup/pkg/cluster/operation.Upgrade()\“n\“tgithub.com/pingcap/tiup@/pkg/cluster/operation/upgrade.go:99\“n at github.com/pingcap/tiup/pkg/cluster/task.(*ClusterOperate).Execute()\“n\“tgithub.com/pingcap/tiup@/pkg/cluster/task/action.go:53\“n at github.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute()\“n\“tgithub.com/pingcap/tiup@/pkg/cluster/task/task.go:189\“n at github.com/pingcap/tiup/components/cluster/command.upgrade()\“n\“tgithub.com/pingcap/tiup@/components/cluster/command/upgrade.go:174\“n at github.com/pingcap/tiup/components/cluster/command.newUpgradeCmd.func1()\“n\“tgithub.com/pingcap/tiup@/components/cluster/command/upgrade.go:50\“n at github.com/spf13/cobra.(*Command).execute()\“n\“tgithub.com/spf13/cobra@v1.0.0/command.go:842\“n at github.com/spf13/cobra.(*Command).ExecuteC()\“n\“tgithub.com/spf13/cobra@v1.0.0/command.go:950\“n at github.com/spf13/cobra.(*Command).Execute()\“n\“tgithub.com/spf13/cobra@v1.0.0/command.go:887\“n at github.com/pingcap/tiup/components/cluster/command.Execute()\“n\“tgithub.com/pingcap/tiup@/components/cluster/command/root.go:220\“n at main.main()\“n\“tgithub.com/pingcap/tiup@/components/cluster/main.go:19\“n at runtime.main()\“n\“truntime/proc.go:203\“n at runtime.goexit()\“n\“truntime/asm_amd64.s:1357\“nfailed to stop: tikv 192.168.1.43:20160\“ngithub.com/pingcap/tiup/pkg/cluster/operation.stopInstance\“n\“tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:593\“ngithub.com/pingcap/tiup/pkg/cluster/operation.Upgrade\“n\“tgithub.com/pingcap/tiup@/pkg/cluster/operation/upgrade.go:99\“ngithub.com/pingcap/tiup/pkg/cluster/task.(*ClusterOperate).Execute\“n\“tgithub.com/pingcap/tiup@/pkg/cluster/task/action.go:53\“ngithub.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute\“n\“tgithub.com/pingcap/tiup@/pkg/cluster/task/task.go:189\“ngithub.com/pingcap/tiup/components/cluster/command.upgrade\“n\“tgithub.com/pingcap/tiup@/components/cluster/command/upgrade.go:174\“ngithub.com/pingcap/tiup/components/cluster/command.newUpgradeCmd.func1\“n\“tgithub.com/pingcap/tiup@/components/cluster/command/upgrade.go:50\“ngithub.com/spf13/cobra.(*Command).execute\“n\“tgithub.com/spf13/cobra@v1.0.0/command.go:842\“ngithub.com/spf13/cobra.(*Command).ExecuteC\“n\“tgithub.com/spf13/cobra@v1.0.0/command.go:950\“ngithub.com/spf13/cobra.(*Command).Execute\“n\“tgithub.com/spf13/cobra@v1.0.0/command.go:887\“ngithub.com/pingcap/tiup/components/cluster/command.Execute\“n\“tgithub.com/pingcap/tiup@/components/cluster/command/root.go:220\“nmain.main\“n\“tgithub.com/pingcap/tiup@/components/cluster/main.go:19\“nruntime.main\“n\“truntime/proc.go:203\“nruntime.goexit\“n\“truntime/asm_amd64.s:1357\“nfailed to stop 192.168.1.43\“nfailed to upgrade”}


(2) 解决方案:


1、升级 tiup 到最新版本: tiup update –self && tiup update –all 升级以下 tiup 及其组件


为啥要升级,目的是要使用最新版本的 tiup 的下面 2 个参数:


tiup cluster –help


Flags:


-h, –help help for tiup


–ssh-timeout int Timeout in seconds to connect host via SSH, ignored for operations that don’t need an SSH connection. (default 5)


-v, –version version for tiup


–wait-timeout int Timeout in seconds to wait for an operation to complete, ignored for operations that don’t fit. (default 60)


如果报 ssh-timeout 相关的报错,这个是中控机跟 tikv/pd/tidb 机器建立 ssh 连接的超时时间,如果遇到网络不好等情况,可以调大这个参数时间


如果报 ERROR Run Command Timeout 相关的报错,这个是中控机跟 tikv/pd/tidb 机器执行命令的超时时间,如果遇到执行比较慢,可以调大这个参数时间。


2、调整了相关的 timeout 超时时间,执行了多次还是升级不成功,那就祭出最大的杀器:–force


滚动升级会逐个升级所有的组件。升级 TiKV 期间,会逐个将 TiKV 上的所有 leader 切走再停止该 TiKV 实例。默认超时时间为 5 分钟,超过后会直接停止实例。


如果不希望驱逐 leader,而希望立刻升级,可以在上述命令中指定 –force,该方式会造成性能抖动 (特别建议在凌晨低峰时间操作,将影响降低到最低),不会造成数据损失。


发布于: 刚刚阅读数: 2
用户头像

TiDB 社区官网:https://tidb.net/ 2021.12.15 加入

TiDB 社区干货传送门是由 TiDB 社区中布道师组委会自发组织的 TiDB 社区优质内容对外宣布的栏目,旨在加深 TiDBer 之间的交流和学习。一起构建有爱、互助、共创共建的 TiDB 社区 https://tidb.net/

评论

发布
暂无评论
TiUP升级集群报Run Command Timeout/SSH Timeout错误解决方案_TiDB 社区干货传送门_InfoQ写作社区