DM 集群迁移
- 2025-03-14 北京
本文字数:4697 字
阅读完需:约 15 分钟
作者: Hacker_ 小峰原文来源:https://tidb.net/blog/6aaca77c
DM 集群迁移
因原来 DM 集群所在机器机架需要下线,所以需要迁移 DM 集群及其上的同步任务。通过先扩容后缩容的方式做迁移,难点是几个正在使用 dm-worker 同步的任务的迁移需要重点关注的。
当前 DM 集群 状态:
$ tiup dm display dm-001
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
10.0.0.5:9093 alertmanager 10.0.0.5 9093/9094 linux/x86_64 Up /data/dm-data/alertmanager-9093 /data/dm-deploy/alertmanager-9093
10.0.0.6:8265 dm-master 10.0.0.6 8265/8295 linux/x86_64 Healthy /data01/dm-data/dm-master-8265 /data01/dm-deploy/dm-master-8265
10.0.0.5:8261 dm-master 10.0.0.5 8261/8291 linux/x86_64 Healthy|L /data/dm-data/dm-master-8261 /data/dm-deploy/dm-master-8261
10.0.0.5:8263 dm-master 10.0.0.5 8263/8293 linux/x86_64 Healthy /data03/dm-data/dm-master-8263 /data03/dm-deploy/dm-master-8263
10.0.0.6:8268 dm-worker 10.0.0.6 8268 linux/x86_64 Free /data02/dm-data/dm-worker-8268 /data02/dm-deploy/dm-worker-8268
10.0.0.5:8262 dm-worker 10.0.0.5 8262 linux/x86_64 Bound /data/dm-data/dm-worker-8262 /data01/dm-deploy/dm-worker-8262
10.0.0.5:8264 dm-worker 10.0.0.5 8264 linux/x86_64 Free /data/dm-data/dm-worker-8264 /data02/dm-deploy/dm-worker-8264
10.0.0.5:8266 dm-worker 10.0.0.5 8266 linux/x86_64 Bound /data/dm-data/dm-worker-8266 /data04/dm-deploy/dm-worker-8266
10.0.0.5:3000 grafana 10.0.0.5 3000 linux/x86_64 Up - /data/dm-deploy/grafana-3000
10.0.0.5:9090 prometheus 10.0.0.5 9090 linux/x86_64 Up /data/dm-data/prometheus-9090 /data/dm-deploy/prometheus-9090
Total nodes: 10
迁移目标:需要替换掉所有 10.0.0.5 的节点,替换为新的 机器: 10.0.0.8 。
两台机器需要软硬件配置一致,最好磁盘目录也一样。
# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 63G 0 63G 0% /dev
tmpfs 63G 0 63G 0% /dev/shm
tmpfs 63G 2.1M 63G 1% /run
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/sda3 50G 14G 37G 27% /
/dev/sda2 494M 170M 324M 35% /boot
/dev/sda4 40G 431M 40G 2% /var
/dev/sda5 1.1T 223M 1.1T 1% /data
/dev/sdb1 894G 34M 894G 1% /data01
/dev/sdc1 894G 34M 894G 1% /data02
/dev/sdd1 894G 34M 894G 1% /data03
/dev/sde1 894G 34M 894G 1% /data04
tmpfs 13G 0 13G 0% /run/user/0
扩容新节点 dm-master、dm-worker
扩容新节点,以替换需要下线的旧节点。
扩容配置文件,重点关注磁盘目录分配和端口,dm-master 普通磁盘 SAS 即可,dm-worker 需要 SSD 磁盘。
tiup dm scale-out dm-001 dm-scale-out.yaml
$ cat dm-scale-out.yaml
---
master_servers:
- host: 10.0.0.8
name: master-1
# ssh_port: 22
port: 8261
peer_port: 8291
deploy_dir: "/data01/dm-deploy/dm-master-8261"
data_dir: "/data01/dm-data/dm-master-8261"
log_dir: "/data01/dm-deploy/dm-master-8261/log"
- host: 10.0.0.8
name: master-2
# ssh_port: 22
port: 8263
peer_port: 8293
deploy_dir: "/data/dm-deploy/dm-master-8263"
data_dir: "/data/dm-data/dm-master-8263"
log_dir: "/data/dm-deploy/dm-master-8263/log"
worker_servers:
- host: 10.0.0.8
# ssh_port: 22
name: dm-10.0.0.8-8262
port: 8262
deploy_dir: "/data02/dm-deploy/dm-worker-8262"
data_dir: "/data02/dm-data/dm-worker-8262"
log_dir: "/data02/dm-deploy/dm-worker-8262/log"
- host: 10.0.0.8
# ssh_port: 22
name: dm-10.0.0.8-8264
port: 8264
deploy_dir: "/data03/dm-deploy/dm-worker-8264"
data_dir: "/data03/dm-data/dm-worker-8264"
log_dir: "/data03/dm-deploy/dm-worker-8264/log"
迁移 dm-worker 同步任务【重点】
主要需要 改变数据源与 DM-worker 的绑定关系 。
如何变更 dm-worker 绑定?
dmctl --master-addr <master-addr> operate-source show #查看源数据库列表
dmctl --master-addr <master-addr> get-config source <source-id> #直接查看数据源配置。
tiup dmctl --master-addr 10.0.0.8:8261 pause-task <task-name> #先暂停任务,才能改变dm-worker 绑定
dmctl --master-addr <master-addr> list-member --worker #查看source 与worker 绑定关系
# 在本示例中 <source-id> 绑定到了 dm-10.0.0.5-8262 上。
# 使用如下命令可以将该数据源绑定到 dm-10.0.0.8-8262 上
tiup dmctl --master-addr 10.0.0.8:8261 transfer-source <source-id> dm-10.0.0.8-8262
1、查看数据源配置
tiup dmctl --master-addr 10.0.0.8:8261 operate-source show
{
"result": true,
"msg": "",
"sources": [
{
"result": true,
"msg": "",
"source": "<source-id>",
"worker": "dm-10.0.0.5-8262"
},
{
"result": true,
"msg": "",
"source": "<source-id>",
"worker": "dm-10.0.0.5-8266"
}
]
}
tiup dmctl --master-addr 10.0.0.8:8261 get-config source <source-id>
2、通过 transfer-source 改变数据源与 DM-worker 的绑定关系
2.1 list-member 列出 DM-worker 的绑定关系
dmctl --master-addr <master-addr> list-member --worker
dmctl --master-addr 10.0.0.8:8261 list-member --worker
{
"result": true,
"msg": "",
"members": [
{
"worker": {
"msg": "",
"workers": [
{
"name": "dm-10.0.0.6-8268",
"addr": "10.0.0.6:8268",
"stage": "free",
"source": ""
},
{
"name": "dm-10.0.0.8-8262",
"addr": "10.0.0.8:8262",
"stage": "free",
"source": ""
},
{
"name": "dm-10.0.0.8-8264",
"addr": "10.0.0.8:8264",
"stage": "free",
"source": ""
},
{
"name": "dm-10.0.0.5-8262",
"addr": "10.0.0.5:8262",
"stage": "bound",
"source": "<source-id>"
},
{
"name": "dm-10.0.0.5-8266",
"addr": "10.0.0.5:8266",
"stage": "bound",
"source": "<source-id>"
}
]
}
}
]
}
2.2 pause-task
在改变绑定关系前,DM 会检查待解绑的 worker 是否正在运行同步任务,如果正在运行则需要先 暂停任务 ,并在改变绑定关系后 恢复任务 。
#先将 sources "<source-id>" task 任务 <task-name> 暂停。
tiup dmctl --master-addr 10.0.0.8:8261 pause-task <task-name>
tiup dmctl --master-addr 10.0.0.8:8261 query-status <task-name>
2.3 transfer-source
# 在本示例中 <source-id> 绑定到了 dm-10.0.0.5-8262 上。
# 使用如下命令可以将该数据源绑定到 dm-10.0.0.8-8262 上
tiup dmctl --master-addr 10.0.0.8:8261 transfer-source <source-id> dm-10.0.0.8-8262
2.4 resume-task
# 再次通过 dmctl --master-addr <master-addr> list-member --worker 查看,检查命令已生效。
tiup dmctl --master-addr 10.0.0.8:8261 list-member --worker
tiup dmctl --master-addr 10.0.0.8:8261 resume-task <task-name>
tiup dmctl --master-addr 10.0.0.8:8261 query-status <task-name>
变更下一个 dm-worker 绑定关系:
tiup dmctl --master-addr 10.0.0.8:8261 pause-task <task-name2>
tiup dmctl --master-addr 10.0.0.8:8261 transfer-source <source-id> dm-10.0.0.8-8264
tiup dmctl --master-addr 10.0.0.8:8261 list-member --worker
tiup dmctl --master-addr 10.0.0.8:8261 resume-task <task-name2>
tiup dmctl --master-addr 10.0.0.8:8261 query-status <task-name2>
迁移监控节点【难点】
报错:
{"code": 1, "error": "executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@10.0.0.8:22' {ssh_stderr: FAILED: instance 0 in group 3: no address\n, ssh_stdout: Checking /data/dm-deploy/prometheus-9090/conf/prometheus.yml\n, ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/dm-deploy/prometheus-9090/bin/prometheus/promtool check config /data/dm-deploy/prometheus-9090/conf/prometheus.yml}, cause: Process exited with status 1: check config failed", "errorVerbose": "check config failed\nexecutor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@10.0.0.8:22' {ssh_stderr: FAILED: instance 0 in group 3: no address\n, ssh_stdout: Checking /data/dm-deploy/prometheus-9090/conf/prometheus.yml\n, ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/dm-deploy/prometheus-9090/bin/prometheus/promtool check config ...
ssh 免登陆
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub 10.1.0.0
用扩容在缩容的方式迁移 grafana/prometheus/altermanager 总是有奇怪 ssh 报错,后来看了 TiDB 论坛发现一句话,需要先缩容在扩监控,五分钟搞定[捂脸].
缩容 grafana、Prometheus、altermanager
需要先缩容原来节点,再扩容
tiup dm scale-in dm-001 -N 10.1.1.1:xxxx
扩容监控节点到新机器
tiup dm scale-out dm-001 /home/tidb/dm/dm-scale-out-grafana.yaml
$ cat dm-scale-out-grafana.yaml
---
monitoring_servers:
- host: 10.0.0.8
grafana_servers:
- host: 10.0.0.8
alertmanager_servers:
- host: 10.0.0.8
其他
变更 IP 后,对应的监控 URL 后端 IP 也需要更新。
版权声明: 本文为 InfoQ 作者【TiDB 社区干货传送门】的原创文章。
原文链接:【http://xie.infoq.cn/article/e37aeb0c477f854462cacfb5f】。文章转载请联系作者。

TiDB 社区干货传送门
TiDB 社区官网:https://tidb.net/ 2021-12-15 加入
TiDB 社区干货传送门是由 TiDB 社区中布道师组委会自发组织的 TiDB 社区优质内容对外宣布的栏目,旨在加深 TiDBer 之间的交流和学习。一起构建有爱、互助、共创共建的 TiDB 社区 https://tidb.net/
评论