写点什么

DM 集群迁移

  • 2025-03-14
    北京
  • 本文字数:4697 字

    阅读完需:约 15 分钟

作者: Hacker_ 小峰原文来源:https://tidb.net/blog/6aaca77c

DM 集群迁移

因原来 DM 集群所在机器机架需要下线,所以需要迁移 DM 集群及其上的同步任务。通过先扩容后缩容的方式做迁移,难点是几个正在使用 dm-worker 同步的任务的迁移需要重点关注的。


当前 DM 集群 状态:


$ tiup dm display dm-001
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir-- ---- ---- ----- ------- ------ -------- ----------10.0.0.5:9093 alertmanager 10.0.0.5 9093/9094 linux/x86_64 Up /data/dm-data/alertmanager-9093 /data/dm-deploy/alertmanager-909310.0.0.6:8265 dm-master 10.0.0.6 8265/8295 linux/x86_64 Healthy /data01/dm-data/dm-master-8265 /data01/dm-deploy/dm-master-826510.0.0.5:8261 dm-master 10.0.0.5 8261/8291 linux/x86_64 Healthy|L /data/dm-data/dm-master-8261 /data/dm-deploy/dm-master-826110.0.0.5:8263 dm-master 10.0.0.5 8263/8293 linux/x86_64 Healthy /data03/dm-data/dm-master-8263 /data03/dm-deploy/dm-master-826310.0.0.6:8268 dm-worker 10.0.0.6 8268 linux/x86_64 Free /data02/dm-data/dm-worker-8268 /data02/dm-deploy/dm-worker-826810.0.0.5:8262 dm-worker 10.0.0.5 8262 linux/x86_64 Bound /data/dm-data/dm-worker-8262 /data01/dm-deploy/dm-worker-826210.0.0.5:8264 dm-worker 10.0.0.5 8264 linux/x86_64 Free /data/dm-data/dm-worker-8264 /data02/dm-deploy/dm-worker-826410.0.0.5:8266 dm-worker 10.0.0.5 8266 linux/x86_64 Bound /data/dm-data/dm-worker-8266 /data04/dm-deploy/dm-worker-826610.0.0.5:3000 grafana 10.0.0.5 3000 linux/x86_64 Up - /data/dm-deploy/grafana-300010.0.0.5:9090 prometheus 10.0.0.5 9090 linux/x86_64 Up /data/dm-data/prometheus-9090 /data/dm-deploy/prometheus-9090Total nodes: 10
复制代码


迁移目标:需要替换掉所有 10.0.0.5 的节点,替换为新的 机器: 10.0.0.8 。


两台机器需要软硬件配置一致,最好磁盘目录也一样。


# df -hFilesystem      Size  Used Avail Use% Mounted ondevtmpfs         63G     0   63G   0% /devtmpfs            63G     0   63G   0% /dev/shmtmpfs            63G  2.1M   63G   1% /runtmpfs            63G     0   63G   0% /sys/fs/cgroup/dev/sda3        50G   14G   37G  27% //dev/sda2       494M  170M  324M  35% /boot/dev/sda4        40G  431M   40G   2% /var/dev/sda5       1.1T  223M  1.1T   1% /data/dev/sdb1       894G   34M  894G   1% /data01/dev/sdc1       894G   34M  894G   1% /data02/dev/sdd1       894G   34M  894G   1% /data03/dev/sde1       894G   34M  894G   1% /data04tmpfs            13G     0   13G   0% /run/user/0
复制代码

扩容新节点 dm-master、dm-worker

扩容新节点,以替换需要下线的旧节点。


扩容配置文件,重点关注磁盘目录分配和端口,dm-master 普通磁盘 SAS 即可,dm-worker 需要 SSD 磁盘


tiup dm scale-out dm-001 dm-scale-out.yaml
复制代码


$ cat dm-scale-out.yaml---master_servers:  - host: 10.0.0.8    name: master-1    # ssh_port: 22    port: 8261    peer_port: 8291    deploy_dir: "/data01/dm-deploy/dm-master-8261"    data_dir: "/data01/dm-data/dm-master-8261"    log_dir: "/data01/dm-deploy/dm-master-8261/log"  - host: 10.0.0.8    name: master-2    # ssh_port: 22    port: 8263    peer_port: 8293    deploy_dir: "/data/dm-deploy/dm-master-8263"    data_dir: "/data/dm-data/dm-master-8263"    log_dir: "/data/dm-deploy/dm-master-8263/log"
worker_servers: - host: 10.0.0.8 # ssh_port: 22 name: dm-10.0.0.8-8262 port: 8262 deploy_dir: "/data02/dm-deploy/dm-worker-8262" data_dir: "/data02/dm-data/dm-worker-8262" log_dir: "/data02/dm-deploy/dm-worker-8262/log" - host: 10.0.0.8 # ssh_port: 22 name: dm-10.0.0.8-8264 port: 8264 deploy_dir: "/data03/dm-deploy/dm-worker-8264" data_dir: "/data03/dm-data/dm-worker-8264" log_dir: "/data03/dm-deploy/dm-worker-8264/log"
复制代码

迁移 dm-worker 同步任务【重点】

主要需要 改变数据源与 DM-worker 的绑定关系


如何变更 dm-worker 绑定?


dmctl --master-addr <master-addr> operate-source show #查看源数据库列表dmctl --master-addr <master-addr> get-config source <source-id> #直接查看数据源配置。
tiup dmctl --master-addr 10.0.0.8:8261 pause-task <task-name> #先暂停任务,才能改变dm-worker 绑定
dmctl --master-addr <master-addr> list-member --worker #查看source 与worker 绑定关系
# 在本示例中 <source-id> 绑定到了 dm-10.0.0.5-8262 上。# 使用如下命令可以将该数据源绑定到 dm-10.0.0.8-8262 上
tiup dmctl --master-addr 10.0.0.8:8261 transfer-source <source-id> dm-10.0.0.8-8262
复制代码

1、查看数据源配置

tiup dmctl --master-addr 10.0.0.8:8261 operate-source show
{ "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "<source-id>", "worker": "dm-10.0.0.5-8262" }, { "result": true, "msg": "", "source": "<source-id>", "worker": "dm-10.0.0.5-8266" } ]}

复制代码


tiup dmctl --master-addr 10.0.0.8:8261 get-config source <source-id>
复制代码

2、通过 transfer-source 改变数据源与 DM-worker 的绑定关系

2.1 list-member 列出 DM-worker 的绑定关系

dmctl --master-addr <master-addr> list-member --worker


dmctl --master-addr 10.0.0.8:8261 list-member --worker
{ "result": true, "msg": "", "members": [ { "worker": { "msg": "", "workers": [ { "name": "dm-10.0.0.6-8268", "addr": "10.0.0.6:8268", "stage": "free", "source": "" }, { "name": "dm-10.0.0.8-8262", "addr": "10.0.0.8:8262", "stage": "free", "source": "" }, { "name": "dm-10.0.0.8-8264", "addr": "10.0.0.8:8264", "stage": "free", "source": "" }, { "name": "dm-10.0.0.5-8262", "addr": "10.0.0.5:8262", "stage": "bound", "source": "<source-id>" }, { "name": "dm-10.0.0.5-8266", "addr": "10.0.0.5:8266", "stage": "bound", "source": "<source-id>" } ] } } ]}
复制代码

2.2 pause-task

在改变绑定关系前,DM 会检查待解绑的 worker 是否正在运行同步任务,如果正在运行则需要先 暂停任务 ,并在改变绑定关系后 恢复任务 。


#先将 sources "<source-id>" task 任务  <task-name>  暂停。
tiup dmctl --master-addr 10.0.0.8:8261 pause-task <task-name> tiup dmctl --master-addr 10.0.0.8:8261 query-status <task-name>
复制代码

2.3 transfer-source

# 在本示例中 <source-id> 绑定到了 dm-10.0.0.5-8262 上。# 使用如下命令可以将该数据源绑定到            dm-10.0.0.8-8262 上
tiup dmctl --master-addr 10.0.0.8:8261 transfer-source <source-id> dm-10.0.0.8-8262

复制代码

2.4 resume-task

# 再次通过 dmctl --master-addr <master-addr> list-member --worker 查看,检查命令已生效。
tiup dmctl --master-addr 10.0.0.8:8261 list-member --worker
tiup dmctl --master-addr 10.0.0.8:8261 resume-task <task-name>
tiup dmctl --master-addr 10.0.0.8:8261 query-status <task-name>

复制代码


变更下一个 dm-worker 绑定关系:


tiup dmctl --master-addr 10.0.0.8:8261 pause-task <task-name2>tiup dmctl --master-addr 10.0.0.8:8261 transfer-source <source-id> dm-10.0.0.8-8264tiup dmctl --master-addr 10.0.0.8:8261 list-member --workertiup dmctl --master-addr 10.0.0.8:8261 resume-task <task-name2>tiup dmctl --master-addr 10.0.0.8:8261 query-status <task-name2>
复制代码

迁移监控节点【难点】

报错:


{"code": 1, "error": "executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@10.0.0.8:22' {ssh_stderr:   FAILED: instance 0 in group 3: no address\n, ssh_stdout: Checking /data/dm-deploy/prometheus-9090/conf/prometheus.yml\n, ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/dm-deploy/prometheus-9090/bin/prometheus/promtool check config /data/dm-deploy/prometheus-9090/conf/prometheus.yml}, cause: Process exited with status 1: check config failed", "errorVerbose": "check config failed\nexecutor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@10.0.0.8:22' {ssh_stderr:   FAILED: instance 0 in group 3: no address\n, ssh_stdout: Checking /data/dm-deploy/prometheus-9090/conf/prometheus.yml\n, ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/dm-deploy/prometheus-9090/bin/prometheus/promtool check config ...
复制代码


ssh 免登陆


ssh-keygen -t rsassh-copy-id -i ~/.ssh/id_rsa.pub 10.1.0.0
复制代码


用扩容在缩容的方式迁移 grafana/prometheus/altermanager 总是有奇怪 ssh 报错,后来看了 TiDB 论坛发现一句话,需要先缩容在扩监控,五分钟搞定[捂脸].

缩容 grafana、Prometheus、altermanager

需要先缩容原来节点,再扩容tiup dm scale-in dm-001 -N 10.1.1.1:xxxx

复制代码

扩容监控节点到新机器

tiup dm scale-out dm-001 /home/tidb/dm/dm-scale-out-grafana.yaml
复制代码


$ cat dm-scale-out-grafana.yaml---monitoring_servers:- host: 10.0.0.8grafana_servers:- host: 10.0.0.8alertmanager_servers:- host: 10.0.0.8
复制代码

其他

变更 IP 后,对应的监控 URL 后端 IP 也需要更新。


TiDB 中控机迁移 详见


发布于: 刚刚阅读数: 3
用户头像

TiDB 社区官网:https://tidb.net/ 2021-12-15 加入

TiDB 社区干货传送门是由 TiDB 社区中布道师组委会自发组织的 TiDB 社区优质内容对外宣布的栏目,旨在加深 TiDBer 之间的交流和学习。一起构建有爱、互助、共创共建的 TiDB 社区 https://tidb.net/

评论

发布
暂无评论
DM 集群迁移_迁移_TiDB 社区干货传送门_InfoQ写作社区