k8s 网络问题以及容器跨宿主机通信原理
- 2024-06-07 福建
本文字数:8223 字
阅读完需:约 27 分钟
【0】资源配置文件
[root@mcwk8s03 mcwtest]# ls
mcwdeploy.yaml
[root@mcwk8s03 mcwtest]# cat mcwdeploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: mcwpython
name: mcwtest-deploy
spec:
replicas: 1
selector:
matchLabels:
app: mcwpython
template:
metadata:
labels:
app: mcwpython
spec:
containers:
- command:
- sh
- -c
- echo 123 >>/mcw.txt && cd / && rm -rf /etc/yum.repos.d/* && curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-vault-8.5.2111.repo && yum install -y python2 && python2 -m SimpleHTTPServer 20000
image: centos
imagePullPolicy: IfNotPresent
name: mcwtest
dnsPolicy: "None"
dnsConfig:
nameservers:
- 8.8.8.8
- 8.8.4.4
searches:
#- namespace.svc.cluster.local
- my.dns.search.suffix
options:
- name: ndots
value: "5"
---
apiVersion: v1
kind: Service
metadata:
name: mcwtest-svc
spec:
ports:
- name: mcwport
port: 2024
protocol: TCP
targetPort: 20000
selector:
app: mcwpython
type: NodePort
[root@mcwk8s03 mcwtest]#
[1]查看服务部分通,部分不通
[root@mcwk8s03 mcwtest]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.2.0.1 <none> 443/TCP 583d
mcwtest-svc NodePort 10.2.0.155 <none> 2024:41527/TCP 133m
nginx ClusterIP None <none> 80/TCP 413d
[root@mcwk8s03 mcwtest]# curl -I 10.2.0.155
curl: (7) Failed connect to 10.2.0.155:80; Connection timed out
[root@mcwk8s03 mcwtest]# curl -I 10.0.0.33:41527
curl: (7) Failed connect to 10.0.0.33:41527; Connection refused
[root@mcwk8s03 mcwtest]# curl -I 10.0.0.36:41527
curl: (7) Failed connect to 10.0.0.36:41527; Connection timed out
[root@mcwk8s03 mcwtest]# curl -I 10.0.0.35:41527
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/2.7.18
Date: Tue, 04 Jun 2024 16:38:18 GMT
Content-type: text/html; charset=ANSI_X3.4-1968
Content-Length: 816
[root@mcwk8s03 mcwtest]#
【2】查看,能通的 IP,是因为在容器所在的宿主机,因此需要排查容器跨宿主机是否可以通信
[root@mcwk8s03 mcwtest]# kubectl get pod -o wide|grep mcwtest
mcwtest-deploy-6465665557-g9zjd 1/1 Running 0 37m 172.17.98.13 mcwk8s05 <none> <none>
[root@mcwk8s03 mcwtest]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
mcwk8s05 Ready <none> 580d v1.15.12 10.0.0.35 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.21
mcwk8s06 Ready <none> 580d v1.15.12 10.0.0.36 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.21
[root@mcwk8s03 mcwtest]#
【3】排查容器是否可以跨宿主机 IP,首先这个容器是在这个宿主机上,应该优先排查这个宿主机是否能到其他机器的 docker0 IP
[root@mcwk8s03 mcwtest]# ifconfig docker
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.83.1 netmask 255.255.255.0 broadcast 172.17.83.255
ether 02:42:e9:a4:51:4f txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@mcwk8s03 mcwtest]#
[root@mcwk8s05 /]# ifconfig docker
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 172.17.98.1 netmask 255.255.255.0 broadcast 172.17.98.255
inet6 fe80::42:18ff:fee1:e8fc prefixlen 64 scopeid 0x20<link>
ether 02:42:18:e1:e8:fc txqueuelen 0 (Ethernet)
RX packets 548174 bytes 215033771 (205.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 632239 bytes 885330301 (844.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@mcwk8s05 /]#
[root@mcwk8s06 ~]# ifconfig docker
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 172.17.9.1 netmask 255.255.255.0 broadcast 172.17.9.255
inet6 fe80::42:f0ff:fefa:133e prefixlen 64 scopeid 0x20<link>
ether 02:42:f0:fa:13:3e txqueuelen 0 (Ethernet)
RX packets 229 bytes 31724 (30.9 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 212 bytes 53292 (52.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@mcwk8s06 ~]#
可以看到,05宿主机是不通其他机器的docker0的,03 06是互通的
[root@mcwk8s05 /]# ping -c 1 172.17.83.1
PING 172.17.83.1 (172.17.83.1) 56(84) bytes of data.
^C
--- 172.17.83.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
[root@mcwk8s05 /]# ping -c 1 172.17.9.1
PING 172.17.9.1 (172.17.9.1) 56(84) bytes of data.
^C
--- 172.17.9.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
[root@mcwk8s05 /]#
[root@mcwk8s06 ~]# ping -c 1 172.17.83.1
PING 172.17.83.1 (172.17.83.1) 56(84) bytes of data.
64 bytes from 172.17.83.1: icmp_seq=1 ttl=64 time=0.246 ms
--- 172.17.83.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.246/0.246/0.246/0.000 ms
[root@mcwk8s06 ~]#
【4】可以看到,etcd 里面,没有 005 宿主机的 docker 网段的
[root@mcwk8s03 mcwtest]# etcdctl ls /coreos.com/network/subnets
/coreos.com/network/subnets/172.17.9.0-24
/coreos.com/network/subnets/172.17.83.0-24
[root@mcwk8s03 mcwtest]# etcdctl get /coreos.com/network/subnets/172.17.9.0-24
{"PublicIP":"10.0.0.36","BackendType":"vxlan","BackendData":{"VtepMAC":"2a:2c:21:3a:58:21"}}
[root@mcwk8s03 mcwtest]# etcdctl get /coreos.com/network/subnets/172.17.83.0-24
{"PublicIP":"10.0.0.33","BackendType":"vxlan","BackendData":{"VtepMAC":"b2:83:33:7b:fd:37"}}
[root@mcwk8s03 mcwtest]#
【5】重启 005 的 flannel 服务,如果不重启 docker0,那么网络就会有点问题,docker0 不会被分配新的网段 IP
[root@mcwk8s05 ~]# systemctl restart flanneld.service
[root@mcwk8s05 ~]# ifconfig flannel.1
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 172.17.89.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::3470:76ff:feea:39b8 prefixlen 64 scopeid 0x20<link>
ether 36:70:76:ea:39:b8 txqueuelen 0 (Ethernet)
RX packets 1 bytes 40 (40.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 9 bytes 540 (540.0 B)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
[root@mcwk8s05 ~]# ifconfig docker
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 172.17.98.1 netmask 255.255.255.0 broadcast 172.17.98.255
inet6 fe80::42:18ff:fee1:e8fc prefixlen 64 scopeid 0x20<link>
ether 02:42:18:e1:e8:fc txqueuelen 0 (Ethernet)
RX packets 551507 bytes 216568663 (206.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 635860 bytes 891305864 (850.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@mcwk8s05 ~]# systemctl restart docker
[root@mcwk8s05 ~]# ifconfig docker
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 172.17.89.1 netmask 255.255.255.0 broadcast 172.17.89.255
inet6 fe80::42:18ff:fee1:e8fc prefixlen 64 scopeid 0x20<link>
ether 02:42:18:e1:e8:fc txqueuelen 0 (Ethernet)
RX packets 552135 bytes 216658479 (206.6 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 636771 bytes 892057926 (850.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@mcwk8s05 ~]#
【6】再次查看,etcd 已经有了 05 节点的这个网段了
[root@mcwk8s03 mcwtest]# etcdctl ls /coreos.com/network/subnets
/coreos.com/network/subnets/172.17.83.0-24
/coreos.com/network/subnets/172.17.9.0-24
/coreos.com/network/subnets/172.17.89.0-24
[root@mcwk8s03 mcwtest]# etcdctl get /coreos.com/network/subnets/172.17.89.0-24
{"PublicIP":"10.0.0.35","BackendType":"vxlan","BackendData":{"VtepMAC":"36:70:76:ea:39:b8"}}
[root@mcwk8s03 mcwtest]#
【7】再次测试,05 节点的,可以 nodeport 访问到了
[root@mcwk8s03 mcwtest]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.2.0.1 <none> 443/TCP 583d
mcwtest-svc NodePort 10.2.0.155 <none> 2024:33958/TCP 154m
nginx ClusterIP None <none> 80/TCP 413d
[root@mcwk8s03 mcwtest]# curl -I 10.2.0.155:2024
curl: (7) Failed connect to 10.2.0.155:2024; Connection timed out
[root@mcwk8s03 mcwtest]# curl -I 10.0.0.33:33958
curl: (7) Failed connect to 10.0.0.33:33958; Connection refused
[root@mcwk8s03 mcwtest]# curl -I 10.0.0.35:33958
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/2.7.18
Date: Tue, 04 Jun 2024 16:59:03 GMT
Content-type: text/html; charset=ANSI_X3.4-1968
Content-Length: 816
[root@mcwk8s03 mcwtest]# curl -I 10.0.0.36:33958
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/2.7.18
Date: Tue, 04 Jun 2024 16:59:12 GMT
Content-type: text/html; charset=ANSI_X3.4-1968
Content-Length: 816
[root@mcwk8s03 mcwtest]#
【8】03 节点不通,可能是 03 是 master,但是它本身好像不作为 node,就是个单纯的 master,所以没法当做 nodeIP 去访问。但是集群 IP,无法访问,是怎么回事呢
[root@mcwk8s03 mcwtest]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.2.0.1 <none> 443/TCP 583d
mcwtest-svc NodePort 10.2.0.155 <none> 2024:33958/TCP 157m
nginx ClusterIP None <none> 80/TCP 413d
[root@mcwk8s03 mcwtest]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
mcwk8s05 Ready <none> 580d v1.15.12
mcwk8s06 Ready <none> 580d v1.15.12
[root@mcwk8s03 mcwtest]#
【9】路由方面,重启 flannel,etcd 里面会重新写入新的网段,并且其他节点也会有这个新的网段的路由,需要重启该宿主机的 docker,给容器重新分配新的 IP 用吧,应该。
之前 03 master 只有一个 flanel 的路由,
[root@mcwk8s03 mcwtest]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.0.254 0.0.0.0 UG 100 0 0 eth0
10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0
172.17.9.0 172.17.9.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.83.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
[root@mcwk8s03 mcwtest]#
05 node重启之后,03上多出来一个05 node的flannel路由,
[root@mcwk8s03 mcwtest]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.0.254 0.0.0.0 UG 100 0 0 eth0
10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0
172.17.9.0 172.17.9.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.83.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
172.17.89.0 172.17.89.0 255.255.255.0 UG 0 0 0 flannel.1
[root@mcwk8s03 mcwtest]#
06好的node也是如此
之前:
[root@mcwk8s06 ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.0.254 0.0.0.0 UG 100 0 0 eth0
10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0
172.17.9.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
172.17.83.0 172.17.83.0 255.255.255.0 UG 0 0 0 flannel.1
[root@mcwk8s06 ~]#
操作之后,多了个路由
[root@mcwk8s06 ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.0.254 0.0.0.0 UG 100 0 0 eth0
10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0
172.17.9.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
172.17.83.0 172.17.83.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.89.0 172.17.89.0 255.255.255.0 UG 0 0 0 flannel.1
[root@mcwk8s06 ~]#
05node之前是有两个路由,但是都是错误的
[root@mcwk8s05 /]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.0.254 0.0.0.0 UG 100 0 0 eth0
10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0
172.17.59.0 172.17.59.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.61.0 172.17.61.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.98.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
[root@mcwk8s05 /]#
05重启之后,也更新了路由为正确的
[root@mcwk8s05 /]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.0.254 0.0.0.0 UG 100 0 0 eth0
10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0
172.17.9.0 172.17.9.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.83.0 172.17.83.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.89.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
[root@mcwk8s05 /]#
【10】然后看看集群 IP 为啥不通,这直接原因应该是没有 ipvs 规则。
[root@mcwk8s03 mcwtest]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.2.0.1 <none> 443/TCP 583d
mcwtest-svc NodePort 10.2.0.155 <none> 2024:33958/TCP 3h3m
nginx ClusterIP None <none> 80/TCP 413d
[root@mcwk8s03 mcwtest]# curl -I 10.2.0.155:2024
curl: (7) Failed connect to 10.2.0.155:2024; Connection timed out
[root@mcwk8s03 mcwtest]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
[root@mcwk8s03 mcwtest]# systemctl status kube-proxy
Unit kube-proxy.service could not be found.
[root@mcwk8s03 mcwtest]#
其它节点都是有ipvs规则的,有时间确认下,ipvsadm规则是哪个服务创建的,应该是kube-proxy创建的吧,我们的03 master是没有部署kube-proxy,这样的话,没有ipvs规则并且无法在master上访问集群IP就说的通了。如过是这样的话,也就是容器集群IP之间的通信,跟ipvs有关,跟apiserver是否挂了没有直接关系,不影响,有时间验证
[root@mcwk8s05 /]# ipvsadm -Ln|grep -C 1 10.2.0.155
-> 172.17.89.4:9090 Masq 1 0 0
TCP 10.2.0.155:2024 rr
-> 172.17.89.10:20000 Masq 1 0 0
[root@mcwk8s05 /]#
[root@mcwk8s05 /]# curl -I 10.2.0.155:2024
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/2.7.18
Date: Tue, 04 Jun 2024 17:34:45 GMT
Content-type: text/html; charset=ANSI_X3.4-1968
Content-Length: 816
[root@mcwk8s05 /]#
【综上】
单节点的容器网络用 nodeIP:nodeport 等之类的用不了了,可以优先检查下 docker0 网关是不是不通了;如果 flannel 等网络服务重启之后,即使是 pod 方式部署的网络插件,也要看下重启 docker 服务,让容器分配新的网段。比如某次 flannel oom 总是重启,导致 kube001 网络故障;每个机器,好像都有其他节点的 flannel.1 网卡网段的路由,使用当前机器的 flannel.1 的网卡接口;并且有当前节点 docker0 网段的路由,是走的 0.0.0.0 网关,走默认路由由上面可以知道,svc 里面有个是 clusterIP 用的端口,clusterIP 是用 ipvs 规则管理,进行数据转发的;nodeip clusterIP 的 ipvs 规则进行转发,因为转发到后端容器 IP,当前宿主机有所有 node 的 flannel 网段的路由,所有 flannel 网关以及 docker0 都正常且对应网段的话,那么就是用这两个接口实现容器跨宿主机通信的。而至于访问 nodeip:nodeport 以及 clusterIP:port 是怎么知道把流量给到正确的 pod 的,这里是通过 ipvs 规则来实现寻找到后端 pod 的,转发到 pod 对应的 IP 和端口的时候,又根据当前机器有所有网络插件 flannel,也就是每个宿主机单独网段的路由条目,让它们知道自己要去走 flannel.1 接口。而 etcd 保存有哪个网段是哪个宿主机,有对应宿主机 IP,找到宿主机 IP 了,那么在该机器上转发到的 pod 机器,在那个宿主机上就是能用该 IP,使用 docker0 进行通信的,因为单个宿主机上的容器,都是通过 docker0 进行通信,并且互通的。只有跨宿主机通信容器的时候,才会根据路由,找到 flannel.1 接口,然后在 etcd 找到是那个宿主机上的容器,然后找到这个容器,完成通信。flannel 网络插件这样,其它网络插件原理类似;ipvs 网络代理模式是这个作用,iptables 网络代理模式作用类似。
文章转载自:马昌伟
EquatorCoco
还未添加个人签名 2023-06-19 加入
还未添加个人简介
评论