本文分享自华为云社区《Calico BGP RouteReflector策略实践》,作者:可以交个朋友。
一 背景
容器网络组件 Calico 支持多种后端模式,有 Overlay 的 IPIP、Vxlan 模式,也有 Underlay 纯路由的 BGP 模式。
相比于 Overlay 网络模型,Underlay 网络具有更高的数据面转发性能。同时在纯路由模式下,也有两种方案:Calico BGP 的 fullmesh 方案,该方案存在一些限制,适用于小规模 kubernetes 集群,集群节点越多,BGP 连接就越多,需要建立大量连接来保证网络的互通性,每增加一个节点就要成倍的增加连接保证网络的互通性,这样的话就会使用大量的网络消耗。所以这时就可以使用 Route Reflector 模式,也称为 RR 模式。RR模式 中会指定一个或多个 BGP Speaker 为 RouterReflecor,它与网络中其他 Speaker 建立连接,每个 Speaker 只要与 Router Reflector 建立 BGP 就可以获得全网的路由信息。
二 Calico BGP RouteReflector 模式组网架构
在不改变 IDC 机房内部网络拓扑的情况下,接入层交换机和核心层交换机建立 BGP 连接,借助于机房内部已有的路由策略实现,针对 Node 所处的物理位置分配 Pod CIDR,并在每个节点上将 Pod CIDR 通过 BGP 协议宣告给接入层交换机,实现全网通信的能力。下图基于 Leaf-Spine 架构做详细说明。
组网原则:
每个接入层交换机与其管理的 Node 二层联通,共同构成一个 AS。每个节点上跑 BGP 服务,用于宣告本节点路由信息。
核心层交换机和接入层交换机之间的每个路由器单独占用一个 AS,物理直连,跑 BGP 协议。核心层交换机可以感知到全网的路由信息,接入层交换机可以感知与自己直连的 Node 上的路由信息。
同一个主机上的 pod 互访通过宿主机路由器。(将 linux 主机当成一个路由器)
同一个机架上不同 node 上的 pod 通信通过 TOR(leaf)交换机
不同机架上 pod 通信走核心交换机
三 模拟生产场景组网搭建环境
提前准备一台 Ubuntu2204 操作系统的机器(规格 8U16G 即可)。需要在虚拟机上安装如下软件工具:
Docker
go 开发环境
Kind(kubernetes 兴趣小组开发的一款 kuberntes in docker 软件,可用来快速搭建 k8s 测试环境,kind 安装需要主机上先安装 go,kind 安装版本可选 v0.20.0 版本)
ContainerLab(使用容器技术构建的虚拟网络平台,可以使用 vyos 镜像构建虚拟的交换机路由器。建议安装 v0.42.0 版本的 containerlab)
3.1 kubernetes 环境搭建
kubernetes 集群版本为: 1.27.3
集群规模为 1 master,3 work node
集群构建脚本如下: 1-setup-env.sh
#!/bin/bashdateset -v
# 1.prep noCNI envcat <<EOF | kind create cluster --name=calico-bgp-rr --image=kindest/node:v1.27.3 --config=-kind: ClusterapiVersion: kind.x-k8s.io/v1alpha4networking: disableDefaultCNI: true podSubnet: "10.244.0.0/16"nodes:- role: control-plane kubeadmConfigPatches: - | kind: InitConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.5.10 node-labels: "rack=rack0"
- role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.5.11 node-labels: "rack=rack0"
- role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.8.10 node-labels: "rack=rack1"
- role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.8.11 node-labels: "rack=rack1"
EOF# 2.remove taintskubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/control-plane:NoSchedule-kubectl get nodes -o wide
# 3. install toolsfor i in $(docker ps -a --format "table {{.Names}}" |grep calico-bgp-rr)do echo $i docker cp /usr/bin/ping $i:/usr/bin/ping docker cp /usr/local/bin/calicoctl $i:/usr/local/bin/ # docker exec -it $i bash -c "apt-get -y update > /dev/null && apt-get -y install net-tools tcpdump lrzsz > /dev/null 2>&1"done
复制代码
执行脚本创建集群,由于未安装 cni 组件,集群部分 pod 会出现 pending 等状态,集群 node 也会处于 NotReady 状态,这是正常现象。后面安装 calico cni 组件后,就可以解决。
3.2 创建网桥
在主机上创建网桥,主要作用是为了连通 kind 创建的 K8s node 和 containerlab 构建的交换机之间的网络。
brctl addbr br-leaf0;ifconfig br-leaf0 up;brctl addbr br-leaf1;ifconfig br-leaf1 up
3.3 借助 containerLab 搭建三层交换机并配置 BGP 规则
containerlab 构建交换机脚本如下:2-setup-clab.sh
#!/bin/bashset -v
cat <<EOF>clab.yaml | clab deploy -t clab.yaml -name: calico-bgp-rrtopology: nodes: spine0: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/spine0-boot.cfg:/opt/vyatta/etc/config/config.boot
spine1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/spine1-boot.cfg:/opt/vyatta/etc/config/config.boot
leaf0: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/leaf0-boot.cfg:/opt/vyatta/etc/config/config.boot
leaf1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/leaf1-boot.cfg:/opt/vyatta/etc/config/config.boot
br-leaf0: kind: bridge br-leaf1: kind: bridge
server1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-control-plane exec: - ip addr add 10.1.5.10/24 dev net0 - ip route replace default via 10.1.5.1
server2: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker exec: - ip addr add 10.1.5.11/24 dev net0 - ip route replace default via 10.1.5.1
server3: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker2 exec: - ip addr add 10.1.8.10/24 dev net0 - ip route replace default via 10.1.8.1
server4: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker3 exec: - ip addr add 10.1.8.11/24 dev net0 - ip route replace default via 10.1.8.1 links: - endpoints: ["br-leaf0:br-leaf0-net0", "server1:net0"] - endpoints: ["br-leaf0:br-leaf0-net1", "server2:net0"]
- endpoints: ["br-leaf1:br-leaf1-net0", "server3:net0"] - endpoints: ["br-leaf1:br-leaf1-net1", "server4:net0"]
- endpoints: ["leaf0:eth1", "spine0:eth1"] - endpoints: ["leaf0:eth2", "spine1:eth1"] - endpoints: ["leaf0:eth3", "br-leaf0:br-leaf0-net2"]
- endpoints: ["leaf1:eth1", "spine0:eth2"] - endpoints: ["leaf1:eth2", "spine1:eth2"] - endpoints: ["leaf1:eth3", "br-leaf1:br-leaf1-net2"]
EOF
复制代码
可以看到 containerlab 组网成功,vyos 对应的交换机上的 bgp 路由协议配置参照文档末尾。
3.4 Calico cni 插件部署安装
由于 Calico 默认安装的是 ipip 模式,需要手动进行关闭,不通过 ipip/vxlan 封装就会开启 bgp 模式。
kubectl apply -f calico.yaml
#kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.23/manifests/calico.yaml
Calico 组件安装完成后,节点之间建立的 BGP 连接是 fullmesh 全连接的形式
3.5 Calico BGP RR 模式开启
fullmesh 全连接形式在大规模集群中并不适用,我们需要关闭 bgp fullmesh 的模式,采取 bgp route reflector
方法如下: 3-disable-bgp-full-mesh.sh
#!/bin/bashset -v# 1. disable bgp fullmeshcat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3items:- apiVersion: projectcalico.org/v3 kind: BGPConfiguration metadata: name: default spec: logSeverityScreen: Info nodeToNodeMeshEnabled: falsekind: BGPConfigurationListmetadata:EOF
复制代码
3.6 Calico node 配置 BGP RR 规则
kubernetes 集群中的节点作为 BGP 路由反射器的客户端,需要和 BGP 路由反射器配置 peer 信息以达到同步路由的功能。
#!/bin/bashset -v# 1.3. add() bgp configuration for the nodescat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3kind: Nodemetadata: annotations: labels: rack: rack0 name: calico-bgp-rr-control-planespec: addresses: - address: 10.1.5.10 type: InternalIP bgp: asNumber: 65005 ipv4Address: 10.1.5.10/24 orchRefs: - nodeName: calico-bgp-rr-control-plane orchestrator: k8sEOF
cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3kind: Nodemetadata: labels: rack: rack0 name: calico-bgp-rr-workerspec: addresses: - address: 10.1.5.11 type: InternalIP bgp: asNumber: 65005 ipv4Address: 10.1.5.11/24 orchRefs: - nodeName: calico-bgp-rr-worker orchestrator: k8s
EOFcat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3kind: Nodemetadata: labels: rack: rack1 name: calico-bgp-rr-worker2spec: addresses: - address: 10.1.8.10 type: InternalIP bgp: asNumber: 65008 ipv4Address: 10.1.8.10/24 orchRefs: - nodeName: calico-bgp-rr-worker2 orchestrator: k8s
EOFcat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3kind: Nodemetadata: labels: rack: rack1 name: calico-bgp-rr-worker3spec: addresses: - address: 10.1.8.11 type: InternalIP bgp: asNumber: 65008 ipv4Address: 10.1.8.11/24 orchRefs: - nodeName: calico-bgp-rr-worker3 orchestrator: k8s
EOF
# 1.4. peer to leaf0 switchcat <<EOF | calicoctl apply -f -apiVersion: projectcalico.org/v3kind: BGPPeermetadata: name: rack0-to-leaf0spec: peerIP: 10.1.5.1 asNumber: 65005 nodeSelector: rack == 'rack0'EOF
# 1.5. peer to leaf1 switchcat <<EOF | calicoctl apply -f -apiVersion: projectcalico.org/v3kind: BGPPeermetadata: name: rack1-to-leaf1spec: peerIP: 10.1.8.1 asNumber: 65008 nodeSelector: rack == 'rack1'EOF
复制代码
登录到集群中任意节点,查看 BGP 信息: 发现已经不再是 BGP full mesh 的形式了。node specific 表示该节点是路由反射器的客户端,对端即路由反射器是 10.1.5.1 这个地址
四 集群外访问 Pod 进行 BGP 验证测试
部署测试业务
apiVersion: apps/v1kind: DaemonSet#kind: Deploymentmetadata: labels: app: app name: appspec: #replicas: 2 selector: matchLabels: app: app template: metadata: labels: app: app spec: containers: - image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool name: nettoolbox---apiVersion: v1kind: Servicemetadata: name: appspec: type: NodePort selector: app: app ports: - name: app port: 8080 targetPort: 80 nodePort: 32000
复制代码
登录集群任意节点查看路由规则
例如: 10.244.210.64/26 via 10.1.5.1 dev net0 proto bird, 就是表示通过 BGP 协议学习的路由,bird 则是 calico 中的 BGP 客户端
登录 leaf0 交换机查看 BGP 信息和路由规则
查看路由表:
可以发现 leaf0 交换机上存在 k8s 集群中的 pod 路由信息,也就是说可以访问集群中的 pod
查看 BGP 信息:show ip bgp
可以明显看到:
前往地址为: 10.1.8.0/24|| 10.244.192.0/26 || 10.244.210.64的设备 下一跳有两个10.1.12.2和10.1.10.2 属于 EBGP 路由,包含 ECMP 策略
前往地址为: 10.244.81.64/26 || 10.244.205.64/26 下一跳分别为10.1.5.10||10.1.5.11 属于 IBGP 路由
访问测试
集群中 pod 互访
核心交换机访问集群 pod
如果说核心交换机和公网配置 ebgp 规则同步路由后,公网流量也就能进入 kubernetes 集群中了。
五 Containerlab 中的 vyos 容器镜像模拟交换机的配置文件
spine0-boot.cfg 如下:
interfaces { ethernet eth1 { address 10.1.10.2/24 duplex auto speed auto } ethernet eth2 { address 10.1.34.2/24 duplex auto speed auto } loopback lo { }}protocols { bgp { address-family { ipv4-unicast { network 10.1.10.0/24 { } network 10.1.34.0/24 { } } } neighbor 10.1.10.1 { address-family { ipv4-unicast { } } remote-as 65005 } neighbor 10.1.34.1 { address-family { ipv4-unicast { } } remote-as 65008 } parameters { bestpath { as-path { multipath-relax } } } system-as 500 }}system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name spine0 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC}// Warning: Do not remove the following line.// vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2"// Release version: 1.4-rolling-202307070317
复制代码
spine1-boot.cfg
interfaces { ethernet eth1 { address "10.1.12.2/24" duplex "auto" mtu "9000" offload { gso { } sg { } } speed "auto" } ethernet eth2 { address "10.1.11.2/24" duplex "auto" mtu "9000" offload { gso { } sg { } } speed "auto" } loopback lo { }}protocols { bgp { address-family { ipv4-unicast { network 10.1.11.0/24 { } network 10.1.12.0/24 { } } } neighbor 10.1.11.1 { address-family { ipv4-unicast { } } remote-as "65008" } neighbor 10.1.12.1 { address-family { ipv4-unicast { } } remote-as "65005" } parameters { bestpath { as-path { multipath-relax { } } } router-id "10.1.8.1" } system-as "800" }}system { config-management { commit-revisions "100" } conntrack { modules { ftp { } h323 { } nfs { } pptp { } sip { } sqlnet { } tftp { } } } console { device ttyS0 { speed "9600" } } host-name "spine1" login { user vyos { authentication { encrypted-password "$6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/" plaintext-password "" } } } time-zone "UTC"}
// Warning: Do not remove the following line.// // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2"// // Release version: 1.4-rolling-202307070317
复制代码
leaf0-boot.cfg
interfaces { ethernet eth1 { address 10.1.10.1/24 duplex auto mtu 9000 speed auto } ethernet eth2 { address 10.1.12.1/24 duplex auto mtu 9000 speed auto } ethernet eth3 { address 10.1.5.1/24 duplex auto mtu 9000 speed auto } loopback lo { }}nat { source { rule 100 { outbound-interface eth0 source { address 10.1.0.0/16 } translation { address masquerade } } }}protocols { bgp { address-family { ipv4-unicast { network 10.1.5.0/24 { } network 10.1.10.0/24 { } network 10.1.12.0/24 { } } } neighbor 10.1.5.10 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65005 } neighbor 10.1.5.11 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65005 } neighbor 10.1.10.2 { address-family { ipv4-unicast { } } remote-as 500 } neighbor 10.1.12.2 { address-family { ipv4-unicast { } } remote-as 800 } parameters { bestpath { as-path { multipath-relax } } router-id 10.1.5.1 } system-as 65005 }}system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name leaf0 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC}// Warning: Do not remove the following line.// vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2"// Release version: 1.4-rolling-202307070317
复制代码
leaf1-boot.cfg
interfaces { ethernet eth1 { address 10.1.34.1/24 duplex auto mtu 9000 speed auto } ethernet eth2 { address 10.1.11.1/24 duplex auto mtu 9000 speed auto } ethernet eth3 { address 10.1.8.1/24 duplex auto mtu 9000 speed auto } loopback lo { }}nat { source { rule 100 { outbound-interface eth0 source { address 10.1.0.0/16 } translation { address masquerade } } }}protocols { bgp { address-family { ipv4-unicast { network 10.1.8.0/24 { } network 10.1.11.0/24 { } network 10.1.34.0/24 { } } } neighbor 10.1.8.10 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65008 } neighbor 10.1.8.11 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65008 } neighbor 10.1.11.2 { address-family { ipv4-unicast { } } remote-as 800 } neighbor 10.1.34.2 { address-family { ipv4-unicast { } } remote-as 500 } parameters { bestpath { as-path { multipath-relax } } router-id 10.1.8.1 } system-as 65008 }}system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name leaf1 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC}// Warning: Do not remove the following line.// vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2"// Release version: 1.4-rolling-202307070317
复制代码
点击关注,第一时间了解华为云新鲜技术~
评论