写点什么

干货收藏!Calico 的 BGP RouteReflector 策略实践

  • 2024-05-29
    广东
  • 本文字数:10684 字

    阅读完需:约 35 分钟

干货收藏!Calico的BGP RouteReflector策略实践

本文分享自华为云社区《Calico BGP RouteReflector策略实践》,作者:可以交个朋友。

一 背景


容器网络组件 Calico 支持多种后端模式,有 Overlay 的 IPIP、Vxlan 模式,也有 Underlay 纯路由的 BGP 模式。


相比于 Overlay 网络模型,Underlay 网络具有更高的数据面转发性能。同时在纯路由模式下,也有两种方案:Calico BGP 的 fullmesh 方案,该方案存在一些限制,适用于小规模 kubernetes 集群,集群节点越多,BGP 连接就越多,需要建立大量连接来保证网络的互通性,每增加一个节点就要成倍的增加连接保证网络的互通性,这样的话就会使用大量的网络消耗。所以这时就可以使用 Route Reflector 模式,也称为 RR 模式。RR模式 中会指定一个或多个 BGP Speaker 为 RouterReflecor,它与网络中其他 Speaker 建立连接,每个 Speaker 只要与 Router Reflector 建立 BGP 就可以获得全网的路由信息。

二 Calico BGP RouteReflector 模式组网架构


在不改变 IDC 机房内部网络拓扑的情况下,接入层交换机和核心层交换机建立 BGP 连接,借助于机房内部已有的路由策略实现,针对 Node 所处的物理位置分配 Pod CIDR,并在每个节点上将 Pod CIDR 通过 BGP 协议宣告给接入层交换机,实现全网通信的能力。下图基于 Leaf-Spine 架构做详细说明。



组网原则:


  1. 每个接入层交换机与其管理的 Node 二层联通,共同构成一个 AS。每个节点上跑 BGP 服务,用于宣告本节点路由信息。

  2. 核心层交换机和接入层交换机之间的每个路由器单独占用一个 AS,物理直连,跑 BGP 协议。核心层交换机可以感知到全网的路由信息,接入层交换机可以感知与自己直连的 Node 上的路由信息。

  3. 同一个主机上的 pod 互访通过宿主机路由器。(将 linux 主机当成一个路由器)

  4. 同一个机架上不同 node 上的 pod 通信通过 TOR(leaf)交换机

  5. 不同机架上 pod 通信走核心交换机

三 模拟生产场景组网搭建环境


提前准备一台 Ubuntu2204 操作系统的机器(规格 8U16G 即可)。需要在虚拟机上安装如下软件工具:


  1. Docker

  2. go 开发环境

  3. Kind(kubernetes 兴趣小组开发的一款 kuberntes in docker 软件,可用来快速搭建 k8s 测试环境,kind 安装需要主机上先安装 go,kind 安装版本可选 v0.20.0 版本)

  4. ContainerLab(使用容器技术构建的虚拟网络平台,可以使用 vyos 镜像构建虚拟的交换机路由器。建议安装 v0.42.0 版本的 containerlab)


3.1 kubernetes 环境搭建


kubernetes 集群版本为: 1.27.3


集群规模为 1 master,3 work node


集群构建脚本如下: 1-setup-env.sh


#!/bin/bashdateset -v
# 1.prep noCNI envcat <<EOF | kind create cluster --name=calico-bgp-rr --image=kindest/node:v1.27.3 --config=-kind: ClusterapiVersion: kind.x-k8s.io/v1alpha4networking: disableDefaultCNI: true podSubnet: "10.244.0.0/16"nodes:- role: control-plane kubeadmConfigPatches: - | kind: InitConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.5.10 node-labels: "rack=rack0"
- role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.5.11 node-labels: "rack=rack0"
- role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.8.10 node-labels: "rack=rack1"
- role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.8.11 node-labels: "rack=rack1"
EOF# 2.remove taintskubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/control-plane:NoSchedule-kubectl get nodes -o wide
# 3. install toolsfor i in $(docker ps -a --format "table {{.Names}}" |grep calico-bgp-rr)do echo $i docker cp /usr/bin/ping $i:/usr/bin/ping docker cp /usr/local/bin/calicoctl $i:/usr/local/bin/ # docker exec -it $i bash -c "apt-get -y update > /dev/null && apt-get -y install net-tools tcpdump lrzsz > /dev/null 2>&1"done
复制代码


执行脚本创建集群,由于未安装 cni 组件,集群部分 pod 会出现 pending 等状态,集群 node 也会处于 NotReady 状态,这是正常现象。后面安装 calico cni 组件后,就可以解决。


3.2 创建网桥


在主机上创建网桥,主要作用是为了连通 kind 创建的 K8s node 和 containerlab 构建的交换机之间的网络。


brctl addbr br-leaf0;ifconfig br-leaf0 up;brctl addbr br-leaf1;ifconfig br-leaf1 up

3.3 借助 containerLab 搭建三层交换机并配置 BGP 规则


containerlab 构建交换机脚本如下:2-setup-clab.sh


#!/bin/bashset -v
cat <<EOF>clab.yaml | clab deploy -t clab.yaml -name: calico-bgp-rrtopology: nodes: spine0: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/spine0-boot.cfg:/opt/vyatta/etc/config/config.boot
spine1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/spine1-boot.cfg:/opt/vyatta/etc/config/config.boot
leaf0: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/leaf0-boot.cfg:/opt/vyatta/etc/config/config.boot
leaf1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/leaf1-boot.cfg:/opt/vyatta/etc/config/config.boot
br-leaf0: kind: bridge br-leaf1: kind: bridge
server1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-control-plane exec: - ip addr add 10.1.5.10/24 dev net0 - ip route replace default via 10.1.5.1
server2: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker exec: - ip addr add 10.1.5.11/24 dev net0 - ip route replace default via 10.1.5.1
server3: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker2 exec: - ip addr add 10.1.8.10/24 dev net0 - ip route replace default via 10.1.8.1
server4: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker3 exec: - ip addr add 10.1.8.11/24 dev net0 - ip route replace default via 10.1.8.1 links: - endpoints: ["br-leaf0:br-leaf0-net0", "server1:net0"] - endpoints: ["br-leaf0:br-leaf0-net1", "server2:net0"]
- endpoints: ["br-leaf1:br-leaf1-net0", "server3:net0"] - endpoints: ["br-leaf1:br-leaf1-net1", "server4:net0"]
- endpoints: ["leaf0:eth1", "spine0:eth1"] - endpoints: ["leaf0:eth2", "spine1:eth1"] - endpoints: ["leaf0:eth3", "br-leaf0:br-leaf0-net2"]
- endpoints: ["leaf1:eth1", "spine0:eth2"] - endpoints: ["leaf1:eth2", "spine1:eth2"] - endpoints: ["leaf1:eth3", "br-leaf1:br-leaf1-net2"]
EOF
复制代码



可以看到 containerlab 组网成功,vyos 对应的交换机上的 bgp 路由协议配置参照文档末尾。

3.4 Calico cni 插件部署安装


由于 Calico 默认安装的是 ipip 模式,需要手动进行关闭,不通过 ipip/vxlan 封装就会开启 bgp 模式。


kubectl apply -f calico.yaml


#kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.23/manifests/calico.yaml



Calico 组件安装完成后,节点之间建立的 BGP 连接是 fullmesh 全连接的形式


3.5 Calico BGP RR 模式开启


fullmesh 全连接形式在大规模集群中并不适用,我们需要关闭 bgp fullmesh 的模式,采取 bgp route reflector


方法如下: 3-disable-bgp-full-mesh.sh


#!/bin/bashset -v# 1. disable bgp fullmeshcat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3items:- apiVersion: projectcalico.org/v3  kind: BGPConfiguration  metadata:    name: default  spec:    logSeverityScreen: Info    nodeToNodeMeshEnabled: falsekind: BGPConfigurationListmetadata:EOF
复制代码

3.6 Calico node 配置 BGP RR 规则


kubernetes 集群中的节点作为 BGP 路由反射器的客户端,需要和 BGP 路由反射器配置 peer 信息以达到同步路由的功能。


#!/bin/bashset -v# 1.3. add() bgp configuration for the nodescat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3kind: Nodemetadata:  annotations:  labels:    rack: rack0  name: calico-bgp-rr-control-planespec:  addresses:  - address: 10.1.5.10    type: InternalIP  bgp:    asNumber: 65005    ipv4Address: 10.1.5.10/24  orchRefs:  - nodeName: calico-bgp-rr-control-plane    orchestrator: k8sEOF
cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3kind: Nodemetadata: labels: rack: rack0 name: calico-bgp-rr-workerspec: addresses: - address: 10.1.5.11 type: InternalIP bgp: asNumber: 65005 ipv4Address: 10.1.5.11/24 orchRefs: - nodeName: calico-bgp-rr-worker orchestrator: k8s
EOFcat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3kind: Nodemetadata: labels: rack: rack1 name: calico-bgp-rr-worker2spec: addresses: - address: 10.1.8.10 type: InternalIP bgp: asNumber: 65008 ipv4Address: 10.1.8.10/24 orchRefs: - nodeName: calico-bgp-rr-worker2 orchestrator: k8s
EOFcat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3kind: Nodemetadata: labels: rack: rack1 name: calico-bgp-rr-worker3spec: addresses: - address: 10.1.8.11 type: InternalIP bgp: asNumber: 65008 ipv4Address: 10.1.8.11/24 orchRefs: - nodeName: calico-bgp-rr-worker3 orchestrator: k8s
EOF
# 1.4. peer to leaf0 switchcat <<EOF | calicoctl apply -f -apiVersion: projectcalico.org/v3kind: BGPPeermetadata: name: rack0-to-leaf0spec: peerIP: 10.1.5.1 asNumber: 65005 nodeSelector: rack == 'rack0'EOF
# 1.5. peer to leaf1 switchcat <<EOF | calicoctl apply -f -apiVersion: projectcalico.org/v3kind: BGPPeermetadata: name: rack1-to-leaf1spec: peerIP: 10.1.8.1 asNumber: 65008 nodeSelector: rack == 'rack1'EOF
复制代码


登录到集群中任意节点,查看 BGP 信息: 发现已经不再是 BGP full mesh 的形式了。node specific 表示该节点是路由反射器的客户端,对端即路由反射器是 10.1.5.1 这个地址


四 集群外访问 Pod 进行 BGP 验证测试


部署测试业务


apiVersion: apps/v1kind: DaemonSet#kind: Deploymentmetadata:  labels:    app: app  name: appspec:  #replicas: 2  selector:    matchLabels:      app: app  template:    metadata:      labels:        app: app    spec:      containers:      - image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool        name: nettoolbox---apiVersion: v1kind: Servicemetadata:  name: appspec:  type: NodePort  selector:    app: app  ports:  - name: app    port: 8080    targetPort: 80    nodePort: 32000
复制代码



登录集群任意节点查看路由规则



例如: 10.244.210.64/26 via 10.1.5.1 dev net0 proto bird, 就是表示通过 BGP 协议学习的路由,bird 则是 calico 中的 BGP 客户端


登录 leaf0 交换机查看 BGP 信息和路由规则


查看路由表:



可以发现 leaf0 交换机上存在 k8s 集群中的 pod 路由信息,也就是说可以访问集群中的 pod



查看 BGP 信息:show ip bgp



可以明显看到:


前往地址为: 10.1.8.0/24|| 10.244.192.0/26 || 10.244.210.64的设备 下一跳有两个10.1.12.210.1.10.2 属于 EBGP 路由,包含 ECMP 策略


前往地址为: 10.244.81.64/26 || 10.244.205.64/26 下一跳分别为10.1.5.10||10.1.5.11 属于 IBGP 路由


访问测试


集群中 pod 互访



核心交换机访问集群 pod



如果说核心交换机和公网配置 ebgp 规则同步路由后,公网流量也就能进入 kubernetes 集群中了。

五 Containerlab 中的 vyos 容器镜像模拟交换机的配置文件


spine0-boot.cfg 如下:


interfaces {    ethernet eth1 {        address 10.1.10.2/24        duplex auto        speed auto    }    ethernet eth2 {        address 10.1.34.2/24        duplex auto        speed auto    }    loopback lo {    }}protocols {    bgp {        address-family {            ipv4-unicast {                network 10.1.10.0/24 {                }                network 10.1.34.0/24 {                }            }        }        neighbor 10.1.10.1 {            address-family {                ipv4-unicast {                }            }            remote-as 65005        }        neighbor 10.1.34.1 {            address-family {                ipv4-unicast {                }            }            remote-as 65008        }        parameters {            bestpath {                as-path {                    multipath-relax                }            }        }        system-as 500    }}system {    config-management {        commit-revisions 100    }    console {        device ttyS0 {            speed 9600        }    }    host-name spine0    login {        user vyos {            authentication {                encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/                plaintext-password ""            }        }    }    time-zone UTC}// Warning: Do not remove the following line.// vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2"// Release version: 1.4-rolling-202307070317
复制代码


spine1-boot.cfg


interfaces {    ethernet eth1 {        address "10.1.12.2/24"        duplex "auto"        mtu "9000"        offload {            gso { }            sg { }        }        speed "auto"    }    ethernet eth2 {        address "10.1.11.2/24"        duplex "auto"        mtu "9000"        offload {            gso { }            sg { }        }        speed "auto"    }    loopback     lo { }}protocols {    bgp {        address-family {            ipv4-unicast {                network                 10.1.11.0/24 { }                network                 10.1.12.0/24 { }            }        }        neighbor 10.1.11.1 {            address-family {                ipv4-unicast { }            }            remote-as "65008"        }        neighbor 10.1.12.1 {            address-family {                ipv4-unicast { }            }            remote-as "65005"        }        parameters {            bestpath {                as-path {                    multipath-relax { }                }            }            router-id "10.1.8.1"        }        system-as "800"    }}system {    config-management {        commit-revisions "100"    }    conntrack {        modules {            ftp { }            h323 { }            nfs { }            pptp { }            sip { }            sqlnet { }            tftp { }        }    }    console {        device ttyS0 {            speed "9600"        }    }    host-name "spine1"    login {        user vyos {            authentication {                encrypted-password "$6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/"                plaintext-password ""            }        }    }    time-zone "UTC"}
// Warning: Do not remove the following line.// // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2"// // Release version: 1.4-rolling-202307070317
复制代码


leaf0-boot.cfg


interfaces {    ethernet eth1 {        address 10.1.10.1/24        duplex auto        mtu 9000        speed auto    }    ethernet eth2 {        address 10.1.12.1/24        duplex auto        mtu 9000        speed auto    }    ethernet eth3 {        address 10.1.5.1/24        duplex auto        mtu 9000        speed auto    }    loopback lo {    }}nat {    source {        rule 100 {            outbound-interface eth0            source {                address 10.1.0.0/16            }            translation {                address masquerade            }        }    }}protocols {    bgp {        address-family {            ipv4-unicast {                network 10.1.5.0/24 {                }                network 10.1.10.0/24 {                }                network 10.1.12.0/24 {                }            }        }        neighbor 10.1.5.10 {            address-family {                ipv4-unicast {                    nexthop-self {                    }                    route-reflector-client                }            }            remote-as 65005        }        neighbor 10.1.5.11 {            address-family {                ipv4-unicast {                    nexthop-self {                    }                    route-reflector-client                }            }            remote-as 65005        }        neighbor 10.1.10.2 {            address-family {                ipv4-unicast {                }            }            remote-as 500        }        neighbor 10.1.12.2 {            address-family {                ipv4-unicast {                }            }            remote-as 800        }        parameters {            bestpath {                as-path {                    multipath-relax                }            }            router-id 10.1.5.1        }        system-as 65005    }}system {    config-management {        commit-revisions 100    }    console {        device ttyS0 {            speed 9600        }    }    host-name leaf0    login {        user vyos {            authentication {                encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/                plaintext-password ""            }        }    }    time-zone UTC}// Warning: Do not remove the following line.// vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2"// Release version: 1.4-rolling-202307070317
复制代码


leaf1-boot.cfg


interfaces {    ethernet eth1 {        address 10.1.34.1/24        duplex auto        mtu 9000        speed auto    }    ethernet eth2 {        address 10.1.11.1/24        duplex auto        mtu 9000        speed auto    }    ethernet eth3 {        address 10.1.8.1/24        duplex auto        mtu 9000        speed auto    }    loopback lo {    }}nat {    source {        rule 100 {            outbound-interface eth0            source {                address 10.1.0.0/16            }            translation {                address masquerade            }        }    }}protocols {    bgp {        address-family {            ipv4-unicast {                network 10.1.8.0/24 {                }                network 10.1.11.0/24 {                }                network 10.1.34.0/24 {                }            }        }        neighbor 10.1.8.10 {            address-family {                ipv4-unicast {                    nexthop-self {                    }                    route-reflector-client                }            }            remote-as 65008        }        neighbor 10.1.8.11 {            address-family {                ipv4-unicast {                    nexthop-self {                    }                    route-reflector-client                }            }            remote-as 65008        }        neighbor 10.1.11.2 {            address-family {                ipv4-unicast {                }            }            remote-as 800        }        neighbor 10.1.34.2 {            address-family {                ipv4-unicast {                }            }            remote-as 500        }        parameters {            bestpath {                as-path {                    multipath-relax                }            }            router-id 10.1.8.1        }        system-as 65008    }}system {    config-management {        commit-revisions 100    }    console {        device ttyS0 {            speed 9600        }    }    host-name leaf1    login {        user vyos {            authentication {                encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/                plaintext-password ""            }        }    }    time-zone UTC}// Warning: Do not remove the following line.// vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2"// Release version: 1.4-rolling-202307070317
复制代码

点击关注,第一时间了解华为云新鲜技术~

发布于: 3 小时前阅读数: 9
用户头像

提供全面深入的云计算技术干货 2020-07-14 加入

生于云,长于云,让开发者成为决定性力量

评论

发布
暂无评论
干货收藏!Calico的BGP RouteReflector策略实践_Kubernetes_华为云开发者联盟_InfoQ写作社区