写点什么

Ubuntu Server 20.04 搭建安装 Kubernetes

作者:玏佾
  • 2021 年 8 月 26 日
  • 本文字数:15503 字

    阅读完需:约 51 分钟

Ubuntu Server 20.04 搭建安装Kubernetes

1. 简介

kubernetes,简称 K8s,是用 8 代替 8 个字符“ubernete”而成的缩写。是一个开源的,用于管理云平台中多个主机上的容器化的应用,Kubernetes 的目标是让部署容器化的应用简单并且高效(powerful),Kubernetes 提供了应用部署,规划,更新,维护的一种机制。 

本文介绍使用 kubeadm 方式安装。参考文档:https://kubernetes.io/zh/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

2. 准备

之前安装 kvm 准备三台虚拟机:192.168.2.31,192.168.2.32,192.168.33

安装 kvm 虚拟机:Ubuntu server 20.04安装KVM虚拟机


更新环境

sudo apt updatesudo apt -y upgrade && sudo systemctl reboot
复制代码

2.1 配置各节点 HostName

可以修改节点的 host name 替代域名解析。

sudo hostnamectl set-hostname "k8s-master"  #192.168.2.31sudo hostnamectl set-hostname "k8s-node-1"  #192.168.2.32 sudo hostnamectl set-hostname "k8s-node-2"  #192.168.2.33 
复制代码

2.2 修改节点 hosts

修改文件/etc/hosts

$ sudo vi /etc/hosts

192.168.2.31    k8s-master192.168.2.32    k8s-node-1192.168.2.33    k8s-node-2
复制代码


2.3 关闭交换区

安装 k8s 的必须环节,为什么要关?简单的说就是为了性能。

暂时关闭

$ sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

$ sudo swapoff -a

永久关闭

$ sudo vi /etc/fstab #注释掉 swap 一行

#/swap.img none swap sw 0 0

查看是否关闭成功

$ free -h


2.4 关闭防火墙

关闭防火墙

$ sudo ufw disable

查看防火墙状态

$ sudo ufw status


2.5 设置内核参数

查看内核参数,确保 sysctl 配置中 net.bridge.bridge-nf-call-iptables 的值设置为了 1。

$ lsmod | grep br_netfilter

如果参数不一致,执行如下命令修改:

sudo modprobe overlaysudo modprobe br_netfilter
sudo tee /etc/sysctl.d/kubernetes.conf<<EOFnet.bridge.bridge-nf-call-ip6tables = 1net.bridge.bridge-nf-call-iptables = 1net.ipv4.ip_forward = 1EOF
sudo sysctl --system
复制代码

3. 安装

3.1 安装 docker

#安装证书curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
#设置阿里云镜像源sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
sudo apt updatesudo apt install -y docker.io
# Create required directoriessudo mkdir -p /etc/systemd/system/docker.service.d
# Create daemon json config filesudo tee /etc/docker/daemon.json <<EOF{ "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2"}EOF
# Start and enable Servicessudo systemctl daemon-reload sudo systemctl enable docker.service --nowsudo systemctl restart dockersudo systemctl status dockerdocker --version
复制代码


3.2 安装必要包

设置 Kubernetes repository

sudo apt updatesudo apt install -y apt-transport-https curl gnupg2 software-properties-common ca-certificates
#curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key addcurl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add - #使用阿里云
#sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"sudo tee /etc/apt/sources.list.d/kubernetes.list <<EOF deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main #阿里云源EOF
复制代码

安装 kubeadm kubeadm kubectl

sudo apt updatesudo apt-get install -y kubelet kubeadm kubectlsudo apt-mark hold kubelet kubeadm kubectl #标记阻止自动更新
复制代码

查看版本

$ kubectl version --client && kubeadm version

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:44:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
复制代码

3.3 获取初始化节点所需镜像

启用

sudo systemctl enable kubelet

在初始化过程中需要获取一些镜像,我们可以在初始化之前先获取下来,不然很容易报错。初始化 master 的时候会一直卡在那里。

$ sudo kubeadm init --pod-network-cidr 10.5.0.0/16 --control-plane-endpoint=vm-ubuntu-06

如果出现如下问题就是失败了,这里显示的链接镜像超时了。这时候你可能需要一把梯子。

W0825 14:09:24.664761    1369 version.go:103] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get "https://storage.googleapis.com/kubernetes-release/release/stable-1.txt": context deadline exceeded (Client.Timeout exceeded while awaiting headers)W0825 14:09:24.665057    1369 version.go:104] falling back to the local client version: v1.22.1[init] Using Kubernetes version: v1.22.1[preflight] Running pre-flight checks[preflight] Pulling images required for setting up a Kubernetes cluster[preflight] This might take a minute or two, depending on the speed of your internet connection[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'error execution phase preflight: [preflight] Some fatal errors occurred:        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.22.1: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), error: exit status 1        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.22.1: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), error: exit status 1        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.22.1: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), error: exit status 1        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.22.1: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), error: exit status 1        [ERROR ImagePull]: failed to pull image k8s.gcr.io/pause:3.5: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), error: exit status 1        [ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.5.0-0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), error: exit status 1        [ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns/coredns:v1.8.4: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), error: exit status 1[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`To see the stack trace of this error execute with --v=5 or higher
复制代码

或者更换镜像源,首先查看所需版本号

$ kubeadm config images list

k8s.gcr.io/kube-apiserver:v1.22.1k8s.gcr.io/kube-controller-manager:v1.22.1k8s.gcr.io/kube-scheduler:v1.22.1k8s.gcr.io/kube-proxy:v1.22.1k8s.gcr.io/pause:3.5k8s.gcr.io/etcd:3.5.0-0k8s.gcr.io/coredns/coredns:v1.8.4
复制代码

写个批处理来批量修改

$ vi k8s.sh

$ sudo chmod +x k8s.sh

$ sudo ./k8s.sh

k8s.sh的内容如下:

#!/bin/bash
echo "修改docker普通用户权限"sudo chmod 777 /var/run/docker.sock
echo "1.拉取镜像"
#下面的版本号要对应KUBE_VERSION=v1.22.2PAUSE_VERSION=3.5CORE_DNS_VERSION=1.8.4ETCD_VERSION=3.5.0-0
# pull kubernetes images from hub.docker.comdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy-amd64:$KUBE_VERSIONdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager-amd64:$KUBE_VERSIONdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver-amd64:$KUBE_VERSIONdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler-amd64:$KUBE_VERSION# pull aliyuncs mirror docker imagesdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:$PAUSE_VERSIONdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:$CORE_DNS_VERSIONdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:$ETCD_VERSION
# retag to k8s.gcr.io prefixdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy-amd64:$KUBE_VERSION k8s.gcr.io/kube-proxy:$KUBE_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager-amd64:$KUBE_VERSION k8s.gcr.io/kube-controller-manager:$KUBE_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver-amd64:$KUBE_VERSION k8s.gcr.io/kube-apiserver:$KUBE_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler-amd64:$KUBE_VERSION k8s.gcr.io/kube-scheduler:$KUBE_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:$PAUSE_VERSION k8s.gcr.io/pause:$PAUSE_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:$CORE_DNS_VERSION k8s.gcr.io/coredns/coredns:v$CORE_DNS_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:$ETCD_VERSION k8s.gcr.io/etcd:$ETCD_VERSION
# untag origin tag, the images won't be delete.docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy-amd64:$KUBE_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager-amd64:$KUBE_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver-amd64:$KUBE_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler-amd64:$KUBE_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/pause:$PAUSE_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:$CORE_DNS_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:$ETCD_VERSION
echo "====================执行完毕======================"
echo "所需镜像:"kubeadm config images list
echo "已安装镜像:"sudo docker images
echo "====如果数量不匹配请多执行几次k8s_pull_images.sh====="
复制代码

查看所需镜像是否存在

$ sudo docker images

REPOSITORY                           TAG       IMAGE ID       CREATED        SIZEk8s.gcr.io/kube-apiserver            v1.22.1   f30469a2491a   5 days ago     128MBk8s.gcr.io/kube-proxy                v1.22.1   36c4ebbc9d97   5 days ago     104MBk8s.gcr.io/kube-controller-manager   v1.22.1   6e002eb89a88   5 days ago     122MBk8s.gcr.io/kube-scheduler            v1.22.1   aca5ededae9c   5 days ago     52.7MBk8s.gcr.io/etcd                      3.5.0-0   004811815584   2 months ago   295MBk8s.gcr.io/coredns                   1.8.4     8d147537fb7d   2 months ago   47.6MBk8s.gcr.io/pause                     3.5       ed210e3e4a5b   5 months ago   683kB
复制代码

再次使用sudo kubeadm init初始化节点。

[init] Using Kubernetes version: v1.22.1[preflight] Running pre-flight checks[preflight] Pulling images required for setting up a Kubernetes cluster[preflight] This might take a minute or two, depending on the speed of your internet connection[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'[certs] Using certificateDir folder "/etc/kubernetes/pki"[certs] Generating "ca" certificate and key[certs] Generating "apiserver" certificate and key[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local vm-ubuntu-06] and IPs [10.96.0.1 192.168.2.36][certs] Generating "apiserver-kubelet-client" certificate and key[certs] Generating "front-proxy-ca" certificate and key[certs] Generating "front-proxy-client" certificate and key[certs] Generating "etcd/ca" certificate and key[certs] Generating "etcd/server" certificate and key[certs] etcd/server serving cert is signed for DNS names [localhost vm-ubuntu-06] and IPs [192.168.2.36 127.0.0.1 ::1][certs] Generating "etcd/peer" certificate and key[certs] etcd/peer serving cert is signed for DNS names [localhost vm-ubuntu-06] and IPs [192.168.2.36 127.0.0.1 ::1][certs] Generating "etcd/healthcheck-client" certificate and key[certs] Generating "apiserver-etcd-client" certificate and key[certs] Generating "sa" key and public key[kubeconfig] Using kubeconfig folder "/etc/kubernetes"[kubeconfig] Writing "admin.conf" kubeconfig file[kubeconfig] Writing "kubelet.conf" kubeconfig file[kubeconfig] Writing "controller-manager.conf" kubeconfig file[kubeconfig] Writing "scheduler.conf" kubeconfig file[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"[kubelet-start] Starting the kubelet[control-plane] Using manifest folder "/etc/kubernetes/manifests"[control-plane] Creating static Pod manifest for "kube-apiserver"[control-plane] Creating static Pod manifest for "kube-controller-manager"[control-plane] Creating static Pod manifest for "kube-scheduler"[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s[apiclient] All control plane components are healthy after 9.505747 seconds[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace[kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster[upload-certs] Skipping phase. Please see --upload-certs[mark-control-plane] Marking the node vm-ubuntu-06 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers][mark-control-plane] Marking the node vm-ubuntu-06 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule][bootstrap-token] Using token: 0cmq3a.yfcloahv2a9f3p3k[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key[addons] Applied essential addon: CoreDNS[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.2.36:6443 --token 0cmq3a.yfcloahv2a9f3p3k \ --discovery-token-ca-cert-hash sha256:02b3014490559fa323006f6b2dc59b372da3b226256b683770da1d2eaf35fa0b
复制代码

然后按照提示步骤执行

mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config
复制代码

检查状态

$ kubectl cluster-info

Kubernetes control plane is running at https://192.168.2.36:6443CoreDNS is running at https://192.168.2.36:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
复制代码

查看当前节点, 发现状态为NotReady

$ kubectl get node

NAME           STATUS     ROLES                  AGE     VERSIONvm-ubuntu-06   NotReady   control-plane,master   5m50s   v1.22.1
复制代码

查看具体描述

$ kubectl describe node vm-ubuntu-06

查看 pod 状态

$ kubectl get pod -n kube-system -o wide

3.4 部署网络插件

要让 Kubernetes Cluster 能够工作,必须安装 Pod 网络,否则 Pod 之间无法通信。Kubernetes 支持多种网络方案,常用的有flannelcalico

3.4.1 部署 flannel 网络插件

这里我们使用 flannel,在 master 节点执行如下命令部署:

$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml

$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+podsecuritypolicy.policy/psp.flannel.unprivileged createdclusterrole.rbac.authorization.k8s.io/flannel createdclusterrolebinding.rbac.authorization.k8s.io/flannel createdserviceaccount/flannel createdconfigmap/kube-flannel-cfg createddaemonset.apps/kube-flannel-ds created
复制代码

3.4.2 部署 calico 网络插件

安装 calico(和 flannel 二选一即可)

$ kubectl create -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml

获取自定义配置

$ wget https://projectcalico.docs.tigera.io/manifests/custom-resources.yaml

修改自定义配置里面的 IP 地址,使其和 master 初始化的网段保持一致:

$ sudo vi custom-resources.yaml
spec: # Configures Calico networking. calicoNetwork: # Note: The ipPools section cannot be modified post-install. ipPools: - blockSize: 26 cidr: 10.5.0.0/16 # 修改IP地址 encapsulation: VXLANCrossSubnet natOutgoing: Enabled nodeSelector: all()
复制代码

应用配置

$ kubectl create -f custom-resources.yaml

查看运行状况知道所有都变成运行状态

$ watch kubectl get pods -n calico-system

删除调度污点

$ kubectl taint nodes --all node-role.kubernetes.io/master-

返回结果如下:

node/<your-hostname> untainted
复制代码

确认节点

$ kubectl get nodes -o wide


查看状态是否正常

$ kubectl get pods --all-namespaces

ubuntu@vm-ubuntu-06:/$ kubectl get pods --all-namespacesNAMESPACE     NAME                                   READY   STATUS    RESTARTS   AGEkube-system   coredns-78fcd69978-gxdqg               1/1     Running   0          122mkube-system   coredns-78fcd69978-qhgq5               1/1     Running   0          122mkube-system   etcd-vm-ubuntu-06                      1/1     Running   0          122mkube-system   kube-apiserver-vm-ubuntu-06            1/1     Running   0          122mkube-system   kube-controller-manager-vm-ubuntu-06   1/1     Running   0          122mkube-system   kube-flannel-ds-ltqjs                  1/1     Running   0          108mkube-system   kube-proxy-zcj72                       1/1     Running   0          122mkube-system   kube-scheduler-vm-ubuntu-06            1/1     Running   0          122m
复制代码

重启命令:

$ systemctl restart kubelet docker

4. 初始化节点

4.1 初始化 master 节点

启用 k8s

$ sudo systemctl enable kubelet

初始化 master 节点

$ sudo kubeadm init --pod-network-cidr 10.5.0.0/16 --control-plane-endpoint=vm-ubuntu-06

4.2 加入 worker 节点

在 kubeadm 初始话集群成功后会返回 join 命令,里面有 token,discovery-token-ca-cert-hash 等参数,需要记下来。

有关 token 的过期时间是24小时certificate-key 过期时间是2小时
复制代码

如果是不记得,请执行以下命令获取:

  • 在 master 节点执行kubeadm token list获取 token(注意查看是否过期)

$ kubeadm token list


  • 加入集群除了需要 token 外,还需要 Master 节点的 ca 证书 sha256 编码 hash (--discovery-token-ca-cert-hash)值,这个可以通过如下命令获取:

$ openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'


1.如果是过期了,需要重新生成。重新生成基础的 join 命令:

$ kubeadm token create --print-join-command

2.对于添加 master 节点还需要重新生成 certificate-key:

$ kubeadm init phase upload-certs --experimental-upload-certs

如果添加 master 节点:用上面第 1 步生成的 join 命令和第 2 步生成的--certificate-key 值拼接起来执行


参考:

https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-join/#token-based-discovery-with-ca-pinning

https://kubernetes.io/docs/setup/independent/high-availability/#steps-for-the-first-control-plane-node

4.3 加入 worker 节点

把节点添加到集群:

$ kubeadm join vm-ubuntu-06:6443 --token oabl1q.kb01sa3zzendcbhi --discovery-token-ca-cert-hash sha256:ce4c3ff7440587207345bd15045a37d8fe394c39aa9ebe2c37ec22b9d26ab5a1


4.4 重启 k8s

$ sudo systemctl restart kubelet && sudo systemctl restart docker


4.5 重置 k8s

重置后所有节点后,重新初始化 master 节点,加入 worker 节点。注意:重置会导致所有 pod 数据丢失。

$ sudo kubeadm reset

5. 问题

5.1 如果 swap 分区未关闭可能会出现如下错误

如果是永久禁用的方式需要重启一下。

sudo kubeadm init [init] Using Kubernetes version: v1.22.1[preflight] Running pre-flight checkserror execution phase preflight: [preflight] Some fatal errors occurred:        [ERROR Swap]: running with swap on is not supported. Please disable swap[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`To see the stack trace of this error execute with --v=5 or higher
复制代码

启动时可能出现The connection to the server localhost:8080 was refused,在没有配置 config 文件时,kube-apiserver 默认使用的是 localhost,解决方法如下:在主节点上普通用户执行如下操作:

mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config
复制代码

或者以 root 用户运行,导入环境变量:

$ export KUBECONFIG=/etc/kubernetes/admin.conf


5.2 kubeadm join 超时报错

[preflight] Running pre-flight checks[preflight] Reading configuration from the cluster...[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[kubelet-start] Starting the kubelet[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...[kubelet-check] Initial timeout of 40s passed.[kubelet-check] It seems like the kubelet isn't running or healthy.[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.[kubelet-check] It seems like the kubelet isn't running or healthy.[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.[kubelet-check] It seems like the kubelet isn't running or healthy.[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.[kubelet-check] It seems like the kubelet isn't running or healthy.[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.[kubelet-check] It seems like the kubelet isn't running or healthy.[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.error execution phase kubelet-start: error uploading crisocket: timed out waiting for the conditionTo see the stack trace of this error execute with --v=5 or higher
复制代码

journalctl -u kubelet 查看 kubectl 日志发现报错如下

error: failed to run Kubelet: failed to create kubelet:  misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
复制代码

发现设置的 docker 设置 cgroup 名字与 kubelete 的不一致,/etc/docker/daemon.json添加如下配置后使用sudo systemctl restart docker重启 docker。


  {    "exec-opts": ["native.cgroupdriver=systemd"],        "log-driver": "json-file",        "log-opts": {                "max-size": "100m"        }
复制代码


5.3 Pod 内 ping 不通外部网关,无法上网点问题

在 node 节点执行

$ cat /var/run/flannel/subnet.env

FLANNEL_NETWORK=10.244.0.0/16FLANNEL_SUBNET=10.5.1.1/16FLANNEL_MTU=1450FLANNEL_IPMASQ=true
复制代码

尝试 Pod 间互 ping 看是否通畅,如果不通多半是防火墙的问题。

查看 iptables 规则

$ sudo iptables -L --line-numbers

清空 iptables 所有规则

$ sudo iptables -F

在 iptables 中对 subnet 放行:

$ sudo iptables -t nat -I POSTROUTING -s 10.5.0.0/16 -j MASQUERADE

修改域名解析/etc/resolv.conf默认是个软链接,修改后重启后会变回去,这里需要删掉自己建

$ sudo rm /etc/resolv.conf && sudo touch /etc/resolv.conf

$ sudo vim /etc/resolv.conf

nameserver 192.168.2.1 # 改成你的网关nameserver 223.5.5.5nameserver 223.6.6.6
复制代码

6. 脚本

Kubernetes 安装脚本:k8s_install.sh

#!/bin/bash
echo "1.更新环境"
sudo apt updatesudo apt -y upgrade

echo "2.关闭swap分区"
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstabsudo swapoff -a
echo "3.关闭防火墙"
sudo ufw disable
echo "4.设置参数"
sudo modprobe overlaysudo modprobe br_netfilter
sudo tee /etc/sysctl.d/kubernetes.conf<<EOFnet.bridge.bridge-nf-call-ip6tables = 1net.bridge.bridge-nf-call-iptables = 1net.ipv4.ip_forward = 1EOF
sudo sysctl --system
echo "5.安装docker"
#设置阿里云镜像源sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
sudo apt updatesudo apt install -y docker.io
# Create required directoriessudo mkdir -p /etc/systemd/system/docker.service.d
# Create daemon json config filesudo tee /etc/docker/daemon.json <<EOF{ "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2"}EOF
# Start and enable Servicessudo systemctl daemon-reload sudo systemctl enable docker.service --nowsudo systemctl restart dockersudo systemctl status dockerdocker --version
echo "6.设置 Kubernetes repository"
sudo apt updatesudo apt install -y apt-transport-https curl gnupg2 software-properties-common ca-certificates
#curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key addcurl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add - #使用阿里云
#sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"sudo tee /etc/apt/sources.list.d/kubernetes.list <<EOF deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main #阿里云源EOF

echo "7.安装k8s"

sudo apt updatesudo apt-get install -y kubelet kubeadm kubectlsudo apt-mark hold kubelet kubeadm kubectl #标记阻止自动更新
kubectl version --client && kubeadm version
复制代码

初始化 Master 节点脚本:k8s_init_master.sh

#!/bin/bash
echo "修改docker普通用户权限"sudo chmod 777 /var/run/docker.sock
echo "1.拉取镜像"
#下面的版本号要对应KUBE_VERSION=v1.22.2PAUSE_VERSION=3.5CORE_DNS_VERSION=1.8.4ETCD_VERSION=3.5.0-0
# pull kubernetes images from hub.docker.comdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy-amd64:$KUBE_VERSIONdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager-amd64:$KUBE_VERSIONdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver-amd64:$KUBE_VERSIONdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler-amd64:$KUBE_VERSION# pull aliyuncs mirror docker imagesdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:$PAUSE_VERSIONdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:$CORE_DNS_VERSIONdocker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:$ETCD_VERSION
# retag to k8s.gcr.io prefixdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy-amd64:$KUBE_VERSION k8s.gcr.io/kube-proxy:$KUBE_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager-amd64:$KUBE_VERSION k8s.gcr.io/kube-controller-manager:$KUBE_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver-amd64:$KUBE_VERSION k8s.gcr.io/kube-apiserver:$KUBE_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler-amd64:$KUBE_VERSION k8s.gcr.io/kube-scheduler:$KUBE_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:$PAUSE_VERSION k8s.gcr.io/pause:$PAUSE_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:$CORE_DNS_VERSION k8s.gcr.io/coredns/coredns:v$CORE_DNS_VERSIONdocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:$ETCD_VERSION k8s.gcr.io/etcd:$ETCD_VERSION
# untag origin tag, the images won't be delete.docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy-amd64:$KUBE_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager-amd64:$KUBE_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver-amd64:$KUBE_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler-amd64:$KUBE_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/pause:$PAUSE_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:$CORE_DNS_VERSIONdocker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:$ETCD_VERSION
echo "====================执行完毕======================"
echo "所需镜像:"kubeadm config images list
echo "已安装镜像:"sudo docker images
echo "====如果数量不匹配请多执行几次k8s_pull_images.sh====="
echo "2.启用k8s"sudo systemctl enable kubeletlsmod | grep br_netfilter
echo "3.初始化master节点"sudo kubeadm init --pod-network-cidr 10.5.0.0/16 --control-plane-endpoint=vm-ubuntu-06#sudo kubeadm init --pod-network-cidr 10.5.0.0/16
echo "4.master节点配置"#mkdir -p $HOME/.kube#sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config#sudo chown $(id -u):$(id -g) $HOME/.kube/config#sudo echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> /etc/profileexport KUBECONFIG=/etc/kubernetes/admin.conf
echo "5.初始化网络"kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.ymlkubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
kubectl cluster-infokubectl get pods --all-namespaces
复制代码

推荐脚本:https://github.com/easzlab/kubeasz


7. 管理界面 Kuboard

Kuboard v3.x 支持 Kubernetes 多集群管理。如果您从 Kuboard v1.0.x 或者 Kuboard v2.0.x 升级到 Kuboard,请注意:

  • 您可以同时使用 Kuboard v3.x 和 Kuboard v2.0.x;

  • Kuboard v3.x 支持 amd64 (x86) 架构和 arm68 (armv8) 架构的 CPU;

点击此处可以查看 在线演示 账号:demo 密码:demo123

安装教程:https://kuboard.cn/install/v3/install.html

发布于: 2021 年 08 月 26 日阅读数: 192
用户头像

玏佾

关注

还未添加个人签名 2013.06.26 加入

一个多年.net经验的javaer。

评论 (1 条评论)

发布
用户头像
可以用lank8s.cn代替k8s.gcr.io拉取K8S的镜像,如果是用kubeadm命令的话加上参数--image-repository=lank8s.cn就可以了
2021 年 10 月 24 日 21:24
回复
没有更多了
Ubuntu Server 20.04 搭建安装Kubernetes