写点什么

kubernetes 集群安装(kubeadm)

用户头像
小小文
关注
发布于: 2020 年 07 月 18 日

一、环境准备

1、服务器准备

总共5台机器,3台master 2台node节点

系统:CentOS Linux release 7.5 内存:8G 磁盘:50G

最小化安装

10.103.22.231 master01 haproxy keepalived

10.103.22.232 master02 haproxy keepalived

10.103.22.233 master03 haproxy keepalived

10.103.22.234 node04

10.103.22.235 node05

2、系统设置

设置主机名

hostnamectl set-hostname <your_hostname>

修改hosts文件

cat >> /etc/hosts <<EOF
10.103.22.231 master01
10.103.22.232 master02
10.103.22.233 master03
10.103.22.234 node04
10.103.22.235 node05
EOF

安装依赖包

yum install -y conntrack ipvsadm ipset jq iptables curl sysstat libseccomp wget vim yum-utils device-mapper-persistent-data lvm2 net-tools ntpdate telnet

设置防火墙为 Iptables 并设置空规则

systemctl stop firewalld && systemctl disable firewalld
yum -y install iptables-services && systemctl start iptables && systemctl enable iptables && iptables -F && service iptables save

关闭swap

swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

关闭 SELINUX

setenforce 0 && sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config

调整内核参数,对于 K8S

cat > /etc/sysctl.d/kubernetes.conf <<EOF
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_forward=1
net.ipv4.tcp_tw_recycle=0
vm.swappiness=0 # 禁止使用 swap 空间,只有当系统 OOM 时才允许使用它
vm.overcommit_memory=1 # 不检查物理内存是否够用
vm.panic_on_oom=0 # 开启 OOM
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576
fs.file-max=52706963
fs.nr_open=52706963
net.ipv6.conf.all.disable_ipv6=1
net.netfilter.nf_conntrack_max=2310720
EOF
sysctl -p /etc/sysctl.d/kubernetes.conf

加载ipvs模块

cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_sh
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- nf_conntrack_ipv4
EOF
chmod +x /etc/sysconfig/modules/ipvs.modules && sh /etc/sysconfig/modules/ipvs.modules
lsmod |grep ip_vs
#检查模块是否加载生效,如果如下所示表示已经加载ipvs模版到内核了。
lsmod | grep ip_vs
ip_vs_wrr 12697 0
ip_vs_rr 12600 0
ip_vs_sh 12688 0
ip_vs 141432 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack 133053 3 ip_vs,xt_conntrack,nf_conntrack_ipv4
libcrc32c 12644 3 xfs,ip_vs,nf_conntrack

请确保ipset也已经安装了,如未安装请执行yum install -y ipset安装。



调整系统时区

# 设置系统时区为 中国/上海
timedatectl set-timezone Asia/Shanghai
# 将当前的 UTC 时间写入硬件时钟
timedatectl set-local-rtc 0
# 重启依赖于系统时间的服务
systemctl restart rsyslog
systemctl restart crond

关闭系统不需要服务

systemctl stop postfix && systemctl disable postfix

将可执行文件路径 /opt/kubernetes/bin 添加到 PATH 变量

echo 'PATH=/opt/kubernetes/bin:$PATH' >> /etc/profile.d/kubernetes.sh
source /etc/profile.d/kubernetes.sh
mkdir -p /opt/kubernetes/{bin,cert,script}

3、配置SSH免密登录

新密钥对

ssh-keygen -t rsa

分发公钥到各个节点

# 把id_rsa.pub文件内容copy到其他机器的授权文件中
cat ~/.ssh/id_rsa.pub
# 在其他节点执行下面命令(包括worker节点)
mkdir -p ~/.ssh/
echo "<file_content>" >> ~/.ssh/authorized_keys

4、准备Docker环境

清理原有版本

yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine

添加Docker yum源

yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo

安装docker

目前根据kubernetes容器运行时文档,安装如下版本:

yum install -y \
containerd.io-1.2.10 \
docker-ce-19.03.4 \
docker-ce-cli-19.03.4

docker-ce:docker服务器

docker-ce-cli:docker客户端

containerd.io:用于管理主机系统的完整容器生命周期,从映像传输和存储到容器执行和监视,再到底层存储、网络附件等等



配置docker daemon.json文件

mkdir /etc/docker
cat > /etc/docker/daemon.json <<EOF
{
"registry-mirrors": ["https://zvakwn5b.mirror.aliyuncs.com"],
"data-root": "/data/docker",
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF

创建docker数据目录

NODE_IPS=("master01" "master02" "master03" "node04" "node05")
for node_ip in ${NODE_IPS[@]};do
ssh root@${node_ip} "mkdir -p /data/docker/"
done

启动 Docker

systemctl enable docker
systemctl daemon-reload
systemctl start docker

二、安装kubernetes集群

1、下载etcd二进制包

etcd包下载:

地址:https://github.com/etcd-io/etcd/releases/tag/v3.4.9

cd /root/software
wget https://github.com/etcd-io/etcd/releases/download/v3.4.9/etcd-v3.4.9-linux-amd64.tar.gz

包名:

etcd-v3.4.9-linux-amd64.tar.gz

解压安装包:

mkdir -p /root/software/{master,worker}
mv etcd-v3.4.9-linux-amd64.tar.gz /root/software
cd /root/software
tar zxvf etcd-v3.4.9-linux-amd64.tar.gz
cp etcd-v3.4.9-linux-amd64/{etcd,etcdctl} master/

2、创建 CA 证书和秘钥

安装 cfssl 工具集

mkdir -p /opt/kubernetes/{bin,cert} &&cd /opt/kubernetes
mkdir -p /etc/kubernetes/pki/etcd
wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
mv cfssl_linux-amd64 /opt/kubernetes/bin/cfssl
wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
mv cfssljson_linux-amd64 /opt/kubernetes/bin/cfssljson
wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
mv cfssl-certinfo_linux-amd64 /opt/kubernetes/bin/cfssl-certinfo
chmod +x /opt/kubernetes/bin/*

创建配置文件

cat > /etc/kubernetes/pki/etcd/ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
}
}
}
}
EOF

创建证书签名请求文件

cat > /etc/kubernetes/pki/etcd/ca-csr.json <<EOF
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "ShangHai",
"L": "ShangHai",
"O": "ops",
"OU": "zhonggang"
}
]
}
EOF

生成 CA 证书和私钥

cd /etc/kubernetes/pki/etcd
cfssl gencert -initca ca-csr.json | cfssljson -bare ca

分发根证书文件

NODE_IPS=("master01" "master02" "master03")
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p mkdir -p /etc/kubernetes/pki/etcd"
scp /etc/kubernetes/pki/etcd/ca*.pem /etc/kubernetes/pki/etcd/ca-config.json root@${node_ip}:/etc/kubernetes/pki/etcd
done

3、部署etcd集群

创建 etcd 证书和私钥

创建证书签名请求文件

cat > /etc/kubernetes/pki/etcd/etcd-csr.json <<EOF
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"10.103.22.231",
"10.103.22.232",
"10.103.22.233"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "ShangHai",
"L": "ShangHai",
"O": "ops",
"OU": "zhonggang"
}
]
}
EOF

生成证书和私钥

cd /etc/kubernetes/pki/etcd
cfssl gencert -ca=/etc/kubernetes/pki/etcd/ca.pem \
-ca-key=/etc/kubernetes/pki/etcd/ca-key.pem \
-config=/etc/kubernetes/pki/etcd/ca-config.json \
-profile=kubernetes etcd-csr.json | cfssljson -bare etcd

分发生成的证书和私钥到各 etcd 节点

NODE_IPS=("master01" "master02" "master03")
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
scp /root/software/master/etcd* root@${node_ip}:/opt/kubernetes/bin
ssh root@${node_ip} "chmod +x /opt/kubernetes/bin/*"
ssh root@${node_ip} "mkdir -p /etc/kubernetes/pki/etcd"
scp /etc/kubernetes/pki/etcd/etcd*.pem root@${node_ip}:/etc/kubernetes/pki/etcd
done

创建etcd 的systemd unit 模板

cat > /etc/kubernetes/pki/etcd/etcd.service.template <<EOF
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=etcd start
[Service]
User=root
Type=notify
WorkingDirectory=/opt/lib/etcd/
ExecStart=/opt/kubernetes/bin/etcd \
--data-dir=/opt/lib/etcd \
--name ##NODE_NAME## \
--cert-file=/etc/kubernetes/pki/etcd/etcd.pem \
--key-file=/etc/kubernetes/pki/etcd/etcd-key.pem \
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \
--peer-cert-file=/etc/kubernetes/pki/etcd/etcd.pem \
--peer-key-file=/etc/kubernetes/pki/etcd/etcd-key.pem \
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \
--peer-client-cert-auth \
--client-cert-auth \
--listen-peer-urls=https://##NODE_IP##:2380 \
--initial-advertise-peer-urls=https://##NODE_IP##:2380 \
--listen-client-urls=https://##NODE_IP##:2379,http://127.0.0.1:2379 \
--advertise-client-urls=https://##NODE_IP##:2379 \
--initial-cluster-token=etcd-cluster-0 \
--initial-cluster=etcd0=https://10.103.22.231:2380,etcd1=https://10.103.22.232:2380,etcd2=https://10.103.22.233:2380 \
--initial-cluster-state=new
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF

注:

User :指定以 k8s 账户运行;

WorkingDirectory 、 --data-dir :指定工作目录和数据目录为/opt/lib/etcd ,需在启动服务前创建这个目录;

--name :指定节点名称,当 --initial-cluster-state 值为 new 时, --name 的参数值必须位于 --initial-cluster 列表中;

--cert-file 、 --key-file :etcd server 与 client 通信时使用的证书和私钥;

--trusted-ca-file :签名 client 证书的 CA 证书,用于验证 client 证书;

--peer-cert-file 、 --peer-key-file :etcd 与 peer 通信使用的证书和私钥;

--peer-trusted-ca-file :签名 peer 证书的 CA 证书,用于验证 peer 证书;



为各节点创建和分发 etcd systemd unit 文件和etcd 数据目录

#替换模板文件中的变量,为各节点创建 systemd unit 文件
NODE_NAMES=("etcd0" "etcd1" "etcd2")
NODE_IPS=("10.103.22.231" "10.103.22.232" "10.103.22.233")
for (( i=0; i < 3; i++ ));do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/g" -e "s/##NODE_IP##/${NODE_IPS[i]}/g" /etc/kubernetes/pki/etcd/etcd.service.template > /etc/kubernetes/pki/etcd/etcd-${NODE_IPS[i]}.service
done
#分发生成的 systemd unit 和etcd的配置文件:
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /opt/lib/etcd"
scp /etc/kubernetes/pki/etcd/etcd-${node_ip}.service root@${node_ip}:/etc/systemd/system/etcd.service
done

启动 etcd 服务

vim /opt/kubernetes/script/etcd.sh

NODE_IPS=("10.103.22.231" "10.103.22.232" "10.103.22.233")
#启动 etcd 服务
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable etcd && systemctl restart etcd"
done
#检查启动结果,确保状态为 active (running)
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status etcd|grep Active"
done
#验证服务状态,输出均为healthy 时表示集群服务正常
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ETCDCTL_API=3 /opt/kubernetes/bin/etcdctl \
--endpoints=https://${node_ip}:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.pem \
--cert=/etc/kubernetes/pki/etcd/etcd.pem \
--key=/etc/kubernetes/pki/etcd/etcd-key.pem endpoint health
done

显示信息:

>>> 10.103.22.231

https://10.103.22.231:2379 is healthy: successfully committed proposal: took = 14.831695ms

>>> 10.103.22.232

https://10.103.22.232:2379 is healthy: successfully committed proposal: took = 21.961696ms

>>> 10.103.22.233

https://10.103.22.233:2379 is healthy: successfully committed proposal: took = 20.714393ms

4、安装负载均衡器

3台master节点安装 keepalived haproxy

yum install -y keepalived haproxy

配置haproxy 配置文件

vim /etc/haproxy/haproxy.cfg

global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /var/run/haproxy-admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
nbproc 1
defaults
log global
timeout connect 5000
timeout client 10m
timeout server 10m
listen admin_stats
bind 0.0.0.0:10080
mode http
log 127.0.0.1 local0 err
stats refresh 30s
stats uri /status
stats realm welcome login\ Haproxy
stats auth along:along123
stats hide-version
stats admin if TRUE
listen kube-master
bind 0.0.0.0:8443
mode tcp
option tcplog
balance source
server master01 10.103.22.231:6443 check inter 2000 fall 2 rise 2 weight 1
server master02 10.103.22.232:6443 check inter 2000 fall 2 rise 2 weight 1
server master03 10.103.22.233:6443 check inter 2000 fall 2 rise 2 weight 1

注:

haproxy 在 10080 端口输出 status 信息;

haproxy 监听所有接口的 8443 端口,该端口与环境变量 ${KUBE_APISERVER} 指定的端口必须一致;

server 字段列出所有kube-apiserver监听的 IP 和端口;



下发haproxy 配置文件;并启动检查haproxy服务

vim /opt/kubernetes/script/haproxy.sh

NODE_IPS=("master01" "master02" "master03")
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
#下发配置文件
scp /etc/haproxy/haproxy.cfg root@${node_ip}:/etc/haproxy
#启动检查haproxy服务
ssh root@${node_ip} "systemctl restart haproxy"
ssh root@${node_ip} "systemctl enable haproxy.service"
ssh root@${node_ip} "systemctl status haproxy|grep Active"
#检查 haproxy 是否监听8443 端口
ssh root@${node_ip} "netstat -lnpt|grep haproxy"
done

确保输出类似于:

tcp 0 0 0.0.0.0:10080 0.0.0.0:* LISTEN 20027/haproxy

tcp 0 0 0.0.0.0:8443 0.0.0.0:* LISTEN 20027/haproxy



配置和启动 keepalived 服务

keepalived 多备(backup)运行模式,使用非抢占模式

这样配置防止master出现故障在恢复的时候立即抢占VIP (apiserver启动提供服务需要时间,防止在apiserver未提供服务期间有请求调度过来)

backup:10.103.22.231、10.103.22.232、10.103.22.233

设置keeplived配置文件:

vim /etc/keepalived/keepalived.conf

global_defs {
router_id keepalived_hap
}
vrrp_script check-haproxy {
script "killall -0 haproxy"
interval 5
weight -5
}
vrrp_instance VI-kube-master {
state BACKUP
nopreempt
priority 200
dont_track_primary
interface ens160
virtual_router_id 68
advert_int 3
track_script {
check-haproxy
}
virtual_ipaddress {
10.103.22.236
}
}

注:

我的VIP 所在的接口nterface 为 eth1;根据自己的情况改变

使用 killall -0 haproxy 命令检查所在节点的 haproxy 进程是否正常。如果异常则将权重减少(-30),从而触发重新选主过程;

router_id、virtual_router_id 用于标识属于该 HA 的 keepalived 实例,如果有多套keepalived HA,则必须各不相同;

重点:

1、三个节点的state都必须配置为BACKUP

2、三个节点都必须加上配置 nopreempt

3、其中一个节点的优先级必须要高于另外两个节点的优先级

4、在master02和master03的配置中只需将priority 200调成150和100即可



开启keepalived 服务

NODE_IPS=("master01" "master02" "master03")
VIP="10.103.22.236"
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl restart keepalived && systemctl enable keepalived"
ssh root@${node_ip} "systemctl status keepalived|grep Active"
ssh ${node_ip} "ping -c 3 ${VIP}"
done

查看网卡ip:ip a show ens160

2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000

link/ether 00:50:56:a0:ed:1f brd ff:ff:ff:ff:ff:ff

inet 10.103.22.231/24 brd 10.103.22.255 scope global noprefixroute ens160

validlft forever preferredlft forever

inet 10.103.22.236/32 scope global ens160

validlft forever preferredlft forever



5、安装kubeadm 和 kubelet

注: 只在master、node节点上进行以下操作。

cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes Repository
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
gpgcheck=1
gpgkey=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
EOF



yum install -y kubelet-1.17.8 kubeadm-1.17.8 kubectl-1.17.8
# 设置开机自启动kubelet
systemctl enable kubelet

kubelet负责与其他节点集群通信,并进行本节点Pod和容器生命周期的管理。kubeadm是Kubernetes的自动化部署工具,降低了部署难度,提高效率。kubectl是Kubernetes集群客户端管理工具。

kubect可以在只需要使用的机器上安装即可。

6、部署kubernetes master

以下内容是根据kubeadm config print init-defaults指令打印出来的,并根据自己需求手动修改后的结果。

kubeadm config view
kubeadm config print init-defaults > kubeadm-init.yaml.yaml
cat kubeadm-init.yaml.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.17.8
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
controlPlaneEndpoint: 10.103.22.236:8443
apiServer:
certSANs:
- 10.103.22.231
- 10.103.22.232
- 10.103.22.233
- 10.103.22.236
- 127.0.0.1
etcd:
external:
endpoints:
- https://10.103.22.231:2379
- https://10.103.22.232:2379
- https://10.103.22.233:2379
caFile: /etc/kubernetes/pki/etcd/ca.pem
certFile: /etc/kubernetes/pki/etcd/etcd.pem
keyFile: /etc/kubernetes/pki/etcd/etcd-key.pem
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs

其中apiServer.certSANS中配置的是所有要和apiserver交互的地址,包括VIP。 etcd.external.endpoints 配置的是外部etcd集群,其中也指定了etcd证书路径,这就是为什么etcd的证书要复制到kubernetes所有master节点的原因。

如果有网络环境需要配置一下docker代理:

mkdir -p /etc/systemd/system/docker.service.d
vim /etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://IP:port" "HTTPS_PROXY=IP:port" "NO_PROXY=localhost,127.0.0.1,registry.aliyuncs.com/google_containers"



#可以使用以下方式提前下载好镜像,然后再初始化
kubeadm config images pull --config kubeadm-init.yaml
# 初始化
kubeadm init --config=kubeadm-init.yaml --upload-certs | tee kubeadm-init.log
#如果有报错报错,报错是由于使用二进制部署的etcd服务,kubeadm初始化会检查一下端口
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
#使用如下方式初始化即可
kubeadm init --config=kubeadm-init.yaml --upload-certs --ignore-preflight-errors=Port-2379 --ignore-preflight-errors=Port-2380 | tee kubeadm-init.log
#正常情况下的输出日志:
[init] Using Kubernetes version: v1.17.8
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.103.22.231 10.103.22.236 10.103.22.231 10.103.22.232 10.103.22.233 10.103.22.236 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] External etcd mode: Skipping etcd/ca certificate authority generation
[certs] External etcd mode: Skipping etcd/server certificate generation
[certs] External etcd mode: Skipping etcd/peer certificate generation
[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation
[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0713 10:00:44.006478 22455 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0713 10:00:44.008302 22455 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 39.022851 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.17" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[kubelet-check] Initial timeout of 40s passed.
[upload-certs] Using certificate key:
d5f08ec8a45b09cc4d8c7122064503f57ae1b76fa179499a22111ce667c466cd
[mark-control-plane] Marking the node master01 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node master01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: ixbybk.ybid1swqmqjvs9w4
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf \
--control-plane --certificate-key d5f08ec8a45b09cc4d8c7122064503f57ae1b76fa179499a22111ce667c466cd
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf

当你不能科学上网时,可以手动从别的镜像仓库中下载,这样也能达到我们的要求,如下操作:

#查看需要镜像名称
kubeadm config images list
k8s.gcr.io/kube-apiserver:v1.17.8
k8s.gcr.io/kube-controller-manager:v1.17.8
k8s.gcr.io/kube-scheduler:v1.17.8
k8s.gcr.io/kube-proxy:v1.17.8
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.4.3-0
k8s.gcr.io/coredns:1.6.5

配置执行kubectl命令用户

#kubernetes建议使用非root用户运行kubectl命令访问集群
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

安装calico网络插件

使用etcd数据存储安装Calico

1、下载etcd的Calico网络清单。

curl https://docs.projectcalico.org/manifests/calico-etcd.yaml -o calico.yaml

2、根据环境调整配置

vim calico.yaml
#Secret 对象调整,填入命令获取的值
#etcd-key: (cat /etc/kubernetes/pki/etcd/etcd-key.pem | base64 | tr -d '\n')
#etcd-cert: (cat /etc/kubernetes/pki/etcd/etcd.pem | base64 | tr -d '\n')
#etcd-ca: (cat /etc/kubernetes/pki/etcd/ca.pem | base64 | tr -d '\n')
#ConfigMap对象调整
data:
# Configure this with the location of your etcd cluster.
#配置etcd集群的地址
etcd_endpoints: "https://10.103.22.231:2379,https://10.103.22.231:2379,https://10.103.22.231:2379"
#ConfigMap 对象调整,去掉注释
etcd_ca: "/calico-secrets/etcd-ca" # "/calico-secrets/etcd-ca"
etcd_cert: "/calico-secrets/etcd-cert" # "/calico-secrets/etcd-cert"
etcd_key: "/calico-secrets/etcd-key" # "/calico-secrets/etcd-key"
#DaemonSet 对象调整
- name: CALICO_IPV4POOL_CIDR
value: "10.98.0.0/16"
#此处会默认拿kuber-apiservice的service的clusterip地址
#也可以用apiserver的vip地址(keepalived的vip地址,haproxy代理)
#- name: KUBERNETES_SERVICE_HOST
# value: "10.103.22.236"
#- name: KUBERNETES_SERVICE_PORT
# value: "8443"
#- name: KUBERNETES_SERVICE_PORT_HTTPS
# value: "8443"
#安装calico
kubectl apply -f calico.yaml

7、master2和master3加入集群

kubernetes证书分发

复制master1生成的证书文件到master2和master3,因为kubernetes使用证书配置集群,所以其它节点要使用同一套证书。

# 在master1上操作
#这里还有etcd证书,因为在配置etcd的时候我们已经复制过去了,所以这里不用在复制了。
cd /etc/kubernetes/pki/
scp ca.* sa.* front-proxy-ca.* master02:/etc/kubernetes/pki/
scp ca.* sa.* front-proxy-ca.* master03:/etc/kubernetes/pki/
scp /etc/kubernetes/admin.conf master01:/etc/kubernetes/
scp /etc/kubernetes/admin.conf master02:/etc/kubernetes/

手动载入组件镜像

为了节约安装时间,我们这里手动将master1上的组件镜像载入到master2和master3上。如果不手动载入也可以,时间比较久而已。当然如果是k8s.gcr.io的仓库地址的话记得要科学上网。

docker save k8s.gcr.io/kube-proxy:v1.17.8 k8s.gcr.io/kube-controller-manager:v1.17.8 k8s.gcr.io/kube-apiserver:v1.17.8 k8s.gcr.io/kube-scheduler:v1.17.8 k8s.gcr.io/coredns:1.6.5 k8s.gcr.io/pause:3.1 calico/node:v3.15.1 calico/pod2daemon-flexvol:v3.15.1 calico/cni:v3.15.1 calico/kube-controllers:v3.15.1 -o k8s-masterimages-v1.17.8.tar
scp k8s-masterimages-v1.17.8.tar master02:/root
scp k8s-masterimages-v1.17.8.tar master03:/root



# 在master2和master3上手动加载镜像
docker load -i k8s-masterimages-v1.17.8.tar
docker images

加入集群

使用kubeadm安装的集群要让一个新节点加入集群那是相当简单了,如下:

kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf \
--control-plane --certificate-key d5f08ec8a45b09cc4d8c7122064503f57ae1b76fa179499a22111ce667c466cd
#报错error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
#重新生成证书
kubeadm init phase upload-certs --upload-certs
210c2cbf748afcbc17d72c0f2d0a57fd407f21e1f453b14f6f6e29ed9099eb71
#再次加入
kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf \
--control-plane --certificate-key 210c2cbf748afcbc17d72c0f2d0a57fd407f21e1f453b14f6f6e29ed9099eb71
#继续报错
error execution phase control-plane-prepare/download-certs: error downloading certs: the Secret does not include the required certificate or key - name: external-etcd-ca.crt, path: /etc/kubernetes/pki/etcd/ca.pem
#分析由于etcd是外部部署的,重新生成秘钥的时候需要带上kubeadm-init.yaml文件
kubeadm init phase upload-certs --upload-certs --config kubeadm-init.yaml
b206fb1b860569770acb33f0eb717d9cc0b95304d398023399b4e2e4ec588350
#再再次加入,成功
kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf \
--control-plane --certificate-key b206fb1b860569770acb33f0eb717d9cc0b95304d398023399b4e2e4ec588350
This node has joined the cluster and a new control plane instance was created:
* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
To start administering your cluster from this node, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Run 'kubectl get nodes' to see this node join the cluster.

至此master2和master3以master身份加入集群成功,而--experimental-control-plane参数是主要参数。



配置执行kubectl命令用户

#kubernetes建议使用非root用户运行kubectl命令访问集群
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

8、node节点加入集群

kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf

查看节点数及状态

kubectl get node
NAME STATUS ROLES AGE VERSION
master01 Ready master 25h v1.17.8
master02 Ready master 17h v1.17.8
master03 Ready master 17h v1.17.8
node04 Ready <none> 101m v1.17.8
node05 Ready <none> 105m v1.17.8

查看集群组件状态

kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-1 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}

9、验证集群

通过上面的步骤,我们已经完成了集群的配置安装,现在我们要验证高可用集群的特性,看看是否能高可用,要验证的有:



停掉当前已选举的master来验证组件是否会重新选举。

停掉某个etcd来验证etcd的集群是否可用。

停掉vip地址所在的主机服务,验证vip是否会偏移到另外一台HA上并且集群可用。



验证master选举高可用

要验证masrer集群是否高可用,我们先来查看当前各组件选举在哪台master节点上。

查看kube-controller-manager服务

kubectl get endpoints kube-controller-manager -n kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master02_e4a4ca3d-a5f3-4531-b01c-37ab4758e5dc","leaseDurationSeconds":15,"acquireTime":"2020-07-14T01:25:37Z","renewTime":"2020-07-14T03:11:53Z","leaderTransitions":5}'
creationTimestamp: "2020-07-13T02:01:22Z"
name: kube-controller-manager
namespace: kube-system
resourceVersion: "223470"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager
uid: 5d3c1653-919d-4765-ab2f-82e47b623416
#目前看leader 在master02节点

查看kube-scheduler服务

kubectl get endpoints kube-scheduler -n kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master02_d0fd3a81-3b13-421a-81a8-3eb0220ea67c","leaseDurationSeconds":15,"acquireTime":"2020-07-14T01:25:36Z","renewTime":"2020-07-14T03:17:12Z","leaderTransitions":5}'
creationTimestamp: "2020-07-13T02:01:21Z"
name: kube-scheduler
namespace: kube-system
resourceVersion: "224353"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler
uid: f345b2a4-970e-46cb-93b8-790d678d0201
#目前看leader 在master02节点

查看下来kuber-controller-mamager和kube-scheduler都在master02节点上,现在将master02节点关机再次查看。

关机后查看kube-controller-manager服务

kubectl get endpoints kube-controller-manager -n kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master03_369df996-06d1-4bf1-a6a4-38e5dd4a404f","leaseDurationSeconds":15,"acquireTime":"2020-07-14T03:21:44Z","renewTime":"2020-07-14T03:22:04Z","leaderTransitions":6}'
creationTimestamp: "2020-07-13T02:01:22Z"
name: kube-controller-manager
namespace: kube-system
resourceVersion: "225130"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager
uid: 5d3c1653-919d-4765-ab2f-82e47b623416
#目前看leader 在master03节点

关机后查看kube-scheduler服务

kubectl get endpoints kube-scheduler -n kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master03_94056bae-7383-4104-ab54-d80ea4971a92","leaseDurationSeconds":15,"acquireTime":"2020-07-14T03:21:41Z","renewTime":"2020-07-14T03:21:54Z","leaderTransitions":6}'
creationTimestamp: "2020-07-13T02:01:21Z"
name: kube-scheduler
namespace: kube-system
resourceVersion: "225100"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler
uid: f345b2a4-970e-46cb-93b8-790d678d0201
#目前看leader 在master03节点

由上可以看出kuber-controller-mamager和kube-scheduler都切换到master03节点,kubernetes高可用验证没有问题。



验证etcd集群高可用

我们已经知道,kubernetes所有操作的配置以及组件维护的状态都要存储在etcd中,当etcd不能用时,整个kubernetes集群也不能正常工作了。那么我们关掉etcd2节点来测试(由于刚才停掉了master02节点所以用etcd2节点来验证)。

查看当前组件监控状态:

查看集群组件状态

kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-1 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}

将master02节点关机,etcd2也会停掉,查看etcd状态

/opt/kubernetes/bin/etcdctl \
--endpoints=https://10.103.22.232:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.pem \
--cert=/etc/kubernetes/pki/etcd/etcd.pem \
--key=/etc/kubernetes/pki/etcd/etcd-key.pem endpoint health
{"level":"warn","ts":"2020-07-14T13:20:52.166+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-38b567a9-b35b-4069-9d54-ba130f5c1cf1/10.103.22.232:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.103.22.232:2379: connect: no route to host\""}
https://10.103.22.232:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
#已经不可达

已经很明显的看到了etcd1这台服务已经不可达了

回到kubernetes master1节点上,我们再次查看组件健康状态

kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-1 Unhealthy Get https://10.103.22.232:2379/health: dial tcp 10.103.22.232:2379: connect: no route to host

我们也可以通过kubernetes查看到etcd1已经连接失败了



现在查看一下kubernetes集群会受影响吗

kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 Ready master 27h v1.17.8
master02 NotReady master 19h v1.17.8
master03 Ready master 19h v1.17.8
node04 Ready <none> 4h2m v1.17.8
node05 Ready <none> 4h6m v1.17.8

可以看到etcd1节点服务宕机后,我们的kubernetes依然可以用,因此etcd集群也是高可用的。



验证vip地址漂移

我们都知道我们的集群apiserver是通过haproxy的vip地址做反向代理的。可以通过以下命令查看

kubectl cluster-info
Kubernetes master is running at https://10.103.22.236:8443
KubeDNS is running at https://10.103.22.236:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

查看vip地址

ip addr
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a0:ed:1f brd ff:ff:ff:ff:ff:ff
inet 10.103.22.231/24 brd 10.103.22.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet 10.103.22.236/32 scope global ens160
valid_lft forever preferred_lft forever

可以看到vip(10.103.22.236)在master01(10.103.22.231)上,直接将master01服务器关机

验证vip(haproxy的vip地址)的漂移和kubernetes集群是否可用



查看vip地址已经漂移到master02(10.103.22.232)

ip add
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a0:e3:b1 brd ff:ff:ff:ff:ff:ff
inet 10.103.22.232/24 brd 10.103.22.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet 10.103.22.236/32 scope global ens160
valid_lft forever preferred_lft forever

验证kubernetes集群是否可用

kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 NotReady master 27h v1.17.8
master02 Ready master 20h v1.17.8
master03 Ready master 20h v1.17.8
node04 Ready <none> 4h10m v1.17.8
node05 Ready <none> 4h14m v1.17.8

显示我们的kubectl客户端依然可以通过VIP访问kubernetes集群。因此HA高可用验证成功。



用户头像

小小文

关注

还未添加个人签名 2018.02.06 加入

还未添加个人简介

评论

发布
暂无评论
kubernetes 集群安装(kubeadm)