容器云项目-kubernetes集群部署

一、集群节点规划

集群为1个master,4个node节点,master默认不允许调度pod

Hostname Ip Role os source components
node19 10.0.0.19 master,control-plane centos7.9 4c8g etcd kube-apiserver kube-controller-manager kube-proxy kube-schedule calico docker kubelet kubeadm
node20 10.0.0.20 node centos7.9 4c8g calico docker kubelet kube-proxy kubeadm
node21 10.0.0.21 node centos7.9 4c8g calico docker kubelet kube-proxy kubeadm
node22 10.0.0.22 node centos7.9 4c8g calico docker kubelet kube-proxy kubeadm
node23 10.0.0.23 node centos7.9 4c8g calico docker kubelet kube-proxy kubeadm

coredns会随机部署在node节点中,默认是2实例pod副本

二、基础环境

node19执行以下操作

1、生成公钥
sshkey-gen一路回车即可,生成的公钥位于/root/.ssh/

2、设置节点免密

ssh-copy-id -i /root/.ssh/id_rsa.pub node20  
ssh-copy-id -i /root/.ssh/id_rsa.pub node21
ssh-copy-id -i /root/.ssh/id_rsa.pub node22
ssh-copy-id -i /root/.ssh/id_rsa.pub node23

3、部署ntp服务

#!/bin/bash

#!/bin/bash

# 安装ntp服务
yum -y install ntp ntpdate
ntpdate ntp1.aliyun.com
ntpdate ntp2.aliyun.com

# 备份ntp配置文件
[ -f "/etc/ntp.conf" ] && mv /etc/ntp.conf /etc/ntp.confbak

# 配置ntp.conf
cat <<EOF>> /etc/ntp.conf
restrict default nomodify notrap noquery
 
restrict 127.0.0.1
restrict 10.0.0.0 mask 255.255.255.0 nomodify    
#只允许$net网段的客户机进行时间同步。如果允许任何IP的客户机都可以进行时间同步,就修改为"restrict default nomodify"
 
server ntp1.aliyun.com
server ntp2.aliyun.com
server time1.aliyun.com
server time2.aliyun.com
server time-a.nist.gov
server time-b.nist.gov
 
server  127.127.1.0     
# local clock
fudge   127.127.1.0 stratum 10
 
driftfile /var/lib/ntp/drift
broadcastdelay  0.008
keys            /etc/ntp/keys
EOF
# 启动服务
systemctl restart ntpd
systemctl enable ntpd
systemctl daemon-reload
# 加入计划任务
cat <<EOF>> /etc/crontab
0 0,6,12,18 * * * /usr/sbin/ntpdate ntp1.aliyun.com; /sbin/hwclock -w
EOF
systemctl restart crond

node19~node23执行以下操作

1、禁用swap
swap -a或注释/etc/fstab文件swap挂载部分

2、禁用selinux
sed -i 's/enforcing/disabled/g' /etc/selinux/config

3、禁用防火墙
systemctl stop firewalld && systemctl disable firewalld

4、host解析

#!/bin/bash

cat <<EOF>> /etc/hosts
10.0.0.19 node19 
10.0.0.20 node20 
10.0.0.21 node21 
10.0.0.22 node22 
10.0.0.23 node23 
EOF

5、配置yum源
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.163.com/.help/CentOS7-Base-163.repo && yum makecache

6、配置时区
echo "Asia/Shanghai" > /etc/timezone
ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

7、时间同步
yum -y install ntpdate && ntpdate 10.0.0.19
加入计划任务
echo "0 0,6,12,18 * * * /usr/sbin/ntpdate ntp1.aliyun.com;/sbin/hwclock -w" >> /etc/crontab

三、基础软件部署

node19~node23执行以下操作

1、部署docker
下载二进制包 https://download.docker.com/linux/static/stable/x86_64/docker-19.03.6.tgz

解压文件

tar xzvf docker-19.03.6.tgz
cp -ar docker/* /usr/bin

创建系统服务配置文件

#!/bin/bash

cat <<EOF>> /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
ExecStart=/usr/bin/dockerd
ExecReload=/bin/kill -s HUP 
LimitNOFILE=infinity
LimitNPROC=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
EOF   

配置镜像源加速与Cgroup driver

#!/bin/bash

cat <<EOF>> /etc/docker/daemon.json
{
    "registry-mirrors":["https://nr630v1c.mirror.aliyuncs.com"],
    "exec-opts": ["native.cgroupdriver=systemd"],
    "log-driver": "json-file",
    "log-opts": {
    "max-size": "100m"
    },
    "storage-driver": "overlay2",
    "storage-opts": [
    "overlay2.override_kernel_check=true"
    ]
}
EOF   

# 启动docker
systemctl start docker && systemctl enable docker  

2、部署kubelet kubeadm kubectl
配置kubernetes源

#!/bin/bash

cat <<EOF>> /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes Repository
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
EOF   

# 更新yum源   
yum makecache   

部署kubelet kubeadm kubectl工具
yum list kubeadm --showduplicates | sort -r查看可安装的版并安装最新版本,这里最新的是1.21.1-0
yum install kubeadm-1.21.1-0 kubectl-1.21.1-0 kubelet-1.21.1-0 --disableexcludes=kubernetes -y

3、配置网桥

#!/bin/bash

cat <<EOF>> /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

# 加载配置  
sysctl --system

4、配置kubelet Cgroup driver
将kubelet加入系统服务
systemctl enable kubelet,不要启动kubelet,集群初始化时会自动启动

驱动要与docker一致
echo "KUBELET_KUBEADM_EXTRA_ARGS=--cgroup-driver=systemd" > /etc/default/kubelet
在文件/etc/systemd/system/multi-user.target.wants/kubelet.service的service模块中添加
EnvironmentFile=-/etc/default/kubelet

四、集群部署

node19~node23执行以下操作
1、创建集群初始化文件

#!/bin/bash

cat <<EOF>> init-config.yaml   
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
imageRepository: registry.aliyuncs.com/google_containers
kubernetesVersion: v1.21.1
dns: 
  type: CoreDNS
networking: 
  serviceSubnet: "10.96.0.0/16"   # 服务虚拟ip网段
  podSubnet: "10.244.0.0/16" # pod网段
  dnsDomain: "cluster.local" # 集群内部域名  
EOF

2、镜像拉取
kubeadm config images pull --config=init-config.yaml

node19执行以下操作
1、kubeadm init --config=init-config.yaml,初始化结束后请记住集群加入命令,后面添加node需要该信息

根据提示继续以下操作

#!/bin/bash

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config 

node20~node23执行以下操作
1、创建加入集群文件

#!/bin/bash

cat <<EOF>>  join-config.yaml  
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
discovery:
  bootstrapToken:
    apiServerEndpoint: 10.0.0.19:6443
    token: 3py8wm.2q9g0wnij2wayiv2
    unsafeSkipCAVerification: true
  tlsBootstrapToken: 3py8wm.2q9g0wnij2wayiv2

2、加入节点
kubeadm join --config=join-config.yaml

node19执行以下操作

1、部署calico
kubectl create -f https://docs.projectcalico.org/manifests/calico.yaml

五、验证集群

node19执行以下操作
1、集群组件运行状态验证
kubectl get cs,nodes返回

Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                                 STATUS    MESSAGE             ERROR
componentstatus/controller-manager   Healthy   ok                  
componentstatus/scheduler            Healthy   ok                  
componentstatus/etcd-0               Healthy   {"health":"true"}   

NAME          STATUS   ROLES                  AGE   VERSION
node/node19   Ready    none  10d   v1.21.1
node/node20   Ready    none                   10d   v1.21.1
node/node21   Ready    none                   10d   v1.21.1
node/node22   Ready    none                   10d   v1.21.1
node/node23   Ready    none                   10d   v1.21.1

为了好区分各个节点角色,分别给各节点打标签
kubectl label node node19 node-role.kubernetes.io/master='master'
kubectl label node node20 node-role.kubernetes.io/master='node'
kubectl label node node21 node-role.kubernetes.io/master='node'
kubectl label node node22 node-role.kubernetes.io/master='node'
kubectl label node node23 node-role.kubernetes.io/master='node'

再次查看集群

Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                                 STATUS    MESSAGE             ERROR
componentstatus/controller-manager   Healthy   ok                  
componentstatus/scheduler            Healthy   ok                  
componentstatus/etcd-0               Healthy   {"health":"true"}   

NAME          STATUS   ROLES                  AGE   VERSION
node/node19   Ready    control-plane,master   10d   v1.21.1
node/node20   Ready    node                   10d   v1.21.1
node/node21   Ready    node                   10d   v1.21.1
node/node22   Ready    node                   10d   v1.21.1
node/node23   Ready    node                   10d   v1.21.1

kubectl get po | grep -n kube-system查看所有集群组件pod状态

NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-78d6f96c7b-w522q   1/1     Running   0          9d
calico-node-6dxpp                          1/1     Running   1          10d
calico-node-j87wb                          1/1     Running   1          10d
calico-node-l5pkl                          1/1     Running   1          10d
calico-node-tkgnw                          1/1     Running   1          10d
calico-node-xd7qp                          1/1     Running   1          10d
coredns-545d6fc579-2l5ft                   1/1     Running   0          9d
coredns-545d6fc579-ncm85                   1/1     Running   1          10d
etcd-node19                                1/1     Running   1          10d
kube-apiserver-node19                      1/1     Running   0          3d18h
kube-controller-manager-node19             1/1     Running   2          10d
kube-proxy-9nlsn                           1/1     Running   1          10d
kube-proxy-cf24c                           1/1     Running   1          10d
kube-proxy-g5pqw                           1/1     Running   1          10d
kube-proxy-s8xh5                           1/1     Running   1          10d
kube-proxy-tplgz                           1/1     Running   1          10d
kube-scheduler-node19                      1/1     Running   2          10d

2、集群内部网络联通验证
创建示例
cat example.yaml

apiVersion: v1
kind: Service
metadata:
  name: dnsutils-ds
  labels:
    app: dnsutils-ds
spec:
  type: NodePort
  selector:
    app: dnsutils-ds
  ports:
  - name: http
    port: 80
    targetPort: 80
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: dnsutils-ds
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  selector:
    matchLabels:
      app: dnsutils-ds
  template:
    metadata:
      labels:
        app: dnsutils-ds
    spec:
      containers:
      - name: my-dnsutils
        image: tutum/dnsutils:latest
        command:
          - sleep
          - "3600"
        ports:
        - containerPort: 80

kubectl create -f example.yaml创建示例

kubectl get svc,po -o wide | grep dnsutils-ds查看service和pod

service/dnsutils-ds            NodePort    10.96.48.172   <none>        80:30707/TCP   3m18s   app=dnsutils-ds
pod/dnsutils-ds-288j8                      1/1     Running   0          3m18s   10.244.98.111    node23   <none>           <none>
pod/dnsutils-ds-4j8b4                      1/1     Running   0          3m18s   10.244.160.93    node20   <none>           <none>
pod/dnsutils-ds-85w5b                      1/1     Running   0          3m18s   10.244.173.242   node21   <none>           <none>
pod/dnsutils-ds-npd5c                      1/1     Running   0          3m18s   10.244.35.188    node22   <none>           <none>

跨主机pod通信验证
宿主机节点ping VIP
ping 10.96.48.172返回正常

宿主机节点ping podIP
ping 10.244.98.111返回正常

进入pod后ping VIP
kubectl exec -it dnsutils-ds-npd5c -- bash进入pod
ping 10.96.48.172返回正常

进入pod后ping宿主机
ping 10.0.0.19返回正常

内部域名解析验证
kubectl exec -it dnsutils-ds-npd5c -- bash进入pod
nslookup www.baidu.com解析外网返回

Server:		10.96.0.10  # kube-dns这个服务的ip,也是集群内的dns服务的虚拟ip   
Address:	10.96.0.10#53

Non-authoritative answer:
ww.baidu.com	canonical name = ps_other.a.shifen.com.
Name:	ps_other.a.shifen.com
Address: 39.156.66.10

nslookup kubernetes.default解析同一命名空间的service kubernetes

Server:		10.96.0.10
Address:	10.96.0.10#53

Name:	kubernetes.default.svc.cluster.local
Address: 10.96.0.1

nslookup kube-dns.kube-system解析不同命名空间的service kube-dns

Server:		10.96.0.10
Address:	10.96.0.10#53

Name:	kube-dns.kube-system.svc.cluster.local
Address: 10.96.0.10

至此整个集群部署完成

六、问题总结

问题一:POD内不能访问clusterIP和service
kubectl edit cm kube-proxy -n kube-system修改kube-proxy的mode为ipvs
kubectl get pod -n kube-system | grep kube-proxy |awk '{system("kubectl delete pod "$1" -n kube-system")}'重启kube-proxy

问题二:Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
编辑文件/etc/kubernetes/manifests/kube-scheduler.yaml和kube-controller-manager.yaml注释- --port=0,然后重启kubelet

问题三:在拉取coredns镜像报错无法拉取
根据错误提示出的镜像地址手动拉取

GitHub Repository