kubernetes故障汇总

故障一
kubectl get pods

NAME     STATUS     ROLES    AGE   VERSION
node-1   Ready      master   32d   v1.14.1
node-2   Ready      <none>   32d   v1.14.1
node-3   NotReady   <none>   32d   v1.14.1

故障节点docker ps发现所有容器无法随集群正常启动,kubelet也无法启动
查看日志
journactl -f -u kubelet

Mar 30 09:42:33 node-3 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Mar 30 09:42:33 node-3 systemd[1]: Unit kubelet.service entered failed state.
Mar 30 09:42:33 node-3 systemd[1]: kubelet.service failed.
Mar 30 09:42:44 node-3 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Mar 30 09:42:44 node-3 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Mar 30 09:42:44 node-3 systemd[1]: Starting kubelet: The Kubernetes Node Agent...
Mar 30 09:42:44 node-3 kubelet[21157]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 30 09:42:44 node-3 kubelet[21157]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 30 09:42:44 node-3 kubelet[21157]: I0330 09:42:44.152946 21157 server.go:417] Version: v1.14.1
Mar 30 09:42:44 node-3 kubelet[21157]: I0330 09:42:44.153184 21157 plugins.go:103] No cloud provider specified.
Mar 30 09:42:44 node-3 kubelet[21157]: I0330 09:42:44.153207 21157 server.go:754] Client rotation is on, will bootstrap in background
Mar 30 09:42:44 node-3 kubelet[21157]: I0330 09:42:44.165326 21157 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Mar 30 09:42:44 node-3 kubelet[21157]: I0330 09:42:44.206290 21157 server.go:625] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
Mar 30 09:42:44 node-3 kubelet[21157]: F0330 09:42:44.206680 21157 server.go:265] failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename Type Size Used Priority /data/swap file 8330984 14848 -1]
Mar 30 09:42:44 node-3 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Mar 30 09:42:44 node-3 systemd[1]: Unit kubelet.service entered failed state.
Mar 30 09:42:44 node-3 systemd[1]: kubelet.service failed.

可以看到这句错误

failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps containe

突然记起前几天因为做测试内存不够开启了swap分区,而kubelet是不支持swap功能的,导致kubelet无法启动
解决方式:
swapoff掉swap分区
systemctl restart kubelet
systemctl restart docker

故障二

invalid type for io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta.annotations: got "string"

故障原因:
yaml文件中某个字段值无空格

故障三

no matches for kind "DaemonSet" in version "extensions/v1beta1"

解决方式:
将extensions/v1beta1改为apps/v1

故障四

daemonset-controller Error creating: No API token found for service account "default", retry after the token is automatically created and added to the service account

解决方式:
设置sa
openssl genrsa -out /etc/kubernetes/serviceaccount.key 2048
apiserver添加参数 --service-account-key-file=/etc/kubernetes/serviceaccount.key
controller-manager添加参数 --service-account-private-key-file=/etc/kubernetes/serviceaccount.key
重启集群各项服务

故障五
在使用csi存储插件来作为pod数据存储时启动pod时失败,查看pod日志报错

E0607 18:19:45.454297 1686 kubelet.go:1594] Unable to attach or mount volumes for pod "my-csi-app_default(d98df92b-f7e5-45ad-a062-c77ede8949dd)": unmounted volumes=[default-token-vkfvw my-csi-volume], unattached volumes=[default-token-vkfvw my-csi-volume]: timed out waiting for the condition; skipping pod
E0607 18:19:45.454345 1686 pod_workers.go:191] Error syncing pod d98df92b-f7e5-45ad-a062-c77ede8949dd ("my-csi-app_default(d98df92b-f7e5-45ad-a062-c77ede8949dd)"), skipping: unmounted volumes=[default-token-vkfvw my-csi-volume], unattached volumes=[default-token-vkfvw my-csi-volume]: timed out waiting for the condition

解决方式:
分别查看以下日志
kubelet日志
kube-controller-manager日志
最后发现是因为storageclass配置文件写错导致,删除sc,修改sc然后重建sc,再通过pvc引用新创建的sc即可解决

故障六:
coredns处于running状态,但是却达不到预期的1/1

reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Unauthorized

解决方式:
kubectl delete secrets -n kube-system coredns-token-ldzzn
kubectl delete pod -n kube-system coredns-6b98d77f97-tfcdx

故障七:
[2020-01-10 20:57:47.132690] E [glusterfsd-mgmt.c:1940:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:heketistorage)
我有一个heketi+glusterfs部署的集群为kubernetes提供动态存储,有一次重启集群后包上面的错误,导致heketi应用pod无法创建成功
解决方式:
1、查看heketi的pod描述,发现时192.168.0.162节点没有正常挂载卷
可以通过命令gluster volume list及gluster volume status volumeName来比较每一个gluster的pod内部的卷
2、通过比较的确是节点192.168.0.162无法正常挂载卷
3、我删除162节点的gluster的pod,查看卷发现正常了
4、创建heketi显示成功

故障八:
删除ns一直处于Terminated状态,即使改ns下已无对象
解决方式:
1、单独开一个终端
kubectl proxy
2、把ns追加到json文件
kubectl get namespace namespace-name -o json |jq '.spec = {"finalizers":[]}' >temp01.json
3、调用api删除
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp01.json 127.0.0.1:8081/api/v1/namespaces/namespace-name/finalize