kubernetes故障汇总

故障一
kubectl get pods

NAME     STATUS     ROLES    AGE   VERSION
node-1   Ready      master   32d   v1.14.1
node-2   Ready      <none>   32d   v1.14.1
node-3   NotReady   <none>   32d   v1.14.1

故障节点docker ps发现所有容器无法随集群正常启动,kubelet也无法启动
查看日志
journactl -f -u kubelet

Mar 30 09:42:33 node-3 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Mar 30 09:42:33 node-3 systemd[1]: Unit kubelet.service entered failed state.
Mar 30 09:42:33 node-3 systemd[1]: kubelet.service failed.
Mar 30 09:42:44 node-3 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Mar 30 09:42:44 node-3 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Mar 30 09:42:44 node-3 systemd[1]: Starting kubelet: The Kubernetes Node Agent...
Mar 30 09:42:44 node-3 kubelet[21157]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 30 09:42:44 node-3 kubelet[21157]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 30 09:42:44 node-3 kubelet[21157]: I0330 09:42:44.152946 21157 server.go:417] Version: v1.14.1
Mar 30 09:42:44 node-3 kubelet[21157]: I0330 09:42:44.153184 21157 plugins.go:103] No cloud provider specified.
Mar 30 09:42:44 node-3 kubelet[21157]: I0330 09:42:44.153207 21157 server.go:754] Client rotation is on, will bootstrap in background
Mar 30 09:42:44 node-3 kubelet[21157]: I0330 09:42:44.165326 21157 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Mar 30 09:42:44 node-3 kubelet[21157]: I0330 09:42:44.206290 21157 server.go:625] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
Mar 30 09:42:44 node-3 kubelet[21157]: F0330 09:42:44.206680 21157 server.go:265] failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename Type Size Used Priority /data/swap file 8330984 14848 -1]
Mar 30 09:42:44 node-3 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Mar 30 09:42:44 node-3 systemd[1]: Unit kubelet.service entered failed state.
Mar 30 09:42:44 node-3 systemd[1]: kubelet.service failed.

可以看到这句错误

failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps containe

突然记起前几天因为做测试内存不够开启了swap分区,而kubelet是不支持swap功能的,导致kubelet无法启动
解决方式:
swapoff掉swap分区
systemctl restart kubelet
systemctl restart docker

故障二

invalid type for io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta.annotations: got "string"

故障原因:
yaml文件中某个字段值无空格

故障三

no matches for kind "DaemonSet" in version "extensions/v1beta1"

解决方式:
将extensions/v1beta1改为apps/v1

故障四

daemonset-controller Error creating: No API token found for service account "default", retry after the token is automatically created and added to the service account

解决方式:
设置sa
openssl genrsa -out /etc/kubernetes/serviceaccount.key 2048
apiserver添加参数 --service-account-key-file=/etc/kubernetes/serviceaccount.key
controller-manager添加参数 --service-account-private-key-file=/etc/kubernetes/serviceaccount.key
重启集群各项服务

故障五
在使用csi存储插件来作为pod数据存储时启动pod时失败,查看pod日志报错

E0607 18:19:45.454297 1686 kubelet.go:1594] Unable to attach or mount volumes for pod "my-csi-app_default(d98df92b-f7e5-45ad-a062-c77ede8949dd)": unmounted volumes=[default-token-vkfvw my-csi-volume], unattached volumes=[default-token-vkfvw my-csi-volume]: timed out waiting for the condition; skipping pod
E0607 18:19:45.454345 1686 pod_workers.go:191] Error syncing pod d98df92b-f7e5-45ad-a062-c77ede8949dd ("my-csi-app_default(d98df92b-f7e5-45ad-a062-c77ede8949dd)"), skipping: unmounted volumes=[default-token-vkfvw my-csi-volume], unattached volumes=[default-token-vkfvw my-csi-volume]: timed out waiting for the condition

解决方式:
分别查看以下日志
kubelet日志
kube-controller-manager日志
最后发现是因为storageclass配置文件写错导致,删除sc,修改sc然后重建sc,再通过pvc引用新创建的sc即可解决