Docker&Kubernetes使用故障笔记

故障一
故障现象:一般docker守护进程必须以root用户运行,默认情况普通用户是没有权限对docker进程进行操作的,会出现下面的错误

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.35/images/create?fromSrc=-&message=&repo=ubuntu-16.04&tag=: dial unix /var/run/docker.sock: connect: permission denied

解决办法:

  • 使用root用户给予普通用户管理员权限,在/etc/sudoers如下位置里添加信息
## Allow root to run any commands anywhere 
root    ALL=(ALL)       ALL
tengwang  ALL=(ALL)     ALL
  • 切换至普通用户
sudo groupadd docker  
sudo gpasswd -a ${USER} docker   
systemctl restart docker  
newgrp - docker

这个故障给了我一个安全启动容器的思路:
首先一般是以docker启动相关容器
然后我们创建一个普通用户,对该用户赋予上述权限
然后进制root用户远程登录,这样只有普通用户可以登录并启停容器,但是没有权限去改动容器配置

故障二
故障现象:

ERROR: for nginx_linuxwt Cannot start service nginx_linuxwt: driver failed programming external connectivity on endpoint nginx_linuxwt (3cdf1c11cf27c33a33be85ce92e9bc73e0946a3b74c3e409d1c7e3019a81eab3): (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 80 -j DNAT --to-destination 172.17.0.3:80 ! -i docker0: iptables: No chain/target/match by that name (exit status 1))

解决办法:
systemctl restart docker
故障三
故障现象:

安装docker-compose时遇到错误“ImportError: 'module' object has no attribute 'check_specifier'”

解决办法:

easy_install --version #查看setuptools版本,将其升级到30.1.0版本  
pip install --upgrade setuptools==30.1.0

故障四
故障现象:

在使用docker的官方镜像安装splunk的时候,将docker内部的目录/opt/splunk映射出来的时候碰到只能映射出部分目录和文件,目录etc和var是空目录

解决办法:
这是因为etc目录和var目录采用的是volume的方式实现的数据持久化,这两个目录其实是映射到docker存储位置上的,可以使用docker inspect来查看挂载点,该问题暂时我还没有找到好的解决办法,最后只能自己定义Dockerfile来构建镜像,然后都使用bind的方式来进行数据持久化

故障四
安装docker-compose的时候报错

DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Downloading https://files.pythonhosted.org/packages/b3/25/e605574f24948a8a53b497744e93f061eb1dbe7c44b6465fc1c172d591aa/PyNaCl-1.3.0-cp27-cp27mu-manylinux1_x86_64.whl (762kB)
|█████████████████▋ | 419kB 2.2kB/s eta 0:02:35ERROR: Exception:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/pip/_internal/cli/base_command.py", line 178, in main
status = self.run(options, args)
File "/usr/lib/python2.7/site-packages/pip/_internal/commands/install.py", line 352, in run
resolver.resolve(requirement_set)
File "/usr/lib/python2.7/site-packages/pip/_internal/resolve.py", line 131, in resolve
self._resolve_one(requirement_set, req)
File "/usr/lib/python2.7/site-packages/pip/_internal/resolve.py", line 294, in _resolve_one
abstract_dist = self._get_abstract_dist_for(req_to_install)
File "/usr/lib/python2.7/site-packages/pip/_internal/resolve.py", line 242, in _get_abstract_dist_for
self.require_hashes
File "/usr/lib/python2.7/site-packages/pip/_internal/operations/prepare.py", line 347, in prepare_linked_requirement
progress_bar=self.progress_bar
File "/usr/lib/python2.7/site-packages/pip/_internal/download.py", line 886, in unpack_url
progress_bar=progress_bar
File "/usr/lib/python2.7/site-packages/pip/_internal/download.py", line 746, in unpack_http_url
progress_bar)
File "/usr/lib/python2.7/site-packages/pip/_internal/download.py", line 954, in _download_http_url
_download_url(resp, link, content_file, hashes, progress_bar)
File "/usr/lib/python2.7/site-packages/pip/_internal/download.py", line 683, in _download_url
hashes.check_against_chunks(downloaded_chunks)
File "/usr/lib/python2.7/site-packages/pip/_internal/utils/hashes.py", line 62, in check_against_chunks
for chunk in chunks:
File "/usr/lib/python2.7/site-packages/pip/_internal/download.py", line 651, in written_chunks
for chunk in chunks:
File "/usr/lib/python2.7/site-packages/pip/_internal/utils/ui.py", line 156, in iter
for x in it:
File "/usr/lib/python2.7/site-packages/pip/_internal/download.py", line 640, in resp_read
decode_content=False):
File "/usr/lib/python2.7/site-packages/pip/_vendor/urllib3/response.py", line 494, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "/usr/lib/python2.7/site-packages/pip/_vendor/urllib3/response.py", line 459, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "/usr/lib64/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/usr/lib/python2.7/site-packages/pip/_vendor/urllib3/response.py", line 374, in _error_catcher
raise ReadTimeoutError(self._pool, None, 'Read timed out.')
ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

解决办法: pip install -i https://pypi.douban.com/simple docker-compose
这里可能因为网速的原因报错,这里直接指定下载源

故障五
使用docker-compose启动一个mysql服务的时候报错,相关docker-compose.yml文件如下:

mysql_linuxwt:
  restart: always
  image: 10.8.8.13:5000/mysql:v1
  container_name: mysql_linuxwt
  volumes:
    - /etc/localtime:/etc/localtime
    - /etc/timezone:/etc/timezone
    - $PWD/mysql:/var/lib/mysql
    - $PWD/mysqld.cnf:/etc/mysql/mysql.conf.d/mysqld.cnf
    - $PWD/mysql.log:/var/log/mysql/general.log
    - $PWD/error.log:/var/log/mysql/error.log
  ports:
    - 3306:3306
  environment:
    MYSQL_ROOT_PASSWORD: password

chown: cannot read directory '/var/lib/mysql/': Permission denied

这和docker再centos7上运行的selinux机制有关,我们可以关闭selinux来解决该故障
临时关闭setenforce 0
启动容器报错

[ERROR] Could not open file '/var/log/mysql/error.log' for error logging: Permission denied

这是由于我们在外面影射了日志文件,需要给与响应的权限
chmod 777 *.log

故障六:
docker build构建镜像的时候报错:

Message from syslogd@i1234567890 at Mar 23 22:45:52 ...
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

这和docker engine版本和docker内部基础镜像的内核有关,当时docker engine版本过高导致这个问题
故障七:
docker-compose启动相关容器时报错:

/usr/lib/python2.7/site-packages/requests/init.py:80: RequestsDependencyWarning: urllib3 (1.22) or chardet (2.2.1) doesn't match a supported version!
RequestsDependencyWarning)

解决办法:
pip uninstall urllib3
pip uninstall chardet
pip install requests

故障八:
swarm集群中从主节点部署服务启动服务失败,通过命令docker service ps --no-trunc servicename 发现报错:

e5y8g443x1ajj0jovyrhcioqj 151_prometheus.1 prom/prometheus:latest@sha256:bfad037f95e5e34d595502aa02cac6467b7eadc4b08a601d150844003051fb1b node151 Shutdown Rejected less than a second ago "invalid mount config for type "bind": bind source path does not exist"

在主节点需要创建与目标节点同样的目录结构,但不必在主节点生成镜像

故障九:
swarm里部署的gitlab服务无法启动,使用命令systemctl status -l docker查看日志发现错误

level=error msg="Not continuing with pull after error: context canceled"
Jun 26 12:14:54 node150 dockerd[974]: time="2020-06-26T12:14:54.542270318+08:00" level=warning msg="failed to deactivate service binding for container 150_gitlab.1.nhcg0qojnsqm1tq70enycgx2e" error="No such container: 150_gitlab.1.nhcg0qojnsqm1tq70enycgx2e" module=node/agent node.id=osm84hfjpi604vcxid4d8j4xo

从上面信息可以看到镜像出错了,导致服务无法与及将生成的容器无法绑定
解决方式:
仔细查看镜像,swarm中不支持在线拉取镜像,必须在本地生成镜像后在创建服务

故障十:
使用docker stack启动创建某个服务时报错

services.nginx.port.0 is needed a number or string

解决方式:
修改docker-compose.yml文件中services的版本,最好3.4

故障十一:
docker login ....

Error response from daemon: Get https://registry.docker.xxx.com/v1/_ping: x509: certificate

解决方式:
出现这个错误是由于客户端缺少ca证书,需要为docker客户端配置授权证书

故障十一:
利用cAdvisor监控容器得时候,我们把cAdvisor部署在swarm集群中,可以启动成功,但是一段时间可能2小时,会退出,报错

rectory
I1126 17:28:55.447737 1 manager.go:1212] Exiting thread watching subcontainers
I1126 17:28:55.447817 1 manager.go:432] Exiting global housekeeping thread
I1126 17:28:55.447857 1 cadvisor.go:212] Exiting given signal: terminated

目前还没找到好的解决方法,只能重新部署cAdvisor

故障十二
kubernetes中创建pod时工作节点无法正常拉起对应容器,kubernetes版本v1.19+,docker版本为
docker 19.03.6,注意不是ce
查看pod状态
kubectl describe pods podname

Events:
Type Reason Age From Message


Normal Scheduled 28s Successfully assigned default/dapi-test-pod-volume to 192.168.0.159
Warning Failed 27s kubelet, 192.168.0.159 Error: failed to start container "test-container-volume": Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "rootfs_linux.go:58: mounting \"/var/lib/docker/containers/c65c2c7d485ea9ce324486af65e2b6c9ea487bc761bc0fb381ba615f9e988243/resolv.conf\" to rootfs \"/var/lib/docker/overlay2/a0871ac227b090fe67fdbeefe37c455071712b1b245f9b01a9681cbe065f3bbe/merged\" at \"/var/lib/docker/overlay2/a0871ac227b090fe67fdbeefe37c455071712b1b245f9b01a9681cbe065f3bbe/merged/etc/resolv.conf\" caused \"open /var/lib/docker/overlay2/a0871ac227b090fe67fdbeefe37c455071712b1b245f9b01a9681cbe065f3bbe/merged/etc/resolv.conf: read-only file system\""": unknown
Warning Failed 26s kubelet, 192.168.0.159 Error: failed to start container "test-container-volume": Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "rootfs_linux.go:58: mounting \"/var/lib/docker/containers/c65c2c7d485ea9ce324486af65e2b6c9ea487bc761bc0fb381ba615f9e988243/resolv.conf\" to rootfs \"/var/lib/docker/overlay2/a4cc19c29dc5b05b3976c46326b0cedcdb4565d1d3fa82541e15e1e2115ec7d9/merged\" at \"/var/lib/docker/overlay2/a4cc19c29dc5b05b3976c46326b0cedcdb4565d1d3fa82541e15e1e2115ec7d9/merged/etc/resolv.conf\" caused \"open /var/lib/docker/overlay2/a4cc19c29dc5b05b3976c46326b0cedcdb4565d1d3fa82541e15e1e2115ec7d9/merged/etc/resolv.conf: read-only file system\""": unknown
Warning BackOff 25s kubelet, 192.168.0.159 Back-off restarting failed container
Warning MissingClusterDNS 14s (x5 over 28s) kubelet, 192.168.0.159 pod: "dapi-test-pod-volume_default(5925bd28-7b4b-4a5d-b8db-df41e9188307)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
Normal Pulled 14s (x3 over 27s) kubelet, 192.168.0.159 Container image "busybox" already present on machine
Normal Created 14s (x3 over 27s) kubelet, 192.168.0.159 Created container test-container-volume
Warning Failed 14s kubelet, 192.168.0.159 Error: failed to start container "test-container-volume": Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "rootfs_linux.go:58: mounting \"/var/lib/docker/containers/c65c2c7d485ea9ce324486af65e2b6c9ea487bc761bc0fb381ba615f9e988243/resolv.conf\" to rootfs \"/var/lib/docker/overlay2/e7738653dab62450be4e787d9f478fe45cdc30edb98a56ecafcf5eac1e24894e/merged\" at \"/var/lib/docker/overlay2/e7738653dab62450be4e787d9f478fe45cdc30edb98a56ecafcf5eac1e24894e/merged/etc/resolv.conf\" caused \"open /var/lib/docker/overlay2/e7738653dab62450be4e787d9f478fe45cdc30edb98a56ecafcf5eac1e24894e/merged/etc/resolv.conf: read-only file system\""": unknown

查看docker状态
systemctl status -l docker

Jan 18 17:11:05 node160 dockerd[66659]: time="2021-01-18T17:11:05.990989177+08:00" level=error msg="stream copy error: reading from a closed fifo"
Jan 18 17:11:05 node160 dockerd[66659]: time="2021-01-18T17:11:05.994460343+08:00" level=error msg="stream copy error: reading from a closed fifo"
Jan 18 17:11:06 node160 dockerd[66659]: time="2021-01-18T17:11:06.003185162+08:00" level=error msg="939a110bd8550a6968966c7d30e2d7dcd12162401e16c7bd22454c0dfbdbb9d6 cleanup: failed to delete container from containerd: no such container"
Jan 18 17:11:06 node160 dockerd[66659]: time="2021-01-18T17:11:06.003224790+08:00" level=error msg="Handler for POST /v1.40/containers/939a110bd8550a6968966c7d30e2d7dcd12162401e16c7bd22454c0dfbdbb9d6/start returned error: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/docker/containers/76a197367cde1de2667cbfed41e36f37e0283145cd8f52f2d3a6130fa8b0e867/resolv.conf\\\" to rootfs \\\"/var/lib/docker/overlay2/4415725d1b5a8a6172f95d3b4a6257f38c2eeb0dd9e150a404a9d480077a5fb8/merged\\\" at \\\"/var/lib/docker/overlay2/4415725d1b5a8a6172f95d3b4a6257f38c2eeb0dd9e150a404a9d480077a5fb8/merged/etc/resolv.conf\\\" caused \\\"open /var/lib/docker/overlay2/4415725d1b5a8a6172f95d3b4a6257f38c2eeb0dd9e150a404a9d480077a5fb8/merged/etc/resolv.conf: read-only file system\\\"\"": unknown"
Jan 18 17:11:29 node160 dockerd[66659]: time="2021-01-18T17:11:29.397191486+08:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/f20c9504ddfa9dd8e5f20a06d5e7a89f0905be8f2c2c7f1f879ea6a39286802b/shim.sock" debug=false pid=67461
Jan 18 17:11:29 node160 dockerd[66659]: time="2021-01-18T17:11:29.506873735+08:00" level=info msg="shim reaped" id=f20c9504ddfa9dd8e5f20a06d5e7a89f0905be8f2c2c7f1f879ea6a39286802b
Jan 18 17:11:29 node160 dockerd[66659]: time="2021-01-18T17:11:29.519061498+08:00" level=error msg="stream copy error: reading from a closed fifo"
Jan 18 17:11:29 node160 dockerd[66659]: time="2021-01-18T17:11:29.519226960+08:00" level=error msg="stream copy error: reading from a closed fifo"
Jan 18 17:11:29 node160 dockerd[66659]: time="2021-01-18T17:11:29.542989942+08:00" level=error msg="f20c9504ddfa9dd8e5f20a06d5e7a89f0905be8f2c2c7f1f879ea6a39286802b cleanup: failed to delete container from containerd: no such container"
Jan 18 17:11:29 node160 dockerd[66659]: time="2021-01-18T17:11:29.543037145+08:00" level=error msg="Handler for POST /v1.40/containers/f20c9504ddfa9dd8e5f20a06d5e7a89f0905be8f2c2c7f1f879ea6a39286802b/start returned error: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/docker/containers/76a197367cde1de2667cbfed41e36f37e0283145cd8f52f2d3a6130fa8b0e867/resolv.conf\\\" to rootfs \\\"/var/lib/docker/overlay2/3d90d0c28faa66409dee9a6c0c9b08966fa65a0ea38f866d57c4945389e9219e/merged\\\" at \\\"/var/lib/docker/overlay2/3d90d0c28faa66409dee9a6c0c9b08966fa65a0ea38f866d57c4945389e9219e/merged/etc/resolv.conf\\\" caused \\\"open /var/lib/docker/overlay2/3d90d0c28faa66409dee9a6c0c9b08966fa65a0ea38f866d57c4945389e9219e/merged/etc/resolv.conf: read-only file system\\\"\"": unknown

解决办法:
据说这是一个bug,解决的办法可以尝试升级docker版本,如果仍然不行,使用docker-ce版试一试

故障十三
03gz1

故障十四

error: unable to recognize "cron.yaml": no matches for kind "CronJob" in version "batch/v1"
将batch/v1改为batch/v1beta1