kubernetes

kubernetes学习笔记-Pod健康检查

一、健康检查原理

应用在运行中会出现故障,kubernetes提供了健康检查机制,当发现应用异常自动重启容器,将其从service服务中剔除,保障应用的高可用性。kubernetes有三种探测机制:

  • readiness probes 准备就绪检查,通过它是否准备接受流量,准备完毕加入到endpoint
  • liveness probes 在线检查机制,检查应用是否可用,异常重启容器
  • startup probes 启动检查机制,应用一些启动缓慢的业务,避免长时间启动被前面的探针杀掉

而每种探测机制有三种检查方法:

  • exec 提供shell或命令的检测,在容器中执行命令检查,返回0为健康,非0异常
  • httpGet http协议探测,在容器中发送http请求,根据http返回码判断是否异常
  • tcpSocket tcp协议探测,向容器发送tcp链接,能建立说明正常

每种探测方法可以支持以下几个检查参数:

  • initialDelaySeconds 初始第一次探测间隔,用于应用启动时间,防止应用还没启动而健康检查失败
  • periodSeconds 检查间隔,多久执行probe检查,默认10s
  • timeoutSeconds 检查超时时长,探测应用timeout后为失败
  • successThreshod 成功探测阈值,表示探测多少次为健康,默认探测1次

1.1.livenessProbe

licenessProbe适用于容器内部运行状态,主要用于存活检查
分别用三种检查方法来检查
exec
一般是用shell命令在容器内执行来检查,举个例子,容器内部需要一直有文件/tmp/liveness-probe.log存在,如果不存在表示容器异常,需要重启
定义一个容器
cat centos-exec-liveness-probe.yaml

apiVersion: v1
kind: Pod
metadata:
  name: exec-liveness-probe
  annotations:
    kubernetes.io/description: "exec-liveness-probe"
spec:
  containers:
  - name: exec-liveness-probe
    image: centos:latest
    imagePullPolicy: IfNotPresent
    args:
    - /bin/sh
    - -c
    - touch /tmp/liveness-probe.log && sleep 10 && rm -f /tmp/liveness-probe.log && sleep 20
    livenessProbe:    # 健康检查机制
      exec:      # 健康检查方法通过ls -l /tmp/liveness-probe.log返回码来判断是否健康
        command:
        - ls
        - -l
        - /tmp/liveness-probe.log
      initialDelaySeconds: 1
      periodSeconds: 5
      timeoutSeconds: 1

生成容器
kubectl apply -f centos-exec-liveness-probe.yaml
kubectl describe pods exec-liveness-probe | tail

 Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  26s               default-scheduler  Successfully assigned default/exec-liveness-probe to node-3
  Normal   Pulled     26s               kubelet, node-3    Container image "centos:latest" already present on machine
  Normal   Created    25s               kubelet, node-3    Created container exec-liveness-probe
  Normal   Started    25s               kubelet, node-3    Started container exec-liveness-probe
  Warning  Unhealthy  3s (x3 over 13s)  kubelet, node-3    Liveness probe failed: ls: cannot access '/tmp/liveness-probe.log': No such file or directory
  Normal   Killing    3s                kubelet, node-3    Container exec-liveness-probe failed liveness probe, will be restarted
[root@node-1 demo]# 

可以看到检测到异常而重启的日志信息
kubectl get pods exec-liveness-probe

NAME                  READY   STATUS    RESTARTS   AGE
exec-liveness-probe   1/1     Running   2          80s

可以看到容器被重启的次数
httpGet
主要用于web场景,向容器发送http请求,以此来判断容器的健康状态,返回码下雨200为正常
cat nginx-httpGet-liveness-readiness.yaml

apiVersion: v1
kind: Pod
metadata:
  name: nginx-httpget-liveness-readiness-probe
  annotations:
    kubernetes.io/description: "nginx-httpGet-liveness-readiness-probe"
spec:
  containers:
  - name: nginx-httpget-liveness-readiness-probe
    image: nginx:latest
    ports:
    - name: http-80-port
      protocol: TCP
      containerPort: 80
    livenessProbe:  # 健康检查机制
      httpGet:      # 检查方式
        port: 80
        scheme: HTTP
        path: /index.html
      initialDelaySeconds: 3
      periodSeconds: 10
      timeoutSeconds: 3

kubectl apply -f nginx-httpGet-liveness-readiness.yaml 生成pod
kubectl get pods nginx-httpGet-liveness-readiness.yaml 查看pod
现在容器是正常运行的,下面我们模拟故障
删掉容器中的nginx站点文件
kubectl exec -it nginx-httpget-liveness-readiness-probe bash
rm -f /usr/share/nginx/html/index.html
kubectl get podsnginx-httpget-liveness-readiness-probe查看pod列表发现容器重启了一次
kubectl describe pods nginx-httpget-liveness-readiness-probe

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  8m8s                  default-scheduler  Successfully assigned default/nginx-httpget-liveness-readiness-probe to node-3
  Normal   Pulling    116s (x2 over 8m7s)   kubelet, node-3    Pulling image "nginx:latest"
  Warning  Unhealthy  116s (x3 over 2m16s)  kubelet, node-3    Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    116s                  kubelet, node-3    Container nginx-httpget-liveness-readiness-probe failed liveness probe, will be restarted
  Normal   Pulled     111s (x2 over 7m59s)  kubelet, node-3    Successfully pulled image "nginx:latest"
  Normal   Created    111s (x2 over 7m59s)  kubelet, node-3    Created container nginx-httpget-liveness-readiness-probe
  Normal   Started    111s (x2 over 7m58s)  kubelet, node-3    Started container nginx-httpget-liveness-readiness-probe

从上面输出的日志可以看到nginxweb服务有过一次404错误,触发重启

tcpSocket
tcpsocket健康检查适用于TCP业务,通过向指定容器建立一个tcp连接,可以建立连接则健康检查正常,否则健康检查异常
cat nginx-tcp-lieness.yaml

apiVersion: v1
kind: Pod
metadata:
  name: nginx-tcp-liveness-probe
  annotations:
    kubernetes.io/description: "nginx-tcp-liveness-probe"
spec:
  containers:
  - name: nginx-tcp-liveness-probe
    image: nginx:latest
    ports:
    - name: http-80-port
      protocol: TCP
      containerPort: 80
    livenessProbe:      # 健康检查机制
      tcpSocket:        # 检查方式 
        port: 80
      initialDelaySeconds: 3
      periodSeconds: 10
      timeoutSeconds: 3   

应用配置生成容器
kubectl apply -f nginx-tcp-liveness.yaml
进到容器里面
kubectl exec -it nginx-tcp-liveness-probe bash
apt-get update && apt-get install htop安装htop
htop查看nginx的pid
kill pid
查看pod详情
kubectl describe pods nginx-tcp-liveness-probe

 Normal   Scheduled  5m58s                  default-scheduler  Successfully assigned default/nginx-tcp-liveness-probe to node-3
  Normal   Pulled     3m11s (x2 over 5m47s)  kubelet, node-3    Successfully pulled image "nginx:latest"
  Normal   Created    3m11s (x2 over 5m47s)  kubelet, node-3    Created container nginx-tcp-liveness-probe
  Normal   Started    3m11s (x2 over 5m46s)  kubelet, node-3    Started container nginx-tcp-liveness-probe
  Warning  BackOff    17s (x2 over 26s)      kubelet, node-3    Back-off restarting failed container

从上面输出的日志可以看到容器异常触发重启

1.2.readinessProbe

readinessProbe是一种准备就绪检查机制
下面还是以nginx为例
cat httpget-liveness-readiness-probe.yaml

apiVersion: v1
kind: Pod
metadata:
  name: nginx-tcp-livenessprobe
  annotations:
    kubernetes.io/description: "nginx-tcp-livenessprobe"
  labels: 
    app: nginx   # 标签
spec:
  containers:
  - name: nginx-tcp-livenessprobe
    image: nginx:latest
    ports:
      - name: http-80-port
        protocol: TCP
        containerPort: 80
    livenessProbe:   # 存活检查
      httpGet:
        port: 80
        path: /index.html
        scheme: HTTP
      initialDelaySeconds: 3
      periodSeconds: 10
      timeoutSeconds: 3
    readinessProbe:    # 就绪性检查
      httpGet:
        port: 80
        path: /test.html
        scheme: HTTP
      initialDelaySeconds: 3
      periodSeconds: 10
      timeoutSeconds: 3

进行就绪检查一般应用于service场景,所以还需要配置一个service
cat nginx-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx    # 标签
  name: nginx-service
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
     app: nginx # 标签选择器
  type: ClusterIP

生成配置
kubectl apply -fhttpget-liveness-readiness-probe.yaml
kubectl apply -f nginx-service.yaml
kubectl get pods查看pods列表

NAME                              READY   STATUS    RESTARTS   AGE
nginx-app-demo-7bdfd97dcd-76l5r   1/1     Running   3          4d23h
nginx-app-demo-7bdfd97dcd-mksq5   1/1     Running   3          4d23h
nginx-app-demo-7bdfd97dcd-qn9vx   1/1     Running   3          4d23h
nginx-app-demo-7bdfd97dcd-trc48   1/1     Running   3          4d23h
nginx-tcp-livenessprobe           0/1     Running   0          2m27s

kubectl describe pods nginx-tcp-livenessprobe

Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  31s               default-scheduler  Successfully assigned default/nginx-tcp-livenessprobe to node-3
  Normal   Pulling    30s               kubelet, node-3    Pulling image "nginx:latest"
  Normal   Pulled     27s               kubelet, node-3    Successfully pulled image "nginx:latest"
  Normal   Created    27s               kubelet, node-3    Created container nginx-tcp-livenessprobe
  Normal   Started    27s               kubelet, node-3    Started container nginx-tcp-livenessprobe
  Warning  Unhealthy  3s (x3 over 23s)  kubelet, node-3    Readiness probe failed: HTTP probe failed with statuscode: 404

从上面两种方式来查看pods状态来看,liveness机制的健康检查显示正常,但是基于readiness机制显示异常,查看endpoints
kubectl describe services nginx-service

Name:              nginx-service
Namespace:         default
Labels:            app=nginx
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"nginx"},"name":"nginx-service","namespace":"default"},"s...
Selector:          app=nginx
Type:              ClusterIP
IP:                10.96.84.131
Port:              http  80/TCP
TargetPort:        80/TCP
Endpoints:         
Session Affinity:  None
Events:            <none>

kubectl describe endpoints nginx-service

Name:              nginx-service
Namespace:         default
Labels:            app=nginx
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"nginx"},"name":"nginx-service","namespace":"default"},"s...
Selector:          app=nginx
Type:              ClusterIP
IP:                10.96.84.131
Port:              http  80/TCP
TargetPort:        80/TCP
Endpoints:         
Session Affinity:  None
Events:            <none>
[root@node-1 demo]# kubectl describe endpoints nginx-service
Name:         nginx-service
Namespace:    default
Labels:       app=nginx
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2020-03-08T03:37:58Z
Subsets:
  Addresses:          <none>
  NotReadyAddresses:  10.244.2.51
  Ports:
    Name  Port  Protocol
    ----  ----  --------
    http  80    TCP

发现endpoints为空,这表明kubelet认为该pod未就绪,没有把它加入到endpoints中
为了验证该健康检查机制,手动到容器内创建满足readiness健康检查机制的text.html
kubectl get pods nginx-tcp-livenessprobe

nginx-tcp-livenessprobe           1/1     Running   0          11h

发现就绪了
查看endpoints
kubectl describe endpoints nginx-service

Name:         nginx-service
Namespace:    default
Labels:       app=nginx
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2020-03-08T14:59:46Z
Subsets:
  Addresses:          10.244.2.51
  NotReadyAddresses:  <none>
  Ports:
    Name  Port  Protocol
    ----  ----  --------
    http  80    TCP

Events:  <no

发现kubelet已经将其加入到endpoints中
如果这个时候再进容器删掉test.html,基于readiness健康检查机制会把该endpoint剔除出endpoints

二、总结

livenessProbe为存活检查,通过exec、httpGet、tcpSocket方式去进行检测,通过返回的结果来判断是否异常,如果异常会触发容器重启
readinessProbe为就绪性检测,一般要和service和endpoints结合起来使用,它的检测方式和livenessProbe一样,只是如果监测异常,它就将其从endpoints中剔除或者不把它加入到endpoints中
一般来说先进行存活性检测,就是指容器处于Running,再进行就绪性检测,指该endpoint处于Ready状态

支付宝扫码打赏 微信打赏

若你觉得我的文章对你有帮助,欢迎点击上方按钮对我打赏

扫描二维码,分享此文章

linuxwt's Picture
linuxwt

我叫王腾,来自武汉,2016年毕业后在上海做了一年helpdesk,自学了linux后回武汉从事系统运维的工作,从2017年开始写博客记录自己的学习工作,现在正在进行数据迁移到此博客,目前就职于中国移动设计院有限公司,个人的座右铭是:逃脱舒适区才能在闲暇的时候惬意的玩耍。

武汉光谷 https://linuxwt.com

Subscribe to 今晚打老虎

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!

Comments