一、健康检查原理
应用在运行中会出现故障,kubernetes提供了健康检查机制,当发现应用异常自动重启容器,将其从service服务中剔除,保障应用的高可用性。kubernetes有三种探测机制:
- readiness probes 准备就绪检查,通过它是否准备接受流量,准备完毕加入到endpoint
- liveness probes 在线检查机制,检查应用是否可用,异常重启容器
- startup probes 启动检查机制,应用一些启动缓慢的业务,避免长时间启动被前面的探针杀掉
而每种探测机制有三种检查方法:
- exec 提供shell或命令的检测,在容器中执行命令检查,返回0为健康,非0异常
- httpGet http协议探测,在容器中发送http请求,根据http返回码判断是否异常
- tcpSocket tcp协议探测,向容器发送tcp链接,能建立说明正常
每种探测方法可以支持以下几个检查参数:
- initialDelaySeconds 初始第一次探测间隔,用于应用启动时间,防止应用还没启动而健康检查失败
- periodSeconds 检查间隔,多久执行probe检查,默认10s
- timeoutSeconds 检查超时时长,探测应用timeout后为失败
- successThreshod 成功探测阈值,表示探测多少次为健康,默认探测1次
1.1.livenessProbe
licenessProbe适用于容器内部运行状态,主要用于存活检查
分别用三种检查方法来检查
exec
一般是用shell命令在容器内执行来检查,举个例子,容器内部需要一直有文件/tmp/liveness-probe.log存在,如果不存在表示容器异常,需要重启
定义一个容器
cat centos-exec-liveness-probe.yaml
apiVersion: v1
kind: Pod
metadata:
name: exec-liveness-probe
annotations:
kubernetes.io/description: "exec-liveness-probe"
spec:
containers:
- name: exec-liveness-probe
image: centos:latest
imagePullPolicy: IfNotPresent
args:
- /bin/sh
- -c
- touch /tmp/liveness-probe.log && sleep 10 && rm -f /tmp/liveness-probe.log && sleep 20
livenessProbe: # 健康检查机制
exec: # 健康检查方法通过ls -l /tmp/liveness-probe.log返回码来判断是否健康
command:
- ls
- -l
- /tmp/liveness-probe.log
initialDelaySeconds: 1
periodSeconds: 5
timeoutSeconds: 1
生成容器
kubectl apply -f centos-exec-liveness-probe.yaml
kubectl describe pods exec-liveness-probe | tail
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 26s default-scheduler Successfully assigned default/exec-liveness-probe to node-3
Normal Pulled 26s kubelet, node-3 Container image "centos:latest" already present on machine
Normal Created 25s kubelet, node-3 Created container exec-liveness-probe
Normal Started 25s kubelet, node-3 Started container exec-liveness-probe
Warning Unhealthy 3s (x3 over 13s) kubelet, node-3 Liveness probe failed: ls: cannot access '/tmp/liveness-probe.log': No such file or directory
Normal Killing 3s kubelet, node-3 Container exec-liveness-probe failed liveness probe, will be restarted
[root@node-1 demo]#
可以看到检测到异常而重启的日志信息
kubectl get pods exec-liveness-probe
NAME READY STATUS RESTARTS AGE
exec-liveness-probe 1/1 Running 2 80s
可以看到容器被重启的次数
httpGet
主要用于web场景,向容器发送http请求,以此来判断容器的健康状态,返回码下雨200为正常
cat nginx-httpGet-liveness-readiness.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-httpget-liveness-readiness-probe
annotations:
kubernetes.io/description: "nginx-httpGet-liveness-readiness-probe"
spec:
containers:
- name: nginx-httpget-liveness-readiness-probe
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
livenessProbe: # 健康检查机制
httpGet: # 检查方式
port: 80
scheme: HTTP
path: /index.html
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 3
kubectl apply -f nginx-httpGet-liveness-readiness.yaml 生成pod
kubectl get pods nginx-httpGet-liveness-readiness.yaml 查看pod
现在容器是正常运行的,下面我们模拟故障
删掉容器中的nginx站点文件
kubectl exec -it nginx-httpget-liveness-readiness-probe bash
rm -f /usr/share/nginx/html/index.html
kubectl get podsnginx-httpget-liveness-readiness-probe查看pod列表发现容器重启了一次
kubectl describe pods nginx-httpget-liveness-readiness-probe
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m8s default-scheduler Successfully assigned default/nginx-httpget-liveness-readiness-probe to node-3
Normal Pulling 116s (x2 over 8m7s) kubelet, node-3 Pulling image "nginx:latest"
Warning Unhealthy 116s (x3 over 2m16s) kubelet, node-3 Liveness probe failed: HTTP probe failed with statuscode: 404
Normal Killing 116s kubelet, node-3 Container nginx-httpget-liveness-readiness-probe failed liveness probe, will be restarted
Normal Pulled 111s (x2 over 7m59s) kubelet, node-3 Successfully pulled image "nginx:latest"
Normal Created 111s (x2 over 7m59s) kubelet, node-3 Created container nginx-httpget-liveness-readiness-probe
Normal Started 111s (x2 over 7m58s) kubelet, node-3 Started container nginx-httpget-liveness-readiness-probe
从上面输出的日志可以看到nginxweb服务有过一次404错误,触发重启
tcpSocket
tcpsocket健康检查适用于TCP业务,通过向指定容器建立一个tcp连接,可以建立连接则健康检查正常,否则健康检查异常
cat nginx-tcp-lieness.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-tcp-liveness-probe
annotations:
kubernetes.io/description: "nginx-tcp-liveness-probe"
spec:
containers:
- name: nginx-tcp-liveness-probe
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
livenessProbe: # 健康检查机制
tcpSocket: # 检查方式
port: 80
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 3
应用配置生成容器
kubectl apply -f nginx-tcp-liveness.yaml
进到容器里面
kubectl exec -it nginx-tcp-liveness-probe bash
apt-get update && apt-get install htop安装htop
htop查看nginx的pid
kill pid
查看pod详情
kubectl describe pods nginx-tcp-liveness-probe
Normal Scheduled 5m58s default-scheduler Successfully assigned default/nginx-tcp-liveness-probe to node-3
Normal Pulled 3m11s (x2 over 5m47s) kubelet, node-3 Successfully pulled image "nginx:latest"
Normal Created 3m11s (x2 over 5m47s) kubelet, node-3 Created container nginx-tcp-liveness-probe
Normal Started 3m11s (x2 over 5m46s) kubelet, node-3 Started container nginx-tcp-liveness-probe
Warning BackOff 17s (x2 over 26s) kubelet, node-3 Back-off restarting failed container
从上面输出的日志可以看到容器异常触发重启
1.2.readinessProbe
readinessProbe是一种准备就绪检查机制
下面还是以nginx为例
cat httpget-liveness-readiness-probe.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-tcp-livenessprobe
annotations:
kubernetes.io/description: "nginx-tcp-livenessprobe"
labels:
app: nginx # 标签
spec:
containers:
- name: nginx-tcp-livenessprobe
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
livenessProbe: # 存活检查
httpGet:
port: 80
path: /index.html
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 3
readinessProbe: # 就绪性检查
httpGet:
port: 80
path: /test.html
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 3
进行就绪检查一般应用于service场景,所以还需要配置一个service
cat nginx-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: nginx # 标签
name: nginx-service
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx # 标签选择器
type: ClusterIP
生成配置
kubectl apply -fhttpget-liveness-readiness-probe.yaml
kubectl apply -f nginx-service.yaml
kubectl get pods查看pods列表
NAME READY STATUS RESTARTS AGE
nginx-app-demo-7bdfd97dcd-76l5r 1/1 Running 3 4d23h
nginx-app-demo-7bdfd97dcd-mksq5 1/1 Running 3 4d23h
nginx-app-demo-7bdfd97dcd-qn9vx 1/1 Running 3 4d23h
nginx-app-demo-7bdfd97dcd-trc48 1/1 Running 3 4d23h
nginx-tcp-livenessprobe 0/1 Running 0 2m27s
kubectl describe pods nginx-tcp-livenessprobe
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31s default-scheduler Successfully assigned default/nginx-tcp-livenessprobe to node-3
Normal Pulling 30s kubelet, node-3 Pulling image "nginx:latest"
Normal Pulled 27s kubelet, node-3 Successfully pulled image "nginx:latest"
Normal Created 27s kubelet, node-3 Created container nginx-tcp-livenessprobe
Normal Started 27s kubelet, node-3 Started container nginx-tcp-livenessprobe
Warning Unhealthy 3s (x3 over 23s) kubelet, node-3 Readiness probe failed: HTTP probe failed with statuscode: 404
从上面两种方式来查看pods状态来看,liveness机制的健康检查显示正常,但是基于readiness机制显示异常,查看endpoints
kubectl describe services nginx-service
Name: nginx-service
Namespace: default
Labels: app=nginx
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"nginx"},"name":"nginx-service","namespace":"default"},"s...
Selector: app=nginx
Type: ClusterIP
IP: 10.96.84.131
Port: http 80/TCP
TargetPort: 80/TCP
Endpoints:
Session Affinity: None
Events: <none>
kubectl describe endpoints nginx-service
Name: nginx-service
Namespace: default
Labels: app=nginx
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"nginx"},"name":"nginx-service","namespace":"default"},"s...
Selector: app=nginx
Type: ClusterIP
IP: 10.96.84.131
Port: http 80/TCP
TargetPort: 80/TCP
Endpoints:
Session Affinity: None
Events: <none>
[root@node-1 demo]# kubectl describe endpoints nginx-service
Name: nginx-service
Namespace: default
Labels: app=nginx
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2020-03-08T03:37:58Z
Subsets:
Addresses: <none>
NotReadyAddresses: 10.244.2.51
Ports:
Name Port Protocol
---- ---- --------
http 80 TCP
发现endpoints为空,这表明kubelet认为该pod未就绪,没有把它加入到endpoints中
为了验证该健康检查机制,手动到容器内创建满足readiness健康检查机制的text.html
kubectl get pods nginx-tcp-livenessprobe
nginx-tcp-livenessprobe 1/1 Running 0 11h
发现就绪了
查看endpoints
kubectl describe endpoints nginx-service
Name: nginx-service
Namespace: default
Labels: app=nginx
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2020-03-08T14:59:46Z
Subsets:
Addresses: 10.244.2.51
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
http 80 TCP
Events: <no
发现kubelet已经将其加入到endpoints中
如果这个时候再进容器删掉test.html,基于readiness健康检查机制会把该endpoint剔除出endpoints
二、总结
livenessProbe为存活检查,通过exec、httpGet、tcpSocket方式去进行检测,通过返回的结果来判断是否异常,如果异常会触发容器重启
readinessProbe为就绪性检测,一般要和service和endpoints结合起来使用,它的检测方式和livenessProbe一样,只是如果监测异常,它就将其从endpoints中剔除或者不把它加入到endpoints中
一般来说先进行存活性检测,就是指容器处于Running,再进行就绪性检测,指该endpoint处于Ready状态