一、软件环境准备
本次实践全部在本机172.168.1.29包括第三方应用如mysql、mongodb、redis等上进行安装,监控软件相关:
- Go
- Prometheus
- Grafana
简单说一下这三者的作用,Go给后面我们监控应用时需要运行exporter文件时提供编译环境,Prometheus抓取从exporter获取的信息并存储,Grafana将信息以仪表图形等方式展现
1.1.Go环境安装
二进制版本Go
tar zvxf go1.12.1.linux-amd64.tar.gz -C /usr/local/
设置go的环境变量
cat /etc/profile
export GOROOT=/usr/local/go #安装路径
export GOPATH=/root/go #工作路径
export PATH=$PATH:$GOROOT/bin
export GO111MODULE=off #无模块支持,go会从GOPATH和vendor文件夹寻找包,这里我们关掉该参数
source /etc/profile
go version输出go version go1.12.1 linux/amd64说明环境变量设置成功
下面测试go环境是否安装成功
cat helloworld.go
package main
import (
"fmt"
)
func main() {
fmt.Printf("hello world\n")
}
go run helloworld.go输出hello world说明go安装成功
1.2.Prometheus安装
Prometheus是一款开源的系统监控、报警、时间序列数据库的组合,常常用来监控k8s容器的集群管理进行监控。Prometheus基本原理是通过http协议周期性抓取被监控组件的状态,而输出这些被监控的组件的http接口为exporter,目前官方已经提供了很多常用的exporter可以直接使用,比如mysql_exporter、mongodb_exporter、redis_exporter等
这里我们使用docker来安装Prometheus
prometheus:
restart: always
image: prom/prometheus
container_name: prometheus
volumes:
- /etc/localtime:/etc/localtime
- /etc/timezone:/etc/timezone
- $PWD/prometheus.yml:/etc/prometheus/prometheus.yml
privileged: true
ports:
- 9090:9090
cat prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
docker-compose up -d
访问http://ip:9090/targets出现下图的样子说明Prometheus安装成功
1.3.Grafana安装
Grafana是一个可视化仪表盘,拥有功能齐全的仪表盘和图形编辑器,支持Prometheus、Influxdb、Elasticsearch等作为数据源,可以讲Prometheus抓取的数据通过其直观的展示出来
1.3.1.编写Dockerfile
一般我们都是使用root来运行docker容器,这在拉取官方image的时候会有一些权限问题,所以我们用官方image作为基础镜像制作一个root用户运行的镜像
cat Dockerfile
FROM grafana/grafana
MAINTAINER <tengwanginit@gmail.com>
USER root
RUN chown -R root.root /etc/grafana && \
chmod -R a+r /etc/grafana && \
chown -R grafana:grafana /var/lib/grafana && \
chown -R grafana:grafana /usr/share/grafana
docker build -t grafana/linuxwt .
1.3.2.编写docker-compose.yml文件
cat docker-compose.yml
grafana_linuxwt:
image: grafana/linuxwt
restart: always
container_name: grafana_linuxwt
volumes:
- ./grafana.ini:/etc/grafana/grafana.ini
- ./grafana:/var/lib/grafana
- /etc/localtime:/etc/localtime
- /etc/timezone:/etc/timezone
ports:
- 3000:3000
environment:
GF_INSTALL_PLUGINS: grafana-clock-panel,grafana-simple-json-datasource
docker-compose up -d
访问地址http://ip:3000/出现下面画面表示安装成功
注意上面的环境变量可以自由添加扩展,扩展可以去grafana官网的plugins去找
修改好密码后我们就可以进入grafana的主页了,下面就是需要Grafana识别到Prometheus Server
1.4.添加数据源
进入Grafana面板后,add data source
进行简单设置
URL处要填写ip:9090,因为我们的Prometheus是部署在docker中的,如果是部署在宿主机上,这里就可以写localhost:9090了,其他默认,最后点击保存,提示Data source is working表示数据源添加成功,至此我们Grafana安装完成
二、监控系统和应用
Prometheus的组件Exporter是用Go编写的程序,它开放一个http接口,Prometheus通过这个接口抓取Exporter输出的信息,这里我们需要下载node_exporter部署在被监控的机器上,本实验中就是本机了
2.1.监控主机系统
node_exporter下载
tar zvxf node_exporter-0.17.0.linux-amd64.tar.gz -C /usr/local/
cd /usr/local
mv node_exporter-0.17.0.linux-amd64/ node_exporter
将node_exporter加入系统服务
cat /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/node_exporter/node_exporter --web.listen-address=0.0.0.0:9100 --log.level=info --collector.logind --collector.systemd --collector.tcpstat --collector.ntp --collector.interrupts --collector.meminfo_numa --collector.processes --no-collector.bonding --no-collector.bcache --no-collector.arp --no-collector.edac --no-collector.infiniband --no-collector.ipvs --no-collector.mdadm --no-collector.nfs --no-collector.nfsd
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=10s
SendSIGKILL=no
SyslogIdentifier=prometheus_node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
systemctl enable node_exporter
systemctl daemon-reload
systemctl start node_exporter
配置prometheus.yml
添加如下信息
cat <<EOF>>prometheus.yml
- job_name: server1
static_configs:
- targets: ['172.168.1.29:9100']
labels:
env: prod
alias: server1
上面的格式必须一致,否则会报错
配置仪表盘
先确定Prometheus可以检测到地址http://ip:9090/targets,出现如下画面说明正常
下载node_exporter对应的仪表盘
Home-import dashbords
选择Prometheus数据源,保存,不出意外就可以在主面板看到我们刚才添加的仪表盘了
正常会出现如下画面
2.2.监控MySQL
mysql_exporter下载
先来看一下我们的mysql信息,mysql我也是安装在docker中
cat docker-compose.yml
mysql_linuxwt:
restart: always
image: mysql:5.7
container_name: mysql_linuxwt
volumes:
- /etc/localtime:/etc/localtime
- /etc/timezone:/etc/timezone
- $PWD/mysql:/var/lib/mysql
- $PWD/mysqld.cnf:/etc/mysql/mysql.conf.d/mysqld.cnf
- /var/lib/mysql-files:/var/lib/mysql-files
privileged: true
ports:
- 33066:3306
environment:
MYSQL_ROOT_PASSWORD: 123
监控mysql分为以下几步:
- 配置专有用户exporter,该用户要能从mysql_exporter所在的主机(后面也会用docker部署mysql_exporter,这里的主机就是其容器)访问到mysql_linuxwt
docker exec -it mysql_linuxwt bash
mysql -u root -p123
create user exporter@'%' identified by '1234' with max_user_connections 3;
grant select on performance_schema.* to 'exporter@'%';
grant select on performance_schema.* to 'exporter@'%';
flush privileges; - 部署mysql_exporter
也是通过docker来部署,将其与mysql放在同一个docker-compose里部署
cat docker-compose.yml
version: '2'
services:
mysql_gene:
restart: always
image: mysql:5.7
container_name: mysql_gene
volumes:
- /etc/localtime:/etc/localtime
- /etc/timezone:/etc/timezone
- $PWD/mysql:/var/lib/mysql
- $PWD/mysqld.cnf:/etc/mysql/mysql.conf.d/mysqld.cnf
- /var/lib/mysql-files:/var/lib/mysql-files
privileged: true
ports:
- 33066:3306
environment:
MYSQL_ROOT_PASSWORD: gooalgene@123
mysql_exporter:
restart: always
image: prom/mysqld-exporter
container_name: mysql_exporter
links:
- mysql_gene
ports:
- 9104:9104
environment:
DATA_SOURCE_NAME: "exporter:1234@(mysql_gene:3306)/"
docker-compose up -d启动容器
- 设置prometheus.yml
cat prometheus.yml
添加如下内容
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'server1'
static_configs:
- targets: ['172.168.1.29:9100']
labels:
env: prod
alias: server1
- job_name: 'server2'
static_configs:
- targets: ['172.168.1.27:9100']
labels:
env: proc
alias: server2
- job_name: mysql
static_configs:
- targets: ['172.168.1.29:9104']
labels:
instance: mysql
最后重启prometheus
- 查看数据是否正常已被prometheus获取
172.168.1.29:9090/targets出现以下画面说明prometheus与各exporter的连接正常
访问mysql的数据地址http://172.168.1.29:9104/metrics,正常出现如下数据说明exporter成功输出了mysql数据
下面我们就可以导入mysql对应dashbord来展示画面了,导入的方式前面已经说过了
mysql-dashbords下载
导入就可以看到监控的画面了
2.3.监控Redis
redis_exporter下载
先看一下redis的部署信息
cat docker-compose.yml
redis_gene:
restart: always
image: redis:4.0
container_name: redis_gene
volumes:
- /etc/localtime:/etc/localtime
- /etc/timezone:/etc/timezone
- $PWD/redis:/data
- $PWD/redis.conf:/usr/local/etc/redis/redis.conf
privileged: true
ports:
- 6389:6379
command: redis-server /usr/local/etc/redis/redis.conf
监控redis分为以下几步:
- 部署redis_exporter
cat redis.conf
# bind 127.0.0.1
#daemonize yes //禁止redis后台运行
port 6379
pidfile /var/run/redis.pid
appendonly yes
protected-mode no
requirepass Gooal123#
通过docker来部署,将其与mysql放在同一个docker-compose里部署
cat docker-compose.yml
redis_gene:
restart: always
image: redis:4.0
container_name: redis_gene
volumes:
- /etc/localtime:/etc/localtime
- /etc/timezone:/etc/timezone
- $PWD/redis:/data
- $PWD/redis.conf:/usr/local/etc/redis/redis.conf
privileged: true
ports:
- 6389:6379
command: redis-server /usr/local/etc/redis/redis.conf
redis_exporter:
restart: always
image: oliver006/redis_exporter
container_name: redis_exporter
links:
- redis_gene
ports:
- "9121:9121"
environment:
REDIS_ADDR: redis_gene:6379
REDIS_PASSWORD: Gooal123#
- 设置prometheus.yml
添加如下内容
- job_name: redis_exporter
static_configs:
- targets: ['172.168.1.29:9121']
最后重启prometheus
- 查看数据是否正常已被prometheus获取
查看操作和监控mysql的类似,仪表盘也是类似,可以去grafana官网上找
2.4.监控MongoDB
mongodb_exporter下载 该工具有问题
percona_mongodb_exporter下载
先看看mongodb的部署信息
cat docker-compose.yml
mongo_linuxwt:
restart: always
image: mongo:3.4
container_name: mongo_linuxwt
volumes:
- /etc/localtime:/etc/localtime
- /etc/timezone:/etc/timezone
- $PWD/mongo:/data/db
- $PWD/enabled:/sys/kernel/mm/transparent_hugepage/enabled
- $PWD/defrag:/sys/kernel/mm/transparent_hugepage/defrag
ulimits:
nofile:
soft: 300000
hard: 300000
ports:
- "27017:27017"
command: --port 27017 --oplogSize 204800 --profile=1 --slowms=500 --auth
这里假定admin库用户密码为wangteng/wangteng (该用户的创建过程这里省略)
监控mongodb分为以下几步:
- 创建一个监控用户
docker exec -it mongo_linuxwt bash
mongo
db.auth("wangteng","wangteng")
db.getSiblingDB("admin").createUser({
user: "mongodb_exporter",
pwd: "s3cr3tpassw0rd",
roles: [
{ role: "clusterMonitor", db: "admin" },
{ role: "read", db: "local" }
]
})
- 部署mongodb_exporter
cat docker-compose.yml
mongo_linuxwt:
restart: always
image: mongo:3.4
container_name: mongo_linuxwt
volumes:
- /etc/localtime:/etc/localtime
- /etc/timezone:/etc/timezone
- $PWD/mongo:/data/db
- $PWD/enabled:/sys/kernel/mm/transparent_hugepage/enabled
- $PWD/defrag:/sys/kernel/mm/transparent_hugepage/defrag
ulimits:
nofile:
soft: 300000
hard: 300000
ports:
- "27017:27017"
command: --port 27017 --oplogSize 204800 --profile=1 --slowms=500 --auth
mongodb_exporter:
restart: always
image: elarasu/mongodb_exporter
container_name: mongodb_exporter
links:
- mongo_linuxwt
ports:
- "9216:9104"
environment:
MONGODB_URI: mongodb://mongodb_exporter:s3cr3tpassw0rd@mongo_linuxwt:27017
- 设置prometheus.yml
添加如下内容
- job_name: 'mongodb'
static_configs:
- targets: ['172.168.1.29:9216']
重启prometheus
mongodb_dashboard下载
2.5.监控Nginx
2.6.监控Tomcat
2.7.监控Apache
2.8.自定义脚本监控
监控设备io
除了利用exporter来暴露应用接口去pull数据以外,还可以利用自定义脚本将数据push到pushgateway上,然后prometheus从pushgateway上去pull数据
docker部署pushgateway
cat docker-compose.yml
pushgateway:
restart: always
image: prom/pushgateway
container_name: pushgateway
volumes:
- /etc/localtime:/etc/localtime
- /etc/timezone:/etc/timezone
ports:
- "9091:9091"
配置prometheus.yml
- job_name: 'pushgateway'
static_configs:
- targets: ['172.168.1.29:9091']
重启prometheus的server后访问http://172.168.1.29/targets 可以找到pushgateway的状态已经处于Up状态
下面需要通过shell脚本去取得我们想要监控的服务器的相应的数据,这里我们以172.168.1.209这台服务器的设备/dev/sda的参数%util来为例
cat iostat.sh
for (( i=1; i<=12; i++ ))
do
count=$(iostat -d -x 1 5|grep sda|awk '{print $14}'|tail -1)
label="Count_iostat"
instance_name=$(hostname)
echo "$label $count" | curl --data-binary @- http://172.168.1.29:9091/metrics/job/pushgateway/instance/$instance_name
done
将上面的脚本加入计划任务
crontab -e
* * * * * ./iostat.sh
说明一下:因为计划任务每次都是1分钟执行一次,为了高频率的取得数据,这里就用了一个循环,每5秒取一次数据,count值就是我们获取的数据,prometheus里数据的存放有它的规则,可以参考这篇文章的说明 pushgateway
可以通过访问http://172.168.1.29/9090,在expression里填入数据label:Count_iostat来查看是否已经获取导数据
为了更有效的展示数据,需要自定义图形来展示数据
自定义监控非常重要,可以使你监控你想要的大多数指标
三、故障解决
1、重启应用后,日志方面都没有报错,但是就是没有图像出来,一般是因为时间同步的问题
2、如果prometheus可以正常访问exporter,但是导入grafana模板后不出图,一般是模板与exporter不适配,这里要注意,使用exporter的时候别人肯定会提供相应的仪表盘模板,有时候prometheus官网上提供的exporter无法使用,我们可以使用第三方的exporter和grafana模板,本文的mongodb就是在使用官方的exporter时出现问题,转而使用percona提供的exporter,相应的grafana模板也需要使用percona的