Prometheus+Grafana监控平台搭建实践

一、软件环境准备

本次实践全部在本机172.168.1.29包括第三方应用如mysql、mongodb、redis等上进行安装,监控软件相关:

  • Go
  • Prometheus
  • Grafana

简单说一下这三者的作用,Go给后面我们监控应用时需要运行exporter文件时提供编译环境,Prometheus抓取从exporter获取的信息并存储,Grafana将信息以仪表图形等方式展现

1.1.Go环境安装

二进制版本Go
tar zvxf go1.12.1.linux-amd64.tar.gz -C /usr/local/
设置go的环境变量
cat /etc/profile

export GOROOT=/usr/local/go  #安装路径
export GOPATH=/root/go    #工作路径
export PATH=$PATH:$GOROOT/bin  
export GO111MODULE=off  #无模块支持,go会从GOPATH和vendor文件夹寻找包,这里我们关掉该参数

source /etc/profile
go version输出go version go1.12.1 linux/amd64说明环境变量设置成功
下面测试go环境是否安装成功
cat helloworld.go

package main

import (
   "fmt"
)

func main() {
    fmt.Printf("hello world\n") 
}

go run helloworld.go输出hello world说明go安装成功

1.2.Prometheus安装

Prometheus是一款开源的系统监控、报警、时间序列数据库的组合,常常用来监控k8s容器的集群管理进行监控。Prometheus基本原理是通过http协议周期性抓取被监控组件的状态,而输出这些被监控的组件的http接口为exporter,目前官方已经提供了很多常用的exporter可以直接使用,比如mysql_exporter、mongodb_exporter、redis_exporter等

这里我们使用docker来安装Prometheus

prometheus:
    restart: always
    image: prom/prometheus
    container_name: prometheus
    volumes:
        - /etc/localtime:/etc/localtime
        - /etc/timezone:/etc/timezone
        - $PWD/prometheus.yml:/etc/prometheus/prometheus.yml
    privileged: true
    ports:
        - 9090:9090

cat prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

docker-compose up -d
访问http://ip:9090/targets出现下图的样子说明Prometheus安装成功
04gggg

1.3.Grafana安装

Grafana是一个可视化仪表盘,拥有功能齐全的仪表盘和图形编辑器,支持Prometheus、Influxdb、Elasticsearch等作为数据源,可以讲Prometheus抓取的数据通过其直观的展示出来

1.3.1.编写Dockerfile

一般我们都是使用root来运行docker容器,这在拉取官方image的时候会有一些权限问题,所以我们用官方image作为基础镜像制作一个root用户运行的镜像
cat Dockerfile

FROM grafana/grafana
MAINTAINER <tengwanginit@gmail.com>

USER root

RUN chown -R root.root /etc/grafana && \
    chmod -R a+r /etc/grafana && \
    chown -R grafana:grafana /var/lib/grafana && \
    chown -R grafana:grafana /usr/share/grafana

docker build -t grafana/linuxwt .

1.3.2.编写docker-compose.yml文件

cat docker-compose.yml

grafana_linuxwt:
    image: grafana/linuxwt
    restart: always
    container_name: grafana_linuxwt
    volumes:
      - ./grafana.ini:/etc/grafana/grafana.ini
      - ./grafana:/var/lib/grafana
      - /etc/localtime:/etc/localtime
      - /etc/timezone:/etc/timezone
    ports:
      - 3000:3000
    environment:
      GF_INSTALL_PLUGINS: grafana-clock-panel,grafana-simple-json-datasource

docker-compose up -d
访问地址http://ip:3000/出现下面画面表示安装成功
注意上面的环境变量可以自由添加扩展,扩展可以去grafana官网的plugins去找

04ffff
修改好密码后我们就可以进入grafana的主页了,下面就是需要Grafana识别到Prometheus Server

1.4.添加数据源

进入Grafana面板后,add data source
04addad
进行简单设置
04bbbb
URL处要填写ip:9090,因为我们的Prometheus是部署在docker中的,如果是部署在宿主机上,这里就可以写localhost:9090了,其他默认,最后点击保存,提示Data source is working表示数据源添加成功,至此我们Grafana安装完成

二、监控系统和应用

Prometheus的组件Exporter是用Go编写的程序,它开放一个http接口,Prometheus通过这个接口抓取Exporter输出的信息,这里我们需要下载node_exporter部署在被监控的机器上,本实验中就是本机了

2.1.监控主机系统

node_exporter下载
tar zvxf node_exporter-0.17.0.linux-amd64.tar.gz -C /usr/local/
cd /usr/local
mv node_exporter-0.17.0.linux-amd64/ node_exporter
将node_exporter加入系统服务
cat /usr/lib/systemd/system/node_exporter.service

[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=simple

ExecStart=/usr/local/node_exporter/node_exporter --web.listen-address=0.0.0.0:9100 --log.level=info --collector.logind --collector.systemd --collector.tcpstat --collector.ntp --collector.interrupts  --collector.meminfo_numa  --collector.processes --no-collector.bonding --no-collector.bcache --no-collector.arp --no-collector.edac --no-collector.infiniband --no-collector.ipvs --no-collector.mdadm --no-collector.nfs --no-collector.nfsd

ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=10s
SendSIGKILL=no
SyslogIdentifier=prometheus_node_exporter
Restart=always

[Install]
WantedBy=multi-user.target

systemctl enable node_exporter
systemctl daemon-reload
systemctl start node_exporter
配置prometheus.yml
添加如下信息

cat <<EOF>>prometheus.yml  
 - job_name: server1
    static_configs:
      - targets: ['172.168.1.29:9100']
        labels:
          env: prod
          alias: server1

上面的格式必须一致,否则会报错
配置仪表盘
先确定Prometheus可以检测到地址http://ip:9090/targets,出现如下画面说明正常
04errr
下载node_exporter对应的仪表盘
Home-import dashbords
04ooooo
选择Prometheus数据源,保存,不出意外就可以在主面板看到我们刚才添加的仪表盘了
正常会出现如下画面
04a-si-da-suo-da-suo-da

2.2.监控MySQL

mysql_exporter下载
先来看一下我们的mysql信息,mysql我也是安装在docker中
cat docker-compose.yml

mysql_linuxwt:
  restart: always
  image: mysql:5.7
  container_name: mysql_linuxwt
  volumes:
      - /etc/localtime:/etc/localtime
      - /etc/timezone:/etc/timezone
      - $PWD/mysql:/var/lib/mysql
      - $PWD/mysqld.cnf:/etc/mysql/mysql.conf.d/mysqld.cnf
      - /var/lib/mysql-files:/var/lib/mysql-files
  privileged: true
  ports:
    - 33066:3306
  environment:
       MYSQL_ROOT_PASSWORD: 123

监控mysql分为以下几步:

  • 配置专有用户exporter,该用户要能从mysql_exporter所在的主机(后面也会用docker部署mysql_exporter,这里的主机就是其容器)访问到mysql_linuxwt
    docker exec -it mysql_linuxwt bash
    mysql -u root -p123
    create user exporter@'%' identified by '1234' with max_user_connections 3;
    grant select on performance_schema.* to 'exporter@'%';
    grant select on performance_schema.* to 'exporter@'%';
    flush privileges;
  • 部署mysql_exporter
    也是通过docker来部署,将其与mysql放在同一个docker-compose里部署
    cat docker-compose.yml
version: '2'
services:
  mysql_gene:
    restart: always
    image: mysql:5.7
    container_name: mysql_gene
    volumes:
        - /etc/localtime:/etc/localtime
        - /etc/timezone:/etc/timezone
        - $PWD/mysql:/var/lib/mysql
        - $PWD/mysqld.cnf:/etc/mysql/mysql.conf.d/mysqld.cnf
        - /var/lib/mysql-files:/var/lib/mysql-files
    privileged: true
    ports:
      - 33066:3306
    environment:
         MYSQL_ROOT_PASSWORD: gooalgene@123
  mysql_exporter:
      restart: always
      image: prom/mysqld-exporter
      container_name: mysql_exporter
      links:
          - mysql_gene
      ports:
          - 9104:9104
      environment:
          DATA_SOURCE_NAME: "exporter:1234@(mysql_gene:3306)/"

docker-compose up -d启动容器

  • 设置prometheus.yml
    cat prometheus.yml
添加如下内容   
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'server1'
    static_configs:
      - targets: ['172.168.1.29:9100']
        labels:
          env: prod
          alias: server1
  - job_name: 'server2'
    static_configs:
      - targets: ['172.168.1.27:9100']
        labels:
          env: proc
          alias: server2
  - job_name: mysql
    static_configs:
      - targets: ['172.168.1.29:9104']
        labels:
          instance: mysql

最后重启prometheus

  • 查看数据是否正常已被prometheus获取
    172.168.1.29:9090/targets出现以下画面说明prometheus与各exporter的连接正常
    04vfvf
    访问mysql的数据地址http://172.168.1.29:9104/metrics,正常出现如下数据说明exporter成功输出了mysql数据
    04fgfg

下面我们就可以导入mysql对应dashbord来展示画面了,导入的方式前面已经说过了
mysql-dashbords下载
导入就可以看到监控的画面了
04bbbbn

2.3.监控Redis

redis_exporter下载
先看一下redis的部署信息
cat docker-compose.yml

redis_gene:
  restart: always
  image: redis:4.0
  container_name: redis_gene
  volumes:
      - /etc/localtime:/etc/localtime
      - /etc/timezone:/etc/timezone
      - $PWD/redis:/data
      - $PWD/redis.conf:/usr/local/etc/redis/redis.conf
  privileged: true
  ports:
      - 6389:6379
  command: redis-server /usr/local/etc/redis/redis.conf

监控redis分为以下几步:

  • 部署redis_exporter
    cat redis.conf
# bind 127.0.0.1
#daemonize yes //禁止redis后台运行
port 6379
pidfile /var/run/redis.pid
appendonly yes
protected-mode no
requirepass Gooal123#

通过docker来部署,将其与mysql放在同一个docker-compose里部署
cat docker-compose.yml

redis_gene:
  restart: always
  image: redis:4.0
  container_name: redis_gene
  volumes:
      - /etc/localtime:/etc/localtime
      - /etc/timezone:/etc/timezone
      - $PWD/redis:/data
      - $PWD/redis.conf:/usr/local/etc/redis/redis.conf
  privileged: true
  ports:
      - 6389:6379
  command: redis-server /usr/local/etc/redis/redis.conf
redis_exporter:
    restart: always
    image: oliver006/redis_exporter
    container_name: redis_exporter
    links:
        - redis_gene
    ports:
        - "9121:9121"
    environment:
        REDIS_ADDR: redis_gene:6379
        REDIS_PASSWORD: Gooal123#
  • 设置prometheus.yml
    添加如下内容
- job_name: redis_exporter
    static_configs:
      - targets: ['172.168.1.29:9121']

最后重启prometheus

  • 查看数据是否正常已被prometheus获取
    查看操作和监控mysql的类似,仪表盘也是类似,可以去grafana官网上找

redis_dashboard下载

2.4.监控MongoDB

mongodb_exporter下载 该工具有问题
percona_mongodb_exporter下载
先看看mongodb的部署信息
cat docker-compose.yml

mongo_linuxwt:
  restart: always
  image: mongo:3.4
  container_name: mongo_linuxwt
  volumes:
    - /etc/localtime:/etc/localtime
    - /etc/timezone:/etc/timezone
    - $PWD/mongo:/data/db
    - $PWD/enabled:/sys/kernel/mm/transparent_hugepage/enabled
    - $PWD/defrag:/sys/kernel/mm/transparent_hugepage/defrag
  ulimits:
    nofile:
      soft: 300000
      hard: 300000
  ports:
      - "27017:27017"
  command: --port 27017 --oplogSize 204800 --profile=1 --slowms=500  --auth

这里假定admin库用户密码为wangteng/wangteng (该用户的创建过程这里省略)
监控mongodb分为以下几步:

  • 创建一个监控用户
    docker exec -it mongo_linuxwt bash
    mongo
    db.auth("wangteng","wangteng")
db.getSiblingDB("admin").createUser({
    user: "mongodb_exporter",
    pwd: "s3cr3tpassw0rd",
    roles: [
        { role: "clusterMonitor", db: "admin" },
        { role: "read", db: "local" }
    ]
})
  • 部署mongodb_exporter
    cat docker-compose.yml
mongo_linuxwt:
  restart: always
  image: mongo:3.4
  container_name: mongo_linuxwt
  volumes:
    - /etc/localtime:/etc/localtime
    - /etc/timezone:/etc/timezone
    - $PWD/mongo:/data/db
    - $PWD/enabled:/sys/kernel/mm/transparent_hugepage/enabled
    - $PWD/defrag:/sys/kernel/mm/transparent_hugepage/defrag
  ulimits:
    nofile:
      soft: 300000
      hard: 300000
  ports:
      - "27017:27017"
  command: --port 27017 --oplogSize 204800 --profile=1 --slowms=500  --auth

mongodb_exporter:
   restart: always
   image: elarasu/mongodb_exporter
   container_name: mongodb_exporter
   links:
     - mongo_linuxwt
   ports:
     - "9216:9104"
   environment: 
       MONGODB_URI: mongodb://mongodb_exporter:s3cr3tpassw0rd@mongo_linuxwt:27017
  • 设置prometheus.yml
    添加如下内容
- job_name: 'mongodb'
    static_configs:
      - targets: ['172.168.1.29:9216']

重启prometheus
mongodb_dashboard下载

2.5.监控Nginx

2.6.监控Tomcat

2.7.监控Apache

2.8.自定义脚本监控

监控设备io
除了利用exporter来暴露应用接口去pull数据以外,还可以利用自定义脚本将数据push到pushgateway上,然后prometheus从pushgateway上去pull数据
docker部署pushgateway
cat docker-compose.yml

pushgateway:
    restart: always
    image: prom/pushgateway
    container_name: pushgateway
    volumes:
        - /etc/localtime:/etc/localtime
        - /etc/timezone:/etc/timezone
    ports:
        - "9091:9091"

配置prometheus.yml

- job_name: 'pushgateway'
    static_configs:
      - targets: ['172.168.1.29:9091']

重启prometheus的server后访问http://172.168.1.29/targets 可以找到pushgateway的状态已经处于Up状态
下面需要通过shell脚本去取得我们想要监控的服务器的相应的数据,这里我们以172.168.1.209这台服务器的设备/dev/sda的参数%util来为例
cat iostat.sh

for (( i=1; i<=12; i++ ))
do
count=$(iostat -d -x 1 5|grep sda|awk '{print $14}'|tail -1)
label="Count_iostat"
instance_name=$(hostname)
echo "$label $count" | curl --data-binary @- http://172.168.1.29:9091/metrics/job/pushgateway/instance/$instance_name
done

将上面的脚本加入计划任务

crontab -e   
* * * * * ./iostat.sh

说明一下:因为计划任务每次都是1分钟执行一次,为了高频率的取得数据,这里就用了一个循环,每5秒取一次数据,count值就是我们获取的数据,prometheus里数据的存放有它的规则,可以参考这篇文章的说明 pushgateway
可以通过访问http://172.168.1.29/9090,在expression里填入数据label:Count_iostat来查看是否已经获取导数据

为了更有效的展示数据,需要自定义图形来展示数据
042092
042093
042094
042095
自定义监控非常重要,可以使你监控你想要的大多数指标

三、故障解决

1、重启应用后,日志方面都没有报错,但是就是没有图像出来,一般是因为时间同步的问题
2、如果prometheus可以正常访问exporter,但是导入grafana模板后不出图,一般是模板与exporter不适配,这里要注意,使用exporter的时候别人肯定会提供相应的仪表盘模板,有时候prometheus官网上提供的exporter无法使用,我们可以使用第三方的exporter和grafana模板,本文的mongodb就是在使用官方的exporter时出现问题,转而使用percona提供的exporter,相应的grafana模板也需要使用percona的