一、说明 1.1 简介 Prometheus 负责数据收集处理,Grafana 负责前台展示数据。其中采用 Prometheus 中对接的各 Exporter 包含:
Node Exporter(核心组件),负责收集所属节点的硬件和操作系统数据,可外挂客制化收集数据文件。它将以容器方式运行在所有节点上;
其他各专属类型Exporter,例如上篇介绍的HPC高性能计算环境下,有针对调度系统专用的Exporter;
Alertmanager(可选组件),负责告警,它将以容器方式运行在所有节点上;
1.2 界面展示示意 让我们通过以下截图进一步了解各个组件:
Prometheus
Node-exporter
Grafana
二、安装 docker和docker-compose 2.1 安装 docker 教程较多,这里采用的常见安装方式:
1 2 3 4 5 6 7 8 9 10 11 12 # 安装依赖包 yum install -y yum-utils device-mapper-persistent-data lvm2 # 添加Docker软件包源 yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo # 安装Docker CE yum install docker-ce -y # 启动 systemctl start docker # 开机启动 systemctl enable docker # 查看Docker信息 docker info
2.2 安装 docker-compose 1 2 3 4 5 curl -L https://github.com/docker/compose/releases/download/1.25.4/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose # 如机器配置了ss代理,可加上代理,加快速度,如下: curl --socks5 127.0.0.1:1080 -L https://github.com/docker/compose/releases/download/1.25.4/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose chmod +x /usr/local/bin/docker-compose
2.3 添加配置文件 1 2 mkdir -p /usr/local/src/config cd /usr/local/src/config
2.4 添加 prometheus.yml 配置文件 1 2 # 添加 prometheus.yml 配置文件 vim prometheus.yml
prometheus.yml 文件示例如下: 本例中,192.168.0.106 为部署主机的 ip,其他 ip 为局域网其他节点(如需监控,需自行安装 Node-exporter,默认情况可通过 docker 安装)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: ['192.168.0.106:9093' ] rule_files: - "node_down.yml" scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['192.168.0.106:9090' ] - job_name: 'cadvisor' static_configs: - targets: ['192.168.0.106:8080' ] - job_name: 'mgt' scrape_interval: 8s static_configs: - targets: ['192.168.0.106:9100' ] - job_name: 'io' scrape_interval: 8s static_configs: - targets: ['192.168.0.176:9100' ] - job_name: 'login' scrape_interval: 8s static_configs: - targets: ['192.168.0.186:9100' ] - job_name: 'cal' scrape_interval: 8s static_configs: - targets: ['192.168.0.109:9100' ] - targets: ['192.168.0.83:9100' ] - targets: ['192.168.0.93:9100' ]
2.5 添加邮件告警配置文件 添加配置文件 alertmanager.yml,配置收发邮件邮箱
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 global: smtp_smarthost: 'smtp.163.com:25' smtp_from: 'xxxxxx@163.com' smtp_auth_username: 'xxxxxx@163.com' smtp_auth_password: '*********' smtp_require_tls: false route: group_by: ['alertname' ] group_wait: 10s group_interval: 10s repeat_interval: 10m receiver: live-monitoring receivers: - name: 'live-monitoring' email_configs: - to: 'xxxxxxxxxx@qq.com'
2.3 添加报警规则 添加一个 node_down.yml 为 prometheus targets 监控
1 2 3 4 5 6 7 8 9 10 11 groups: - name: node_down rules: - alert: InstanceDown expr: up == 0 for: 1m labels: user: test annotations: summary: 'Instance {{ $labels.instance }} down' description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes.'
四、编写 docker-compose vim docker-compose-monitor.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 version: '2' networks: monitor: driver: bridge services: prometheus: image: prom/prometheus container_name: prometheus hostname: prometheus restart: always volumes: - /usr/local/src/config/prometheus.yml:/etc/prometheus/prometheus.yml - /usr/local/src/config/node_down.yml:/etc/prometheus/node_down.yml ports: - '9090:9090' networks: - monitor alertmanager: image: prom/alertmanager container_name: alertmanager hostname: alertmanager restart: always volumes: - /usr/local/src/config/alertmanager.yml:/etc/alertmanager/alertmanager.yml ports: - '9093:9093' networks: - monitor grafana: image: grafana/grafana container_name: grafana hostname: grafana restart: always ports: - '3000:3000' networks: - monitor node-exporter: image: quay.io/prometheus/node-exporter container_name: node-exporter hostname: node-exporter restart: always ports: - '9100:9100' networks: - monitor cadvisor: image: google/cadvisor:latest container_name: cadvisor hostname: cadvisor restart: always volumes: - /:/rootfs:ro - /var/run:/var/run:rw - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro ports: - '8080:8080' networks: - monitor
五、启动 docker-compose#启动容器: 1 2 3 4 5 6 7 # 使用docker-composer命令启动yml里配置好的各容器 docker-compose -f /usr/local/src/config/docker-compose-monitor.yml up -d # 删除容器: docker-compose -f /usr/local/src/config/docker-compose-monitor.yml down # 重启容器: docker restart id
以上即为通过 docker-composer 快速搭建监控系统的简单介绍,下一篇我们介绍基于实际场景的自研数据采集和grafana的panel客制化。