Prometheus 监控体系部署
监控系统组件说明
服务名 | 服务端口 | 作用 |
---|---|---|
node_exporter | 59100 | 采集服务器运行时各项指标数据 |
cadvisor | 59101 | 采集容器运行时各项指标数据 |
kafka_exporter | 59102 | 采集 kafka tpoic 各项指标数据 |
kube-state-metrics | 30686 | 采集 k8s 集群各项指标数据 |
prometheus | 9090 | 收集、存储监控数据 |
grafana | 3000 | 监控数据可视化 |
各服务器都需要部署 node_exporter 服务
仅运行 docker 的服务器需要部署 cadvisor 服务,如 file 集群各节点
kafka_exporter 服务仅需要部署在 kafka 集群中任意一个节点即可
kube-state-metrics 仅需要在 k8s master 一台节点上部署即可,但镜像需要 k8s 集群中各节点都下载
prometheus 与 grafana 可在同一台服务器上部署
网络端口连通性要求:
- prometheus 所在服务器到所有部署 node_exporter 服务器的 59100 端口通畅
- prometheus 所在服务器到所有部署 cadvisor 服务器的 59101 端口通畅
- prometheus 所在服务器到部署 kafka_exporter 服务器的 59102 端口通畅
- prometheus 所在服务器到部署 k8s master 服务器的 6443 与 30686 端口通畅
- grafana 所在服务器到 prometheus 服务器的 9090 端口通畅
- 如果有配置 grafana 地址反向代理,则:
- 代理所在服务器需要到 grafana 的 3000 端口通畅
- 如果有反向代理 prometheus,则到 prometheus 服务器的 9090 端口也要通畅
部署 Node_exporter
-
下载 node_exporter 安装包
wget http://pdpublic.mingdao.com/private-deployment/offline/common/node_exporter-1.3.1.linux-amd64.tar.gz
-
解压 node_exporter
tar xf node_exporter-1.3.1.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/node_exporter-1.3.1.linux-amd64 /usr/local/node_exporter -
配置启停脚本
cat > /usr/local/node_exporter/start_node_exporter.sh <<EOF
nohup /usr/local/node_exporter/node_exporter --web.listen-address=:59100 &
EOF
cat > /usr/local/node_exporter/stop_node_exporter.sh <<EOF
kill \$(pgrep -f '/usr/local/node_exporter/node_exporter --web.listen-address=:59100')
EOF
chmod +x /usr/local/node_exporter/start_node_exporter.sh
chmod +x /usr/local/node_exporter/stop_node_exporter.sh -
启动 node_exporter
cd /usr/local/node_exporter/
bash start_node_exporter.sh -
加入开机自启动
echo "cd /usr/local/node_exporter/ && /bin/bash start_node_exporter.sh" >> /etc/rc.local
chmod +x /etc/rc.local
部署 Cadvisor
-
下载
wget http://pdpublic.mingdao.com/private-deployment/offline/common/cadvisor-v0.47.0-linux-amd64
-
创建 cadvisor 目录
mkdir /usr/local/cadvisor
-
移动并添加可执行权限
mv cadvisor-v0.47.0-linux-amd64 /usr/local/cadvisor/cadvisor
chmod +x /usr/local/cadvisor/cadvisor -
编写启动、停止脚本
cat > /usr/local/cadvisor/start_cadvisor.sh <<EOF
nohup /usr/local/cadvisor/cadvisor -port 59101 &
EOF
cat > /usr/local/cadvisor/stop_cadvisor.sh <<EOF
kill \$(pgrep -f '/usr/local/cadvisor/cadvisor')
EOF
chmod +x /usr/local/cadvisor/start_cadvisor.sh
chmod +x /usr/local/cadvisor/stop_cadvisor.sh -
启动 cadvisor
cd /usr/local/cadvisor
bash start_cadvisor.sh -
加入开机自启动
echo "cd /usr/local/cadvisor && /bin/bash start_cadvisor.sh" >> /etc/rc.local
chmod +x /etc/rc.local
部署 Kafka_exporter
-
下载安装包
wget http://pdpublic.mingdao.com/private-deployment/offline/common/kafka_exporter-1.4.2.linux-amd64.tar.gz
-
解压至安装目录
tar -zxvf kafka_exporter-1.4.2.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/kafka_exporter-1.4.2.linux-amd64 /usr/local/kafka_exporter -
加入管理脚本
# 注意替换 kafka 服务的地址为实际的IP
cat > /usr/local/kafka_exporter/start_kafka_exporter.sh <<EOF
nohup /usr/local/kafka_exporter/kafka_exporter --kafka.server=192.168.1.2:9092 --web.listen-address=:59102 &
EOF
cat > /usr/local/kafka_exporter/stop_kafka_exporter.sh <<EOF
kill \$(pgrep -f '/usr/local/kafka_exporter/kafka_exporter')
EOF
chmod +x /usr/local/kafka_exporter/start_kafka_exporter.sh
chmod +x /usr/local/kafka_exporter/stop_kafka_exporter.sh -
启动服务
cd /usr/local/kafka_exporter/ && bash start_kafka_exporter.sh
-
加入开机自启动
echo "sleep 60; cd /usr/local/kafka_exporter/ && bash start_kafka_exporter.sh" >> /etc/rc.local
chmod +x /etc/rc.local
部署 Kube-state-metrics
-
下载镜像(k8s 集群中所有节点都需要下载镜像)
- 服务器支持访问互联网
- 服务器不支持访问互联网
crictl pull registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0
# 离线镜像文件下载链接,下载完成后上传到部署服务器
wget http://pdpublic.mingdao.com/private-deployment/offline/common/kube-state-metrics.tar.gz
# 解压镜像文件
gunzip -d kube-state-metrics.tar.gz
# 导入离线镜像
ctr -n k8s.io image import kube-state-metrics.tar -
创建配置文件存放目录
mkdir -p /usr/local/kubernetes/ops-monit
cd /usr/local/kubernetes/ops-monit -
写入部署配置文件
cat > cluster-role-binding.yaml <<\EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: ops-monit
EOF
cat > cluster-role.yaml <<\EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
rules:
- apiGroups:
- ""
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs:
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
- ingresses
verbs:
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- daemonsets
- deployments
- replicasets
verbs:
- list
- watch
- apiGroups:
- batch
resources:
- cronjobs
- jobs
verbs:
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- list
- watch
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- list
- watch
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
- volumeattachments
verbs:
- list
- watch
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- list
- watch
EOF
cat > deployment.yaml <<\EOF
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
namespace: ops-monit
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
template:
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
name: kube-state-metrics
ports:
- containerPort: 8080
name: http-metrics
- containerPort: 8081
name: telemetry
readinessProbe:
httpGet:
path: /
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: kube-state-metrics
EOF
cat > service-account.yaml <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
namespace: ops-monit
EOF
cat > service.yaml <<\EOF
apiVersion: v1
kind: Service
metadata:
# annotations:
# prometheus.io/scrape: 'true'
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
namespace: ops-monit
spec:
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
nodePort: 30686
- name: telemetry
port: 8081
targetPort: telemetry
type: NodePort
selector:
app.kubernetes.io/name: kube-state-metrics
EOF
# 创建cadvisor所需kubernetes用户
cat > rbac.yaml <<\EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
---
# 创建prometheus关联的Secret(1.24版本开始要手动关联)
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: prometheus
namespace: kube-system
annotations:
kubernetes.io/service-account.name: "prometheus"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
EOF -
创建命名空间
kubectl create namespace ops-monit
-
启动监控服务
kubectl apply -f .
-
获取 token
kubectl describe secret $(kubectl describe sa prometheus -n kube-system | sed -n '7p' | awk '{print $2}') -n kube-system | tail -n1 | awk '{print $2}'
- token 内容复制写入到 prometheus 服务 的 /usr/local/prometheus/privatedeploy_kubernetes.token 文件中
部署 Prometheus
-
下载 prometheus 安装包
wget http://pdpublic.mingdao.com/private-deployment/offline/common/prometheus-2.32.1.linux-amd64.tar.gz
-
解压
tar -zxvf prometheus-2.32.1.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/prometheus-2.32.1.linux-amd64 /usr/local/prometheus -
配置 prometheus.yml 文件
global:
scrape_interval: 15s
scrape_configs:
# 服务器监控
- job_name: "node_exporter"
static_configs:
- targets: ["192.168.10.20:59100"]
labels:
nodename: service01
origin_prometheus: node
- targets: ["192.168.10.21:59100"]
labels:
nodename: service02
origin_prometheus: node
- targets: ["192.168.10.2:59100"]
labels:
nodename: db01
origin_prometheus: node
- targets: ["192.168.10.3:59100"]
labels:
nodename: db02
origin_prometheus: node
# 容器监控
- job_name: "cadvisor"
static_configs:
- targets:
- 192.168.10.16:59101
- 192.168.10.17:59101
- 192.168.10.18:59101
- 192.168.10.19:59101
# kafka 监控
- job_name: kafka_exporter
static_configs:
- targets: ["192.168.10.7:59102"]
# k8s监控
- job_name: privatedeploy_kubernetes_metrics
static_configs:
- targets: ["192.168.10.20:30686"] # 注意替换为 k8s 主节点地址
labels:
origin_prometheus: kubernetes
- job_name: 'privatedeploy_kubernetes_cadvisor'
scheme: https
metrics_path: /metrics/cadvisor
tls_config:
insecure_skip_verify: true
bearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.token
kubernetes_sd_configs:
- role: node
api_server: https://192.168.10.20:6443 # 注意替换为 k8s 主节点地址
bearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.token
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: 192.168.10.20:6443 # 注意替换为 k8s 主节点地址
- target_label: origin_prometheus
replacement: kubernetes
- source_labels: [__meta_kubernetes_node_name]
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
metric_relabel_configs:
- source_labels: [instance]
separator: ;
regex: (.+)
target_label: node
replacement: $1
action: replace
- source_labels: [pod_name]
separator: ;
regex: (.+)
target_label: pod
replacement: $1
action: replace
- source_labels: [container_name]
separator: ;
regex: (.+)
target_label: container
replacement: $1
action: replace -
配置启停脚本
cat > /usr/local/prometheus/start_prometheus.sh <<EOF
nohup /usr/local/prometheus/prometheus --storage.tsdb.path=/data/prometheus/data --storage.tsdb.retention.time=30d --config.file=/usr/local/prometheus/prometheus.yml --web.enable-lifecycle &
EOF
cat > /usr/local/prometheus/stop_prometheus.sh <<EOF
kill \$(pgrep -f '/usr/local/prometheus/prometheus')
EOF
cat > /usr/local/prometheus/reload_prometheus.sh <<EOF
curl -X POST http://127.0.0.1:9090/-/reload
EOF
chmod +x /usr/local/prometheus/start_prometheus.sh
chmod +x /usr/local/prometheus/stop_prometheus.sh
chmod +x /usr/local/prometheus/reload_prometheus.sh -
启动 prometheus
cd /usr/local/prometheus/
bash start_prometheus.sh -
加入开机自启动
echo "cd /usr/local/prometheus/ && /bin/bash start_prometheus.sh" >> /etc/rc.local
chmod +x /etc/rc.local
部署 Grafana
-
下载 grafana 安装包
wget http://pdpublic.mingdao.com/private-deployment/offline/common/grafana-10.1.1.linux-amd64.tar.gz
-
解压
tar -xf grafana-10.1.1.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/grafana-10.1.1 /usr/local/grafana -
配置启停脚本
cat > /usr/local/grafana/start_grafana.sh <<EOF
cd /usr/local/grafana && nohup ./bin/grafana-server web &
EOF
cat > /usr/local/grafana/stop_grafana.sh <<EOF
kill \$(pgrep -f 'grafana server web')
EOF
chmod +x /usr/local/grafana/start_grafana.sh
chmod +x /usr/local/grafana/stop_grafana.sh -
修改 /usr/local/grafana/conf/defaults.ini文件 中的 root_url 值如下
root_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/
# 一键修改
sed -ri 's#^root_url = .*#root_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/#' /usr/local/grafana/conf/defaults.ini
grep "^root_url" /usr/local/grafana/conf/defaults.ini -
修改 /usr/local/grafana/conf/defaults.ini文件 中的 serve_from_sub_path 值如下
serve_from_sub_path = true
# 一键修改
sed -ri 's#^serve_from_sub_path = .*#serve_from_sub_path = true#' /usr/local/grafana/conf/defaults.ini
grep "^serve_from_sub_path" /usr/local/grafana/conf/defaults.ini- 如果不需要通过 nginx 代理来访问 grafana 页面,而是直接通过 grafana 的 IP 作为访问地址的话,则需要同时调整 domain 值为实际的 host。
-
启动 grafana
cd /usr/local/grafana/
bash start_grafana.sh -
加入开机自启动
echo "cd /usr/local/grafana/ && /bin/bash start_grafana.sh" >> /etc/rc.local
chmod +x /etc/rc.local
配置将 Grafana 地址反向 代理
nginx 反向代理配置文件,参考如下规则配置代理 grafana 页面
upstream grafana {
server 192.168.1.10:3000;
}
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80;
server_name mdy.domain.com;
access_log /data/logs/weblogs/grafana.log main;
error_log /data/logs/weblogs/grafana.mingdao.net.error.log;
location /privatedeploy/mdy/monitor/grafana/ {
#allow 1.1.1.1;
#deny all;
proxy_hide_header X-Frame-Options;
proxy_set_header X-Frame-Options ALLOWALL;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
proxy_redirect http://localhost:3000 http://mdy.domain.com:80/privatedeploy/mdy/monitor/grafana;
}
location /privatedeploy/mdy/monitor/grafana/api/live {
rewrite ^/(.*) /$1 break;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
}
}
-
代理配置好后即可通过 http://mdy.domain.com/privatedeploy/mdy/monitor/grafana 访问 grafana 页面,然后配置仪表盘
-
如果要代理 prometheus 可添加如下规则(通常不需要,因为 prometheus 页面没有认证,需要注意访问安全)
upstream prometheus {
server 192.168.1.10:9090;
}
location /privatedeploy/mdy/monitor/prometheus {
rewrite ^/privatedeploy/mdy/monitor/prometheus$ / break;
rewrite ^/privatedeploy/mdy/monitor/prometheus/(.*)$ /$1 break;
proxy_pass http://prometheus;
proxy_redirect /graph /privatedeploy/mdy/monitor/prometheus/graph;
}
-