Deployment of Prometheus Monitoring System
Description of Monitoring System Components
Service Name | Service Port | Function |
---|---|---|
node_exporter | 59100 | Collecting various runtime metrics of the server |
cadvisor | 59101 | Collecting various runtime metrics of containers |
kafka_exporter | 59102 | Collecting various metrics of Kafka topics |
kube-state-metrics | 30686 | Collecting various metrics of the k8s cluster |
prometheus | 9090 | Collecting and storing monitoring data |
grafana | 3000 | Visualizing monitoring data |
The node_exporter
service needs to be deployed on each server.
The cadvisor
service needs to be deployed only on servers running Docker, such as each node in a file cluster.
The kafka_exporter
service only needs to be deployed on any single node within the Kafka cluster.
The kube-state-metrics
only needs to be deployed on one node of the k8s master, but the image must be downloaded on all nodes within the k8s cluster.
Prometheus
and Grafana
can be deployed on the same server.
Network port connectivity requirements:
- The server where
Prometheus
is located should have smooth access to port 59100 on all servers wherenode_exporter
is deployed. - The server where
Prometheus
is located should have smooth access to port 59101 on all servers wherecadvisor
is deployed. - The server where
Prometheus
is located should have smooth access to port 59102 where thekafka_exporter
server is deployed. - The server where
Prometheus
is located should have smooth access to port 6443 and 30686 on the k8s master server. - The server where
Grafana
is located should have smooth access to port 9090 on thePrometheus
server. - If the Grafana address is configured with a reverse proxy, then:
- The proxy server needs to have smooth access to port 3000 of Grafana.
- If reverse proxy for Prometheus is configured, it also needs smooth access to port 9090 on the Prometheus server.
Deploy Node_exporter
-
Download the node_exporter installation package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/node_exporter-1.3.1.linux-amd64.tar.gz
-
Extract the node_exporter
tar xf node_exporter-1.3.1.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/node_exporter-1.3.1.linux-amd64 /usr/local/node_exporter -
Configure start/stop scripts
cat > /usr/local/node_exporter/start_node_exporter.sh <<EOF
nohup /usr/local/node_exporter/node_exporter --web.listen-address=:59100 &
EOF
cat > /usr/local/node_exporter/stop_node_exporter.sh <<EOF
kill \$(pgrep -f '/usr/local/node_exporter/node_exporter --web.listen-address=:59100')
EOF
chmod +x /usr/local/node_exporter/start_node_exporter.sh
chmod +x /usr/local/node_exporter/stop_node_exporter.sh -
Start node_exporter
cd /usr/local/node_exporter/
bash start_node_exporter.sh -
Add to startup
echo "cd /usr/local/node_exporter/ && /bin/bash start_node_exporter.sh" >> /etc/rc.local
chmod +x /etc/rc.local
Deploy Cadvisor
-
Download
wget https://pdpublic.mingdao.com/private-deployment/offline/common/cadvisor-v0.47.0-linux-amd64
-
Create the cadvisor directory
mkdir /usr/local/cadvisor
-
Move and add executable permissions
mv cadvisor-v0.47.0-linux-amd64 /usr/local/cadvisor/cadvisor
chmod +x /usr/local/cadvisor/cadvisor -
Write start and stop scripts
cat > /usr/local/cadvisor/start_cadvisor.sh <<EOF
nohup /usr/local/cadvisor/cadvisor -port 59101 &
EOF
cat > /usr/local/cadvisor/stop_cadvisor.sh <<EOF
kill \$(pgrep -f '/usr/local/cadvisor/cadvisor')
EOF
chmod +x /usr/local/cadvisor/start_cadvisor.sh
chmod +x /usr/local/cadvisor/stop_cadvisor.sh -
Start cadvisor
cd /usr/local/cadvisor
bash start_cadvisor.sh -
Add to startup
echo "cd /usr/local/cadvisor && /bin/bash start_cadvisor.sh" >> /etc/rc.local
chmod +x /etc/rc.local
Deploy Kafka_exporter
-
Download the installation package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/kafka_exporter-1.4.2.linux-amd64.tar.gz
-
Extract to the installation directory
tar -zxvf kafka_exporter-1.4.2.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/kafka_exporter-1.4.2.linux-amd64 /usr/local/kafka_exporter -
Add management scripts
# Note to replace the Kafka service address with the actual IP
cat > /usr/local/kafka_exporter/start_kafka_exporter.sh <<EOF
nohup /usr/local/kafka_exporter/kafka_exporter --kafka.server=192.168.1.2:9092 --web.listen-address=:59102 &
EOF
cat > /usr/local/kafka_exporter/stop_kafka_exporter.sh <<EOF
kill \$(pgrep -f '/usr/local/kafka_exporter/kafka_exporter')
EOF
chmod +x /usr/local/kafka_exporter/start_kafka_exporter.sh
chmod +x /usr/local/kafka_exporter/stop_kafka_exporter.sh -
Start the service
cd /usr/local/kafka_exporter/ && bash start_kafka_exporter.sh
-
Add to startup
echo "sleep 60; cd /usr/local/kafka_exporter/ && bash start_kafka_exporter.sh" >> /etc/rc.local
chmod +x /etc/rc.local
Deploy Kube-state-metrics
-
Download the image (all nodes in the k8s cluster need to download the image)
- Server with internet access
- Server without internet access
crictl pull registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0
# Offline image file download link, upload to the deployment server after downloading
wget https://pdpublic.mingdao.com/private-deployment/offline/common/kube-state-metrics.tar.gz
# Extract the image file
gunzip -d kube-state-metrics.tar.gz
# Import the offline image
ctr -n k8s.io image import kube-state-metrics.tar -
Create directory for configuration files
mkdir -p /usr/local/kubernetes/ops-monit
cd /usr/local/kubernetes/ops-monit -
Write deployment configuration files
cat > cluster-role-binding.yaml <<\EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: ops-monit
EOF
cat > cluster-role.yaml <<\EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
rules:
- apiGroups:
- ""
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs:
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
- ingresses
verbs:
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- daemonsets
- deployments
- replicasets
verbs:
- list
- watch
- apiGroups:
- batch
resources:
- cronjobs
- jobs
verbs:
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- list
- watch
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- list
- watch
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
- volumeattachments
verbs:
- list
- watch
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- list
- watch
EOF
cat > deployment.yaml <<\EOF
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
namespace: ops-monit
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
template:
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
name: kube-state-metrics
ports:
- containerPort: 8080
name: http-metrics
- containerPort: 8081
name: telemetry
readinessProbe:
httpGet:
path: /
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: kube-state-metrics
EOF
cat > service-account.yaml <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
namespace: ops-monit
EOF
cat > service.yaml <<\EOF
apiVersion: v1
kind: Service
metadata:
# annotations:
# prometheus.io/scrape: 'true'
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
namespace: ops-monit
spec:
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
nodePort: 30686
- name: telemetry
port: 8081
targetPort: telemetry
type: NodePort
selector:
app.kubernetes.io/name: kube-state-metrics
EOF
# 创建cadvisor所需kubernetes用户
cat > rbac.yaml <<\EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
---
# 创建prometheus关联的Secret(1.24版本开始要手动关联)
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: prometheus
namespace: kube-system
annotations:
kubernetes.io/service-account.name: "prometheus"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
EOF
4. Creating Namespace
kubectl create namespace ops-monit
5. Starting Monitoring Service
kubectl apply -f .
6. Retrieving Token
kubectl describe secret $(kubectl describe sa prometheus -n kube-system | sed -n '7p' | awk '{print $2}') -n kube-system | tail -n1 | awk '{print $2}'
- Copy the token content into the
/usr/local/prometheus/privatedeploy_kubernetes.token
file for the prometheus service.
Deploying Prometheus
-
Download the prometheus package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/prometheus-2.32.1.linux-amd64.tar.gz
-
Extract the package
tar -zxvf prometheus-2.32.1.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/prometheus-2.32.1.linux-amd64 /usr/local/prometheus -
Configure the prometheus.yml file
global:
scrape_interval: 15s
scrape_configs:
# Server Monitoring
- job_name: "node_exporter"
static_configs:
- targets: ["192.168.10.20:59100"]
labels:
nodename: service01
origin_prometheus: node
- targets: ["192.168.10.21:59100"]
labels:
nodename: service02
origin_prometheus: node
- targets: ["192.168.10.2:59100"]
labels:
nodename: db01
origin_prometheus: node
- targets: ["192.168.10.3:59100"]
labels:
nodename: db02
origin_prometheus: node
# Container Monitoring
- job_name: "cadvisor"
static_configs:
- targets:
- 192.168.10.16:59101
- 192.168.10.17:59101
- 192.168.10.18:59101
- 192.168.10.19:59101
# Kafka Monitoring
- job_name: kafka_exporter
static_configs:
- targets: ["192.168.10.7:59102"]
# K8s Monitoring
- job_name: privatedeploy_kubernetes_metrics
static_configs:
- targets: ["192.168.10.20:30686"] # Remember to replace with k8s master node address
labels:
origin_prometheus: kubernetes
- job_name: 'privatedeploy_kubernetes_cadvisor'
scheme: https
metrics_path: /metrics/cadvisor
tls_config:
insecure_skip_verify: true
bearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.token
kubernetes_sd_configs:
- role: node
api_server: https://192.168.10.20:6443 # Remember to replace with k8s master node address
bearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.token
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: 192.168.10.20:6443 # Remember to replace with k8s master node address
- target_label: origin_prometheus
replacement: kubernetes
- source_labels: [__meta_kubernetes_node_name]
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
metric_relabel_configs:
- source_labels: [instance]
separator: ;
regex: (.+)
target_label: node
replacement: $1
action: replace
- source_labels: [pod_name]
separator: ;
regex: (.+)
target_label: pod
replacement: $1
action: replace
- source_labels: [container_name]
separator: ;
regex: (.+)
target_label: container
replacement: $1
action: replace -
Configure start/stop scripts
cat > /usr/local/prometheus/start_prometheus.sh <<EOF
nohup /usr/local/prometheus/prometheus --storage.tsdb.path=/data/prometheus/data --storage.tsdb.retention.time=30d --config.file=/usr/local/prometheus/prometheus.yml --web.enable-lifecycle &
EOF
cat > /usr/local/prometheus/stop_prometheus.sh <<EOF
kill \$(pgrep -f '/usr/local/prometheus/prometheus')
EOF
cat > /usr/local/prometheus/reload_prometheus.sh <<EOF
curl -X POST http://127.0.0.1:9090/-/reload
EOF
chmod +x /usr/local/prometheus/start_prometheus.sh
chmod +x /usr/local/prometheus/stop_prometheus.sh
chmod +x /usr/local/prometheus/reload_prometheus.sh -
Start prometheus
cd /usr/local/prometheus/
bash start_prometheus.sh -
Add to startup
echo "cd /usr/local/prometheus/ && /bin/bash start_prometheus.sh" >> /etc/rc.local
chmod +x /etc/rc.local
Deploying Grafana
-
Download the grafana package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/grafana-10.1.1.linux-amd64.tar.gz
-
Extract the package
tar -xf grafana-10.1.1.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/grafana-10.1.1 /usr/local/grafana -
Configure start/stop scripts
cat > /usr/local/grafana/start_grafana.sh <<EOF
cd /usr/local/grafana && nohup ./bin/grafana-server web &
EOF
cat > /usr/local/grafana/stop_grafana.sh <<EOF
kill \$(pgrep -f 'grafana server web')
EOF
chmod +x /usr/local/grafana/start_grafana.sh
chmod +x /usr/local/grafana/stop_grafana.sh -
Modify the root_url value in the
/usr/local/grafana/conf/defaults.ini
file as followsroot_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/
# Modify with one command
sed -ri 's#^root_url = .*#root_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/#' /usr/local/grafana/conf/defaults.ini
grep "^root_url" /usr/local/grafana/conf/defaults.ini -
Modify the serve_from_sub_path value in the
/usr/local/grafana/conf/defaults.ini
file as followsserve_from_sub_path = true
# Modify with one command
sed -ri 's#^serve_from_sub_path = .*#serve_from_sub_path = true#' /usr/local/grafana/conf/defaults.ini
grep "^serve_from_sub_path" /usr/local/grafana/conf/defaults.ini- If you do not need to access the grafana page through an nginx proxy and prefer to use grafana's IP directly, adjust the domain value to the actual host.
-
Start grafana
cd /usr/local/grafana/
bash start_grafana.sh -
Add to startup
echo "cd /usr/local/grafana/ && /bin/bash start_grafana.sh" >> /etc/rc.local
chmod +x /etc/rc.local
Configuring Reverse Proxy for Grafana Address
nginx reverse proxy configuration file, refer to the following rules to configure the proxy for the grafana page
upstream grafana {
server 192.168.1.10:3000;
}
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80;
server_name hap.domain.com;
access_log /data/logs/weblogs/grafana.log main;
error_log /data/logs/weblogs/grafana.mingdao.net.error.log;
location /privatedeploy/mdy/monitor/grafana/ {
#allow 1.1.1.1;
#deny all;
proxy_hide_header X-Frame-Options;
proxy_set_header X-Frame-Options ALLOWALL;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
proxy_redirect http://localhost:3000 http://hap.domain.com:80/privatedeploy/mdy/monitor/grafana;
}
location /privatedeploy/mdy/monitor/grafana/api/live {
rewrite ^/(.*) /$1 break;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
}
}
-
Once the proxy is configured, you can access the grafana page via http://hap.domain.com/privatedeploy/mdy/monitor/grafana and then configure the dashboard
-
To proxy prometheus, add the following rules (usually not needed as the prometheus page lacks authentication, be cautious about access security)
upstream prometheus {
server 192.168.1.10:9090;
}
location /privatedeploy/mdy/monitor/prometheus {
rewrite ^/privatedeploy/mdy/monitor/prometheus$ / break;
rewrite ^/privatedeploy/mdy/monitor/prometheus/(.*)$ /$1 break;
proxy_pass http://prometheus;
proxy_redirect /graph /privatedeploy/mdy/monitor/prometheus/graph;
}
-