Deployment of Prometheus Monitoring System
Explanation of Monitoring System Components
| Service Name | Service Port | Function |
|---|---|---|
| node_exporter | 59100 | Collect server runtime metric data |
| cadvisor | 59101 | Collect container runtime metric data |
| kafka_exporter | 59102 | Collect metrics of Kafka topics |
| kube-state-metrics | 30686 | Collect metrics of the k8s cluster |
| prometheus | 9090 | Collect and store monitoring data |
| grafana | 3000 | Visualize monitoring data |
The node_exporter service needs to be deployed on every server.
The cadvisor service needs to be deployed only on servers running Docker, such as each node in the file cluster.
The kafka_exporter service only needs to be deployed on any one node within the Kafka cluster.
The kube-state-metrics needs to be deployed only on one node of the k8s master, but its image needs to be downloaded on each node of the k8s cluster.
Prometheus and Grafana can be deployed on the same server.
Network Port Connectivity Requirements:
- The server running Prometheus must have connectivity to port 59100 on all servers where
node_exporteris deployed. - The server running Prometheus must have connectivity to port 59101 on all servers where
cadvisoris deployed. - The server running Prometheus must have connectivity to port 59102 on the server where
kafka_exporteris deployed. - The server running Prometheus must have connectivity to port 6443 and 30686 on the k8s master server.
- The server running Grafana must have connectivity to port 9090 on the Prometheus server.
- If a reverse proxy for Grafana's address is configured:
- The proxy server needs to have connectivity to Grafana's port 3000.
- If Prometheus is also reverse-proxied, it must have connectivity to Prometheus server's port 9090.
Deployment of node_exporter
-
Download the node_exporter installation package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/node_exporter-1.9.1.linux-amd64.tar.gz -
Extract node_exporter
tar xf node_exporter-1.9.1.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/node_exporter-1.9.1.linux-amd64 /usr/local/node_exporter -
Write the systemd service file for node_exporter
cat > /etc/systemd/system/node_exporter.service <<'EOF'
[Unit]
Description=Node Exporter for Prometheus
Documentation=https://github.com/prometheus/node_exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/node_exporter/node_exporter --web.listen-address=:59100
User=root
Group=root
Restart=always
RestartSec=10
LimitNOFILE=102400
[Install]
WantedBy=multi-user.target
EOF -
Start node_exporter
systemctl daemon-reload
systemctl enable node_exporter
systemctl start node_exporter
Deployment of cadvisor
-
Download
wget https://pdpublic.mingdao.com/private-deployment/offline/common/cadvisor-v0.52.1-linux-amd64 -
Create the cadvisor directory
mkdir /usr/local/cadvisor -
Move and add executable permission
mv cadvisor-v0.52.1-linux-amd64 /usr/local/cadvisor/cadvisor
chmod +x /usr/local/cadvisor/cadvisor -
Write the systemd service file for cadvisor
cat > /etc/systemd/system/cadvisor.service <<'EOF'
[Unit]
Description=cAdvisor Container Monitoring
Documentation=https://github.com/google/cadvisor
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/cadvisor/cadvisor -port=59101
User=root
Group=root
Restart=always
RestartSec=10
LimitNOFILE=102400
[Install]
WantedBy=multi-user.target
EOF -
Start cadvisor
systemctl daemon-reload
systemctl enable cadvisor
systemctl start cadvisor
Deployment of kafka_exporter
-
Download the installation package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/kafka_exporter-1.9.0.linux-amd64.tar.gz -
Extract to the installation directory
tar -zxvf kafka_exporter-1.9.0.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/kafka_exporter-1.9.0.linux-amd64 /usr/local/kafka_exporter -
Write the systemd service file for kafka_exporter
# Note to replace the --kafka.server parameter with your actual Kafka address
cat > /etc/systemd/system/kafka_exporter.service <<'EOF'
[Unit]
Description=Kafka Exporter for Prometheus
Documentation=https://github.com/danielqsj/kafka_exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/kafka_exporter/kafka_exporter --kafka.server=192.168.1.2:9092 --web.listen-address=:59102
User=root
Group=root
Restart=always
RestartSec=10
LimitNOFILE=102400
[Install]
WantedBy=multi-user.target
EOF -
Start kafka_exporter
systemctl daemon-reload
systemctl enable kafka_exporter
systemctl start kafka_exporter
Deploy Kube-state-metrics
-
Download the image (all nodes in the k8s cluster need to download the image)
- Server with internet access
- Server without internet access
crictl pull registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0# Offline image file download link, upload to the deployment server after downloading
wget https://pdpublic.mingdao.com/private-deployment/offline/common/kube-state-metrics.tar.gz
# Extract the image file
gunzip -d kube-state-metrics.tar.gz
# Import the offline image
ctr -n k8s.io image import kube-state-metrics.tar -
Create directory for configuration files
mkdir -p /usr/local/kubernetes/ops-monit
cd /usr/local/kubernetes/ops-monit -
Write deployment configuration files
cat > cluster-role-binding.yaml <<\EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: ops-monit
EOF
cat > cluster-role.yaml <<\EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
rules:
- apiGroups:
- ""
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs:
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
- ingresses
verbs:
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- daemonsets
- deployments
- replicasets
verbs:
- list
- watch
- apiGroups:
- batch
resources:
- cronjobs
- jobs
verbs:
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- list
- watch
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- list
- watch
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
- volumeattachments
verbs:
- list
- watch
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- list
- watch
EOF
cat > deployment.yaml <<\EOF
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
namespace: ops-monit
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
template:
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
name: kube-state-metrics
ports:
- containerPort: 8080
name: http-metrics
- containerPort: 8081
name: telemetry
readinessProbe:
httpGet:
path: /
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: kube-state-metrics
EOF
cat > service-account.yaml <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
namespace: ops-monit
EOF
cat > service.yaml <<\EOF
apiVersion: v1
kind: Service
metadata:
# annotations:
# prometheus.io/scrape: 'true'
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.3.0
name: kube-state-metrics
namespace: ops-monit
spec:
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
nodePort: 30686
- name: telemetry
port: 8081
targetPort: telemetry
type: NodePort
selector:
app.kubernetes.io/name: kube-state-metrics
EOF
cat > rbac.yaml <<\EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
---
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: prometheus
namespace: kube-system
annotations:
kubernetes.io/service-account.name: "prometheus"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
EOF
4. Creating Namespace
kubectl create namespace ops-monit
5. Starting Monitoring Service
kubectl apply -f .
6. Retrieving Token
kubectl describe secret $(kubectl describe sa prometheus -n kube-system | sed -n '7p' | awk '{print $2}') -n kube-system | tail -n1 | awk '{print $2}'
- Copy the token content into the
/usr/local/prometheus/privatedeploy_kubernetes.tokenfile for the prometheus service.
Deploying Prometheus
-
Download the Prometheus installation package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/prometheus-3.5.0.linux-amd64.tar.gz -
Extract the package
tar -zxvf prometheus-3.5.0.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/prometheus-3.5.0.linux-amd64 /usr/local/prometheus -
Configure the
prometheus.ymlfileglobal:
scrape_interval: 15s
scrape_configs:
# Server monitoring
- job_name: "node_exporter"
static_configs:
- targets: ["192.168.10.20:59100"]
labels:
nodename: hap-nginx-01
origin_prometheus: node
- targets: ["192.168.10.21:59100"]
labels:
nodename: hap-k8s-service-01
origin_prometheus: node
- targets: ["192.168.10.2:59100"]
labels:
nodename: hap-k8s-service-02
origin_prometheus: node
- targets: ["192.168.10.3:59100"]
labels:
nodename: hap-middleware-01
origin_prometheus: node
- targets: ["192.168.10.3:59100"]
labels:
nodename: hap-db-01
origin_prometheus: node
# Docker monitoring
- job_name: "cadvisor"
static_configs:
- targets:
- 192.168.10.16:59101
# Kafka monitoring
- job_name: kafka_exporter
static_configs:
- targets: ["192.168.10.7:59102"]
# K8s monitoring
- job_name: privatedeploy_kubernetes_metrics
static_configs:
- targets: ["192.168.10.20:30686"] # Remember to replace with K8s main node address
labels:
origin_prometheus: kubernetes
- job_name: 'privatedeploy_kubernetes_cadvisor'
scheme: https
metrics_path: /metrics/cadvisor
tls_config:
insecure_skip_verify: true
bearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.token
kubernetes_sd_configs:
- role: node
api_server: https://192.168.10.20:6443 # Remember to replace with K8s main node address
bearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.token
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: 192.168.10.20:6443 # Remember to replace with K8s main node address
- target_label: origin_prometheus
replacement: kubernetes
- source_labels: [__meta_kubernetes_node_name]
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
metric_relabel_configs:
- source_labels: [instance]
separator: ;
regex: (.+)
target_label: node
replacement: $1
action: replace
- source_labels: [pod_name]
separator: ;
regex: (.+)
target_label: pod
replacement: $1
action: replace
- source_labels: [container_name]
separator: ;
regex: (.+)
target_label: container
replacement: $1
action: replace -
Create the Prometheus systemd service file
cat > /etc/systemd/system/prometheus.service <<'EOF'
[Unit]
Description=Prometheus Monitoring System
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/prometheus/prometheus \
--storage.tsdb.path=/data/prometheus/data \
--storage.tsdb.retention.time=30d \
--config.file=/usr/local/prometheus/prometheus.yml \
--web.enable-lifecycle
ExecReload=/usr/bin/curl -X POST http://127.0.0.1:9090/-/reload
User=root
Group=root
Restart=always
RestartSec=10
LimitNOFILE=102400
[Install]
WantedBy=multi-user.target
EOF -
Start Prometheus
systemctl daemon-reload
systemctl enable prometheus
systemctl start prometheus- When the Prometheus configuration is modified, you can hot reload with
systemctl reload prometheus.
- When the Prometheus configuration is modified, you can hot reload with
Deploying Grafana
-
Download the Grafana installation package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/grafana_12.1.2_17957162798_linux_amd64.tar.gz -
Extract the package
tar -xf grafana_12.1.2_17957162798_linux_amd64.tar.gz -C /usr/local/
mv /usr/local/grafana-12.1.2 /usr/local/grafana -
Modify the
root_urlvalue in the/usr/local/grafana/conf/defaults.inifile as followsroot_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/
# One-click modification
sed -ri 's#^root_url = .*#root_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/#' /usr/local/grafana/conf/defaults.ini
grep "^root_url" /usr/local/grafana/conf/defaults.ini -
Modify the
serve_from_sub_pathvalue in the/usr/local/grafana/conf/defaults.inifile as followsserve_from_sub_path = true
# One-click modification
sed -ri 's#^serve_from_sub_path = .*#serve_from_sub_path = true#' /usr/local/grafana/conf/defaults.ini
grep "^serve_from_sub_path" /usr/local/grafana/conf/defaults.ini- If you don't need to access the Grafana page through an Nginx proxy and instead use the Grafana IP directly, you should change the
domainvalue to the actual host.
- If you don't need to access the Grafana page through an Nginx proxy and instead use the Grafana IP directly, you should change the
-
Create the Grafana systemd service file
cat > /etc/systemd/system/grafana.service <<'EOF'
[Unit]
Description=Grafana Dashboard
Documentation=https://grafana.com/docs/
After=network.target
[Service]
Type=simple
WorkingDirectory=/usr/local/grafana
ExecStart=/usr/local/grafana/bin/grafana-server web
User=root
Group=root
Restart=always
RestartSec=10
LimitNOFILE=102400
[Install]
WantedBy=multi-user.target
EOF -
Start Grafana
systemctl daemon-reload
systemctl enable grafana
systemctl start grafana
Configuring Reverse Proxy for Grafana Address
The Nginx reverse proxy configuration file for proxying the Grafana page should follow the reference rules below
upstream grafana {
server 192.168.1.10:3000;
}
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80;
server_name hap.domain.com;
access_log /data/logs/weblogs/grafana.log main;
error_log /data/logs/weblogs/grafana.mingdao.net.error.log;
location /privatedeploy/mdy/monitor/grafana/ {
#allow 1.1.1.1;
#deny all;
proxy_hide_header X-Frame-Options;
proxy_set_header X-Frame-Options ALLOWALL;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
proxy_redirect http://localhost:3000 http://hap.domain.com:80/privatedeploy/mdy/monitor/grafana;
}
location /privatedeploy/mdy/monitor/grafana/api/live {
rewrite ^/(.*) /$1 break;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
}
}
-
Once the proxy configuration is complete, visit the Grafana page at http://hap.domain.com/privatedeploy/mdy/monitor/grafana, and then configure the dashboard.
-
If you want to proxy Prometheus, you can add the following rules (usually not needed because the Prometheus page lacks authentication, so pay attention to access security)
upstream prometheus {
server 192.168.1.10:9090;
}
location /privatedeploy/mdy/monitor/prometheus {
rewrite ^/privatedeploy/mdy/monitor/prometheus$ / break;
rewrite ^/privatedeploy/mdy/monitor/prometheus/(.*)$ /$1 break;
proxy_pass http://prometheus;
proxy_redirect /graph /privatedeploy/mdy/monitor/prometheus/graph;
}
-