Deployment of Prometheus Monitoring System
Explanation of Monitoring System Components
| Service Name | Service Port | Function |
|---|---|---|
| node_exporter | 59100 | Collect server runtime metric data |
| cadvisor | 59101 | Collect container runtime metric data |
| kafka_exporter | 59102 | Collect metrics of Kafka topics |
| kube-state-metrics | 30686 | Collect metrics of the k8s cluster |
| prometheus | 9090 | Collect and store monitoring data |
| grafana | 3000 | Visualize monitoring data |
The node_exporter service needs to be deployed on every server.
The cadvisor service needs to be deployed only on servers running Docker, such as each node in the file cluster.
The kafka_exporter service only needs to be deployed on any one node within the Kafka cluster.
The kube-state-metrics needs to be deployed only on one node of the k8s master, but its image needs to be downloaded on each node of the k8s cluster.
Prometheus and Grafana can be deployed on the same server.
Network Port Connectivity Requirements:
- The server running Prometheus must have connectivity to port 59100 on all servers where
node_exporteris deployed. - The server running Prometheus must have connectivity to port 59101 on all servers where
cadvisoris deployed. - The server running Prometheus must have connectivity to port 59102 on the server where
kafka_exporteris deployed. - The server running Prometheus must have connectivity to port 6443 and 30686 on the k8s master server.
- The server running Grafana must have connectivity to port 9090 on the Prometheus server.
- If a reverse proxy for Grafana's address is configured:
- The proxy server needs to have connectivity to Grafana's port 3000.
- If Prometheus is also reverse-proxied, it must have connectivity to Prometheus server's port 9090.
Deployment of node_exporter
-
Download the node_exporter installation package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/node_exporter-1.9.1.linux-amd64.tar.gz -
Extract node_exporter
tar xf node_exporter-1.9.1.linux-amd64.tar.gz -C /usr/local/mv /usr/local/node_exporter-1.9.1.linux-amd64 /usr/local/node_exporter -
Write the systemd service file for node_exporter
cat > /etc/systemd/system/node_exporter.service <<'EOF'[Unit]Description=Node Exporter for PrometheusDocumentation=https://github.com/prometheus/node_exporterAfter=network.target[Service]Type=simpleExecStart=/usr/local/node_exporter/node_exporter --web.listen-address=:59100User=rootGroup=rootRestart=alwaysRestartSec=10LimitNOFILE=102400[Install]WantedBy=multi-user.targetEOF -
Start node_exporter
systemctl daemon-reloadsystemctl enable node_exportersystemctl start node_exporter
Deployment of cadvisor
-
Download
wget https://pdpublic.mingdao.com/private-deployment/offline/common/cadvisor-v0.52.1-linux-amd64 -
Create the cadvisor directory
mkdir /usr/local/cadvisor -
Move and add executable permission
mv cadvisor-v0.52.1-linux-amd64 /usr/local/cadvisor/cadvisorchmod +x /usr/local/cadvisor/cadvisor -
Write the systemd service file for cadvisor
cat > /etc/systemd/system/cadvisor.service <<'EOF'[Unit]Description=cAdvisor Container MonitoringDocumentation=https://github.com/google/cadvisorAfter=network.target[Service]Type=simpleExecStart=/usr/local/cadvisor/cadvisor -port=59101User=rootGroup=rootRestart=alwaysRestartSec=10LimitNOFILE=102400[Install]WantedBy=multi-user.targetEOF -
Start cadvisor
systemctl daemon-reloadsystemctl enable cadvisorsystemctl start cadvisor
Deployment of kafka_exporter
-
Download the installation package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/kafka_exporter-1.9.0.linux-amd64.tar.gz -
Extract to the installation directory
tar -zxvf kafka_exporter-1.9.0.linux-amd64.tar.gz -C /usr/local/mv /usr/local/kafka_exporter-1.9.0.linux-amd64 /usr/local/kafka_exporter -
Write the systemd service file for kafka_exporter
# Note to replace the --kafka.server parameter with your actual Kafka addresscat > /etc/systemd/system/kafka_exporter.service <<'EOF'[Unit]Description=Kafka Exporter for PrometheusDocumentation=https://github.com/danielqsj/kafka_exporterAfter=network.target[Service]Type=simpleExecStart=/usr/local/kafka_exporter/kafka_exporter --kafka.server=192.168.1.2:9092 --web.listen-address=:59102User=rootGroup=rootRestart=alwaysRestartSec=10LimitNOFILE=102400[Install]WantedBy=multi-user.targetEOF -
Start kafka_exporter
systemctl daemon-reloadsystemctl enable kafka_exportersystemctl start kafka_exporter
Deploy Kube-state-metrics
-
Download the image (all nodes in the k8s cluster need to download the image)
- Server with internet access
- Server without internet access
crictl pull registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0# Offline image file download link, upload to the deployment server after downloadingwget https://pdpublic.mingdao.com/private-deployment/offline/common/kube-state-metrics.tar.gz# Extract the image filegunzip -d kube-state-metrics.tar.gz# Import the offline imagectr -n k8s.io image import kube-state-metrics.tar -
Create directory for configuration files
mkdir -p /usr/local/kubernetes/ops-monitcd /usr/local/kubernetes/ops-monit -
Write deployment configuration files
cat > cluster-role-binding.yaml <<\EOFapiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: v2.3.0name: kube-state-metricsroleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: kube-state-metricssubjects:- kind: ServiceAccountname: kube-state-metricsnamespace: ops-monitEOFcat > cluster-role.yaml <<\EOFapiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: v2.3.0name: kube-state-metricsrules:- apiGroups:- ""resources:- configmaps- secrets- nodes- pods- services- resourcequotas- replicationcontrollers- limitranges- persistentvolumeclaims- persistentvolumes- namespaces- endpointsverbs:- list- watch- apiGroups:- extensionsresources:- daemonsets- deployments- replicasets- ingressesverbs:- list- watch- apiGroups:- appsresources:- statefulsets- daemonsets- deployments- replicasetsverbs:- list- watch- apiGroups:- batchresources:- cronjobs- jobsverbs:- list- watch- apiGroups:- autoscalingresources:- horizontalpodautoscalersverbs:- list- watch- apiGroups:- authentication.k8s.ioresources:- tokenreviewsverbs:- create- apiGroups:- authorization.k8s.ioresources:- subjectaccessreviewsverbs:- create- apiGroups:- policyresources:- poddisruptionbudgetsverbs:- list- watch- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequestsverbs:- list- watch- apiGroups:- storage.k8s.ioresources:- storageclasses- volumeattachmentsverbs:- list- watch- apiGroups:- admissionregistration.k8s.ioresources:- mutatingwebhookconfigurations- validatingwebhookconfigurationsverbs:- list- watch- apiGroups:- networking.k8s.ioresources:- networkpoliciesverbs:- list- watchEOFcat > deployment.yaml <<\EOFapiVersion: apps/v1kind: Deploymentmetadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: v2.3.0name: kube-state-metricsnamespace: ops-monitspec:replicas: 1selector:matchLabels:app.kubernetes.io/name: kube-state-metricstemplate:metadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: v2.3.0spec:containers:- image: registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0livenessProbe:httpGet:path: /healthzport: 8080initialDelaySeconds: 5timeoutSeconds: 5name: kube-state-metricsports:- containerPort: 8080name: http-metrics- containerPort: 8081name: telemetryreadinessProbe:httpGet:path: /port: 8081initialDelaySeconds: 5timeoutSeconds: 5nodeSelector:kubernetes.io/os: linuxserviceAccountName: kube-state-metricsEOFcat > service-account.yaml <<EOFapiVersion: v1kind: ServiceAccountmetadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: v2.3.0name: kube-state-metricsnamespace: ops-monitEOFcat > service.yaml <<\EOFapiVersion: v1kind: Servicemetadata:# annotations:# prometheus.io/scrape: 'true'labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: v2.3.0name: kube-state-metricsnamespace: ops-monitspec:ports:- name: http-metricsport: 8080targetPort: http-metricsnodePort: 30686- name: telemetryport: 8081targetPort: telemetrytype: NodePortselector:app.kubernetes.io/name: kube-state-metricsEOFcat > rbac.yaml <<\EOFapiVersion: v1kind: ServiceAccountmetadata:name: prometheusnamespace: kube-system---apiVersion: v1kind: Secrettype: kubernetes.io/service-account-tokenmetadata:name: prometheusnamespace: kube-systemannotations:kubernetes.io/service-account.name: "prometheus"---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata:name: prometheusrules:- apiGroups:- ""resources:- nodes- services- endpoints- pods- nodes/proxyverbs:- get- list- watch- apiGroups:- "extensions"resources:- ingressesverbs:- get- list- watch- apiGroups:- ""resources:- configmaps- nodes/metricsverbs:- get- nonResourceURLs:- /metricsverbs:- get---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:name: prometheusroleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: prometheussubjects:- kind: ServiceAccountname: prometheusnamespace: kube-systemEOF
4. Creating Namespace
kubectl create namespace ops-monit
5. Starting Monitoring Service
kubectl apply -f .
6. Retrieving Token
kubectl describe secret $(kubectl describe sa prometheus -n kube-system | sed -n '7p' | awk '{print $2}') -n kube-system | tail -n1 | awk '{print $2}'
- Copy the token content into the
/usr/local/prometheus/privatedeploy_kubernetes.tokenfile for the prometheus service.
Deploying Prometheus
-
Download the Prometheus installation package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/prometheus-3.5.0.linux-amd64.tar.gz -
Extract the package
tar -zxvf prometheus-3.5.0.linux-amd64.tar.gz -C /usr/local/mv /usr/local/prometheus-3.5.0.linux-amd64 /usr/local/prometheus -
Configure the
prometheus.ymlfileglobal:scrape_interval: 15sscrape_configs:# Server monitoring- job_name: "node_exporter"static_configs:- targets: ["192.168.10.20:59100"]labels:nodename: hap-nginx-01origin_prometheus: node- targets: ["192.168.10.21:59100"]labels:nodename: hap-k8s-service-01origin_prometheus: node- targets: ["192.168.10.2:59100"]labels:nodename: hap-k8s-service-02origin_prometheus: node- targets: ["192.168.10.3:59100"]labels:nodename: hap-middleware-01origin_prometheus: node- targets: ["192.168.10.3:59100"]labels:nodename: hap-db-01origin_prometheus: node# Docker monitoring- job_name: "cadvisor"static_configs:- targets:- 192.168.10.16:59101# Kafka monitoring- job_name: kafka_exporterstatic_configs:- targets: ["192.168.10.7:59102"]# K8s monitoring- job_name: privatedeploy_kubernetes_metricsstatic_configs:- targets: ["192.168.10.20:30686"] # Remember to replace with K8s main node addresslabels:origin_prometheus: kubernetes- job_name: 'privatedeploy_kubernetes_cadvisor'scheme: httpsmetrics_path: /metrics/cadvisortls_config:insecure_skip_verify: truebearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.tokenkubernetes_sd_configs:- role: nodeapi_server: https://192.168.10.20:6443 # Remember to replace with K8s main node addressbearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.tokentls_config:insecure_skip_verify: truerelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- target_label: __address__replacement: 192.168.10.20:6443 # Remember to replace with K8s main node address- target_label: origin_prometheusreplacement: kubernetes- source_labels: [__meta_kubernetes_node_name]target_label: __metrics_path__replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisormetric_relabel_configs:- source_labels: [instance]separator: ;regex: (.+)target_label: nodereplacement: $1action: replace- source_labels: [pod_name]separator: ;regex: (.+)target_label: podreplacement: $1action: replace- source_labels: [container_name]separator: ;regex: (.+)target_label: containerreplacement: $1action: replace -
Create the Prometheus systemd service file
cat > /etc/systemd/system/prometheus.service <<'EOF'[Unit]Description=Prometheus Monitoring SystemDocumentation=https://prometheus.io/docs/introduction/overview/After=network.target[Service]Type=simpleExecStart=/usr/local/prometheus/prometheus \--storage.tsdb.path=/data/prometheus/data \--storage.tsdb.retention.time=30d \--config.file=/usr/local/prometheus/prometheus.yml \--web.enable-lifecycleExecReload=/usr/bin/curl -X POST http://127.0.0.1:9090/-/reloadUser=rootGroup=rootRestart=alwaysRestartSec=10LimitNOFILE=102400[Install]WantedBy=multi-user.targetEOF -
Start Prometheus
systemctl daemon-reloadsystemctl enable prometheussystemctl start prometheus- When the Prometheus configuration is modified, you can hot reload with
systemctl reload prometheus.
- When the Prometheus configuration is modified, you can hot reload with
Deploying Grafana
-
Download the Grafana installation package
wget https://pdpublic.mingdao.com/private-deployment/offline/common/grafana_12.1.2_17957162798_linux_amd64.tar.gz -
Extract the package
tar -xf grafana_12.1.2_17957162798_linux_amd64.tar.gz -C /usr/local/mv /usr/local/grafana-12.1.2 /usr/local/grafana -
Modify the
root_urlvalue in the/usr/local/grafana/conf/defaults.inifile as followsroot_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/# One-click modificationsed -ri 's#^root_url = .*#root_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/#' /usr/local/grafana/conf/defaults.inigrep "^root_url" /usr/local/grafana/conf/defaults.ini -
Modify the
serve_from_sub_pathvalue in the/usr/local/grafana/conf/defaults.inifile as followsserve_from_sub_path = true# One-click modificationsed -ri 's#^serve_from_sub_path = .*#serve_from_sub_path = true#' /usr/local/grafana/conf/defaults.inigrep "^serve_from_sub_path" /usr/local/grafana/conf/defaults.ini- If you don't need to access the Grafana page through an Nginx proxy and instead use the Grafana IP directly, you should change the
domainvalue to the actual host.
- If you don't need to access the Grafana page through an Nginx proxy and instead use the Grafana IP directly, you should change the
-
Create the Grafana systemd service file
cat > /etc/systemd/system/grafana.service <<'EOF'[Unit]Description=Grafana DashboardDocumentation=https://grafana.com/docs/After=network.target[Service]Type=simpleWorkingDirectory=/usr/local/grafanaExecStart=/usr/local/grafana/bin/grafana-server webUser=rootGroup=rootRestart=alwaysRestartSec=10LimitNOFILE=102400[Install]WantedBy=multi-user.targetEOF -
Start Grafana
systemctl daemon-reloadsystemctl enable grafanasystemctl start grafana
Configuring Reverse Proxy for Grafana Address
The Nginx reverse proxy configuration file for proxying the Grafana page should follow the reference rules below
upstream grafana {
server 192.168.1.10:3000;
}
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80;
server_name hap.domain.com;
access_log /data/logs/weblogs/grafana.log main;
error_log /data/logs/weblogs/grafana.mingdao.net.error.log;
location /privatedeploy/mdy/monitor/grafana/ {
#allow 1.1.1.1;
#deny all;
proxy_hide_header X-Frame-Options;
proxy_set_header X-Frame-Options ALLOWALL;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
proxy_redirect http://localhost:3000 http://hap.domain.com:80/privatedeploy/mdy/monitor/grafana;
}
location /privatedeploy/mdy/monitor/grafana/api/live {
rewrite ^/(.*) /$1 break;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
}
}
-
Once the proxy configuration is complete, visit the Grafana page at http://hap.domain.com/privatedeploy/mdy/monitor/grafana, and then configure the dashboard.
-
If you want to proxy Prometheus, you can add the following rules (usually not needed because the Prometheus page lacks authentication, so pay attention to access security)
upstream prometheus {server 192.168.1.10:9090;}location /privatedeploy/mdy/monitor/prometheus {rewrite ^/privatedeploy/mdy/monitor/prometheus$ / break;rewrite ^/privatedeploy/mdy/monitor/prometheus/(.*)$ /$1 break;proxy_pass http://prometheus;proxy_redirect /graph /privatedeploy/mdy/monitor/prometheus/graph;}
-