Skip to main content

Deployment of Prometheus Monitoring System

Explanation of Monitoring System Components

Service NameService PortFunction
node_exporter59100Collect server runtime metric data
cadvisor59101Collect container runtime metric data
kafka_exporter59102Collect metrics of Kafka topics
kube-state-metrics30686Collect metrics of the k8s cluster
prometheus9090Collect and store monitoring data
grafana3000Visualize monitoring data

The node_exporter service needs to be deployed on every server.

The cadvisor service needs to be deployed only on servers running Docker, such as each node in the file cluster.

The kafka_exporter service only needs to be deployed on any one node within the Kafka cluster.

The kube-state-metrics needs to be deployed only on one node of the k8s master, but its image needs to be downloaded on each node of the k8s cluster.

Prometheus and Grafana can be deployed on the same server.

Network Port Connectivity Requirements:

  • The server running Prometheus must have connectivity to port 59100 on all servers where node_exporter is deployed.
  • The server running Prometheus must have connectivity to port 59101 on all servers where cadvisor is deployed.
  • The server running Prometheus must have connectivity to port 59102 on the server where kafka_exporter is deployed.
  • The server running Prometheus must have connectivity to port 6443 and 30686 on the k8s master server.
  • The server running Grafana must have connectivity to port 9090 on the Prometheus server.
  • If a reverse proxy for Grafana's address is configured:
    • The proxy server needs to have connectivity to Grafana's port 3000.
    • If Prometheus is also reverse-proxied, it must have connectivity to Prometheus server's port 9090.

Deployment of node_exporter

  1. Download the node_exporter installation package

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/node_exporter-1.9.1.linux-amd64.tar.gz
  2. Extract node_exporter

    tar xf node_exporter-1.9.1.linux-amd64.tar.gz -C /usr/local/
    mv /usr/local/node_exporter-1.9.1.linux-amd64 /usr/local/node_exporter
  3. Write the systemd service file for node_exporter

    cat > /etc/systemd/system/node_exporter.service <<'EOF'
    [Unit]
    Description=Node Exporter for Prometheus
    Documentation=https://github.com/prometheus/node_exporter
    After=network.target

    [Service]
    Type=simple
    ExecStart=/usr/local/node_exporter/node_exporter --web.listen-address=:59100
    User=root
    Group=root
    Restart=always
    RestartSec=10
    LimitNOFILE=102400

    [Install]
    WantedBy=multi-user.target
    EOF
  4. Start node_exporter

    systemctl daemon-reload
    systemctl enable node_exporter
    systemctl start node_exporter

Deployment of cadvisor

  1. Download

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/cadvisor-v0.52.1-linux-amd64
  2. Create the cadvisor directory

    mkdir /usr/local/cadvisor
  3. Move and add executable permission

    mv cadvisor-v0.52.1-linux-amd64 /usr/local/cadvisor/cadvisor
    chmod +x /usr/local/cadvisor/cadvisor
  4. Write the systemd service file for cadvisor

    cat > /etc/systemd/system/cadvisor.service <<'EOF'
    [Unit]
    Description=cAdvisor Container Monitoring
    Documentation=https://github.com/google/cadvisor
    After=network.target

    [Service]
    Type=simple
    ExecStart=/usr/local/cadvisor/cadvisor -port=59101
    User=root
    Group=root
    Restart=always
    RestartSec=10
    LimitNOFILE=102400

    [Install]
    WantedBy=multi-user.target
    EOF
  5. Start cadvisor

    systemctl daemon-reload
    systemctl enable cadvisor
    systemctl start cadvisor

Deployment of kafka_exporter

  1. Download the installation package

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/kafka_exporter-1.9.0.linux-amd64.tar.gz
  2. Extract to the installation directory

    tar -zxvf kafka_exporter-1.9.0.linux-amd64.tar.gz -C /usr/local/
    mv /usr/local/kafka_exporter-1.9.0.linux-amd64 /usr/local/kafka_exporter
  3. Write the systemd service file for kafka_exporter

    # Note to replace the --kafka.server parameter with your actual Kafka address
    cat > /etc/systemd/system/kafka_exporter.service <<'EOF'
    [Unit]
    Description=Kafka Exporter for Prometheus
    Documentation=https://github.com/danielqsj/kafka_exporter
    After=network.target

    [Service]
    Type=simple
    ExecStart=/usr/local/kafka_exporter/kafka_exporter --kafka.server=192.168.1.2:9092 --web.listen-address=:59102
    User=root
    Group=root
    Restart=always
    RestartSec=10
    LimitNOFILE=102400

    [Install]
    WantedBy=multi-user.target
    EOF
  4. Start kafka_exporter

    systemctl daemon-reload
    systemctl enable kafka_exporter
    systemctl start kafka_exporter

Deploy Kube-state-metrics

  1. Download the image (all nodes in the k8s cluster need to download the image)

    crictl pull registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0
  2. Create directory for configuration files

    mkdir -p /usr/local/kubernetes/ops-monit
    cd /usr/local/kubernetes/ops-monit
  3. Write deployment configuration files

    cat > cluster-role-binding.yaml <<\EOF
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    name: kube-state-metrics
    roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: kube-state-metrics
    subjects:
    - kind: ServiceAccount
    name: kube-state-metrics
    namespace: ops-monit
    EOF

    cat > cluster-role.yaml <<\EOF
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    name: kube-state-metrics
    rules:
    - apiGroups:
    - ""
    resources:
    - configmaps
    - secrets
    - nodes
    - pods
    - services
    - resourcequotas
    - replicationcontrollers
    - limitranges
    - persistentvolumeclaims
    - persistentvolumes
    - namespaces
    - endpoints
    verbs:
    - list
    - watch
    - apiGroups:
    - extensions
    resources:
    - daemonsets
    - deployments
    - replicasets
    - ingresses
    verbs:
    - list
    - watch
    - apiGroups:
    - apps
    resources:
    - statefulsets
    - daemonsets
    - deployments
    - replicasets
    verbs:
    - list
    - watch
    - apiGroups:
    - batch
    resources:
    - cronjobs
    - jobs
    verbs:
    - list
    - watch
    - apiGroups:
    - autoscaling
    resources:
    - horizontalpodautoscalers
    verbs:
    - list
    - watch
    - apiGroups:
    - authentication.k8s.io
    resources:
    - tokenreviews
    verbs:
    - create
    - apiGroups:
    - authorization.k8s.io
    resources:
    - subjectaccessreviews
    verbs:
    - create
    - apiGroups:
    - policy
    resources:
    - poddisruptionbudgets
    verbs:
    - list
    - watch
    - apiGroups:
    - certificates.k8s.io
    resources:
    - certificatesigningrequests
    verbs:
    - list
    - watch
    - apiGroups:
    - storage.k8s.io
    resources:
    - storageclasses
    - volumeattachments
    verbs:
    - list
    - watch
    - apiGroups:
    - admissionregistration.k8s.io
    resources:
    - mutatingwebhookconfigurations
    - validatingwebhookconfigurations
    verbs:
    - list
    - watch
    - apiGroups:
    - networking.k8s.io
    resources:
    - networkpolicies
    verbs:
    - list
    - watch
    EOF

    cat > deployment.yaml <<\EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    name: kube-state-metrics
    namespace: ops-monit
    spec:
    replicas: 1
    selector:
    matchLabels:
    app.kubernetes.io/name: kube-state-metrics
    template:
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    spec:
    containers:
    - image: registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0
    livenessProbe:
    httpGet:
    path: /healthz
    port: 8080
    initialDelaySeconds: 5
    timeoutSeconds: 5
    name: kube-state-metrics
    ports:
    - containerPort: 8080
    name: http-metrics
    - containerPort: 8081
    name: telemetry
    readinessProbe:
    httpGet:
    path: /
    port: 8081
    initialDelaySeconds: 5
    timeoutSeconds: 5
    nodeSelector:
    kubernetes.io/os: linux
    serviceAccountName: kube-state-metrics
    EOF

    cat > service-account.yaml <<EOF
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    name: kube-state-metrics
    namespace: ops-monit
    EOF

    cat > service.yaml <<\EOF
    apiVersion: v1
    kind: Service
    metadata:
    # annotations:
    # prometheus.io/scrape: 'true'
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    name: kube-state-metrics
    namespace: ops-monit
    spec:
    ports:
    - name: http-metrics
    port: 8080
    targetPort: http-metrics
    nodePort: 30686
    - name: telemetry
    port: 8081
    targetPort: telemetry
    type: NodePort
    selector:
    app.kubernetes.io/name: kube-state-metrics
    EOF

    cat > rbac.yaml <<\EOF
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: prometheus
    namespace: kube-system

    ---
    apiVersion: v1
    kind: Secret
    type: kubernetes.io/service-account-token
    metadata:
    name: prometheus
    namespace: kube-system
    annotations:
    kubernetes.io/service-account.name: "prometheus"
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
    name: prometheus
    rules:
    - apiGroups:
    - ""
    resources:
    - nodes
    - services
    - endpoints
    - pods
    - nodes/proxy
    verbs:
    - get
    - list
    - watch
    - apiGroups:
    - "extensions"
    resources:
    - ingresses
    verbs:
    - get
    - list
    - watch
    - apiGroups:
    - ""
    resources:
    - configmaps
    - nodes/metrics
    verbs:
    - get
    - nonResourceURLs:
    - /metrics
    verbs:
    - get
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
    name: prometheus
    roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: prometheus
    subjects:
    - kind: ServiceAccount
    name: prometheus
    namespace: kube-system
    EOF

4. Creating Namespace

kubectl create namespace ops-monit

5. Starting Monitoring Service

kubectl apply -f .

6. Retrieving Token

kubectl describe secret $(kubectl describe sa prometheus -n kube-system | sed -n '7p' | awk '{print $2}') -n kube-system | tail -n1 | awk '{print $2}'
  • Copy the token content into the /usr/local/prometheus/privatedeploy_kubernetes.token file for the prometheus service.

Deploying Prometheus

  1. Download the Prometheus installation package

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/prometheus-3.5.0.linux-amd64.tar.gz
  2. Extract the package

    tar -zxvf prometheus-3.5.0.linux-amd64.tar.gz -C /usr/local/
    mv /usr/local/prometheus-3.5.0.linux-amd64 /usr/local/prometheus
  3. Configure the prometheus.yml file

    global:
    scrape_interval: 15s

    scrape_configs:
    # Server monitoring
    - job_name: "node_exporter"
    static_configs:
    - targets: ["192.168.10.20:59100"]
    labels:
    nodename: hap-nginx-01
    origin_prometheus: node
    - targets: ["192.168.10.21:59100"]
    labels:
    nodename: hap-k8s-service-01
    origin_prometheus: node
    - targets: ["192.168.10.2:59100"]
    labels:
    nodename: hap-k8s-service-02
    origin_prometheus: node
    - targets: ["192.168.10.3:59100"]
    labels:
    nodename: hap-middleware-01
    origin_prometheus: node
    - targets: ["192.168.10.3:59100"]
    labels:
    nodename: hap-db-01
    origin_prometheus: node

    # Docker monitoring
    - job_name: "cadvisor"
    static_configs:
    - targets:
    - 192.168.10.16:59101

    # Kafka monitoring
    - job_name: kafka_exporter
    static_configs:
    - targets: ["192.168.10.7:59102"]

    # K8s monitoring
    - job_name: privatedeploy_kubernetes_metrics
    static_configs:
    - targets: ["192.168.10.20:30686"] # Remember to replace with K8s main node address
    labels:
    origin_prometheus: kubernetes

    - job_name: 'privatedeploy_kubernetes_cadvisor'
    scheme: https
    metrics_path: /metrics/cadvisor
    tls_config:
    insecure_skip_verify: true
    bearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.token
    kubernetes_sd_configs:
    - role: node
    api_server: https://192.168.10.20:6443 # Remember to replace with K8s main node address
    bearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.token
    tls_config:
    insecure_skip_verify: true
    relabel_configs:
    - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
    - target_label: __address__
    replacement: 192.168.10.20:6443 # Remember to replace with K8s main node address
    - target_label: origin_prometheus
    replacement: kubernetes
    - source_labels: [__meta_kubernetes_node_name]
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    metric_relabel_configs:
    - source_labels: [instance]
    separator: ;
    regex: (.+)
    target_label: node
    replacement: $1
    action: replace
    - source_labels: [pod_name]
    separator: ;
    regex: (.+)
    target_label: pod
    replacement: $1
    action: replace
    - source_labels: [container_name]
    separator: ;
    regex: (.+)
    target_label: container
    replacement: $1
    action: replace

  4. Create the Prometheus systemd service file

    cat > /etc/systemd/system/prometheus.service <<'EOF'
    [Unit]
    Description=Prometheus Monitoring System
    Documentation=https://prometheus.io/docs/introduction/overview/
    After=network.target

    [Service]
    Type=simple
    ExecStart=/usr/local/prometheus/prometheus \
    --storage.tsdb.path=/data/prometheus/data \
    --storage.tsdb.retention.time=30d \
    --config.file=/usr/local/prometheus/prometheus.yml \
    --web.enable-lifecycle
    ExecReload=/usr/bin/curl -X POST http://127.0.0.1:9090/-/reload
    User=root
    Group=root
    Restart=always
    RestartSec=10
    LimitNOFILE=102400

    [Install]
    WantedBy=multi-user.target
    EOF
  5. Start Prometheus

    systemctl daemon-reload
    systemctl enable prometheus
    systemctl start prometheus
    • When the Prometheus configuration is modified, you can hot reload with systemctl reload prometheus.

Deploying Grafana

  1. Download the Grafana installation package

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/grafana_12.1.2_17957162798_linux_amd64.tar.gz
  2. Extract the package

    tar -xf grafana_12.1.2_17957162798_linux_amd64.tar.gz -C /usr/local/
    mv /usr/local/grafana-12.1.2 /usr/local/grafana
  3. Modify the root_url value in the /usr/local/grafana/conf/defaults.ini file as follows

    root_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/

    # One-click modification
    sed -ri 's#^root_url = .*#root_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/#' /usr/local/grafana/conf/defaults.ini
    grep "^root_url" /usr/local/grafana/conf/defaults.ini
  4. Modify the serve_from_sub_path value in the /usr/local/grafana/conf/defaults.ini file as follows

    serve_from_sub_path = true

    # One-click modification
    sed -ri 's#^serve_from_sub_path = .*#serve_from_sub_path = true#' /usr/local/grafana/conf/defaults.ini
    grep "^serve_from_sub_path" /usr/local/grafana/conf/defaults.ini
    • If you don't need to access the Grafana page through an Nginx proxy and instead use the Grafana IP directly, you should change the domain value to the actual host.
  5. Create the Grafana systemd service file

    cat > /etc/systemd/system/grafana.service <<'EOF'
    [Unit]
    Description=Grafana Dashboard
    Documentation=https://grafana.com/docs/
    After=network.target

    [Service]
    Type=simple
    WorkingDirectory=/usr/local/grafana
    ExecStart=/usr/local/grafana/bin/grafana-server web
    User=root
    Group=root
    Restart=always
    RestartSec=10
    LimitNOFILE=102400

    [Install]
    WantedBy=multi-user.target
    EOF
  6. Start Grafana

    systemctl daemon-reload
    systemctl enable grafana
    systemctl start grafana

Configuring Reverse Proxy for Grafana Address

The Nginx reverse proxy configuration file for proxying the Grafana page should follow the reference rules below

upstream grafana {
server 192.168.1.10:3000;
}

map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}

server {
listen 80;
server_name hap.domain.com;
access_log /data/logs/weblogs/grafana.log main;
error_log /data/logs/weblogs/grafana.mingdao.net.error.log;

location /privatedeploy/mdy/monitor/grafana/ {
#allow 1.1.1.1;
#deny all;
proxy_hide_header X-Frame-Options;
proxy_set_header X-Frame-Options ALLOWALL;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
proxy_redirect http://localhost:3000 http://hap.domain.com:80/privatedeploy/mdy/monitor/grafana;
}

location /privatedeploy/mdy/monitor/grafana/api/live {
rewrite ^/(.*) /$1 break;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
}
}
  • Once the proxy configuration is complete, visit the Grafana page at http://hap.domain.com/privatedeploy/mdy/monitor/grafana, and then configure the dashboard.

    • If you want to proxy Prometheus, you can add the following rules (usually not needed because the Prometheus page lacks authentication, so pay attention to access security)

      upstream prometheus {
      server 192.168.1.10:9090;
      }

      location /privatedeploy/mdy/monitor/prometheus {
      rewrite ^/privatedeploy/mdy/monitor/prometheus$ / break;
      rewrite ^/privatedeploy/mdy/monitor/prometheus/(.*)$ /$1 break;
      proxy_pass http://prometheus;
      proxy_redirect /graph /privatedeploy/mdy/monitor/prometheus/graph;
      }