Skip to main content

Deployment of Prometheus Monitoring System

Description of Monitoring System Components

Service NameService PortFunction
node_exporter59100Collecting various runtime metrics of the server
cadvisor59101Collecting various runtime metrics of containers
kafka_exporter59102Collecting various metrics of Kafka topics
kube-state-metrics30686Collecting various metrics of the k8s cluster
prometheus9090Collecting and storing monitoring data
grafana3000Visualizing monitoring data

The node_exporter service needs to be deployed on each server.

The cadvisor service needs to be deployed only on servers running Docker, such as each node in a file cluster.

The kafka_exporter service only needs to be deployed on any single node within the Kafka cluster.

The kube-state-metrics only needs to be deployed on one node of the k8s master, but the image must be downloaded on all nodes within the k8s cluster.

Prometheus and Grafana can be deployed on the same server.

Network port connectivity requirements:

  • The server where Prometheus is located should have smooth access to port 59100 on all servers where node_exporter is deployed.
  • The server where Prometheus is located should have smooth access to port 59101 on all servers where cadvisor is deployed.
  • The server where Prometheus is located should have smooth access to port 59102 where the kafka_exporter server is deployed.
  • The server where Prometheus is located should have smooth access to port 6443 and 30686 on the k8s master server.
  • The server where Grafana is located should have smooth access to port 9090 on the Prometheus server.
  • If the Grafana address is configured with a reverse proxy, then:
    • The proxy server needs to have smooth access to port 3000 of Grafana.
    • If reverse proxy for Prometheus is configured, it also needs smooth access to port 9090 on the Prometheus server.

Deploy Node_exporter

  1. Download the node_exporter installation package

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/node_exporter-1.3.1.linux-amd64.tar.gz
  2. Extract the node_exporter

    tar xf node_exporter-1.3.1.linux-amd64.tar.gz -C /usr/local/
    mv /usr/local/node_exporter-1.3.1.linux-amd64 /usr/local/node_exporter
  3. Configure start/stop scripts

    cat > /usr/local/node_exporter/start_node_exporter.sh <<EOF
    nohup /usr/local/node_exporter/node_exporter --web.listen-address=:59100 &
    EOF

    cat > /usr/local/node_exporter/stop_node_exporter.sh <<EOF
    kill \$(pgrep -f '/usr/local/node_exporter/node_exporter --web.listen-address=:59100')
    EOF

    chmod +x /usr/local/node_exporter/start_node_exporter.sh
    chmod +x /usr/local/node_exporter/stop_node_exporter.sh
  4. Start node_exporter

    cd /usr/local/node_exporter/
    bash start_node_exporter.sh
  5. Add to startup

    echo "cd /usr/local/node_exporter/ && /bin/bash start_node_exporter.sh" >> /etc/rc.local
    chmod +x /etc/rc.local

Deploy Cadvisor

  1. Download

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/cadvisor-v0.47.0-linux-amd64
  2. Create the cadvisor directory

    mkdir /usr/local/cadvisor
  3. Move and add executable permissions

    mv cadvisor-v0.47.0-linux-amd64 /usr/local/cadvisor/cadvisor
    chmod +x /usr/local/cadvisor/cadvisor
  4. Write start and stop scripts

    cat > /usr/local/cadvisor/start_cadvisor.sh <<EOF
    nohup /usr/local/cadvisor/cadvisor -port 59101 &
    EOF

    cat > /usr/local/cadvisor/stop_cadvisor.sh <<EOF
    kill \$(pgrep -f '/usr/local/cadvisor/cadvisor')
    EOF

    chmod +x /usr/local/cadvisor/start_cadvisor.sh
    chmod +x /usr/local/cadvisor/stop_cadvisor.sh
  5. Start cadvisor

    cd /usr/local/cadvisor
    bash start_cadvisor.sh
  6. Add to startup

    echo "cd /usr/local/cadvisor && /bin/bash start_cadvisor.sh" >> /etc/rc.local
    chmod +x /etc/rc.local

Deploy Kafka_exporter

  1. Download the installation package

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/kafka_exporter-1.4.2.linux-amd64.tar.gz
  2. Extract to the installation directory

    tar -zxvf kafka_exporter-1.4.2.linux-amd64.tar.gz -C /usr/local/
    mv /usr/local/kafka_exporter-1.4.2.linux-amd64 /usr/local/kafka_exporter
  3. Add management scripts

    # Note to replace the Kafka service address with the actual IP
    cat > /usr/local/kafka_exporter/start_kafka_exporter.sh <<EOF
    nohup /usr/local/kafka_exporter/kafka_exporter --kafka.server=192.168.1.2:9092 --web.listen-address=:59102 &
    EOF

    cat > /usr/local/kafka_exporter/stop_kafka_exporter.sh <<EOF
    kill \$(pgrep -f '/usr/local/kafka_exporter/kafka_exporter')
    EOF

    chmod +x /usr/local/kafka_exporter/start_kafka_exporter.sh
    chmod +x /usr/local/kafka_exporter/stop_kafka_exporter.sh
  4. Start the service

    cd /usr/local/kafka_exporter/ && bash start_kafka_exporter.sh
  5. Add to startup

    echo "sleep 60; cd /usr/local/kafka_exporter/ && bash start_kafka_exporter.sh" >> /etc/rc.local
    chmod +x /etc/rc.local

Deploy Kube-state-metrics

  1. Download the image (all nodes in the k8s cluster need to download the image)

    crictl pull registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0
  2. Create directory for configuration files

    mkdir -p /usr/local/kubernetes/ops-monit
    cd /usr/local/kubernetes/ops-monit
  3. Write deployment configuration files

    cat > cluster-role-binding.yaml <<\EOF
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    name: kube-state-metrics
    roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: kube-state-metrics
    subjects:
    - kind: ServiceAccount
    name: kube-state-metrics
    namespace: ops-monit
    EOF

    cat > cluster-role.yaml <<\EOF
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    name: kube-state-metrics
    rules:
    - apiGroups:
    - ""
    resources:
    - configmaps
    - secrets
    - nodes
    - pods
    - services
    - resourcequotas
    - replicationcontrollers
    - limitranges
    - persistentvolumeclaims
    - persistentvolumes
    - namespaces
    - endpoints
    verbs:
    - list
    - watch
    - apiGroups:
    - extensions
    resources:
    - daemonsets
    - deployments
    - replicasets
    - ingresses
    verbs:
    - list
    - watch
    - apiGroups:
    - apps
    resources:
    - statefulsets
    - daemonsets
    - deployments
    - replicasets
    verbs:
    - list
    - watch
    - apiGroups:
    - batch
    resources:
    - cronjobs
    - jobs
    verbs:
    - list
    - watch
    - apiGroups:
    - autoscaling
    resources:
    - horizontalpodautoscalers
    verbs:
    - list
    - watch
    - apiGroups:
    - authentication.k8s.io
    resources:
    - tokenreviews
    verbs:
    - create
    - apiGroups:
    - authorization.k8s.io
    resources:
    - subjectaccessreviews
    verbs:
    - create
    - apiGroups:
    - policy
    resources:
    - poddisruptionbudgets
    verbs:
    - list
    - watch
    - apiGroups:
    - certificates.k8s.io
    resources:
    - certificatesigningrequests
    verbs:
    - list
    - watch
    - apiGroups:
    - storage.k8s.io
    resources:
    - storageclasses
    - volumeattachments
    verbs:
    - list
    - watch
    - apiGroups:
    - admissionregistration.k8s.io
    resources:
    - mutatingwebhookconfigurations
    - validatingwebhookconfigurations
    verbs:
    - list
    - watch
    - apiGroups:
    - networking.k8s.io
    resources:
    - networkpolicies
    verbs:
    - list
    - watch
    EOF

    cat > deployment.yaml <<\EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    name: kube-state-metrics
    namespace: ops-monit
    spec:
    replicas: 1
    selector:
    matchLabels:
    app.kubernetes.io/name: kube-state-metrics
    template:
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    spec:
    containers:
    - image: registry.cn-hangzhou.aliyuncs.com/mdpublic/kube-state-metrics:2.3.0
    livenessProbe:
    httpGet:
    path: /healthz
    port: 8080
    initialDelaySeconds: 5
    timeoutSeconds: 5
    name: kube-state-metrics
    ports:
    - containerPort: 8080
    name: http-metrics
    - containerPort: 8081
    name: telemetry
    readinessProbe:
    httpGet:
    path: /
    port: 8081
    initialDelaySeconds: 5
    timeoutSeconds: 5
    nodeSelector:
    kubernetes.io/os: linux
    serviceAccountName: kube-state-metrics
    EOF

    cat > service-account.yaml <<EOF
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    name: kube-state-metrics
    namespace: ops-monit
    EOF

    cat > service.yaml <<\EOF
    apiVersion: v1
    kind: Service
    metadata:
    # annotations:
    # prometheus.io/scrape: 'true'
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.3.0
    name: kube-state-metrics
    namespace: ops-monit
    spec:
    ports:
    - name: http-metrics
    port: 8080
    targetPort: http-metrics
    nodePort: 30686
    - name: telemetry
    port: 8081
    targetPort: telemetry
    type: NodePort
    selector:
    app.kubernetes.io/name: kube-state-metrics
    EOF

    # 创建cadvisor所需kubernetes用户
    cat > rbac.yaml <<\EOF
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: prometheus
    namespace: kube-system

    ---
    # 创建prometheus关联的Secret(1.24版本开始要手动关联)
    apiVersion: v1
    kind: Secret
    type: kubernetes.io/service-account-token
    metadata:
    name: prometheus
    namespace: kube-system
    annotations:
    kubernetes.io/service-account.name: "prometheus"
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
    name: prometheus
    rules:
    - apiGroups:
    - ""
    resources:
    - nodes
    - services
    - endpoints
    - pods
    - nodes/proxy
    verbs:
    - get
    - list
    - watch
    - apiGroups:
    - "extensions"
    resources:
    - ingresses
    verbs:
    - get
    - list
    - watch
    - apiGroups:
    - ""
    resources:
    - configmaps
    - nodes/metrics
    verbs:
    - get
    - nonResourceURLs:
    - /metrics
    verbs:
    - get
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
    name: prometheus
    roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: prometheus
    subjects:
    - kind: ServiceAccount
    name: prometheus
    namespace: kube-system
    EOF

4. Creating Namespace

kubectl create namespace ops-monit

5. Starting Monitoring Service

kubectl apply -f .

6. Retrieving Token

kubectl describe secret $(kubectl describe sa prometheus -n kube-system | sed -n '7p' | awk '{print $2}') -n kube-system | tail -n1 | awk '{print $2}'
  • Copy the token content into the /usr/local/prometheus/privatedeploy_kubernetes.token file for the prometheus service.

Deploying Prometheus

  1. Download the prometheus package

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/prometheus-2.32.1.linux-amd64.tar.gz
  2. Extract the package

    tar -zxvf prometheus-2.32.1.linux-amd64.tar.gz -C /usr/local/
    mv /usr/local/prometheus-2.32.1.linux-amd64 /usr/local/prometheus
  3. Configure the prometheus.yml file

    global:
    scrape_interval: 15s

    scrape_configs:
    # Server Monitoring
    - job_name: "node_exporter"
    static_configs:
    - targets: ["192.168.10.20:59100"]
    labels:
    nodename: service01
    origin_prometheus: node
    - targets: ["192.168.10.21:59100"]
    labels:
    nodename: service02
    origin_prometheus: node
    - targets: ["192.168.10.2:59100"]
    labels:
    nodename: db01
    origin_prometheus: node
    - targets: ["192.168.10.3:59100"]
    labels:
    nodename: db02
    origin_prometheus: node

    # Container Monitoring
    - job_name: "cadvisor"
    static_configs:
    - targets:
    - 192.168.10.16:59101
    - 192.168.10.17:59101
    - 192.168.10.18:59101
    - 192.168.10.19:59101

    # Kafka Monitoring
    - job_name: kafka_exporter
    static_configs:
    - targets: ["192.168.10.7:59102"]

    # K8s Monitoring
    - job_name: privatedeploy_kubernetes_metrics
    static_configs:
    - targets: ["192.168.10.20:30686"] # Remember to replace with k8s master node address
    labels:
    origin_prometheus: kubernetes

    - job_name: 'privatedeploy_kubernetes_cadvisor'
    scheme: https
    metrics_path: /metrics/cadvisor
    tls_config:
    insecure_skip_verify: true
    bearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.token
    kubernetes_sd_configs:
    - role: node
    api_server: https://192.168.10.20:6443 # Remember to replace with k8s master node address
    bearer_token_file: /usr/local/prometheus/privatedeploy_kubernetes.token
    tls_config:
    insecure_skip_verify: true
    relabel_configs:
    - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
    - target_label: __address__
    replacement: 192.168.10.20:6443 # Remember to replace with k8s master node address
    - target_label: origin_prometheus
    replacement: kubernetes
    - source_labels: [__meta_kubernetes_node_name]
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    metric_relabel_configs:
    - source_labels: [instance]
    separator: ;
    regex: (.+)
    target_label: node
    replacement: $1
    action: replace
    - source_labels: [pod_name]
    separator: ;
    regex: (.+)
    target_label: pod
    replacement: $1
    action: replace
    - source_labels: [container_name]
    separator: ;
    regex: (.+)
    target_label: container
    replacement: $1
    action: replace

  4. Configure start/stop scripts

    cat > /usr/local/prometheus/start_prometheus.sh <<EOF
    nohup /usr/local/prometheus/prometheus --storage.tsdb.path=/data/prometheus/data --storage.tsdb.retention.time=30d --config.file=/usr/local/prometheus/prometheus.yml --web.enable-lifecycle &
    EOF

    cat > /usr/local/prometheus/stop_prometheus.sh <<EOF
    kill \$(pgrep -f '/usr/local/prometheus/prometheus')
    EOF

    cat > /usr/local/prometheus/reload_prometheus.sh <<EOF
    curl -X POST http://127.0.0.1:9090/-/reload
    EOF

    chmod +x /usr/local/prometheus/start_prometheus.sh
    chmod +x /usr/local/prometheus/stop_prometheus.sh
    chmod +x /usr/local/prometheus/reload_prometheus.sh
  5. Start prometheus

    cd /usr/local/prometheus/
    bash start_prometheus.sh
  6. Add to startup

    echo "cd /usr/local/prometheus/ && /bin/bash start_prometheus.sh" >> /etc/rc.local
    chmod +x /etc/rc.local

Deploying Grafana

  1. Download the grafana package

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/grafana-10.1.1.linux-amd64.tar.gz
  2. Extract the package

    tar -xf grafana-10.1.1.linux-amd64.tar.gz -C /usr/local/
    mv /usr/local/grafana-10.1.1 /usr/local/grafana
  3. Configure start/stop scripts

    cat > /usr/local/grafana/start_grafana.sh <<EOF
    cd /usr/local/grafana && nohup ./bin/grafana-server web &
    EOF

    cat > /usr/local/grafana/stop_grafana.sh <<EOF
    kill \$(pgrep -f 'grafana server web')
    EOF

    chmod +x /usr/local/grafana/start_grafana.sh
    chmod +x /usr/local/grafana/stop_grafana.sh
  4. Modify the root_url value in the /usr/local/grafana/conf/defaults.ini file as follows

    root_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/

    # Modify with one command
    sed -ri 's#^root_url = .*#root_url = %(protocol)s://%(domain)s:%(http_port)s/privatedeploy/mdy/monitor/grafana/#' /usr/local/grafana/conf/defaults.ini
    grep "^root_url" /usr/local/grafana/conf/defaults.ini
  5. Modify the serve_from_sub_path value in the /usr/local/grafana/conf/defaults.ini file as follows

    serve_from_sub_path = true

    # Modify with one command
    sed -ri 's#^serve_from_sub_path = .*#serve_from_sub_path = true#' /usr/local/grafana/conf/defaults.ini
    grep "^serve_from_sub_path" /usr/local/grafana/conf/defaults.ini
    • If you do not need to access the grafana page through an nginx proxy and prefer to use grafana's IP directly, adjust the domain value to the actual host.
  6. Start grafana

    cd /usr/local/grafana/
    bash start_grafana.sh
  7. Add to startup

    echo "cd /usr/local/grafana/ && /bin/bash start_grafana.sh" >> /etc/rc.local
    chmod +x /etc/rc.local

Configuring Reverse Proxy for Grafana Address

nginx reverse proxy configuration file, refer to the following rules to configure the proxy for the grafana page

upstream grafana {
server 192.168.1.10:3000;
}

map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}

server {
listen 80;
server_name hap.domain.com;
access_log /data/logs/weblogs/grafana.log main;
error_log /data/logs/weblogs/grafana.mingdao.net.error.log;

location /privatedeploy/mdy/monitor/grafana/ {
#allow 1.1.1.1;
#deny all;
proxy_hide_header X-Frame-Options;
proxy_set_header X-Frame-Options ALLOWALL;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
proxy_redirect http://localhost:3000 http://hap.domain.com:80/privatedeploy/mdy/monitor/grafana;
}

location /privatedeploy/mdy/monitor/grafana/api/live {
rewrite ^/(.*) /$1 break;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $http_host;
proxy_pass http://grafana;
}
}
  • Once the proxy is configured, you can access the grafana page via http://hap.domain.com/privatedeploy/mdy/monitor/grafana and then configure the dashboard

    • To proxy prometheus, add the following rules (usually not needed as the prometheus page lacks authentication, be cautious about access security)

      upstream prometheus {
      server 192.168.1.10:9090;
      }

      location /privatedeploy/mdy/monitor/prometheus {
      rewrite ^/privatedeploy/mdy/monitor/prometheus$ / break;
      rewrite ^/privatedeploy/mdy/monitor/prometheus/(.*)$ /$1 break;
      proxy_pass http://prometheus;
      proxy_redirect /graph /privatedeploy/mdy/monitor/prometheus/graph;
      }