Skip to main content

MySQL MGR Cluster

MGR (MySQL Group Replication) provides multi-node state synchronization, supporting automatic member management and fault detection. It is recommended as the primary HA solution for production environments.

This guide describes how to build a Single-Primary mode cluster based on native Group Replication across three nodes and configure MySQL Router for transparent access.

Cluster Node Planning

Server IPHost RoleServices Deployed
10.206.0.2Primary (Bootstrap Node)MySQL, MySQL Router, MySQL Shell
10.206.0.3SecondaryMySQL, MySQL Router, MySQL Shell
10.206.0.4SecondaryMySQL, MySQL Router, MySQL Shell

MySQL Service Deployment

Deployment Instructions

This section covers the steps for deploying MySQL via binary packages. To ensure basic cluster consistency, the following steps must be fully executed on each node in the cluster.

  1. Download Installation Package

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/mysql-8.0.45-linux-glibc2.17-x86_64.tar.xz
  2. Extract Package and Move to Installation Path

    tar -xvf mysql-8.0.45-linux-glibc2.17-x86_64.tar.xz
    mv mysql-8.0.45-linux-glibc2.17-x86_64 /usr/local/mysql
  3. Add MySQL to PATH Environment Variable

    echo 'export PATH=/usr/local/mysql/bin:$PATH' > /etc/profile.d/mysql.sh
    source /etc/profile.d/mysql.sh
  4. Create System User and Related Directories

    useradd -U -r -s /sbin/nologin mysql
    mkdir -p /data/mysql
    mkdir -p /data/logs/mysql
    chown -R mysql:mysql /data/mysql /data/logs/mysql /usr/local/mysql/
  5. Create /etc/my.cnf Configuration File

    Key Configuration Points

    Before writing the following content to the server, ensure you modify these parameters based on the actual environment of the current node:

    • server-id: Must be unique for each node; it is recommended to increment sequentially (e.g., 1, 2, 3).
    • report_host: Enter the actual intranet IP address of the current server.
    • group_replication_local_address: Enter the actual intranet IP of the current server (default port is 33061).
    • group_replication_group_seeds: Enter the list of IPs and ports for all three nodes in the cluster.
    cat > /etc/my.cnf <<'EOF'
    [mysqld]
    user = mysql
    basedir = /usr/local/mysql
    datadir = /data/mysql
    socket = /usr/local/mysql/mysqld.sock
    pid-file = /usr/local/mysql/mysqld.pid
    log-error = /data/logs/mysql/mysqld.log

    server-id = 1
    bind-address = 0.0.0.0
    port = 3306
    mysqlx_port = 33060
    skip-name-resolve = ON
    max_connections = 5000
    default_storage_engine = InnoDB
    innodb_buffer_pool_size = 2G
    character_set_server = utf8mb4
    collation_server = utf8mb4_0900_ai_ci
    slow_query_log = 1
    slow_query_log_file = /data/logs/mysql/mysql-slow.log
    long_query_time = 1
    log_bin = /data/mysql/mysql-bin
    binlog_format = ROW
    sync_binlog = 1
    binlog_expire_logs_seconds = 2592000
    # --- GTID & Replication ---
    gtid_mode = ON
    enforce_gtid_consistency = ON
    log_slave_updates = ON
    master_info_repository = TABLE
    relay_log_info_repository = TABLE
    transaction_write_set_extraction = XXHASH64
    binlog_transaction_dependency_tracking = WRITESET
    replica_parallel_type = LOGICAL_CLOCK
    replica_parallel_workers = 4
    replica_preserve_commit_order = ON

    # --- Group Replication (MGR) ---
    report_host = 10.206.0.2
    report_port = 3306
    plugin_load_add = 'group_replication.so'
    group_replication_group_name = "c9f6d3f2-7b21-4e5a-9c87-3a0e9f0a43d2"
    group_replication_start_on_boot = OFF
    group_replication_local_address = "10.206.0.2:33061"
    group_replication_group_seeds = "10.206.0.2:33061,10.206.0.3:33061,10.206.0.4:33061"
    group_replication_single_primary_mode = ON
    group_replication_enforce_update_everywhere_checks = OFF
    group_replication_recovery_get_public_key = ON


    [client]
    port = 3306
    socket = /usr/local/mysql/mysqld.sock

    [mysql]
    default-character-set = utf8mb4
    EOF
  6. Configure Systemd Management File

    cat > /etc/systemd/system/mysql.service <<'EOF'
    [Unit]
    Description=MySQL Database Server
    Documentation=man:mysqld(8) http://dev.mysql.com/doc/
    After=network.target
    Wants=network-online.target
    After=network-online.target

    [Service]
    Environment=MYSQLD_PARENT_PID=1
    User=mysql
    Group=mysql
    Type=forking
    PIDFile=/usr/local/mysql/mysqld.pid
    ExecStart=/usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf --daemonize
    ExecStop=/bin/kill -s SIGTERM $MAINPID
    LimitNOFILE=102400
    LimitMEMLOCK=infinity
    Restart=on-failure
    OOMScoreAdjust=-500
    TimeoutSec=0

    [Install]
    WantedBy=multi-user.target
    EOF
  7. Initialize MySQL

    mysqld --no-defaults --initialize --datadir=/data/mysql/ --user=mysql --log-error=/data/logs/mysql/mysqld.log
  8. Start MySQL

    systemctl daemon-reload
    systemctl enable mysql
    systemctl start mysql
  9. Log in to MySQL and Change Initial Password

    Log in using the temporary password generated during initialization:

    mysql -uroot -p$(grep 'temporary password' /data/logs/mysql/mysqld.log | awk '{print $NF}')

    After logging in, run the following commands in MySQL:

    -- Change the current root user password
    ALTER USER USER() IDENTIFIED BY '123456';
    RESET MASTER;

    -- Create a root user that allows remote connections
    SET SQL_LOG_BIN=0;
    CREATE USER 'root'@'%' IDENTIFIED BY '123456';
    GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
    SET SQL_LOG_BIN=1;
    • The root user password set in the command is 123456; a strong password must be used in actual deployment.
    • If the password contains special characters, only - or _ are allowed; characters such as @ ! # & are prohibited to avoid compatibility issues.

Build MGR Cluster

Deployment Instructions
  • All-node Preparation: Periodic sync user creation, channel credential settings, and other basic configurations must be executed on all nodes.
  • Bootstrapping and Grouping: Cluster initialization (Bootstrap) is only performed on the first node, while other nodes only perform join operations.
  1. Create Replication Synchronization User Execute the following command on each node to create the account used for group replication synchronization:

    SET SQL_LOG_BIN=0;
    CREATE USER 'repl'@'%' IDENTIFIED BY '123456';
    GRANT REPLICATION SLAVE, CONNECTION_ADMIN, BACKUP_ADMIN, GROUP_REPLICATION_STREAM ON *.* TO 'repl'@'%';
    SET SQL_LOG_BIN=1;
    • The replication synchronization account repl password set in the command is 123456; a strong password must be used in actual deployment.
    • If the password contains special characters, only - or _ are allowed; characters such as @ ! # & are prohibited to avoid compatibility issues.
  2. Configure Group Replication Recovery Channel

    Run the following command on each node to complete the group replication recovery channel credential configuration. Replace the password with the actual password of the synchronization account repl:

    CHANGE REPLICATION SOURCE TO SOURCE_USER='repl', SOURCE_PASSWORD='123456' FOR CHANNEL 'group_replication_recovery';
  3. Start and Initialize MGR Group

    The MGR cluster must be "bootstrapped" by one node, after which other nodes can join. Please strictly follow this sequence:

    • A. First Node (Bootstrap Initialization) Execute the following commands to enable bootstrap mode and initialize the cluster:

      SET GLOBAL group_replication_bootstrap_group=ON;
      START GROUP_REPLICATION;
      SET GLOBAL group_replication_bootstrap_group=OFF;
    • B. Other Nodes (Join Existing Group) After the first node is successfully initialized, execute the following command on the remaining nodes to join the group:

      START GROUP_REPLICATION;
  4. Check Member Information

    SELECT * FROM performance_schema.replication_group_members;

    Normal status should show all nodes with MEMBER_STATE = ONLINE, with only one PRIMARY node and the rest as SECONDARY.

  5. Modify group_replication_start_on_boot in /etc/my.cnf to ON

    Run this on each node. This parameter was set to OFF during initialization to prevent bootstrap conflicts. After the cluster is established, enable it so nodes can automatically rejoin the existing MGR replication group after a server reboot or MySQL restart:

    sed -ri 's|group_replication_start_on_boot.*|group_replication_start_on_boot = ON|g' /etc/my.cnf
    grep group_replication_start_on_boot /etc/my.cnf
  6. Rolling Restart and Status Verification

    Run this on each node in sequence. After restarting one node, confirm that its status has returned to ONLINE before proceeding to the next node:

    # 1. Restart service
    systemctl restart mysql

    # 2. Check node status (Ensure all currently active nodes have MEMBER_STATE = ONLINE)
    mysql -uroot -p123456 -e 'SELECT * FROM performance_schema.replication_group_members;'

Enable InnoDB Cluster

Deployment Instructions
  • All-node Operations: The MySQL Shell (mysqlsh) package must be downloaded and installed on all nodes.

  • Single-node Operations: The initialization configuration of InnoDB Cluster only needs to be executed once on the first node (Primary).

  1. Download mysqlsh Installation Package

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/mysql-shell-8.0.45-linux-glibc2.17-x86-64bit.tar.gz
  2. Extract Package and Move to Installation Path

    tar -xvf mysql-shell-8.0.45-linux-glibc2.17-x86-64bit.tar.gz
    mv mysql-shell-8.0.45-linux-glibc2.17-x86-64bit /usr/local/mysql-shell
  3. Add mysqlsh to PATH Environment Variable

    echo 'export PATH=/usr/local/mysql-shell/bin:$PATH' > /etc/profile.d/mysql-shell.sh
    source /etc/profile.d/mysql-shell.sh
  4. Connect to PRIMARY Node (First Node Only)

    mysqlsh --uri root@10.206.0.2:3306 -p123456
  5. Initialize and Create InnoDB Cluster

    var cluster = dba.createCluster('hap-mysql')
  6. View Cluster Status

    cluster.status()
    • After logging back in, you need to execute var cluster = dba.getCluster() before you can use cluster.status().

MySQL Router Deployment

Deployment Instructions

To achieve HA for the access layer, the following steps must be performed on each node in the cluster separately.

  1. Download Installation Package

    wget https://pdpublic.mingdao.com/private-deployment/offline/common/mysql-router-8.0.45-linux-glibc2.17-x86_64.tar.xz
  2. Extract Package and Move to Installation Path

    tar -xvf mysql-router-8.0.45-linux-glibc2.17-x86_64.tar.xz
    mv mysql-router-8.0.45-linux-glibc2.17-x86_64 /usr/local/mysql-router
  3. Add MySQL to PATH Environment Variable

    echo 'export PATH=/usr/local/mysql-router/bin:$PATH' > /etc/profile.d/mysql-router.sh
    source /etc/profile.d/mysql-router.sh
  4. Initialize mysqlrouter

    mysqlrouter --bootstrap root:123456@10.206.0.2:3306 --user=mysql --report-host=10.206.0.2

    Note:

    • The IP after --bootstrap is the Primary node IP and is used to read the cluster topology. Use the same Primary address on all cluster nodes.

    • The IP after --report-host is the intranet IP of the current node. Change it separately on each node to its actual IP.

  5. Increase MySQL Router Maximum Connection Configuration

    sed -i '/^\[DEFAULT\]$/a max_total_connections=5000' /usr/local/mysql-router/mysqlrouter.conf
  6. Configure Permissions

    chown -R mysql:mysql /usr/local/mysql-router
  7. Configure Systemd

    cat > /etc/systemd/system/mysqlrouter.service <<'EOF'
    [Unit]
    Description=MySQL Router Service
    Documentation=man:mysqlrouter(1) https://dev.mysql.com/doc/mysql-router/en/
    Wants=network-online.target
    After=network-online.target

    [Service]
    Type=simple
    User=mysql
    Group=mysql
    WorkingDirectory=/usr/local/mysql-router
    ExecStart=/usr/local/mysql-router/bin/mysqlrouter -c /usr/local/mysql-router/mysqlrouter.conf
    Restart=on-failure
    RestartSec=2
    LimitNOFILE=102400
    OOMScoreAdjust=-500
    TimeoutSec=0

    [Install]
    WantedBy=multi-user.target
    EOF
  8. Start mysqlrouter

    systemctl daemon-reload
    systemctl enable mysqlrouter
    systemctl start mysqlrouter
  9. Connection Test

    Verify connectivity through the read-write port 6446 of MySQL Router on the current node:

    mysql -h 127.0.0.1 -P 6446 -u root -p123456

MySQL MGR Auto Recovery

Configuration Instructions

If half or more nodes fail simultaneously in an MGR cluster, the cluster will automatically enter a Complete Outage (OFFLINE/RECOVERING) state to prevent "brain-split". In this extreme case, even if the node service process is normal, MGR defaults to being unable to rejoin automatically without manual intervention.

The core purpose of configuring the "auto recovery script" is to perform periodic checks and, when the cluster is completely interrupted, automatically select the latest node by GTID progress to bootstrap recovery and achieve fault self-healing. Complete the following steps on each node.

  1. Create Directory and Grant Permissions

    # 1. Create log directory
    mkdir -p /data/logs/mysql

    # 2. Ensure the directory owner is the mysql user (or adjust according to your actual running user)
    chown -R mysql:mysql /data/logs/mysql

    # 3. Set directory access permissions
    chmod 755 /data/logs/mysql
  2. Create Core Detection and Recovery Script /usr/local/bin/mysql-auto-recover.sh

    Note: Please ensure you verify and modify the parameters in the Core Configuration area of the script below (such as password, node IP list, etc.) to match your actual deployment environment.

    cat > /usr/local/bin/mysql-auto-recover.sh <<'EOF'
    #!/bin/bash
    set -euo pipefail # Strict mode

    # ============================ Core Configuration ============================
    MYSQL_CLI="/usr/local/mysql/bin/mysql" # File path of the MySQL client
    MYSQL_SHELL="/usr/local/mysql-shell/bin/mysqlsh" # MySQL Shell file path
    MYSQL_USER="root" # MySQL database administrator username
    MYSQL_PASS="123456" # MySQL database administrator password
    CLUSTER_NAME="hap-mysql" # Name of the MGR (InnoDB Cluster)
    LOG_FILE="/data/logs/mysql/mysql-auto-recover.log" # Log path of the auto recovery script
    LOCK_FILE="/data/logs/mysql/mysql-auto-recover.lock" # Process exclusive lock file path
    NODES=("10.206.0.2" "10.206.0.3" "10.206.0.4") # IP address list of all MGR cluster nodes, space-separated and aligned with the cluster plan
    LOG_KEEP_DAYS=180 # Local log file retention days
    TRIES=3 # Continuous retry count for fault confirmation detection
    SLEEP_SEC=2 # Buffer time between each fault detection retry (seconds)

    # ============================ Log Functions ============================
    # Unified log format: [YYYY-MM-DD HH:MM:SS] [LEVEL] [PID] message
    # Levels: INFO (normal event) / WARN (abnormal but recoverable) / ERROR (recovery failed and requires attention)
    _log() {
    local level="$1"; shift
    printf '[%s] [%-5s] [pid=%d] %s\n' "$(date '+%F %T')" "$level" "$$" "$*" >> "$LOG_FILE"
    }
    log_info() { _log "INFO" "$@"; }
    log_warn() { _log "WARN" "$@"; }
    log_error() { _log "ERROR" "$@"; }

    # Prefix each stdout/stderr line from external processes such as mysqlsh before writing it to disk.
    # Empty lines and known low-value warning noise are filtered to keep logs clean and readable.
    log_pipe() {
    local tag="$1"
    local line
    while IFS= read -r line; do
    [ -z "$line" ] && continue
    case "$line" in
    *"Using a password on the command line"*) continue ;;
    *"Cannot set LC_ALL to locale"*) continue ;;
    esac
    _log "INFO" "[$tag] $line"
    done
    }

    # ============================ Basic Functions ============================
    # Concurrency control: ensures only one detection process runs at a time
    exec_lock() {
    exec 9>"$LOCK_FILE"
    if ! flock -n 9; then
    log_warn "Another instance is already running; exiting."
    exit 0
    fi
    }

    # Log rotation and cleanup: split files by day, and point the main log path to today's file through a symbolic link
    rotate_logs() {
    local log_file="$1"
    local keep_days="${2:-180}"
    local log_dir base_name today today_log

    log_dir=$(dirname "$log_file")
    base_name=$(basename "$log_file" .log)
    today=$(date +%F)
    today_log="${log_dir}/${base_name}_${today}.log"

    # If today's log file does not exist, create it and point the main log path to it (atomic symlink update)
    if [ ! -f "$today_log" ]; then
    : > "$today_log"
    ln -sfn "$today_log" "$log_file"
    fi

    # If the main log is not a symlink or does not point to today's file, repair the symlink
    if [ ! -L "$log_file" ] || [ "$(readlink -f "$log_file" 2>/dev/null || true)" != "$today_log" ]; then
    ln -sfn "$today_log" "$log_file"
    fi

    # Clean expired logs in the current directory only to avoid accidental deletion
    find "$log_dir" -maxdepth 1 -name "${base_name}_*.log" -type f -mtime +"$keep_days" -exec rm -f {} \; 2>/dev/null || true
    }

    check_node_alive() {
    local node="$1"
    if "$MYSQL_CLI" -h "$node" -u"$MYSQL_USER" -p"$MYSQL_PASS" \
    --connect-timeout=3 -e "SELECT 1;" >/dev/null 2>&1; then
    return 0
    else
    log_warn "Node $node is unreachable."
    return 1
    fi
    }

    check_all_nodes_alive() {
    local ok=0
    for node in "${NODES[@]}"; do
    if check_node_alive "$node"; then
    ok=$((ok+1))
    fi
    done
    echo "$ok"
    }

    # Return value: non-negative integer = ONLINE member count; -1 = query failed (different from a real 0)
    get_cluster_online_count() {
    local result
    result=$("$MYSQL_CLI" -u"$MYSQL_USER" -p"$MYSQL_PASS" \
    --connect-timeout=3 -N -e \
    "SELECT COUNT(*) FROM performance_schema.replication_group_members WHERE MEMBER_STATE='ONLINE';" 2>/dev/null) || {
    echo "-1"
    return 0
    }
    if [[ "$result" =~ ^[0-9]+$ ]]; then
    echo "$result"
    else
    echo "-1"
    fi
    }

    # Count the total committed transactions in gtid_executed, compatible with:
    # 1) Single transaction: uuid:1000 -> 1
    # 2) Range: uuid:1-1000 -> 1000
    # 3) Multiple ranges: uuid:1-100:200-500 -> 100 + 301 = 401
    # 4) Multiple sources: uuid1:1-100,uuid2:1-200 -> 100 + 200 = 300
    # Key point: split by ":" and skip the UUID segment to avoid counting digits/hyphens in UUIDs as transactions.
    # Use the "total transaction count" rather than the "maximum sequence number" as the primary selection basis.
    # This correctly handles cases where nodes have the same highest range but different lower range sizes
    # (for example 1-53986:1000005-1000010 vs 1-53988:1000005-1000010).
    count_gtid_transactions() {
    local gtid="$1"
    [ -z "$gtid" ] && { echo 0; return 0; }
    echo "$gtid" | awk '
    BEGIN { total = 0 }
    {
    # Remove all whitespace characters
    gsub(/[ \t\r\n]/, "", $0)
    # Multiple sources are separated by commas
    n = split($0, sources, ",")
    for (i = 1; i <= n; i++) {
    # Each source is in the form uuid:range[:range...]; the first segment is UUID and must be skipped
    m = split(sources[i], parts, ":")
    for (j = 2; j <= m; j++) {
    seg = parts[j]
    if (seg ~ /^[0-9]+$/) {
    # Single transaction
    total += 1
    } else if (seg ~ /^[0-9]+-[0-9]+$/) {
    # Closed interval [start, end], transaction count = end - start + 1
    k = split(seg, range, "-")
    start = range[1] + 0
    end = range[k] + 0
    if (end >= start) total += (end - start + 1)
    }
    }
    }
    print total
    }'
    }

    find_primary_node() {
    declare -A TX_MAP
    local max_tx=0
    local primary=""
    local node gtid tx

    for node in "${NODES[@]}"; do
    gtid=$("$MYSQL_CLI" -h "$node" -u"$MYSQL_USER" -p"$MYSQL_PASS" \
    --connect-timeout=3 -N -e \
    "SELECT @@GLOBAL.gtid_executed;" 2>/dev/null) || gtid=""
    tx=$(count_gtid_transactions "$gtid")
    TX_MAP[$node]=$tx
    log_info "Node $node GTID: executed=[$gtid] transactions=$tx"
    if [ "$tx" -gt "$max_tx" ]; then
    max_tx=$tx
    primary=$node
    fi
    done

    # If all node GTIDs are 0 (new empty cluster or all queries failed), default to the first node
    [ -z "$primary" ] && primary="${NODES[0]}"

    log_info "Primary candidate elected: $primary (transactions=$max_tx)"
    echo "$primary"
    }

    # Check whether the current host IP matches the target node (supports multiple NICs, exact match)
    is_local_host() {
    local target="$1"
    local ip
    for ip in $(hostname -I 2>/dev/null); do
    [ "$ip" = "$target" ] && return 0
    done
    return 1
    }

    do_recovery() {
    local my_ip rc
    my_ip=$(hostname -I | awk '{print $1}')
    log_info "This node ($my_ip) is the recovery target; invoking reboot_cluster_from_complete_outage."
    set +e
    # Set LC_ALL / LANG to C.UTF-8 to avoid mysqlsh locale warnings on systems without en_US.UTF-8.
    # Redirect all output to log_pipe so it is written with the [MYSQLSH] prefix.
    # Use PIPESTATUS[0] to capture the mysqlsh exit code itself instead of log_pipe's 0.
    timeout 60 env LC_ALL=C.UTF-8 LANG=C.UTF-8 \
    "$MYSQL_SHELL" --no-wizard --py \
    --user="$MYSQL_USER" --password="$MYSQL_PASS" \
    --host=127.0.0.1 <<PYEOF 2>&1 | log_pipe "MYSQLSH"
    import sys
    try:
    cluster = dba.reboot_cluster_from_complete_outage('$CLUSTER_NAME', {'force': True})
    print("Cluster rebooted successfully.")
    except Exception as e:
    print("Recovery failed:", e)
    sys.exit(1)
    PYEOF
    rc=${PIPESTATUS[0]}
    set -e
    if [ "$rc" -eq 0 ]; then
    log_info "Cluster reboot completed successfully."
    elif [ "$rc" -eq 124 ]; then
    log_error "Cluster reboot timed out after 60 seconds; will retry on next cycle."
    else
    log_error "Cluster reboot failed (exit_code=$rc); will retry on next cycle."
    fi
    }

    # ============================ Main Flow ============================
    # Lock first, then rotate logs to avoid concurrent instances corrupting symlinks/logs
    exec_lock
    rotate_logs "$LOG_FILE" "$LOG_KEEP_DAYS"

    MY_IP=$(hostname -I | awk '{print $1}')
    log_info "===== MGR health check started (host=$MY_IP) ====="

    # Declare a complete cluster outage only after TRIES consecutive confirmations that all nodes are reachable and ONLINE member count is 0
    is_outage=false
    consecutive=0
    for attempt in $(seq 1 "$TRIES"); do
    reachable=$(check_all_nodes_alive)
    online_count=$(get_cluster_online_count)

    if [ "$reachable" -eq "${#NODES[@]}" ] && [ "$online_count" = "0" ]; then
    consecutive=$((consecutive + 1))
    log_warn "Attempt $attempt/$TRIES: outage signal confirmed ($consecutive/$TRIES) [reachable=$reachable, online=$online_count]"
    if [ "$consecutive" -ge "$TRIES" ]; then
    is_outage=true
    break
    fi
    else
    if [ "$consecutive" -gt 0 ]; then
    log_info "Attempt $attempt/$TRIES: outage signal reset [reachable=$reachable, online=$online_count]"
    fi
    consecutive=0
    fi

    # Do not sleep after the last attempt to avoid unnecessary waiting
    [ "$attempt" -lt "$TRIES" ] && sleep "$SLEEP_SEC"
    done

    if [ "$is_outage" = true ]; then
    log_warn "Complete cluster outage detected; initiating recovery procedure."
    PRIMARY_NODE=$(find_primary_node)

    if is_local_host "$PRIMARY_NODE"; then
    do_recovery
    else
    log_info "This node ($MY_IP) is not the recovery target ($PRIMARY_NODE); standing by."
    fi
    else
    log_info "Cluster is healthy or precondition not met; no action taken."
    fi

    log_info "===== MGR health check finished ====="
    EOF

    Grant execution permissions

    chmod +x /usr/local/bin/mysql-auto-recover.sh
  3. Configure Systemd Service Unit

    Create a Service file to define how to run the script above.

    cat > /etc/systemd/system/mysql-auto-recover.service <<'EOF'
    [Unit]
    Description=MySQL MGR Auto-recovery Service
    After=network.target mysql.service

    [Service]
    Type=oneshot
    ExecStart=/usr/local/bin/mysql-auto-recover.sh
    User=root
    TimeoutSec=0
    EOF
  4. Configure Systemd Timer

    Create a Timer file, setting the script to run once every minute.

    cat > /etc/systemd/system/mysql-auto-recover.timer <<'EOF'
    [Unit]
    Description=Run MySQL MGR Recovery check every minute

    [Timer]
    OnCalendar=*-*-* *:*:00
    Unit=mysql-auto-recover.service
    Persistent=true

    [Install]
    WantedBy=timers.target
    EOF
  5. Load Configuration and Start

    # Reload systemd configuration
    systemctl daemon-reload

    # Enable and immediately start the timer
    systemctl enable --now mysql-auto-recover.timer
  6. Common Commands

    The following are common instructions for managing the auto recovery service and its timer:

    # View the next execution time of the timer
    systemctl list-timers --all | grep mysql-auto-recover

    # View real-time operation logs
    tail -f /data/logs/mysql/mysql-auto-recover.log

    # Manually trigger a recovery detection immediately
    systemctl start mysql-auto-recover.service

    # View the status of the timer service
    systemctl status mysql-auto-recover.timer

    # Temporarily stop auto recovery monitoring
    systemctl stop mysql-auto-recover.timer