Limit CLUSTER_CANT_FAILOVER_DATA_AGE log to 10 times period (#1189)

If a replica is step into data_age too old stage, it can not
trigger the failover and currently it can not be automatically
recovered and we will print a log every
CLUSTER_CANT_FAILOVER_RELOG_PERIOD,
which is every second. If the primary has not recovered or there is
no manual failover, this log will flood the log file.

In this case, limit its frequency to 10 times period, which is
10 seconds in our code. Also in this data_age too old stage,
the repeated logs also can stand for the progress of the failover.

See also #780 for more details about it.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
This commit is contained in:
Binbin 2024-10-24 16:38:47 +08:00 committed by GitHub
parent c419524c05
commit a21fe718f4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -4433,11 +4433,18 @@ int clusterGetReplicaRank(void) {
void clusterLogCantFailover(int reason) {
char *msg;
static time_t lastlog_time = 0;
time_t now = time(NULL);
/* Don't log if we have the same reason for some time. */
if (reason == server.cluster->cant_failover_reason &&
time(NULL) - lastlog_time < CLUSTER_CANT_FAILOVER_RELOG_PERIOD)
/* General logging suppression if the same reason has occurred recently. */
if (reason == server.cluster->cant_failover_reason && now - lastlog_time < CLUSTER_CANT_FAILOVER_RELOG_PERIOD) {
return;
}
/* Special case: If the failure reason is due to data age, log 10 times less frequently. */
if (reason == server.cluster->cant_failover_reason && reason == CLUSTER_CANT_FAILOVER_DATA_AGE &&
now - lastlog_time < 10 * CLUSTER_CANT_FAILOVER_RELOG_PERIOD) {
return;
}
server.cluster->cant_failover_reason = reason;