Binbin e1b3629186 Fix data loss when replica do a failover with a old history repl offset (#885)
Our current replica can initiate a failover without restriction when
it detects that the primary node is offline. This is generally not a
problem. However, consider the following scenarios:

1. In slot migration, a primary loses its last slot and then becomes
a replica. When it is fully synchronized with the new primary, the new
primary downs.

2. In CLUSTER REPLICATE command, a replica becomes a replica of another
primary. When it is fully synchronized with the new primary, the new
primary downs.

In the above scenario, case 1 may cause the empty primary to be elected
as the new primary, resulting in primary data loss. Case 2 may cause the
non-empty replica to be elected as the new primary, resulting in data
loss and confusion.

The reason is that we have cached primary logic, which is used for psync.
In the above scenario, when clusterSetPrimary is called, myself will cache
server.primary in server.cached_primary for psync. In replicationGetReplicaOffset,
we get server.cached_primary->reploff for offset, gossip it and rank it,
which causes the replica to use the old historical offset to initiate
failover, and it get a good rank, initiates election first, and then is
elected as the new primary.

The main problem here is that when the replica has not completed full
sync, it may get the historical offset in replicationGetReplicaOffset.

The fix is to clear cached_primary in these places where full sync is
obviously needed, and let the replica use offset == 0 to participate
in the election. In this way, this unhealthy replica has a worse rank
and is not easy to be elected.

Of course, it is possible that it will be elected with offset == 0.
In the future, we may need to prohibit the replica with offset == 0
from having the right to initiate elections.

Another point worth mentioning, in above cases:
1. In the ROLE command, the replica status will be handshake, and the
offset will be -1.
2. Before this PR, in the CLUSTER SHARD command, the replica status will
be online, and the offset will be the old cached value (which is wrong).
3. After this PR, in the CLUSTER SHARD, the replica status will be loading,
and the offset will be 0.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-21 13:11:21 +08:00
..
2024-07-08 20:01:39 -07:00
2024-07-08 20:01:39 -07:00
2024-04-16 21:17:38 +02:00
2024-06-07 01:40:55 +02:00
2024-07-17 13:59:33 -07:00
2024-07-08 20:01:39 -07:00
2024-07-08 20:01:39 -07:00
2024-06-07 01:40:55 +02:00
2024-07-08 20:01:39 -07:00
2024-07-08 20:01:39 -07:00
2024-06-07 01:40:55 +02:00
2024-07-08 20:01:39 -07:00
2024-07-08 20:01:39 -07:00
2024-07-08 20:01:39 -07:00
2024-05-17 13:35:31 +02:00