futriix

Author	SHA1	Message	Date
Binbin	ecbfb6a7ec	Fix reconfiguring sub-replica causing data loss when myself change shard_id (#944 ) When reconfiguring sub-replica, there may a case that the sub-replica will use the old offset and win the election and cause the data loss if the old primary went down. In this case, sender is myself's primary, when executing updateShardId, not only the sender's shard_id is updated, but also the shard_id of myself is updated, casuing the subsequent areInSameShard check, that is, the full_sync_required check to fail. As part of the recent fix of #885, the sub-replica needs to decide whether a full sync is required or not when switching shards. This shard membership check is supposed to be done against sub-replica's current shard_id, which however was lost in this code path. This then leads to sub-replica joining the other shard with a completely different and incorrect replication history. This is the only place where replicaof state can be updated on this path so the most natural fix would be to pull the chain replication reduction logic into this code block and before the updateShardId call. This one follow #885 and closes #942. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com>	2024-08-29 22:39:53 +08:00
Binbin	4fe8320711	Add pause path coverage to replica migration tests (#937 ) In #885, we only add a shutdown path, there is another path is that the server might got hang by slowlog. This PR added the pause path coverage to cover it. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-28 11:08:27 +08:00
Binbin	6a84e06b05	Wait for the role change and fix the timing issue in the new test (#947 ) The test might be fast enough and then there is no change in the role causing the test to fail. Adding a wait to avoid the timing issue: ``` *** [err]: valkey-cli make source node ignores NOREPLICAS error when doing the last CLUSTER SETSLOT Expected '{127.0.0.1 23154 267}' to be equal to '' (context: type eval line 24 cmd {assert_equal [lindex [R 3 role] 2] {}} proc ::test) ``` Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-28 09:51:10 +08:00
Binbin	8045994972	valkey-cli make source node ignores NOREPLICAS when doing the last CLUSTER SETSLOT (#928 ) This fixes #899. In that issue, the primary is cluster-allow-replica-migration no and its replica is cluster-allow-replica-migration yes. And during the slot migration: 1. Primary calling blockClientForReplicaAck, waiting its replica. 2. Its replica reconfiguring itself as a replica of other shards due to replica migration and disconnect from the old primary. 3. The old primary never got the chance to receive the ack, so it got a timeout and got a NOREPLICAS error. In this case, the replicas might automatically migrate to another primary, resulting in the client being unblocked with the NOREPLICAS error. In this case, since the configuration will eventually propagate itself, we can safely ignore this error on the source node. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-23 16:22:30 +08:00

4 Commits