A very commonly signaled operational problem with Redis master-replicas
sets is that, once the master becomes unavailable for some reason,
especially because of network problems, many times it wont be able to
perform a partial resynchronization with the new master, once it rejoins
the partition, for the following reason:
1. The master becomes isolated, however it keeps sending PINGs to the
replicas. Such PINGs will never be received since the link connection is
actually already severed.
2. On the other side, one of the replicas will turn into the new master,
setting its secondary replication ID offset to the one of the last
command received from the old master: this offset will not include the
PINGs sent by the master once the link was already disconnected.
3. When the master rejoins the partion and is turned into a replica, its
offset will be too advanced because of the PINGs, so a PSYNC will fail,
and a full synchronization will be required.
Related to issue #7002 and other discussion we had in the past around
this problem.
Former-commit-id: 5d6e8fe3e3e43162f0c57f580b6e8432274fca30
1. Call emptyDb even in case of diskless-load: We want modules
to get the same FLUSHDB event as disk-based replication.
2. Do not fire any module events when flushing the backups array.
3. Delete redundant call to signalFlushedDb (Called from emptyDb).
Former-commit-id: aa8a3077a2d20e66e34f72f2860d0cc3daad496e
Makse sure call() doesn't wrap replicated commands with
a redundant MULTI/EXEC
Other, unrelated changes:
1. Formatting compiler warning in INFO CLIENTS
2. Use CLIENT_ID_AOF instead of UINT64_MAX
b512cb40 introduced automatic wrapping of MULTI/EXEC for the
alsoPropagate API. However this collides with the built-in mechanism
already present in module.c. To avoid complex changes near Redis 6 GA
this commit introduces the ability to exclude call() MUTLI/EXEC wrapping
for also propagate in order to continue to use the old code paths in
module.c.
Now that this mechanism is the sole one used for blocked clients
timeouts, it is more wise to cleanup the table when the client unblocks
for any reason. We use a flag: CLIENT_IN_TO_TABLE, in order to avoid a
radix tree lookup when the client was already removed from the table
because we processed it by scanning the radix tree.
If lots of clients PSUBSCRIBE to same patterns, multiple pattens matching will take place. This commit change it into just one single pattern matching by using a `dict *` to store the unique pattern and which clients subscribe to it.
A very commonly signaled operational problem with Redis master-replicas
sets is that, once the master becomes unavailable for some reason,
especially because of network problems, many times it wont be able to
perform a partial resynchronization with the new master, once it rejoins
the partition, for the following reason:
1. The master becomes isolated, however it keeps sending PINGs to the
replicas. Such PINGs will never be received since the link connection is
actually already severed.
2. On the other side, one of the replicas will turn into the new master,
setting its secondary replication ID offset to the one of the last
command received from the old master: this offset will not include the
PINGs sent by the master once the link was already disconnected.
3. When the master rejoins the partion and is turned into a replica, its
offset will be too advanced because of the PINGs, so a PSYNC will fail,
and a full synchronization will be required.
Related to issue #7002 and other discussion we had in the past around
this problem.
Before this commit, when upgrading a replica, expired keys will not
be loaded, thus causing replica having less keys in db. To this point,
master and replica's keys is logically consistent. However, before
the keys in master and replica are physically consistent, that is,
they have the same dbsize, if master got a problem and the replica
got promoted and becomes new master of that partition, and master
updates a key which does not exist on master, but physically exists
on the old master(new replica), the old master would refuse to update
the key, thus causing master and replica data inconsistent.
How could this happen?
That's all because of the wrong judgement of roles while starting up
the server. We can not use server.masterhost to judge if the server
is master or replica, since it fails in cluster mode.
When we start the server, we load rdb and do want to load expired keys,
and do not want to have the ability to active expire keys, if it is
a replica.