A very commonly signaled operational problem with Redis master-replicas
sets is that, once the master becomes unavailable for some reason,
especially because of network problems, many times it wont be able to
perform a partial resynchronization with the new master, once it rejoins
the partition, for the following reason:
1. The master becomes isolated, however it keeps sending PINGs to the
replicas. Such PINGs will never be received since the link connection is
actually already severed.
2. On the other side, one of the replicas will turn into the new master,
setting its secondary replication ID offset to the one of the last
command received from the old master: this offset will not include the
PINGs sent by the master once the link was already disconnected.
3. When the master rejoins the partion and is turned into a replica, its
offset will be too advanced because of the PINGs, so a PSYNC will fail,
and a full synchronization will be required.
Related to issue #7002 and other discussion we had in the past around
this problem.
Former-commit-id: 5d6e8fe3e3e43162f0c57f580b6e8432274fca30
"Partial Resynchronization" is a special variant of replication success
that we have to tell systemd about if it is managing redis-server via a
Type=Notify service unit.
Former-commit-id: dd9502f373eb7e32aee69a30dcb521bea3ccd3ad
1. Call emptyDb even in case of diskless-load: We want modules
to get the same FLUSHDB event as disk-based replication.
2. Do not fire any module events when flushing the backups array.
3. Delete redundant call to signalFlushedDb (Called from emptyDb).
Former-commit-id: aa8a3077a2d20e66e34f72f2860d0cc3daad496e
propagate_last_id is declared outside of the loop but used
only from within the loop. Once it's '1' it will never go
back to '0' and will replicate XSETID even for IDs that
don't actually change the last_id.
While not a serious bug (XSETID always used group->last_id
so there's no risk), it does causes redundant traffic
between master and its replicas
First, we must parse the IDs, so that we abort ASAP.
The return value of this command cannot be an error if
the client successfully acknowledged some messages,
so it should be executed in a "all or nothing" fashion.
By using a "circular BRPOPLPUSH"-like scenario it was
possible the get the same client on db->blocking_keys
twice (See comment in moduleTryServeClientBlockedOnKey)
The fix was actually already implememnted in
moduleTryServeClientBlockedOnKey but it had a bug:
the funxction should return 0 or 1 (not OK or ERR)
Other changes:
1. Added two commands to blockonkeys.c test module (To
reproduce the case described above)
2. Simplify blockonkeys.c in order to make testing easier
3. cast raxSize() to avoid warning with format spec
Makse sure call() doesn't wrap replicated commands with
a redundant MULTI/EXEC
Other, unrelated changes:
1. Formatting compiler warning in INFO CLIENTS
2. Use CLIENT_ID_AOF instead of UINT64_MAX