A system similar to the RDB write error handling is used, in which when
we can't write to the AOF file, writes are no longer accepted until we
are able to write again.
For fsync == always we still abort on errors since there is currently no
easy way to avoid replying with success to the user otherwise, and this
would violate the contract with the user of only acknowledging data
already secured on disk.
Avoid to trash a configEpoch for every slot migrated if this node has
already the max configEpoch across the cluster.
Still work to do in this area but this avoids both ending with a very
high configEpoch without any reason and to flood the system with fsyncs.
Avoid to trash a configEpoch for every slot migrated if this node has
already the max configEpoch across the cluster.
Still work to do in this area but this avoids both ending with a very
high configEpoch without any reason and to flood the system with fsyncs.
The actual goal of the function was to get the max configEpoch found in
the cluster, so make it general by removing the assignment of the max
epoch to currentEpoch that is useful only at startup.
The actual goal of the function was to get the max configEpoch found in
the cluster, so make it general by removing the assignment of the max
epoch to currentEpoch that is useful only at startup.
Removed a stale conditional preventing the configEpoch from incrementing
after the import in certain conditions. Since the master got a new slot
it should always claim a new configuration.
Removed a stale conditional preventing the configEpoch from incrementing
after the import in certain conditions. Since the master got a new slot
it should always claim a new configuration.
The node receiving the hash slot needs to have a version that wins over
the other versions in order to force the ownership of the slot.
However the current code is far from perfect since a failover can happen
during the manual resharding. The fix is a work in progress but the
bottom line is that the new version must either be voted as usually,
set by redis-trib manually after it makes sure can't be used by other
nodes, or reserved configEpochs could be used for manual operations (for
example odd versions could be never used by slaves and are always used
by CLUSTER SETSLOT NODE).
The node receiving the hash slot needs to have a version that wins over
the other versions in order to force the ownership of the slot.
However the current code is far from perfect since a failover can happen
during the manual resharding. The fix is a work in progress but the
bottom line is that the new version must either be voted as usually,
set by redis-trib manually after it makes sure can't be used by other
nodes, or reserved configEpochs could be used for manual operations (for
example odd versions could be never used by slaves and are always used
by CLUSTER SETSLOT NODE).
During slots migration redis-trib can send a number of SETSLOT commands.
Fsyncing every time is a bit too much in production as verified
empirically.
To make sure configs are fsynced on all nodes after a resharding
redis-trib may send something like CLUSTER CONFSYNC.
In this case fsyncs were not providing too much value since anyway
processes can crash in the middle of the resharding of an hash slot, and
redis-trib should be able to recover from this condition anyway.
During slots migration redis-trib can send a number of SETSLOT commands.
Fsyncing every time is a bit too much in production as verified
empirically.
To make sure configs are fsynced on all nodes after a resharding
redis-trib may send something like CLUSTER CONFSYNC.
In this case fsyncs were not providing too much value since anyway
processes can crash in the middle of the resharding of an hash slot, and
redis-trib should be able to recover from this condition anyway.
If the slot is manually assigned to another node, clear the migrating
status regardless of the fact it was previously assigned to us or not,
as long as we no longer have keys for this slot.
This avoid a race during slots migration that may leave the slot in
migrating status in the source node, since it received an update message
from the destination node that is already claiming the slot.
This way we are sure that redis-trib at the end of the slot migration is
always able to close the slot correctly.
If the slot is manually assigned to another node, clear the migrating
status regardless of the fact it was previously assigned to us or not,
as long as we no longer have keys for this slot.
This avoid a race during slots migration that may leave the slot in
migrating status in the source node, since it received an update message
from the destination node that is already claiming the slot.
This way we are sure that redis-trib at the end of the slot migration is
always able to close the slot correctly.
If someone asks for SYNC or PSYNC from redis-cli,
automatically enter slaveMode (as if they ran
redis-cli --slave) and continue printing the replication
stream until either they Ctrl-C or the master gets disconnected.
If someone asks for SYNC or PSYNC from redis-cli,
automatically enter slaveMode (as if they ran
redis-cli --slave) and continue printing the replication
stream until either they Ctrl-C or the master gets disconnected.