When the configured node timeout is very small, the data validity time
(maximum data age for a slave to try a failover) is too little (ten
times the configured node timeout) when the replication link with the
master is mostly idle. In this case we'll receive some data from the
master only every server.repl_ping_slave_period to refresh the last
interaction with the master.
This commit adds to the max data validity time the slave ping period to
avoid this problem of slaves sensing too old data without a good reason.
However this max data validity time is likely a setting that should be
configurable by the Redis Cluster user in a way completely independent
from the node timeout.
When the configured node timeout is very small, the data validity time
(maximum data age for a slave to try a failover) is too little (ten
times the configured node timeout) when the replication link with the
master is mostly idle. In this case we'll receive some data from the
master only every server.repl_ping_slave_period to refresh the last
interaction with the master.
This commit adds to the max data validity time the slave ping period to
avoid this problem of slaves sensing too old data without a good reason.
However this max data validity time is likely a setting that should be
configurable by the Redis Cluster user in a way completely independent
from the node timeout.
This commit makes it simple to start an handshake with a specific node
address, and uses this in order to detect a node IP change and start a
new handshake in order to fix the IP if possible.
This commit makes it simple to start an handshake with a specific node
address, and uses this in order to detect a node IP change and start a
new handshake in order to fix the IP if possible.
As specified in the Redis Cluster specification, when a node can reach
the majority again after a period in which it was partitioend away with
the minorty of masters, wait some time before accepting queries, to
provide a reasonable amount of time for other nodes to upgrade its
configuration.
This lowers the probabilities of both a client and a master with not
updated configuration to rejoin the cluster at the same time, with a
stale master accepting writes.
As specified in the Redis Cluster specification, when a node can reach
the majority again after a period in which it was partitioend away with
the minorty of masters, wait some time before accepting queries, to
provide a reasonable amount of time for other nodes to upgrade its
configuration.
This lowers the probabilities of both a client and a master with not
updated configuration to rejoin the cluster at the same time, with a
stale master accepting writes.
With this commit options not explicitly rewritten by CONFIG REWRITE are
not touched at all. These include new options that may not have support
for REWRITE, and other special cases like rename-command and include.
With this commit options not explicitly rewritten by CONFIG REWRITE are
not touched at all. These include new options that may not have support
for REWRITE, and other special cases like rename-command and include.
The value was otherwise undefined, so next time the node was promoted
again from slave to master, adding a slave to the list of slaves
would likely crash the server or result into undefined behavior.
The value was otherwise undefined, so next time the node was promoted
again from slave to master, adding a slave to the list of slaves
would likely crash the server or result into undefined behavior.
Later this should be configurable from the command line but at least now
we use something more appropriate for our use case compared to the
redis-rb default timeout.
Later this should be configurable from the command line but at least now
we use something more appropriate for our use case compared to the
redis-rb default timeout.
The bug could be easily triggered by:
SADD foo a b c 1 2 3 4 5 6
SDIFF foo foo
When the key was the same in two sets, an unsafe iterator was used to
check existence of elements in the same set we were iterating.
Usually this would just result into a wrong output, however with the
dict.c API misuse protection we have in place, the result was actually
an assertion failed that was triggered by the CI test, while creating
random datasets for the "MASTER and SLAVE consistency" test.
The bug could be easily triggered by:
SADD foo a b c 1 2 3 4 5 6
SDIFF foo foo
When the key was the same in two sets, an unsafe iterator was used to
check existence of elements in the same set we were iterating.
Usually this would just result into a wrong output, however with the
dict.c API misuse protection we have in place, the result was actually
an assertion failed that was triggered by the CI test, while creating
random datasets for the "MASTER and SLAVE consistency" test.