Better handling of connection errors in order to update the table and
recovery, populate the startup nodes table after fetching the list of
nodes.
More work to do about it, it is still not as reliable as
redis-rb-cluster implementation which is the minimal reference
implementation for Redis Cluster clients.
Better handling of connection errors in order to update the table and
recovery, populate the startup nodes table after fetching the list of
nodes.
More work to do about it, it is still not as reliable as
redis-rb-cluster implementation which is the minimal reference
implementation for Redis Cluster clients.
Using CLUSTER FAILOVER FORCE it is now possible to failover a master in
a forced way, which means:
1) No check to understand if the master is up is performed.
2) No data age of the slave is checked. Evan a slave with very old data
can manually failover a master in this way.
3) No chat with the master is attempted to reach its replication offset:
the master can just be down.
Using CLUSTER FAILOVER FORCE it is now possible to failover a master in
a forced way, which means:
1) No check to understand if the master is up is performed.
2) No data age of the slave is checked. Evan a slave with very old data
can manually failover a master in this way.
3) No chat with the master is attempted to reach its replication offset:
the master can just be down.
Automatic failovers only happen in Redis Cluster if the slave trying to
be elected was disconnected from its master for no more than 10 times
the node-timeout value. However there should be no such a check for
manual failovers, since these are initiated by the sysadmin that, in
theory, knows what she is doing when a slave is selected to be promoted.
Automatic failovers only happen in Redis Cluster if the slave trying to
be elected was disconnected from its master for no more than 10 times
the node-timeout value. However there should be no such a check for
manual failovers, since these are initiated by the sysadmin that, in
theory, knows what she is doing when a slave is selected to be promoted.
Will be configurable / adaptive at some point but let's start with a
saner value compared to 1 sec which is not a good idea for big data
structures stored into a single key.
Will be configurable / adaptive at some point but let's start with a
saner value compared to 1 sec which is not a good idea for big data
structures stored into a single key.
The error when the target key is busy was a generic one, while it makes
sense to be able to distinguish between the target key busy error and
the others easily.
The error when the target key is busy was a generic one, while it makes
sense to be able to distinguish between the target key busy error and
the others easily.
The same change was operated for normal client connections. This is
important for Cluster as well, since when a node rejoins the cluster,
when a partition heals or after a restart, it gets flooded with new
connection attempts by all the other nodes trying to form a full
mesh again.
The same change was operated for normal client connections. This is
important for Cluster as well, since when a node rejoins the cluster,
when a partition heals or after a restart, it gets flooded with new
connection attempts by all the other nodes trying to form a full
mesh again.
When a Sentinel performs a failover (successful or not), or when a
Sentinel votes for a different Sentinel trying to start a failover, it
sets a min delay before it will try to get elected for a failover.
While not strictly needed, because if multiple Sentinels will try
to failover the same master at the same time, only one configuration
will eventually win, this serialization is practically very useful.
Normal failovers are cleaner: one Sentinel starts to failover, the
others update their config when the Sentinel performing the failover
is able to get the selected slave to move from the role of slave to the
one of master.
However currently this timeout was implicit, so users could see
Sentinels not reacting, after a failed failover, for some time, without
giving any feedback in the logs to the poor sysadmin waiting for clues.
This commit makes Sentinels more verbose about the delay: when a master
is down and a failover attempt is not performed because the delay has
still not elaped, something like that will be logged:
Next failover delay: I will not start a failover
before Thu May 8 16:48:59 2014
When a Sentinel performs a failover (successful or not), or when a
Sentinel votes for a different Sentinel trying to start a failover, it
sets a min delay before it will try to get elected for a failover.
While not strictly needed, because if multiple Sentinels will try
to failover the same master at the same time, only one configuration
will eventually win, this serialization is practically very useful.
Normal failovers are cleaner: one Sentinel starts to failover, the
others update their config when the Sentinel performing the failover
is able to get the selected slave to move from the role of slave to the
one of master.
However currently this timeout was implicit, so users could see
Sentinels not reacting, after a failed failover, for some time, without
giving any feedback in the logs to the poor sysadmin waiting for clues.
This commit makes Sentinels more verbose about the delay: when a master
is down and a failover attempt is not performed because the delay has
still not elaped, something like that will be logged:
Next failover delay: I will not start a failover
before Thu May 8 16:48:59 2014
SPOP, tested in the new test, is among the commands rewritng the
client->argv argument vector (it gets rewritten as SREM) for command
replication purposes.
Because of recent optimizations to client->argv caching in the context
of the Lua internal Redis client, it is important to test for SPOP to be
callable from Lua without bad effects to the other commands.
SPOP, tested in the new test, is among the commands rewritng the
client->argv argument vector (it gets rewritten as SREM) for command
replication purposes.
Because of recent optimizations to client->argv caching in the context
of the Lua internal Redis client, it is important to test for SPOP to be
callable from Lua without bad effects to the other commands.
Sometimes the process is still there but no longer in a state that can
be checked (after being killed). This used to happen after a call to
SHUTDOWN NOSAVE in the scripting unit, causing a false positive.
Sometimes the process is still there but no longer in a state that can
be checked (after being killed). This used to happen after a call to
SHUTDOWN NOSAVE in the scripting unit, causing a false positive.