The first address specified as a bind parameter
(server.bindaddr[0]) gets used as the source IP
for cluster communication.
If no bind address is specified by the user, the
behavior is unchanged.
This patch allows multiple Redis Cluster instances
to communicate when running on the same interface
of the same host.
The first address specified as a bind parameter
(server.bindaddr[0]) gets used as the source IP
for cluster communication.
If no bind address is specified by the user, the
behavior is unchanged.
This patch allows multiple Redis Cluster instances
to communicate when running on the same interface
of the same host.
This commit sets the failover timeout to 30 seconds instead of the 180
seconds default, and allows to reconfigure multiple slaves at the same
time.
This makes tests less sensible to timing, with the result that there are
less false positives due to normal behaviors that require time to
succeed or to be retried.
However the long term solution is probably some way in order to detect
when a test failed because of timing issues (for example split brain
during leader election) and retry it.
This commit sets the failover timeout to 30 seconds instead of the 180
seconds default, and allows to reconfigure multiple slaves at the same
time.
This makes tests less sensible to timing, with the result that there are
less false positives due to normal behaviors that require time to
succeed or to be retried.
However the long term solution is probably some way in order to detect
when a test failed because of timing issues (for example split brain
during leader election) and retry it.
Sentinel needs to avoid split brain conditions due to multiple sentinels
trying to get voted at the exact same time.
So far some desynchronization was provided by fluctuating server.hz,
that is the frequency of the timer function call. However the
desynchonization provided in this way was not enough when using many
Sentinel instances, especially when a large quorum value is used in
order to force a greater degree of agreement (more than N/2+1).
It was verified that it was likely to trigger a split brain
condition, forcing the system to try again after a timeout.
Usually the system will succeed after a few retries, but this is not
optimal.
This commit desynchronizes instances in a more effective way to make it
likely that the first attempt will be successful.
Sentinel needs to avoid split brain conditions due to multiple sentinels
trying to get voted at the exact same time.
So far some desynchronization was provided by fluctuating server.hz,
that is the frequency of the timer function call. However the
desynchonization provided in this way was not enough when using many
Sentinel instances, especially when a large quorum value is used in
order to force a greater degree of agreement (more than N/2+1).
It was verified that it was likely to trigger a split brain
condition, forcing the system to try again after a timeout.
Usually the system will succeed after a few retries, but this is not
optimal.
This commit desynchronizes instances in a more effective way to make it
likely that the first attempt will be successful.
This is still code to rework in order to use agreement to obtain a new
configEpoch when a slot is migrated, however this commit handles the
special case that happens when the nodes are just started and everybody
has a configEpoch of 0. In this special condition to have the maximum
configEpoch is not enough as the special epoch 0 is not unique (all the
others are).
This does not fixes the intrinsic race condition of a failover happening
while we are resharding, that will be addressed later.
This is still code to rework in order to use agreement to obtain a new
configEpoch when a slot is migrated, however this commit handles the
special case that happens when the nodes are just started and everybody
has a configEpoch of 0. In this special condition to have the maximum
configEpoch is not enough as the special epoch 0 is not unique (all the
others are).
This does not fixes the intrinsic race condition of a failover happening
while we are resharding, that will be addressed later.