The previous behavior of the state machine was to wait some time and
retry the slave selection, but this is not robust enough against drastic
changes in the conditions of the monitored instances.
What we do now when the slave selection fails is to abort the failover
and return back monitoring the master. If the ODOWN condition is still
present a new failover will be triggered and so forth.
This commit also refactors the code we use to abort a failover.
The previous behavior of the state machine was to wait some time and
retry the slave selection, but this is not robust enough against drastic
changes in the conditions of the monitored instances.
What we do now when the slave selection fails is to abort the failover
and return back monitoring the master. If the ODOWN condition is still
present a new failover will be triggered and so forth.
This commit also refactors the code we use to abort a failover.
When we reset the master we should start with clean timestamps for ping
replies otherwise we'll detect a spurious +sdown event, because on
+master-switch event the previous master instance was probably in +sdown
condition. Since we updated the address we should count time from
scratch again.
Also this commit makes sure to explicitly reset the count of pending
commands, now we can do this because of the new way the hiredis link
is closed.
When we reset the master we should start with clean timestamps for ping
replies otherwise we'll detect a spurious +sdown event, because on
+master-switch event the previous master instance was probably in +sdown
condition. Since we updated the address we should count time from
scratch again.
Also this commit makes sure to explicitly reset the count of pending
commands, now we can do this because of the new way the hiredis link
is closed.
We disconnect the Redis instances hiredis link in a more robust way now.
Also we change the way we perform the redirection for the +switch-master
event, that is not just an instance reset with an address change.
Using the same system we now implement the +redirect-to-master event
that is triggered by an instance that is configured to be master but
found to be a slave at the first INFO reply. In that case we monitor the
master instead, logging the incident as an event.
We disconnect the Redis instances hiredis link in a more robust way now.
Also we change the way we perform the redirection for the +switch-master
event, that is not just an instance reset with an address change.
Using the same system we now implement the +redirect-to-master event
that is triggered by an instance that is configured to be master but
found to be a slave at the first INFO reply. In that case we monitor the
master instead, logging the incident as an event.
Sentinel observers detect failover checking if a slave attached to the
monitored master turns into its replication state from slave to master.
However while this change may in theory only happen after a SLAVEOF NO
ONE command, in practie it is very easy to reboot a slave instance with
a wrong configuration that turns it into a master, especially if it was
a past master before a successfull failover.
This commit changes the detection policy so that if an instance goes
from slave to master, but at the same time the runid has changed, we
sense a reboot, and in that case we don't detect a failover at all.
This commit also introduces the "reboot" sentinel event, that is logged
at "warning" level (so this will trigger an admin notification).
The commit also fixes a problem in the disconnect handler that assumed
that the instance object always existed, that is not the case. Now we
no longer assume that redisAsyncFree() will call the disconnection
handler before returning.
Sentinel observers detect failover checking if a slave attached to the
monitored master turns into its replication state from slave to master.
However while this change may in theory only happen after a SLAVEOF NO
ONE command, in practie it is very easy to reboot a slave instance with
a wrong configuration that turns it into a master, especially if it was
a past master before a successfull failover.
This commit changes the detection policy so that if an instance goes
from slave to master, but at the same time the runid has changed, we
sense a reboot, and in that case we don't detect a failover at all.
This commit also introduces the "reboot" sentinel event, that is logged
at "warning" level (so this will trigger an admin notification).
The commit also fixes a problem in the disconnect handler that assumed
that the instance object always existed, that is not the case. Now we
no longer assume that redisAsyncFree() will call the disconnection
handler before returning.
This commit implements the first, beta quality implementation of Redis
Sentinel, a distributed monitoring system for Redis with notification
and automatic failover capabilities.
More info at http://redis.io/topics/sentinel
This commit implements the first, beta quality implementation of Redis
Sentinel, a distributed monitoring system for Redis with notification
and automatic failover capabilities.
More info at http://redis.io/topics/sentinel
Redis loading data from disk, and a Redis slave disconnected from its
master with serve-stale-data disabled, are two conditions where
commands are normally refused by Redis, returning an error.
However there is no reason to disable Pub/Sub commands as well, given
that this layer does not interact with the dataset. To allow Pub/Sub in
as many contexts as possible is especially interesting now that Redis
Sentinel uses Pub/Sub of a Redis master as a communication channel
between Sentinels.
This commit allows Pub/Sub to be used in the above two contexts where
it was previously denied.
Redis loading data from disk, and a Redis slave disconnected from its
master with serve-stale-data disabled, are two conditions where
commands are normally refused by Redis, returning an error.
However there is no reason to disable Pub/Sub commands as well, given
that this layer does not interact with the dataset. To allow Pub/Sub in
as many contexts as possible is especially interesting now that Redis
Sentinel uses Pub/Sub of a Redis master as a communication channel
between Sentinels.
This commit allows Pub/Sub to be used in the above two contexts where
it was previously denied.
For the C standard char can be either signed or unsigned, it's up to the
compiler, but Redis assumed that it was signed in a few places.
The practical effect of this patch is that now Redis 2.6 will run
correctly in every system where char is unsigned, notably the RaspBerry
PI and other ARM systems with GCC.
Thanks to Georgi Marinov (@eesn on twitter) that reported the problem
and allowed me to use his RaspBerry via SSH to trace and fix the issue!
For the C standard char can be either signed or unsigned, it's up to the
compiler, but Redis assumed that it was signed in a few places.
The practical effect of this patch is that now Redis 2.6 will run
correctly in every system where char is unsigned, notably the RaspBerry
PI and other ARM systems with GCC.
Thanks to Georgi Marinov (@eesn on twitter) that reported the problem
and allowed me to use his RaspBerry via SSH to trace and fix the issue!
If Redis only manages to write out a partial buffer, the AOF file won't
load back into Redis the next time it starts up. It is better to
discard the short write than waste time running redis-check-aof.
If Redis only manages to write out a partial buffer, the AOF file won't
load back into Redis the next time it starts up. It is better to
discard the short write than waste time running redis-check-aof.
Behaves like rdb_last_bgsave_status -- even down to reporting 'ok' when
no rewrite has been done yet. (You might want to check that
aof_last_rewrite_time_sec is not -1.)
Behaves like rdb_last_bgsave_status -- even down to reporting 'ok' when
no rewrite has been done yet. (You might want to check that
aof_last_rewrite_time_sec is not -1.)
REDIS_REPL_PING_SLAVE_PERIOD controls how often the master should
transmit a heartbeat (PING) to its slaves. This period, which defaults
to 10, is measured in seconds.
Redis 2.4 masters used to ping their slaves every ten seconds, just like
it says on the tin.
The Redis 2.6 masters I have been experimenting with, on the other hand,
ping their slaves *every second*. (master_last_io_seconds_ago never
approaches 10.) I think the ping period was inadvertently slashed to
one-tenth of its nominal value around the time REDIS_HZ was introduced.
This commit reintroduces correct ping schedule behaviour.
REDIS_REPL_PING_SLAVE_PERIOD controls how often the master should
transmit a heartbeat (PING) to its slaves. This period, which defaults
to 10, is measured in seconds.
Redis 2.4 masters used to ping their slaves every ten seconds, just like
it says on the tin.
The Redis 2.6 masters I have been experimenting with, on the other hand,
ping their slaves *every second*. (master_last_io_seconds_ago never
approaches 10.) I think the ping period was inadvertently slashed to
one-tenth of its nominal value around the time REDIS_HZ was introduced.
This commit reintroduces correct ping schedule behaviour.
The REPLCONF command is an internal command (not designed to be directly
used by normal clients) that allows a slave to set some replication
related state in the master before issuing SYNC to start the
replication.
The initial motivation for this command, and the only reason currently
it is used by the implementation, is to let the slave instance
communicate its listening port to the slave, so that the master can
show all the slaves with their listening ports in the "replication"
section of the INFO output.
This allows clients to auto discover and query all the slaves attached
into a master.
Currently only a single option of the REPLCONF command is supported, and
it is called "listening-port", so the slave now starts the replication
process with something like the following chat:
REPLCONF listening-prot 6380
SYNC
Note that this works even if the master is an older version of Redis and
does not understand REPLCONF, because the slave ignores the REPLCONF
error.
In the future REPLCONF can be used for partial replication and other
replication related features where there is the need to exchange
information between master and slave.
NOTE: This commit also fixes a bug: the INFO outout already carried
information about slaves, but the port was broken, and was obtained
with getpeername(2), so it was actually just the ephemeral port used
by the slave to connect to the master as a client.
The REPLCONF command is an internal command (not designed to be directly
used by normal clients) that allows a slave to set some replication
related state in the master before issuing SYNC to start the
replication.
The initial motivation for this command, and the only reason currently
it is used by the implementation, is to let the slave instance
communicate its listening port to the slave, so that the master can
show all the slaves with their listening ports in the "replication"
section of the INFO output.
This allows clients to auto discover and query all the slaves attached
into a master.
Currently only a single option of the REPLCONF command is supported, and
it is called "listening-port", so the slave now starts the replication
process with something like the following chat:
REPLCONF listening-prot 6380
SYNC
Note that this works even if the master is an older version of Redis and
does not understand REPLCONF, because the slave ignores the REPLCONF
error.
In the future REPLCONF can be used for partial replication and other
replication related features where there is the need to exchange
information between master and slave.
NOTE: This commit also fixes a bug: the INFO outout already carried
information about slaves, but the port was broken, and was obtained
with getpeername(2), so it was actually just the ephemeral port used
by the slave to connect to the master as a client.