21057 Commits

Author SHA1 Message Date
charsyam
9cd06e4406 Fix AOF bug: expire could be removed from key on AOF rewrite.
There was a race condition in the AOF rewrite code that, with bad enough
timing, could cause a volatile key just about to expire to be turned
into a non-volatile key. The bug was never reported to cause actualy
issues, but was found analytically by an user in the Redis mailing list:

https://groups.google.com/forum/?fromgroups=#!topic/redis-db/Kvh2FAGK4Uk

This commit fixes issue #1079.
2013-05-02 15:35:59 +02:00
antirez
a444e5957f Sentinel: changes to tilt mode.
Tilt mode was too aggressive (not processing INFO output), this
resulted in a few problems:

1) Redirections were not followed when in tilt mode. This opened a
   window to misinform clients about the current master when a Sentinel
   was in tilt mode and a fail over happened during the time it was not
   able to update the state.

2) It was possible for a Sentinel exiting tilt mode to detect a false
   fail over start, if a slave rebooted with a wrong configuration
   about at the same time. This used to happen since in tilt mode we
   lose the information that the runid changed (reboot).

   Now instead the Sentinel in tilt mode will still remove the instance
   from the list of slaves if it changes state AND runid at the same
   time.

Both are edge conditions but the changes should overall improve the
reliability of Sentinel.
2013-04-30 15:08:29 +02:00
antirez
e5ef85c444 Sentinel: changes to tilt mode.
Tilt mode was too aggressive (not processing INFO output), this
resulted in a few problems:

1) Redirections were not followed when in tilt mode. This opened a
   window to misinform clients about the current master when a Sentinel
   was in tilt mode and a fail over happened during the time it was not
   able to update the state.

2) It was possible for a Sentinel exiting tilt mode to detect a false
   fail over start, if a slave rebooted with a wrong configuration
   about at the same time. This used to happen since in tilt mode we
   lose the information that the runid changed (reboot).

   Now instead the Sentinel in tilt mode will still remove the instance
   from the list of slaves if it changes state AND runid at the same
   time.

Both are edge conditions but the changes should overall improve the
reliability of Sentinel.
2013-04-30 15:08:29 +02:00
antirez
af01e6445c Sentinel: more sensible delay in master demote after tilt. 2013-04-30 15:08:22 +02:00
antirez
ef05a78e7e Sentinel: more sensible delay in master demote after tilt. 2013-04-30 15:08:22 +02:00
antirez
af07569d67 Sentinel: only demote old master into slave under certain conditions.
We used to always turn a master into a slave if the DEMOTE flag was set,
as this was a resurrecting master instance.

However the following race condition is possible for a Sentinel that
got partitioned or internal issues (tilt mode), and was not able to
refresh the state in the meantime:

1) Sentinel X is running, master is instance "A".
3) "A" fails, sentinels will promote slave "B" as master.
2) Sentinel X goes down because of a network partition.
4) "A" returns available, Sentinels will demote it as a slave.
5) "B" fails, other Sentinels will promote slave "A" as master.
6) At this point Sentinel X comes back.

When "X" comes back he thinks that:

"B" is the master.
"A" is the slave to demote.

We want to avoid that Sentinel "X" will demote "A" into a slave.
We also want that Sentinel "X" will detect that the conditions changed
and will reconfigure itself to monitor the right master.

There are two main ways for the Sentinel to reconfigure itself after
this event:

1) If "B" is reachable and already configured as a slave by other
sentinels, "X" will perform a redirection to "A".
2) If there are not the conditions to demote "A", the fact that "A"
reports to be a master will trigger a failover detection in "X", that
will end into a reconfiguraiton to monitor "A".

However if the Sentinel was not reachable, its state may not be updated,
so in case it titled, or was partiitoned from the master instance of the
slave to demote, the new implementation waits some time (enough to
guarantee we can detect the new INFO, and new DOWN conditions).

If after some time still there are not the right condiitons to demote
the instance, the DEMOTE flag is cleared.
2013-04-26 17:02:13 +02:00
antirez
48ede0d84d Sentinel: only demote old master into slave under certain conditions.
We used to always turn a master into a slave if the DEMOTE flag was set,
as this was a resurrecting master instance.

However the following race condition is possible for a Sentinel that
got partitioned or internal issues (tilt mode), and was not able to
refresh the state in the meantime:

1) Sentinel X is running, master is instance "A".
3) "A" fails, sentinels will promote slave "B" as master.
2) Sentinel X goes down because of a network partition.
4) "A" returns available, Sentinels will demote it as a slave.
5) "B" fails, other Sentinels will promote slave "A" as master.
6) At this point Sentinel X comes back.

When "X" comes back he thinks that:

"B" is the master.
"A" is the slave to demote.

We want to avoid that Sentinel "X" will demote "A" into a slave.
We also want that Sentinel "X" will detect that the conditions changed
and will reconfigure itself to monitor the right master.

There are two main ways for the Sentinel to reconfigure itself after
this event:

1) If "B" is reachable and already configured as a slave by other
sentinels, "X" will perform a redirection to "A".
2) If there are not the conditions to demote "A", the fact that "A"
reports to be a master will trigger a failover detection in "X", that
will end into a reconfiguraiton to monitor "A".

However if the Sentinel was not reachable, its state may not be updated,
so in case it titled, or was partiitoned from the master instance of the
slave to demote, the new implementation waits some time (enough to
guarantee we can detect the new INFO, and new DOWN conditions).

If after some time still there are not the right condiitons to demote
the instance, the DEMOTE flag is cleared.
2013-04-26 17:02:13 +02:00
antirez
5ed9341ed1 Sentinel: always redirect on master->slave transition.
Sentinel redirected to the master if the instance changed runid or it
was the first time we got INFO, and a role change was detected from
master to slave.

While this is a good idea in case of slave->master, since otherwise we
could detect a failover without good reasons just after a reboot with a
slave with a wrong configuration, in the case of master->slave
transition is much better to always perform the redirection for the
following reasons:

1) A Sentinel may go down for some time. When it is back online there is
no other way to understand there was a failover.
2) Pointing clients to a slave seems to be always the wrong thing to do.
3) There is no good rationale about handling things differently once an
instance is rebooted (runid change) in that case.
2013-04-24 11:30:17 +02:00
antirez
1965e22aa1 Sentinel: always redirect on master->slave transition.
Sentinel redirected to the master if the instance changed runid or it
was the first time we got INFO, and a role change was detected from
master to slave.

While this is a good idea in case of slave->master, since otherwise we
could detect a failover without good reasons just after a reboot with a
slave with a wrong configuration, in the case of master->slave
transition is much better to always perform the redirection for the
following reasons:

1) A Sentinel may go down for some time. When it is back online there is
no other way to understand there was a failover.
2) Pointing clients to a slave seems to be always the wrong thing to do.
3) There is no good rationale about handling things differently once an
instance is rebooted (runid change) in that case.
2013-04-24 11:30:17 +02:00
antirez
bd638d72ee Config option to turn AOF rewrite incremental fsync on/off. 2013-04-24 10:57:07 +02:00
antirez
d264122f6a Config option to turn AOF rewrite incremental fsync on/off. 2013-04-24 10:57:07 +02:00
antirez
ef70f8f36e AOF: sync data on disk every 32MB when rewriting.
This prevents the kernel from putting too much stuff in the output
buffers, doing too heavy I/O all at once. So the goal of this commit is
to split the disk pressure due to the AOF rewrite process into smaller
spikes.

Please see issue #1019 for more information.
2013-04-24 10:26:31 +02:00
antirez
336d722fba AOF: sync data on disk every 32MB when rewriting.
This prevents the kernel from putting too much stuff in the output
buffers, doing too heavy I/O all at once. So the goal of this commit is
to split the disk pressure due to the AOF rewrite process into smaller
spikes.

Please see issue #1019 for more information.
2013-04-24 10:26:31 +02:00
antirez
cac8706810 rio.c: added ability to fdatasync() from time to time while writing. 2013-04-24 10:26:30 +02:00
antirez
91f4213ddf rio.c: added ability to fdatasync() from time to time while writing. 2013-04-24 10:26:30 +02:00
antirez
cf6882f3af Sentinel: turn old master into a slave when it comes back. 2013-04-19 16:47:24 +02:00
antirez
8e222c888f Sentinel: turn old master into a slave when it comes back. 2013-04-19 16:47:24 +02:00
antirez
98bb3d2a40 More explicit panic message on out of memory. 2013-04-19 15:11:34 +02:00
antirez
9d823fc222 More explicit panic message on out of memory. 2013-04-19 15:11:34 +02:00
xiaost7
d284570deb Cluster: fix clusterNode.name print format on debug message.
It was %40s instead of %.40s, and since the string is not null
terminated it caused random garbage to be displayed, and possibly a
crash.
2013-04-19 09:53:43 +02:00
xiaost7
ecdbaf4695 Cluster: fix clusterNode.name print format on debug message.
It was %40s instead of %.40s, and since the string is not null
terminated it caused random garbage to be displayed, and possibly a
crash.
2013-04-19 09:53:43 +02:00
antirez
cb2d627e8d redis-cli: raise error on bad command line switch.
Previously redis-cli never tried to raise an error when an unrecognized
switch was encountered, as everything after the initial options is to be
transmitted to the server.

However this is too liberal, as there are no commands starting with "-".
So the new behavior is to produce an error if there is an unrecognized
switch starting with "-". This should not break past redis-cli usages
but should prevent broken options to be silently discarded.

As far the first token not starting with "-" is encountered, all the
rest is considered to be part of the command, so you cna still use
strings starting with "-" as values, like in:

    redis-cli --port 6380 set foo --my-value
2013-04-11 13:17:25 +02:00
antirez
f8ae70cf7c redis-cli: raise error on bad command line switch.
Previously redis-cli never tried to raise an error when an unrecognized
switch was encountered, as everything after the initial options is to be
transmitted to the server.

However this is too liberal, as there are no commands starting with "-".
So the new behavior is to produce an error if there is an unrecognized
switch starting with "-". This should not break past redis-cli usages
but should prevent broken options to be silently discarded.

As far the first token not starting with "-" is encountered, all the
rest is considered to be part of the command, so you cna still use
strings starting with "-" as values, like in:

    redis-cli --port 6380 set foo --my-value
2013-04-11 13:17:25 +02:00
antirez
80b47765af redis-cli: --latency-history mode implemented. 2013-04-11 13:11:41 +02:00
antirez
0280c2f252 redis-cli: --latency-history mode implemented. 2013-04-11 13:11:41 +02:00
antirez
352e1a86a8 Cluster: reconfigure additonal slaves on failover. 2013-04-09 12:13:26 +02:00
antirez
b84570dece Cluster: reconfigure additonal slaves on failover. 2013-04-09 12:13:26 +02:00
antirez
8f6ad5206e Cluster: CONFIG SET cluster-node-timeout. 2013-04-09 11:29:51 +02:00
antirez
d1aee359c0 Cluster: CONFIG SET cluster-node-timeout. 2013-04-09 11:29:51 +02:00
antirez
442af928f9 Cluster: use server.cluster_node_timeout directly.
We used to copy this value into the server.cluster structure, however this
was not necessary.

The reason why we don't directly use server.cluster->node_timeout is
that things that can be configured via redis.conf need to be directly
available in the server structure as server.cluster is allocated later
only if needed in order to reduce the memory footprint of non-cluster
instances.
2013-04-09 11:24:18 +02:00
antirez
68cf249f81 Cluster: use server.cluster_node_timeout directly.
We used to copy this value into the server.cluster structure, however this
was not necessary.

The reason why we don't directly use server.cluster->node_timeout is
that things that can be configured via redis.conf need to be directly
available in the server structure as server.cluster is allocated later
only if needed in order to reduce the memory footprint of non-cluster
instances.
2013-04-09 11:24:18 +02:00
antirez
a8c26a0397 Cluster: configdigest field no longer used. Removed. 2013-04-09 11:07:25 +02:00
antirez
ef4f25ff6e Cluster: configdigest field no longer used. Removed. 2013-04-09 11:07:25 +02:00
antirez
9daa232d42 Cluster: properly send ping to nodes not pinged foro too much time.
In commit de720e4 it was introduced the concept of sending a ping to
every node not receiving a ping since node_timeout/2 seconds.
However the code was located in a place that was not executed because of
a previous conditional causing the loop to re-iterate.

This caused false positives in nodes availability detection.

The current code is still not perfect as a node may be detected to be in
PFAIL state even if it does not reply for just node_timeout/2 seconds
that is not correct. There is a plan to improve this code ASAP.
2013-04-08 19:40:20 +02:00
antirez
f09b2508f4 Cluster: properly send ping to nodes not pinged foro too much time.
In commit d728ec6 it was introduced the concept of sending a ping to
every node not receiving a ping since node_timeout/2 seconds.
However the code was located in a place that was not executed because of
a previous conditional causing the loop to re-iterate.

This caused false positives in nodes availability detection.

The current code is still not perfect as a node may be detected to be in
PFAIL state even if it does not reply for just node_timeout/2 seconds
that is not correct. There is a plan to improve this code ASAP.
2013-04-08 19:40:20 +02:00
antirez
90293ad01b Cluster: move REDIS_CLUSTER_FAILOVER_DELAY near other timing defines. 2013-04-04 14:23:34 +02:00
antirez
d5b383477e Cluster: move REDIS_CLUSTER_FAILOVER_DELAY near other timing defines. 2013-04-04 14:23:34 +02:00
antirez
da03b66774 Cluster: CONFIG GET cluster-node-timeout. 2013-04-04 14:21:01 +02:00
antirez
3cc6e7d01d Cluster: CONFIG GET cluster-node-timeout. 2013-04-04 14:21:01 +02:00
antirez
0c0db1bc3d Cluster: node timeout is now configurable. 2013-04-04 12:29:10 +02:00
antirez
05fa4f4034 Cluster: node timeout is now configurable. 2013-04-04 12:29:10 +02:00
antirez
2e9c57f2aa Cluster: turn hardcoded node timeout multiplicators into defines.
Most Redis Cluster time limits are expressed in terms of the configured
node timeout. Turn them into defines.
2013-04-04 12:04:11 +02:00
antirez
00bab23c41 Cluster: turn hardcoded node timeout multiplicators into defines.
Most Redis Cluster time limits are expressed in terms of the configured
node timeout. Turn them into defines.
2013-04-04 12:04:11 +02:00
antirez
d8a59ffc18 Make rio.c comment 80-columns friendly. 2013-04-03 12:41:14 +02:00
antirez
8419397665 Make rio.c comment 80-columns friendly. 2013-04-03 12:41:14 +02:00
antirez
a9d031c771 Throttle BGSAVE attempt on saving error.
When a BGSAVE fails, Redis used to flood itself trying to BGSAVE at
every next cron call, that is either 10 or 100 times per second
depending on configuration and server version.

This commit does not allow a new automatic BGSAVE attempt to be
performed before a few seconds delay (currently 5).

This avoids both the auto-flood problem and filling the disk with
logs at a serious rate.

The five seconds limit, considering a log entry of 200 bytes, will use
less than 4 MB of disk space per day that is reasonable, the sysadmin
should notice before of catastrofic events especially since by default
Redis will stop serving write queries after the first failed BGSAVE.

This fixes issue #849
2013-04-02 14:05:50 +02:00
antirez
b237de33d1 Throttle BGSAVE attempt on saving error.
When a BGSAVE fails, Redis used to flood itself trying to BGSAVE at
every next cron call, that is either 10 or 100 times per second
depending on configuration and server version.

This commit does not allow a new automatic BGSAVE attempt to be
performed before a few seconds delay (currently 5).

This avoids both the auto-flood problem and filling the disk with
logs at a serious rate.

The five seconds limit, considering a log entry of 200 bytes, will use
less than 4 MB of disk space per day that is reasonable, the sysadmin
should notice before of catastrofic events especially since by default
Redis will stop serving write queries after the first failed BGSAVE.

This fixes issue #849
2013-04-02 14:05:50 +02:00
antirez
419ca24c7e Version bumped to 2.9.9. 2013-04-02 11:55:23 +02:00
antirez
b14fda7deb Version bumped to 2.9.9. 2013-04-02 11:55:23 +02:00
Salvatore Sanfilippo
51d1e00564 Merge pull request #1017 from jbergstroem/build-improvements
Build improvements
2013-04-02 02:24:52 -07:00