27610 Commits

Author SHA1 Message Date
antirez
7c531eb5ad Don't send more than 1 newline/sec while loading RDB. 2013-12-10 18:43:19 +01:00
antirez
54a526687d Slaves heartbeat while loading RDB files.
Starting with Redis 2.8 masters are able to detect timed out slaves,
while before 2.8 only slaves were able to detect a timed out master.

Now that timeout detection is bi-directional the following problem
happens as described "in the field" by issue #1449:

1) Master and slave setup with big dataset.
2) Slave performs the first synchronization, or a full sync
   after a failed partial resync.
3) Master sends the RDB payload to the slave.
4) Slave loads this payload.
5) Master detects the slave as timed out since does not receive back the
   REPLCONF ACK acknowledges.

Here the problem is that the master has no way to know how much the
slave will take to load the RDB file in memory. The obvious solution is
to use a greater replication timeout setting, but this is a shame since
for the 0.1% of operation time we are forced to use a timeout that is
not what is suited for 99.9% of operation time.

This commit tries to fix this problem with a solution that is a bit of
an hack, but that modifies little of the replication internals, in order
to be back ported to 2.8 safely.

During the RDB loading time, we send the master newlines to avoid
being sensed as timed out. This is the same that the master already does
while saving the RDB file to still signal its presence to the slave.

The single newline is used because:

1) It can't desync the protocol, as it is only transmitted all or
nothing.
2) It can be safely sent while we don't have a client structure for the
master or in similar situations just with write(2).
2013-12-09 20:26:00 +01:00
antirez
27db38d069 Slaves heartbeat while loading RDB files.
Starting with Redis 2.8 masters are able to detect timed out slaves,
while before 2.8 only slaves were able to detect a timed out master.

Now that timeout detection is bi-directional the following problem
happens as described "in the field" by issue #1449:

1) Master and slave setup with big dataset.
2) Slave performs the first synchronization, or a full sync
   after a failed partial resync.
3) Master sends the RDB payload to the slave.
4) Slave loads this payload.
5) Master detects the slave as timed out since does not receive back the
   REPLCONF ACK acknowledges.

Here the problem is that the master has no way to know how much the
slave will take to load the RDB file in memory. The obvious solution is
to use a greater replication timeout setting, but this is a shame since
for the 0.1% of operation time we are forced to use a timeout that is
not what is suited for 99.9% of operation time.

This commit tries to fix this problem with a solution that is a bit of
an hack, but that modifies little of the replication internals, in order
to be back ported to 2.8 safely.

During the RDB loading time, we send the master newlines to avoid
being sensed as timed out. This is the same that the master already does
while saving the RDB file to still signal its presence to the slave.

The single newline is used because:

1) It can't desync the protocol, as it is only transmitted all or
nothing.
2) It can be safely sent while we don't have a client structure for the
master or in similar situations just with write(2).
2013-12-09 20:26:00 +01:00
antirez
ae81525d35 Handle inline requested terminated with just \n. 2013-12-09 13:28:39 +01:00
antirez
eaf1bfb88b Handle inline requested terminated with just \n. 2013-12-09 13:28:39 +01:00
Yossi Gottlieb
834a5f530d Return proper error on requests with an unbalanced number of quotes. 2013-12-08 12:58:12 +02:00
Yossi Gottlieb
6e70c01148 Return proper error on requests with an unbalanced number of quotes. 2013-12-08 12:58:12 +02:00
antirez
b6d79f34e8 Sentinel: fix reported role info sampling.
The way the role change was recoded was not sane and too much
convoluted, causing the role information to be not always updated.

This commit fixes issue #1445.
2013-12-06 12:46:56 +01:00
antirez
c590549e40 Sentinel: fix reported role info sampling.
The way the role change was recoded was not sane and too much
convoluted, causing the role information to be not always updated.

This commit fixes issue #1445.
2013-12-06 12:46:56 +01:00
antirez
33ea913329 Sentinel: fix reported role fields when master is reset.
When there is a master address switch, the reported role must be set to
master so that we have a chance to re-sample the INFO output to check if
the new address is reporting the right role.

Otherwise if the role was wrong, it will be sensed as wrong even after
the address switch, and for enough time according to the role change
time, for Sentinel consider the master SDOWN.

This fixes isue #1446, that describes the effects of this bug in
practice.
2013-12-06 11:37:46 +01:00
antirez
2b414a4b5f Sentinel: fix reported role fields when master is reset.
When there is a master address switch, the reported role must be set to
master so that we have a chance to re-sample the INFO output to check if
the new address is reporting the right role.

Otherwise if the role was wrong, it will be sensed as wrong even after
the address switch, and for enough time according to the role change
time, for Sentinel consider the master SDOWN.

This fixes isue #1446, that describes the effects of this bug in
practice.
2013-12-06 11:37:46 +01:00
antirez
87d6939e79 Fixed typo in redis.conf. 2013-12-06 10:48:46 +01:00
antirez
8534a290d3 Fixed typo in redis.conf. 2013-12-06 10:48:46 +01:00
Salvatore Sanfilippo
7230831fd5 Merge pull request #1439 from AnuragRamdasan/patch-3
Grammar fix.
2013-12-05 09:53:45 -08:00
Salvatore Sanfilippo
2ef57f8d47 Merge pull request #1439 from AnuragRamdasan/patch-3
Grammar fix.
2013-12-05 09:53:45 -08:00
Anurag Ramdasan
f775787f2e Grammar fix. 2013-12-05 23:15:47 +05:30
Anurag Ramdasan
839ed7a60b Grammar fix. 2013-12-05 23:15:47 +05:30
Salvatore Sanfilippo
d5c369a836 Merge pull request #1438 from AnuragRamdasan/patch-2
fixed typo
2013-12-05 08:18:20 -08:00
Salvatore Sanfilippo
026e561446 Merge pull request #1438 from AnuragRamdasan/patch-2
fixed typo
2013-12-05 08:18:20 -08:00
Anurag Ramdasan
f416ddc327 fixed typo 2013-12-05 21:47:17 +05:30
Anurag Ramdasan
fb6b9b14bd fixed typo 2013-12-05 21:47:17 +05:30
Salvatore Sanfilippo
4eda9be590 Merge pull request #1437 from AnuragRamdasan/patch-1
Fixed grammar: 'usually' to 'usual'
2013-12-05 07:42:05 -08:00
Salvatore Sanfilippo
cbaad0b26f Merge pull request #1437 from AnuragRamdasan/patch-1
Fixed grammar: 'usually' to 'usual'
2013-12-05 07:42:05 -08:00
Anurag Ramdasan
6232143904 Fixed grammar: 'usually' to 'usual' 2013-12-05 21:09:31 +05:30
Anurag Ramdasan
74431b80a3 Fixed grammar: 'usually' to 'usual' 2013-12-05 21:09:31 +05:30
antirez
7a5a646df9 Fixed grammar: before H the article is a, not an. 2013-12-05 16:35:32 +01:00
antirez
11e81a1e9a Fixed grammar: before H the article is a, not an. 2013-12-05 16:35:32 +01:00
antirez
6763faef58 Fixed typos in redis.conf file. 2013-12-05 16:28:35 +01:00
antirez
74da4a574f Fixed typos in redis.conf file. 2013-12-05 16:28:35 +01:00
antirez
faa5495eea Fix clients timeout handling.
During the refactoring of blocking operations, commit
83e363d3e67c27865d7679c27f466c5e12b3d4ee, a bug was introduced where
a milliseconds time is compared to a seconds time, so all the clients
always appear to timeout if timeout is set to non-zero value.

Thanks to Jonathan Leibiusky for finding the bug and helping verifying
the cause and fix.
2013-12-05 14:55:07 +01:00
antirez
58713c6b13 Fix clients timeout handling.
During the refactoring of blocking operations, commit
82b672f6335ac2db32a724ba5dc10398c949a4a8, a bug was introduced where
a milliseconds time is compared to a seconds time, so all the clients
always appear to timeout if timeout is set to non-zero value.

Thanks to Jonathan Leibiusky for finding the bug and helping verifying
the cause and fix.
2013-12-05 14:55:07 +01:00
antirez
a7ebb0c7bf WAIT command: synchronous replication for Redis. 2013-12-04 16:20:03 +01:00
antirez
c5618e7fdd WAIT command: synchronous replication for Redis. 2013-12-04 16:20:03 +01:00
antirez
5f743cc4f8 blocked.c API commented. 2013-12-03 18:03:15 +01:00
antirez
c2f305545a blocked.c API commented. 2013-12-03 18:03:15 +01:00
antirez
83e363d3e6 BLPOP blocking code refactored to be generic & reusable. 2013-12-03 17:43:53 +01:00
antirez
82b672f633 BLPOP blocking code refactored to be generic & reusable. 2013-12-03 17:43:53 +01:00
antirez
a6ed453b33 Removed old comments and dead code from freeClient(). 2013-12-03 13:54:06 +01:00
antirez
2e027c48e5 Removed old comments and dead code from freeClient(). 2013-12-03 13:54:06 +01:00
antirez
1ea9a283cb Grammar fix in freeClient(). 2013-12-03 13:40:41 +01:00
antirez
e4025ea926 Grammar fix in freeClient(). 2013-12-03 13:40:41 +01:00
antirez
6fc6c6bda9 Sentinel: don't write HZ when flushing config.
See issue #1419.
2013-12-02 15:56:10 +01:00
antirez
f80cf7363a Sentinel: don't write HZ when flushing config.
See issue #1419.
2013-12-02 15:56:10 +01:00
antirez
4df452caf6 Sentinel: better time desynchronization.
Sentinels are now desynchronized in a better way changing the time
handler frequency between 10 and 20 HZ. This way on average a
desynchronization of 25 milliesconds is produced that should be larger
enough compared to network latency, avoiding most split-brain condition
during the vote.

Now that the clocks are desynchronized, to have larger random delays when
performing operations can be easily achieved in the following way.
Take as example the function that starts the failover, that is
called with a frequency between 10 and 20 HZ and will start the
failover every time there are the conditions. By just adding as an
additional condition something like rand()%4 == 0, we can amplify the
desynchronization between Sentinel instances easily.

See issue #1419.
2013-12-02 12:29:42 +01:00
antirez
dffebbc904 Sentinel: better time desynchronization.
Sentinels are now desynchronized in a better way changing the time
handler frequency between 10 and 20 HZ. This way on average a
desynchronization of 25 milliesconds is produced that should be larger
enough compared to network latency, avoiding most split-brain condition
during the vote.

Now that the clocks are desynchronized, to have larger random delays when
performing operations can be easily achieved in the following way.
Take as example the function that starts the failover, that is
called with a frequency between 10 and 20 HZ and will start the
failover every time there are the conditions. By just adding as an
additional condition something like rand()%4 == 0, we can amplify the
desynchronization between Sentinel instances easily.

See issue #1419.
2013-12-02 12:29:42 +01:00
antirez
b7c955046d Cluster: nodes re-addition blacklist API. 2013-12-02 11:12:23 +01:00
antirez
6fa42b7507 Cluster: nodes re-addition blacklist API. 2013-12-02 11:12:23 +01:00
antirez
5502face59 Cluster: basic data structures for nodes black list. 2013-11-29 17:37:06 +01:00
antirez
8f18345ef0 Cluster: basic data structures for nodes black list. 2013-11-29 17:37:06 +01:00
antirez
a829c85988 Cluster: some code about clusterHandleSlaveFailover() marginally improved.
80 cols friendly, some minor change to the code to make it simpler.
2013-11-29 16:17:05 +01:00