11882 Commits

Author SHA1 Message Date
Salvatore Sanfilippo
be2a34d871 Merge pull request #3945 from badboy/dicthash-bench-compile
Reorder to make dict-benchmark compile on Linux
2017-04-18 16:31:18 +02:00
antirez
02d02a3754 Fix #3848 by closing the descriptor on error. 2017-04-18 16:24:06 +02:00
antirez
f52c65edf2 Fix #3848 by closing the descriptor on error. 2017-04-18 16:24:06 +02:00
antirez
8b7b4d6734 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2017-04-18 16:15:24 +02:00
antirez
e3e243744c Merge branch 'unstable' of github.com:/antirez/redis into unstable 2017-04-18 16:15:24 +02:00
antirez
da2f9cd186 Fix descriptor leak. Close #3848. 2017-04-18 16:15:16 +02:00
antirez
199b0b0c36 Fix descriptor leak. Close #3848. 2017-04-18 16:15:16 +02:00
Salvatore Sanfilippo
332a05dc33 Merge pull request #3856 from viennadd/issue-3847
fix #3847: add close socket before return ANET_ERR.
2017-04-18 16:13:23 +02:00
Salvatore Sanfilippo
d840bc0dbd Merge pull request #3856 from viennadd/issue-3847
fix #3847: add close socket before return ANET_ERR.
2017-04-18 16:13:23 +02:00
张文康
5f88bd320e update block->free after some diff data are written to the child process 2017-04-18 20:10:08 +08:00
张文康
f34c984f2f update block->free after some diff data are written to the child process 2017-04-18 20:10:08 +08:00
antirez
c33493277a Clarify why we save ziplist elements in revserse order.
Also get rid of variables that are now kinda redundant, since the
dictionary iterator was removed.

This is related to PR #3949.
2017-04-18 11:01:47 +02:00
antirez
083636b801 Clarify why we save ziplist elements in revserse order.
Also get rid of variables that are now kinda redundant, since the
dictionary iterator was removed.

This is related to PR #3949.
2017-04-18 11:01:47 +02:00
Salvatore Sanfilippo
0a942f1751 Merge pull request #3949 from spinlock/unstable-rdb-encoding
rdb: saving skiplist in reversed order to accelerate the deserialisation process
2017-04-18 10:56:57 +02:00
Salvatore Sanfilippo
174c5846a5 Merge pull request #3949 from spinlock/unstable-rdb-encoding
rdb: saving skiplist in reversed order to accelerate the deserialisation process
2017-04-18 10:56:57 +02:00
Jan-Erik Rediger
c4ad4765b0 Reorder to make dict-benchmark compile on Linux
Fixes #3944
2017-04-17 13:37:59 +02:00
Jan-Erik Rediger
606c74baf6 Reorder to make dict-benchmark compile on Linux
Fixes #3944
2017-04-17 13:37:59 +02:00
spinlock
23ec36909e rdb: saving skiplist in reversed order to accelerate the deserialisation process 2017-04-17 13:22:34 +08:00
spinlock
ed5d5d6633 rdb: saving skiplist in reversed order to accelerate the deserialisation process 2017-04-17 13:22:34 +08:00
antirez
271733f4f8 Cluster: discard pong times in the future.
However we allow for 500 milliseconds of tolerance, in order to
avoid often discarding semantically valid info (the node is up)
because of natural few milliseconds desync among servers even when
NTP is used.

Note that anyway we should ping the node from time to time regardless and
discover if it's actually down from our point of view, since no update
is accepted while we have an active ping on the node.

Related to #3929.
2017-04-15 10:12:08 +02:00
antirez
4d4f22843c Cluster: discard pong times in the future.
However we allow for 500 milliseconds of tolerance, in order to
avoid often discarding semantically valid info (the node is up)
because of natural few milliseconds desync among servers even when
NTP is used.

Note that anyway we should ping the node from time to time regardless and
discover if it's actually down from our point of view, since no update
is accepted while we have an active ping on the node.

Related to #3929.
2017-04-15 10:12:08 +02:00
antirez
3f068b92b9 Test: fix, hopefully, false PSYNC failure like in issue #2715.
And many other related Github issues... all reporting the same problem.
There was probably just not enough backlog in certain unlucky runs.
I'll ask people that can reporduce if they see now this as fixed as
well.
2017-04-14 17:53:11 +02:00
antirez
a2e3dc0a1d Test: fix, hopefully, false PSYNC failure like in issue #2715.
And many other related Github issues... all reporting the same problem.
There was probably just not enough backlog in certain unlucky runs.
I'll ask people that can reporduce if they see now this as fixed as
well.
2017-04-14 17:53:11 +02:00
antirez
02777bb252 Cluster: always add PFAIL nodes at end of gossip section.
To rely on the fact that nodes in PFAIL state will be shared around by
randomly adding them in the gossip section is a weak assumption,
especially after changes related to sending less ping/pong packets.

We want to always include gossip entries for all the nodes that are in
PFAIL state, so that the PFAIL -> FAIL state promotion can happen much
faster and reliably.

Related to #3929.
2017-04-14 13:39:49 +02:00
antirez
0530c7e564 Cluster: always add PFAIL nodes at end of gossip section.
To rely on the fact that nodes in PFAIL state will be shared around by
randomly adding them in the gossip section is a weak assumption,
especially after changes related to sending less ping/pong packets.

We want to always include gossip entries for all the nodes that are in
PFAIL state, so that the PFAIL -> FAIL state promotion can happen much
faster and reliably.

Related to #3929.
2017-04-14 13:39:49 +02:00
antirez
8c829d9e43 Cluster: fix gossip section ping/pong times encoding.
The gossip section times are 32 bit, so cannot store the milliseconds
time but just the seconds approximation, which is good enough for our
uses. At the same time however, when comparing the gossip section times
of other nodes with our node's view, we need to convert back to
milliseconds.

Related to #3929. Without this change the patch to reduce the traffic in
the bus message does not work.
2017-04-14 11:01:22 +02:00
antirez
06de4577d6 Cluster: fix gossip section ping/pong times encoding.
The gossip section times are 32 bit, so cannot store the milliseconds
time but just the seconds approximation, which is good enough for our
uses. At the same time however, when comparing the gossip section times
of other nodes with our node's view, we need to convert back to
milliseconds.

Related to #3929. Without this change the patch to reduce the traffic in
the bus message does not work.
2017-04-14 11:01:22 +02:00
antirez
6878a3fedd Cluster: add clean-logs command to create-cluster script. 2017-04-14 10:52:00 +02:00
antirez
81a36f41ec Cluster: add clean-logs command to create-cluster script. 2017-04-14 10:52:00 +02:00
antirez
8f7bf2841a Cluster: decrease ping/pong traffic by trusting other nodes reports.
Cluster of bigger sizes tend to have a lot of traffic in the cluster bus
just for failure detection: a node will try to get a ping reply from
another node no longer than when the half the node timeout would elapsed,
in order to avoid a false positive.

However this means that if we have N nodes and the node timeout is set
to, for instance M seconds, we'll have to ping N nodes every M/2
seconds. This N*M/2 pings will receive the same number of pongs, so
a total of N*M packets per node. However given that we have a total of N
nodes doing this, the total number of messages will be N*N*M.

In a 100 nodes cluster with a timeout of 60 seconds, this translates
to a total of 100*100*30 packets per second, summing all the packets
exchanged by all the nodes.

This is, as you can guess, a lot... So this patch changes the
implementation in a very simple way in order to trust the reports of
other nodes: if a node A reports a node B as alive at least up to
a given time, we update our view accordingly.

The problem with this approach is that it could result into a subset of
nodes being able to reach a given node X, and preventing others from
detecting that is actually not reachable from the majority of nodes.
So the above algorithm is refined by trusting other nodes only if we do
not have currently a ping pending for the node X, and if there are no
failure reports for that node.

Since each node, anyway, pings 10 other nodes every second (one node
every 100 milliseconds), anyway eventually even trusting the other nodes
reports, we will detect if a given node is down from our POV.

Now to understand the number of packets that the cluster would exchange
for failure detection with the patch, we can start considering the
random PINGs that the cluster sent anyway as base line:
Each node sends 10 packets per second, so the total traffic if no
additioal packets would be sent, including PONG packets, would be:

    Total messages per second = N*10*2

However by trusting other nodes gossip sections will not AWALYS prevent
pinging nodes for the "half timeout reached" rule all the times. The
math involved in computing the actual rate as N and M change is quite
complex and depends also on another parameter, which is the number of
entries in the gossip section of PING and PONG packets. However it is
possible to compare what happens in cluster of different sizes
experimentally. After applying this patch a very important reduction in
the number of packets exchanged is trivial to observe, without apparent
impacts on the failure detection performances.

Actual numbers with different cluster sizes should be published in the
Reids Cluster documentation in the future.

Related to #3929.
2017-04-14 10:43:53 +02:00
antirez
de2eed4838 Cluster: decrease ping/pong traffic by trusting other nodes reports.
Cluster of bigger sizes tend to have a lot of traffic in the cluster bus
just for failure detection: a node will try to get a ping reply from
another node no longer than when the half the node timeout would elapsed,
in order to avoid a false positive.

However this means that if we have N nodes and the node timeout is set
to, for instance M seconds, we'll have to ping N nodes every M/2
seconds. This N*M/2 pings will receive the same number of pongs, so
a total of N*M packets per node. However given that we have a total of N
nodes doing this, the total number of messages will be N*N*M.

In a 100 nodes cluster with a timeout of 60 seconds, this translates
to a total of 100*100*30 packets per second, summing all the packets
exchanged by all the nodes.

This is, as you can guess, a lot... So this patch changes the
implementation in a very simple way in order to trust the reports of
other nodes: if a node A reports a node B as alive at least up to
a given time, we update our view accordingly.

The problem with this approach is that it could result into a subset of
nodes being able to reach a given node X, and preventing others from
detecting that is actually not reachable from the majority of nodes.
So the above algorithm is refined by trusting other nodes only if we do
not have currently a ping pending for the node X, and if there are no
failure reports for that node.

Since each node, anyway, pings 10 other nodes every second (one node
every 100 milliseconds), anyway eventually even trusting the other nodes
reports, we will detect if a given node is down from our POV.

Now to understand the number of packets that the cluster would exchange
for failure detection with the patch, we can start considering the
random PINGs that the cluster sent anyway as base line:
Each node sends 10 packets per second, so the total traffic if no
additioal packets would be sent, including PONG packets, would be:

    Total messages per second = N*10*2

However by trusting other nodes gossip sections will not AWALYS prevent
pinging nodes for the "half timeout reached" rule all the times. The
math involved in computing the actual rate as N and M change is quite
complex and depends also on another parameter, which is the number of
entries in the gossip section of PING and PONG packets. However it is
possible to compare what happens in cluster of different sizes
experimentally. After applying this patch a very important reduction in
the number of packets exchanged is trivial to observe, without apparent
impacts on the failure detection performances.

Actual numbers with different cluster sizes should be published in the
Reids Cluster documentation in the future.

Related to #3929.
2017-04-14 10:43:53 +02:00
antirez
c5d6f577f0 Cluster: collect more specific bus messages stats.
First step in order to change Cluster in order to use less messages.
Related to issue #3929.
2017-04-13 19:22:35 +02:00
antirez
c628ae114b Cluster: collect more specific bus messages stats.
First step in order to change Cluster in order to use less messages.
Related to issue #3929.
2017-04-13 19:22:35 +02:00
Itamar Haber
b8286d1fc9 Changes command stats iteration to being dict-based
With the addition of modules, looping over the redisCommandTable
misses any added commands. By moving to dictionary iteration this
is resolved.
2017-04-13 17:03:46 +03:00
Itamar Haber
cac5a8b65d Changes command stats iteration to being dict-based
With the addition of modules, looping over the redisCommandTable
misses any added commands. By moving to dictionary iteration this
is resolved.
2017-04-13 17:03:46 +03:00
antirez
104584b95e Fix typo in feedReplicationBacklog() top comment. 2017-04-12 12:28:05 +02:00
antirez
341adb516b Fix typo in feedReplicationBacklog() top comment. 2017-04-12 12:28:05 +02:00
antirez
1210af3804 Add a top comment in crucial functions inside networking.c. 2017-04-12 10:12:27 +02:00
antirez
6081a5873c Add a top comment in crucial functions inside networking.c. 2017-04-12 10:12:27 +02:00
antirez
4a850be4dc Set lua-time-limit default value at safe place.
Otherwise, as it was, it will overwrite whatever the user set.

Close #3703.
2017-04-11 16:56:00 +02:00
antirez
7c415014b0 Set lua-time-limit default value at safe place.
Otherwise, as it was, it will overwrite whatever the user set.

Close #3703.
2017-04-11 16:56:00 +02:00
antirez
f47607af02 Fix preprocessor if/else chain broken in order to fix #3927. 2017-04-11 16:54:27 +02:00
antirez
ffce9ebf9b Fix preprocessor if/else chain broken in order to fix #3927. 2017-04-11 16:54:27 +02:00
antirez
74720ea993 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2017-04-11 16:45:49 +02:00
antirez
de3e670282 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2017-04-11 16:45:49 +02:00
antirez
aa5b4be02e Fix zmalloc_get_memory_size() ifdefs to actually use the else branch.
Close #3927.
2017-04-11 16:45:11 +02:00
antirez
7e0c3177d6 Fix zmalloc_get_memory_size() ifdefs to actually use the else branch.
Close #3927.
2017-04-11 16:45:11 +02:00
Salvatore Sanfilippo
69ce5c5d10 Merge pull request #3924 from lorneli/unstable
Expire: Update comment of activeExpireCycle function
2017-04-11 16:31:55 +02:00
Salvatore Sanfilippo
3b0a442dd2 Merge pull request #3924 from lorneli/unstable
Expire: Update comment of activeExpireCycle function
2017-04-11 16:31:55 +02:00
antirez
531647bb1b Make more obvious why there was issue #3843. 2017-04-10 13:17:05 +02:00