12903 Commits

Author SHA1 Message Date
antirez
8f7bf2841a Cluster: decrease ping/pong traffic by trusting other nodes reports.
Cluster of bigger sizes tend to have a lot of traffic in the cluster bus
just for failure detection: a node will try to get a ping reply from
another node no longer than when the half the node timeout would elapsed,
in order to avoid a false positive.

However this means that if we have N nodes and the node timeout is set
to, for instance M seconds, we'll have to ping N nodes every M/2
seconds. This N*M/2 pings will receive the same number of pongs, so
a total of N*M packets per node. However given that we have a total of N
nodes doing this, the total number of messages will be N*N*M.

In a 100 nodes cluster with a timeout of 60 seconds, this translates
to a total of 100*100*30 packets per second, summing all the packets
exchanged by all the nodes.

This is, as you can guess, a lot... So this patch changes the
implementation in a very simple way in order to trust the reports of
other nodes: if a node A reports a node B as alive at least up to
a given time, we update our view accordingly.

The problem with this approach is that it could result into a subset of
nodes being able to reach a given node X, and preventing others from
detecting that is actually not reachable from the majority of nodes.
So the above algorithm is refined by trusting other nodes only if we do
not have currently a ping pending for the node X, and if there are no
failure reports for that node.

Since each node, anyway, pings 10 other nodes every second (one node
every 100 milliseconds), anyway eventually even trusting the other nodes
reports, we will detect if a given node is down from our POV.

Now to understand the number of packets that the cluster would exchange
for failure detection with the patch, we can start considering the
random PINGs that the cluster sent anyway as base line:
Each node sends 10 packets per second, so the total traffic if no
additioal packets would be sent, including PONG packets, would be:

    Total messages per second = N*10*2

However by trusting other nodes gossip sections will not AWALYS prevent
pinging nodes for the "half timeout reached" rule all the times. The
math involved in computing the actual rate as N and M change is quite
complex and depends also on another parameter, which is the number of
entries in the gossip section of PING and PONG packets. However it is
possible to compare what happens in cluster of different sizes
experimentally. After applying this patch a very important reduction in
the number of packets exchanged is trivial to observe, without apparent
impacts on the failure detection performances.

Actual numbers with different cluster sizes should be published in the
Reids Cluster documentation in the future.

Related to #3929.
2017-04-14 10:43:53 +02:00
antirez
de2eed4838 Cluster: decrease ping/pong traffic by trusting other nodes reports.
Cluster of bigger sizes tend to have a lot of traffic in the cluster bus
just for failure detection: a node will try to get a ping reply from
another node no longer than when the half the node timeout would elapsed,
in order to avoid a false positive.

However this means that if we have N nodes and the node timeout is set
to, for instance M seconds, we'll have to ping N nodes every M/2
seconds. This N*M/2 pings will receive the same number of pongs, so
a total of N*M packets per node. However given that we have a total of N
nodes doing this, the total number of messages will be N*N*M.

In a 100 nodes cluster with a timeout of 60 seconds, this translates
to a total of 100*100*30 packets per second, summing all the packets
exchanged by all the nodes.

This is, as you can guess, a lot... So this patch changes the
implementation in a very simple way in order to trust the reports of
other nodes: if a node A reports a node B as alive at least up to
a given time, we update our view accordingly.

The problem with this approach is that it could result into a subset of
nodes being able to reach a given node X, and preventing others from
detecting that is actually not reachable from the majority of nodes.
So the above algorithm is refined by trusting other nodes only if we do
not have currently a ping pending for the node X, and if there are no
failure reports for that node.

Since each node, anyway, pings 10 other nodes every second (one node
every 100 milliseconds), anyway eventually even trusting the other nodes
reports, we will detect if a given node is down from our POV.

Now to understand the number of packets that the cluster would exchange
for failure detection with the patch, we can start considering the
random PINGs that the cluster sent anyway as base line:
Each node sends 10 packets per second, so the total traffic if no
additioal packets would be sent, including PONG packets, would be:

    Total messages per second = N*10*2

However by trusting other nodes gossip sections will not AWALYS prevent
pinging nodes for the "half timeout reached" rule all the times. The
math involved in computing the actual rate as N and M change is quite
complex and depends also on another parameter, which is the number of
entries in the gossip section of PING and PONG packets. However it is
possible to compare what happens in cluster of different sizes
experimentally. After applying this patch a very important reduction in
the number of packets exchanged is trivial to observe, without apparent
impacts on the failure detection performances.

Actual numbers with different cluster sizes should be published in the
Reids Cluster documentation in the future.

Related to #3929.
2017-04-14 10:43:53 +02:00
antirez
c5d6f577f0 Cluster: collect more specific bus messages stats.
First step in order to change Cluster in order to use less messages.
Related to issue #3929.
2017-04-13 19:22:35 +02:00
antirez
c628ae114b Cluster: collect more specific bus messages stats.
First step in order to change Cluster in order to use less messages.
Related to issue #3929.
2017-04-13 19:22:35 +02:00
Itamar Haber
b8286d1fc9 Changes command stats iteration to being dict-based
With the addition of modules, looping over the redisCommandTable
misses any added commands. By moving to dictionary iteration this
is resolved.
2017-04-13 17:03:46 +03:00
Itamar Haber
cac5a8b65d Changes command stats iteration to being dict-based
With the addition of modules, looping over the redisCommandTable
misses any added commands. By moving to dictionary iteration this
is resolved.
2017-04-13 17:03:46 +03:00
antirez
104584b95e Fix typo in feedReplicationBacklog() top comment. 2017-04-12 12:28:05 +02:00
antirez
341adb516b Fix typo in feedReplicationBacklog() top comment. 2017-04-12 12:28:05 +02:00
antirez
1210af3804 Add a top comment in crucial functions inside networking.c. 2017-04-12 10:12:27 +02:00
antirez
6081a5873c Add a top comment in crucial functions inside networking.c. 2017-04-12 10:12:27 +02:00
antirez
4a850be4dc Set lua-time-limit default value at safe place.
Otherwise, as it was, it will overwrite whatever the user set.

Close #3703.
2017-04-11 16:56:00 +02:00
antirez
7c415014b0 Set lua-time-limit default value at safe place.
Otherwise, as it was, it will overwrite whatever the user set.

Close #3703.
2017-04-11 16:56:00 +02:00
antirez
f47607af02 Fix preprocessor if/else chain broken in order to fix #3927. 2017-04-11 16:54:27 +02:00
antirez
ffce9ebf9b Fix preprocessor if/else chain broken in order to fix #3927. 2017-04-11 16:54:27 +02:00
antirez
74720ea993 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2017-04-11 16:45:49 +02:00
antirez
de3e670282 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2017-04-11 16:45:49 +02:00
antirez
aa5b4be02e Fix zmalloc_get_memory_size() ifdefs to actually use the else branch.
Close #3927.
2017-04-11 16:45:11 +02:00
antirez
7e0c3177d6 Fix zmalloc_get_memory_size() ifdefs to actually use the else branch.
Close #3927.
2017-04-11 16:45:11 +02:00
Salvatore Sanfilippo
69ce5c5d10 Merge pull request #3924 from lorneli/unstable
Expire: Update comment of activeExpireCycle function
2017-04-11 16:31:55 +02:00
Salvatore Sanfilippo
3b0a442dd2 Merge pull request #3924 from lorneli/unstable
Expire: Update comment of activeExpireCycle function
2017-04-11 16:31:55 +02:00
antirez
531647bb1b Make more obvious why there was issue #3843. 2017-04-10 13:17:05 +02:00
antirez
71c350f73b Make more obvious why there was issue #3843. 2017-04-10 13:17:05 +02:00
Salvatore Sanfilippo
01b6966afc Merge pull request #3843 from dvirsky/fix_bc_free
fixed free of blocked client before refering to it
2017-04-10 13:14:52 +02:00
Salvatore Sanfilippo
cccd21d5c6 Merge pull request #3843 from dvirsky/fix_bc_free
fixed free of blocked client before refering to it
2017-04-10 13:14:52 +02:00
antirez
ffefc9f92d Fix modules blocking commands awake delay.
If a thread unblocks a client blocked in a module command, by using the
RedisMdoule_UnblockClient() API, the event loop may not be awaken until
the next timeout of the multiplexing API or the next unrelated I/O
operation on other clients. We actually want the client to be served
ASAP, so a mechanism is needed in order for the unblocking API to inform
Redis that there is a client to serve ASAP.

This commit fixes the issue using the old trick of the pipe: when a
client needs to be unblocked, a byte is written in a pipe. When we run
the list of clients blocked in modules, we consume all the bytes
written in the pipe. Writes and reads are performed inside the context
of the mutex, so no race is possible in which we consume the bytes that
are actually related to an awake request for a client that should still
be put into the list of clients to unblock.

It was verified that after the fix the server handles the blocked
clients with the expected short delay.

Thanks to @dvirsky for understanding there was such a problem and
reporting it.
2017-04-10 09:33:21 +02:00
antirez
60ffbb72b4 Fix modules blocking commands awake delay.
If a thread unblocks a client blocked in a module command, by using the
RedisMdoule_UnblockClient() API, the event loop may not be awaken until
the next timeout of the multiplexing API or the next unrelated I/O
operation on other clients. We actually want the client to be served
ASAP, so a mechanism is needed in order for the unblocking API to inform
Redis that there is a client to serve ASAP.

This commit fixes the issue using the old trick of the pipe: when a
client needs to be unblocked, a byte is written in a pipe. When we run
the list of clients blocked in modules, we consume all the bytes
written in the pipe. Writes and reads are performed inside the context
of the mutex, so no race is possible in which we consume the bytes that
are actually related to an awake request for a client that should still
be put into the list of clients to unblock.

It was verified that after the fix the server handles the blocked
clients with the expected short delay.

Thanks to @dvirsky for understanding there was such a problem and
reporting it.
2017-04-10 09:33:21 +02:00
antirez
91999fce40 Rax library updated.
Important bugs fixed.
2017-04-08 17:31:13 +02:00
antirez
f9db3144d6 Rax library updated.
Important bugs fixed.
2017-04-08 17:31:13 +02:00
lorneli
98db5739cc Expire: Update comment of activeExpireCycle function
The macro REDIS_EXPIRELOOKUPS_TIME_PERC has been replaced by
ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC in commit
6500fabfb881a7ffaadfbff74ab801c55d4591fc.
2017-04-08 15:15:24 +08:00
lorneli
e8b44eb33a Expire: Update comment of activeExpireCycle function
The macro REDIS_EXPIRELOOKUPS_TIME_PERC has been replaced by
ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC in commit
6500fabfb881a7ffaadfbff74ab801c55d4591fc.
2017-04-08 15:15:24 +08:00
antirez
3f9e2322ec Rax library updated. 2017-04-07 08:46:39 +02:00
antirez
fcad87788a Rax library updated. 2017-04-07 08:46:39 +02:00
antirez
1409c545da Cluster: hash slots tracking using a radix tree. 2017-03-27 16:37:22 +02:00
antirez
de52b6375b Cluster: hash slots tracking using a radix tree. 2017-03-27 16:37:22 +02:00
Salvatore Sanfilippo
94751543b0 Merge pull request #3875 from oranagra/lfu_tests
add LFU policies to the test suite, just for coverage
2017-03-15 09:18:04 +01:00
Salvatore Sanfilippo
7d036aadc2 Merge pull request #3875 from oranagra/lfu_tests
add LFU policies to the test suite, just for coverage
2017-03-15 09:18:04 +01:00
Oran Agra
4acb4da1d1 add LFU policies to the test suite, just for coverage 2017-03-15 01:05:15 -07:00
Oran Agra
499595f510 add LFU policies to the test suite, just for coverage 2017-03-15 01:05:15 -07:00
antirez
a62f786344 Use sha256 instead of sha1 to generate tarball hashes. 2017-03-09 13:49:36 +01:00
antirez
01f56d44dc Use sha256 instead of sha1 to generate tarball hashes. 2017-03-09 13:49:36 +01:00
vienna
59bdd08214 fix #3847: add close socket before return ANET_ERR. 2017-03-07 16:14:05 +00:00
vienna
bcb1240ccf fix #3847: add close socket before return ANET_ERR. 2017-03-07 16:14:05 +00:00
itamar
443f279a3a Sets up fake client to select current db in RM_Call() 2017-03-06 14:37:10 +02:00
itamar
a04ba58d9a Sets up fake client to select current db in RM_Call() 2017-03-06 14:37:10 +02:00
Guy Benoish
71a8df6a2b Merge branch 'unstable' of https://github.com/antirez/redis into unstable 2017-03-02 13:25:05 +02:00
Guy Benoish
dcbf01295b Merge branch 'unstable' of https://github.com/antirez/redis into unstable 2017-03-02 13:25:05 +02:00
Dvir Volk
4b2229e4b8 fixed free of blocked client before refering to it 2017-03-01 16:51:01 +02:00
Dvir Volk
c40322945a fixed free of blocked client before refering to it 2017-03-01 16:51:01 +02:00
Salvatore Sanfilippo
9cc83d2ad9 Makefile: fix building with Solaris C compiler, 64 bit. 2017-02-23 16:53:39 +01:00
Salvatore Sanfilippo
1f0dae3c7f Makefile: fix building with Solaris C compiler, 64 bit. 2017-02-23 16:53:39 +01:00