20927 Commits

Author SHA1 Message Date
Madelyn Olson
a75fa3aad1 Made crc64 test consistent 2020-04-28 11:20:15 +02:00
Madelyn Olson
1652f7b897 Implemented CRC64 based on slice by 4 2020-04-28 11:20:15 +02:00
Madelyn Olson
52c75e9db1 Implemented CRC64 based on slice by 4 2020-04-28 11:20:15 +02:00
Oran Agra
3a8f8fa487 hickup, re-fix dictEncObjKeyCompare
come to think of it, in theory (not in practice), getDecodedObject can
return the same original object with refcount incremented, so the
pointer comparision in the previous commit was invalid.
so now instead of checking the encoding, we explicitly check the
refcount.
2020-04-28 12:14:46 +03:00
Oran Agra
9a3dab0a2e hickup, re-fix dictEncObjKeyCompare
come to think of it, in theory (not in practice), getDecodedObject can
return the same original object with refcount incremented, so the
pointer comparision in the previous commit was invalid.
so now instead of checking the encoding, we explicitly check the
refcount.
2020-04-28 12:14:46 +03:00
antirez
bee51ef883 Rework comment in dictEncObjKeyCompare(). 2020-04-27 22:40:15 +02:00
antirez
31781e97b6 Rework comment in dictEncObjKeyCompare(). 2020-04-27 22:40:15 +02:00
Oran Agra
9499903e56 allow dictFind using static robj
since the recent addition of OBJ_STATIC_REFCOUNT and the assertion in
incrRefCount it is now impossible to use dictFind using a static robj,
because dictEncObjKeyCompare will call getDecodedObject which tries to
increment the refcount just in order to decrement it later.
2020-04-27 23:17:19 +03:00
Oran Agra
09a5c07886 allow dictFind using static robj
since the recent addition of OBJ_STATIC_REFCOUNT and the assertion in
incrRefCount it is now impossible to use dictFind using a static robj,
because dictEncObjKeyCompare will call getDecodedObject which tries to
increment the refcount just in order to decrement it later.
2020-04-27 23:17:19 +03:00
Salvatore Sanfilippo
9112fe9ebd Merge pull request #7148 from madolson/unstable-crc
Unstable crc
2020-04-27 17:29:13 +02:00
Salvatore Sanfilippo
911a2d6cb4
Merge pull request #7148 from madolson/unstable-crc
Unstable crc
2020-04-27 17:29:13 +02:00
Oran Agra
f1cd7f5880 optimize memory usage of deferred replies
When deffered reply is added the previous reply node cannot be used so
all the extra space we allocated in it is wasted. in case someone uses
deffered replies in a loop, each time adding a small reply, each of
these reply nodes (the small string reply) would have consumed a 16k
block.
now when we add anther diferred reply node, we trim the unused portion
of the previous reply block.

see #7123
2020-04-27 16:46:14 +02:00
Oran Agra
8110ba8880 optimize memory usage of deferred replies
When deffered reply is added the previous reply node cannot be used so
all the extra space we allocated in it is wasted. in case someone uses
deffered replies in a loop, each time adding a small reply, each of
these reply nodes (the small string reply) would have consumed a 16k
block.
now when we add anther diferred reply node, we trim the unused portion
of the previous reply block.

see #7123
2020-04-27 16:46:14 +02:00
Salvatore Sanfilippo
81a368482a Merge pull request #7146 from oranagra/optimize_deferred_reply
optimize memory usage of deferred replies
2020-04-27 16:45:47 +02:00
Salvatore Sanfilippo
828736e7d0
Merge pull request #7146 from oranagra/optimize_deferred_reply
optimize memory usage of deferred replies
2020-04-27 16:45:47 +02:00
Oran Agra
58619c1286 Keep track of meaningful replication offset in replicas too
Now both master and replicas keep track of the last replication offset
that contains meaningful data (ignoring the tailing pings), and both
trim that tail from the replication backlog, and the offset with which
they try to use for psync.

the implication is that if someone missed some pings, or even have
excessive pings that the promoted replica has, it'll still be able to
psync (avoid full sync).

the downside (which was already committed) is that replicas running old
code may fail to psync, since the promoted replica trims pings form it's
backlog.

This commit adds a test that reproduces several cases of promotions and
demotions with stale and non-stale pings

Background:
The mearningful offset on the master was added recently to solve a problem were
the master is left all alone, injecting PINGs into it's backlog when no one is
listening and then gets demoted and tries to replicate from a replica that didn't
have any of the PINGs (or at least not the last ones).

however, consider this case:
master A has two replicas (B and C) replicating directly from it.
there's no traffic at all, and also no network issues, just many pings in the
tail of the backlog. now B gets promoted, A becomes a replica of B, and C
remains a replica of A. when A gets demoted, it trims the pings from its
backlog, and successfully replicate from B. however, C is still aware of
these PINGs, when it'll disconnect and re-connect to A, it'll ask for something
that's not in the backlog anymore (since A trimmed the tail of it's backlog),
and be forced to do a full sync (something it didn't have to do before the
meaningful offset fix).

Besides that, the psync2 test was always failing randomly here and there, it
turns out the reason were PINGs. Investigating it shows the following scenario:

cycle 1: redis #1 is master, and all the rest are direct replicas of #1
cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1
now we see that when #1 is demoted it prints:
17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference)
17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964).
17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master.
and when #3 connects to the demoted #2, #2 says:
17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964

so the issue here is that the meaningful offset feature saved the day for the
demoted master (since it needs to sync from a replica that didn't get the last
ping), but it didn't help one of the other replicas which did get the last ping.
2020-04-27 15:52:49 +02:00
Oran Agra
e4d2bb62b2 Keep track of meaningful replication offset in replicas too
Now both master and replicas keep track of the last replication offset
that contains meaningful data (ignoring the tailing pings), and both
trim that tail from the replication backlog, and the offset with which
they try to use for psync.

the implication is that if someone missed some pings, or even have
excessive pings that the promoted replica has, it'll still be able to
psync (avoid full sync).

the downside (which was already committed) is that replicas running old
code may fail to psync, since the promoted replica trims pings form it's
backlog.

This commit adds a test that reproduces several cases of promotions and
demotions with stale and non-stale pings

Background:
The mearningful offset on the master was added recently to solve a problem were
the master is left all alone, injecting PINGs into it's backlog when no one is
listening and then gets demoted and tries to replicate from a replica that didn't
have any of the PINGs (or at least not the last ones).

however, consider this case:
master A has two replicas (B and C) replicating directly from it.
there's no traffic at all, and also no network issues, just many pings in the
tail of the backlog. now B gets promoted, A becomes a replica of B, and C
remains a replica of A. when A gets demoted, it trims the pings from its
backlog, and successfully replicate from B. however, C is still aware of
these PINGs, when it'll disconnect and re-connect to A, it'll ask for something
that's not in the backlog anymore (since A trimmed the tail of it's backlog),
and be forced to do a full sync (something it didn't have to do before the
meaningful offset fix).

Besides that, the psync2 test was always failing randomly here and there, it
turns out the reason were PINGs. Investigating it shows the following scenario:

cycle 1: redis #1 is master, and all the rest are direct replicas of #1
cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1
now we see that when #1 is demoted it prints:
17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference)
17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964).
17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master.
and when #3 connects to the demoted #2, #2 says:
17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964

so the issue here is that the meaningful offset feature saved the day for the
demoted master (since it needs to sync from a replica that didn't get the last
ping), but it didn't help one of the other replicas which did get the last ping.
2020-04-27 15:52:49 +02:00
antirez
a0c54d5622 Fix STRALGO command flags. 2020-04-27 15:52:49 +02:00
antirez
fea9788cc4 Fix STRALGO command flags. 2020-04-27 15:52:49 +02:00
Dave-in-lafayette
1713328723 fix for unintended crash during panic response
If redis crashes early, before lua is set up (like, if File Descriptor 0 is closed before exec), it will crash again trying to print memory statistics.
2020-04-27 15:52:49 +02:00
Dave-in-lafayette
2144047e14 fix for unintended crash during panic response
If redis crashes early, before lua is set up (like, if File Descriptor 0 is closed before exec), it will crash again trying to print memory statistics.
2020-04-27 15:52:49 +02:00
Dave-in-lafayette
4c30d6d732 fix for crash during panic before all threads are up
If there's a panic before all threads have been started (say, if file descriptor 0 is closed at exec), the panic response will crash here again.
2020-04-27 15:52:49 +02:00
Dave-in-lafayette
1e17d3de79 fix for crash during panic before all threads are up
If there's a panic before all threads have been started (say, if file descriptor 0 is closed at exec), the panic response will crash here again.
2020-04-27 15:52:49 +02:00
Oran Agra
5633862924 Keep track of meaningful replication offset in replicas too
Now both master and replicas keep track of the last replication offset
that contains meaningful data (ignoring the tailing pings), and both
trim that tail from the replication backlog, and the offset with which
they try to use for psync.

the implication is that if someone missed some pings, or even have
excessive pings that the promoted replica has, it'll still be able to
psync (avoid full sync).

the downside (which was already committed) is that replicas running old
code may fail to psync, since the promoted replica trims pings form it's
backlog.

This commit adds a test that reproduces several cases of promotions and
demotions with stale and non-stale pings

Background:
The mearningful offset on the master was added recently to solve a problem were
the master is left all alone, injecting PINGs into it's backlog when no one is
listening and then gets demoted and tries to replicate from a replica that didn't
have any of the PINGs (or at least not the last ones).

however, consider this case:
master A has two replicas (B and C) replicating directly from it.
there's no traffic at all, and also no network issues, just many pings in the
tail of the backlog. now B gets promoted, A becomes a replica of B, and C
remains a replica of A. when A gets demoted, it trims the pings from its
backlog, and successfully replicate from B. however, C is still aware of
these PINGs, when it'll disconnect and re-connect to A, it'll ask for something
that's not in the backlog anymore (since A trimmed the tail of it's backlog),
and be forced to do a full sync (something it didn't have to do before the
meaningful offset fix).

Besides that, the psync2 test was always failing randomly here and there, it
turns out the reason were PINGs. Investigating it shows the following scenario:

cycle 1: redis #1 is master, and all the rest are direct replicas of #1
cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1
now we see that when #1 is demoted it prints:
17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference)
17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964).
17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master.
and when #3 connects to the demoted #2, #2 says:
17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964

so the issue here is that the meaningful offset feature saved the day for the
demoted master (since it needs to sync from a replica that didn't get the last
ping), but it didn't help one of the other replicas which did get the last ping.
2020-04-27 15:52:23 +02:00
Oran Agra
4447ddc8bb Keep track of meaningful replication offset in replicas too
Now both master and replicas keep track of the last replication offset
that contains meaningful data (ignoring the tailing pings), and both
trim that tail from the replication backlog, and the offset with which
they try to use for psync.

the implication is that if someone missed some pings, or even have
excessive pings that the promoted replica has, it'll still be able to
psync (avoid full sync).

the downside (which was already committed) is that replicas running old
code may fail to psync, since the promoted replica trims pings form it's
backlog.

This commit adds a test that reproduces several cases of promotions and
demotions with stale and non-stale pings

Background:
The mearningful offset on the master was added recently to solve a problem were
the master is left all alone, injecting PINGs into it's backlog when no one is
listening and then gets demoted and tries to replicate from a replica that didn't
have any of the PINGs (or at least not the last ones).

however, consider this case:
master A has two replicas (B and C) replicating directly from it.
there's no traffic at all, and also no network issues, just many pings in the
tail of the backlog. now B gets promoted, A becomes a replica of B, and C
remains a replica of A. when A gets demoted, it trims the pings from its
backlog, and successfully replicate from B. however, C is still aware of
these PINGs, when it'll disconnect and re-connect to A, it'll ask for something
that's not in the backlog anymore (since A trimmed the tail of it's backlog),
and be forced to do a full sync (something it didn't have to do before the
meaningful offset fix).

Besides that, the psync2 test was always failing randomly here and there, it
turns out the reason were PINGs. Investigating it shows the following scenario:

cycle 1: redis #1 is master, and all the rest are direct replicas of #1
cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1
now we see that when #1 is demoted it prints:
17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference)
17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964).
17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master.
and when #3 connects to the demoted #2, #2 says:
17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964

so the issue here is that the meaningful offset feature saved the day for the
demoted master (since it needs to sync from a replica that didn't get the last
ping), but it didn't help one of the other replicas which did get the last ping.
2020-04-27 15:52:23 +02:00
antirez
23a3f0bd14 Fix STRALGO command flags. 2020-04-27 13:35:17 +02:00
antirez
3497fd007f Fix STRALGO command flags. 2020-04-27 13:35:17 +02:00
John Sully
16fd1ed48e shared pointer comparisons with other pointers
Former-commit-id: d5ede50b040c82e02eb2b82982091bdd0fb7da12
2020-04-24 22:20:26 -04:00
John Sully
ad06da5655 shared pointer comparisons with other pointers
Former-commit-id: d5ede50b040c82e02eb2b82982091bdd0fb7da12
2020-04-24 22:20:26 -04:00
John Sully
1d8ecd93de EMBSTR size is lower than it needs to be
Former-commit-id: fab6132cb3a0594f6ef65163fcb6f1e0ff8d7587
2020-04-24 22:19:55 -04:00
John Sully
7f560f8e65 EMBSTR size is lower than it needs to be
Former-commit-id: fab6132cb3a0594f6ef65163fcb6f1e0ff8d7587
2020-04-24 22:19:55 -04:00
Madelyn Olson
a69f237f77 Added crcspeed library 2020-04-24 17:11:21 -07:00
Madelyn Olson
a48dfe98cf Added crcspeed library 2020-04-24 17:11:21 -07:00
Madelyn Olson
676e31201a Made crc64 test consistent 2020-04-24 17:05:52 -07:00
Madelyn Olson
2192a91d62 Made crc64 test consistent 2020-04-24 17:05:52 -07:00
Madelyn Olson
0978a60b3b Implemented CRC64 based on slice by 4 2020-04-24 17:00:03 -07:00
Madelyn Olson
486e45ffaf Implemented CRC64 based on slice by 4 2020-04-24 17:00:03 -07:00
antirez
75cf725568 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2020-04-24 16:59:56 +02:00
antirez
022f09447b Merge branch 'unstable' of github.com:/antirez/redis into unstable 2020-04-24 16:59:56 +02:00
antirez
ae3cf7911a LCS -> STRALGO LCS.
STRALGO should be a container for mostly read-only string
algorithms in Redis. The algorithms should have two main
characteristics:

1. They should be non trivial to compute, and often not part of
programming language standard libraries.
2. They should be fast enough that it is a good idea to have optimized C
implementations.

Next thing I would love to see? A small strings compression algorithm.
2020-04-24 16:54:32 +02:00
antirez
8a7f255cd0 LCS -> STRALGO LCS.
STRALGO should be a container for mostly read-only string
algorithms in Redis. The algorithms should have two main
characteristics:

1. They should be non trivial to compute, and often not part of
programming language standard libraries.
2. They should be fast enough that it is a good idea to have optimized C
implementations.

Next thing I would love to see? A small strings compression algorithm.
2020-04-24 16:54:32 +02:00
antirez
262da0ba78 LCS -> STRALGO LCS.
STRALGO should be a container for mostly read-only string
algorithms in Redis. The algorithms should have two main
characteristics:

1. They should be non trivial to compute, and often not part of
programming language standard libraries.
2. They should be fast enough that it is a good idea to have optimized C
implementations.

Next thing I would love to see? A small strings compression algorithm.
2020-04-24 16:49:27 +02:00
antirez
3722f89f49 LCS -> STRALGO LCS.
STRALGO should be a container for mostly read-only string
algorithms in Redis. The algorithms should have two main
characteristics:

1. They should be non trivial to compute, and often not part of
programming language standard libraries.
2. They should be fast enough that it is a good idea to have optimized C
implementations.

Next thing I would love to see? A small strings compression algorithm.
2020-04-24 16:49:27 +02:00
Oran Agra
4ed5b7cb74 optimize memory usage of deferred replies
When deffered reply is added the previous reply node cannot be used so
all the extra space we allocated in it is wasted. in case someone uses
deffered replies in a loop, each time adding a small reply, each of
these reply nodes (the small string reply) would have consumed a 16k
block.
now when we add anther diferred reply node, we trim the unused portion
of the previous reply block.

see #7123
2020-04-24 17:20:28 +03:00
Oran Agra
fb732f7a94 optimize memory usage of deferred replies
When deffered reply is added the previous reply node cannot be used so
all the extra space we allocated in it is wasted. in case someone uses
deffered replies in a loop, each time adding a small reply, each of
these reply nodes (the small string reply) would have consumed a 16k
block.
now when we add anther diferred reply node, we trim the unused portion
of the previous reply block.

see #7123
2020-04-24 17:20:28 +03:00
antirez
e772b55a9c Also use propagate() in streamPropagateGroupID(). 2020-04-24 10:15:04 +02:00
antirez
373ae6061a Also use propagate() in streamPropagateGroupID(). 2020-04-24 10:15:04 +02:00
yanhui13
782d9f2ff9 optimize the output of cluster slots 2020-04-24 10:15:04 +02:00
yanhui13
374ffdf1c1 optimize the output of cluster slots 2020-04-24 10:15:04 +02:00
antirez
6d3bd2ed5a Minor aesthetic changes to #7135. 2020-04-24 10:15:04 +02:00