196 Commits

Author SHA1 Message Date
John Sully
f17dab1f67 Merge branch 'unstable' into keydbpro
Former-commit-id: 0dafbc254a0efd5ee302d5c58fb2ca0a85110104
2020-07-13 03:31:47 +00:00
John Sully
84bf240caa Merge tag '6.0.5' into unstable
Redis 6.0.5


Former-commit-id: b736a95b0d23e4b73daa88c676b76d1d18e8bd17
2020-07-13 00:55:23 +00:00
John Sully
f853142083 Add multi-master-no-forward command to reduce bus traffic with multi-master
Former-commit-id: d99d06b1250a51ea4bc54f678f451acbb7901e33
2020-07-12 19:25:19 +00:00
John Sully
785779ee40 Fix failure to merge databases on active replica sync, due to bad merge with Redis 6
Former-commit-id: cd9514f4c8624932df2ec60ae3c2244899844aa6
2020-07-12 01:13:22 +00:00
John Sully
4001a99481 Merge branch 'unstable' into keydbpro
Former-commit-id: 461eea07260a31cd75753d5b7be691f5793a6f1b
2020-06-07 16:41:21 -04:00
John Sully
7384abfe56 replication test race
Former-commit-id: e1f3cd6ec3bf2319484de04c3796dcfa75e0479c
2020-06-07 01:14:57 -04:00
Oran Agra
fed743b2e1 fix pingoff test race 2020-06-06 11:44:21 +02:00
John Sully
df3f1e8d8e Merge branch 'unstable' into keydbpro
Former-commit-id: 08a36155e3db9918048e87c3d691b7317787c9ab
2020-06-01 17:41:37 -04:00
John Sully
4820142896 PSYNC test shouldn't wait forever
Former-commit-id: 130613e16636923296a8d5b2c4bc623e62fef2f5
2020-06-01 16:13:58 -04:00
John Sully
92de178bfe PSYNC test reliability improvements (test only issue)
Former-commit-id: 50fd4fa7e62f3996f15f6a8c4dcd892022f111ec
2020-06-01 16:01:26 -04:00
John Sully
9e87395c34 Fix for issue #187 we need to properly handle the case where a key with a subkey expirey itself expires during load
Former-commit-id: e6a9a6b428b91b6108df24ae6285ea9b582b7b23
2020-06-01 15:33:19 -04:00
John Sully
08fca5ef31 sendfile has high latency in some scenarios, don't use it
Former-commit-id: 1eb0e3c1c604e71c54423f1d11b8c709c847a516
2020-05-31 23:22:25 -04:00
John Sully
4b317392be Don't start multimaster tests until all nodes are connected
Former-commit-id: 202b97eff76501e736a2f0969607e3297e9703a4
2020-05-31 22:50:30 -04:00
John Sully
2e0c684324 active replica tests on slow computers
Former-commit-id: c9920849dd6d6d0f6ecfe0d1002cb0edd7f7bfa9
2020-05-29 01:58:15 -04:00
John Sully
688dceb3a8 Fix test issue with TLS
Former-commit-id: 81b240f81d1c52fd331c4e0e89659913380229c4
2020-05-29 01:44:52 -04:00
John Sully
ed2e0e66f6 Merge tag '6.0.4' into unstable
Redis 6.0.4.


Former-commit-id: 9c31ac7925edba187e527f506e5e992946bd38a6
2020-05-29 00:57:07 -04:00
antirez
41bb699867 Test: take PSYNC2 test master timeout high during switch.
This will likely avoid false positives due to trailing pings.
2020-05-28 10:56:14 +02:00
Oran Agra
01039e5964 adjust revived meaningful offset tests
these tests create several edge cases that are otherwise uncovered (at
least not consistently) by the test suite, so although they're no longer
testing what they were meant to test, it's still a good idea to keep
them in hope that they'll expose some issue in the future.
2020-05-28 10:09:51 +02:00
Oran Agra
98e6f2cd5b revive meaningful offset tests 2020-05-28 10:09:51 +02:00
antirez
0163e4e495 Another meaningful offset test removed. 2020-05-28 10:09:51 +02:00
antirez
24a0f7bf55 Remove the PSYNC2 meaningful offset test. 2020-05-28 10:09:51 +02:00
antirez
2411e4e33f Test: PSYNC2 test can now show server logs. 2020-05-28 10:09:51 +02:00
John Sully
e0a0d93a07 Merge branch 'unstable' into keydbpro
Former-commit-id: a830cf85df236885558c5571c0bf23cfb23e3655
2020-05-24 14:41:53 -04:00
John Sully
fa0be83fd9 Merge tag '6.0.2' into unstable
Redis 6.0.2


Former-commit-id: a010e4a4b2cc2bcad1cb14604b7ebc596c35b05e
2020-05-22 16:45:18 -04:00
John Sully
5a7ce664d0 Merge commit '78cbd3039858407837632bc37abb36e36ec60ce5' into unstable
Former-commit-id: d74871da40dea11bd1a226fbecb0974ff5f8ec8c
2020-05-22 15:36:44 -04:00
Qu Chen
5d59bbb6d9 Disconnect chained replicas when the replica performs PSYNC with the master always to avoid replication offset mismatch between master and chained replicas. 2020-05-22 12:37:59 +02:00
Oran Agra
7d8259d151 fix valgrind test failure in replication test
in 00323f342 i added more keys to that test to make it run longer
but in valgrind this now means the test times out, give valgrind more
time.
2020-05-22 12:37:49 +02:00
Oran Agra
5e75739bfd add regression test for the race in #7205
with the original version of 6.0.0, this test detects an excessive full
sync.
with the fix in 146201c69, this test detects memory corruption,
especially when using libc allocator with or without valgrind.
2020-05-22 12:37:49 +02:00
antirez
3d478f2e3f Improve the PSYNC2 test reliability. 2020-05-22 12:37:49 +02:00
John Sully
193d7c76cb Fix bad merge in CI.yml
Former-commit-id: 6311d709c39b3bacaeab77b18033010f1b548f81
2020-05-21 22:09:06 -04:00
Oran Agra
a3dd04410d fix redis 6.0 not freeing closed connections during loading.
This bug was introduced by a recent change in which readQueryFromClient
is using freeClientAsync, and despite the fact that now
freeClientsInAsyncFreeQueue is in beforeSleep, that's not enough since
it's not called during loading in processEventsWhileBlocked.
furthermore, afterSleep was called in that case but beforeSleep wasn't.

This bug also caused slowness sine the level-triggered mode of epoll
kept signaling these connections as readable causing us to keep doing
connRead again and again for ll of these, which keep accumulating.

now both before and after sleep are called, but not all of their actions
are performed during loading, some are only reserved for the main loop.

fixes issue #7215
2020-05-14 11:29:43 +02:00
Oran Agra
5258341880 fix unstable replication test
this test which has coverage for varoius flows of diskless master was
failing randomly from time to time.

the failure was:
[err]: diskless all replicas drop during rdb pipe in tests/integration/replication.tcl
log message of '*Diskless rdb transfer, last replica dropped, killing fork child*' not found

what seemed to have happened is that the master didn't detect that all
replicas dropped by the time the replication ended, it thought that one
replica is still connected.

now the test takes a few seconds longer but it seems stable.
2020-05-14 11:29:43 +02:00
John
063672dbdb more reliability fixes for multimaster
Former-commit-id: 3543a3c763de91a4d76bca89659fec9bf6b7a1c8
2020-05-11 05:38:21 -04:00
John
b03c4ccc50 more reliability fixes for multimaster
Former-commit-id: fd5b541260908423c35227ff9e42a83f96ace6c0
2020-05-11 09:37:42 +00:00
John
0e6add2e84 Make multimaster tests more reliable
Former-commit-id: 3122912920973cb433d625a09b183c3f538e2523
2020-05-11 05:23:47 -04:00
John
680a6ac90f Make multimaster tests more reliable
Former-commit-id: 4fe59ba11b720864ea0124885b358cb72127cc2d
2020-05-11 09:22:27 +00:00
Oran Agra
eb9d28903d add daily github actions with libc malloc and valgrind
* fix memlry leaks with diskless replica short read.
* fix a few timing issues with valgrind runs
* fix issue with valgrind and watchdog schedule signal

about the valgrind WD issue:
the stack trace test in logging.tcl, has issues with valgrind:
==28808== Can't extend stack to 0x1ffeffdb38 during signal delivery for thread 1:
==28808==   too small or bad protection modes

it seems to be some valgrind bug with SA_ONSTACK.
SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed),
also, not sure if it's even valid without a call to sigaltstack()
2020-05-08 10:37:35 +02:00
Oran Agra
a8995ce3c9 fix loading race in psync2 tests 2020-04-28 11:20:15 +02:00
Oran Agra
58619c1286 Keep track of meaningful replication offset in replicas too
Now both master and replicas keep track of the last replication offset
that contains meaningful data (ignoring the tailing pings), and both
trim that tail from the replication backlog, and the offset with which
they try to use for psync.

the implication is that if someone missed some pings, or even have
excessive pings that the promoted replica has, it'll still be able to
psync (avoid full sync).

the downside (which was already committed) is that replicas running old
code may fail to psync, since the promoted replica trims pings form it's
backlog.

This commit adds a test that reproduces several cases of promotions and
demotions with stale and non-stale pings

Background:
The mearningful offset on the master was added recently to solve a problem were
the master is left all alone, injecting PINGs into it's backlog when no one is
listening and then gets demoted and tries to replicate from a replica that didn't
have any of the PINGs (or at least not the last ones).

however, consider this case:
master A has two replicas (B and C) replicating directly from it.
there's no traffic at all, and also no network issues, just many pings in the
tail of the backlog. now B gets promoted, A becomes a replica of B, and C
remains a replica of A. when A gets demoted, it trims the pings from its
backlog, and successfully replicate from B. however, C is still aware of
these PINGs, when it'll disconnect and re-connect to A, it'll ask for something
that's not in the backlog anymore (since A trimmed the tail of it's backlog),
and be forced to do a full sync (something it didn't have to do before the
meaningful offset fix).

Besides that, the psync2 test was always failing randomly here and there, it
turns out the reason were PINGs. Investigating it shows the following scenario:

cycle 1: redis #1 is master, and all the rest are direct replicas of #1
cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1
now we see that when #1 is demoted it prints:
17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference)
17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964).
17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master.
and when #3 connects to the demoted #2, #2 says:
17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964

so the issue here is that the meaningful offset feature saved the day for the
demoted master (since it needs to sync from a replica that didn't get the last
ping), but it didn't help one of the other replicas which did get the last ping.
2020-04-27 15:52:49 +02:00
John Sully
f627dd8cbe Initial merge of unstable 6
Former-commit-id: aac140de199646914cc02997a45111c9c695e55d
2020-04-16 16:36:16 -04:00
John Sully
c001ea5b41 Merge branch 'unstable' into redis_6_merge
Former-commit-id: cc9924ffa606200f331b3bf5e1e1a4aa3f2702fa
2020-04-15 23:00:13 -04:00
John Sully
2687677ba6 Multithreading reliability, force single thread for test relying on internal behavior
Former-commit-id: 033761c5f97fc1d1823a031b34467ac1df5588f3
2020-04-15 20:52:25 -04:00
John Sully
ce54857237 Merge commit '454e12cb8961f21c9dd8502dc82ae6ffd7e22fe0' into redis_6_merge
Former-commit-id: cc3ebbe5194e9744fb84ce490e90ac5fbe7f8716
2020-04-14 22:19:29 -04:00
John Sully
0725491043 Merge commit 'c609bf3f2c7f0982f632f82623ee4802868b8ef1' into redis_6_merge
Former-commit-id: 320bc3c0329ff9e5a980b79426b719addae381cf
2020-04-14 21:04:42 -04:00
John Sully
2684a266c8 Fix subkey expires not replicating correctly, and AOF issues
Former-commit-id: bd183cdee13081a02efef5df75edf2292b872a16
2020-04-04 21:52:27 -04:00
antirez
28d402d31b PSYNC2: meaningful offset test. 2020-03-25 15:55:24 +01:00
Oran Agra
cde46df309 fix for flaky psync2 test
*** [err]: PSYNC2: total sum of full synchronizations is exactly 4 in tests/integration/psync2.tcl
Expected 5 == 4 (context: type eval line 6 cmd {assert {$sum == 4}} proc ::test)

issue was that sometime the test got an unexpected full sync since it
tried to switch to the replica before it was in sync with it's master.
2020-03-12 15:53:47 +01:00
John Sully
3fad87ca13 Merge branch 'unstable' into keydbpro
Former-commit-id: f3457e2a9a8464bac656b57256316bbddb65d9e9
2020-02-16 04:04:34 -05:00
John Sully
fbaa46505c Merge branch 'unstable' into redis_6_merge
Former-commit-id: 18a5f46b6138e8a975dda0ed4897d19eed756d24
2020-02-11 02:39:08 -05:00
John Sully
d346ad7734 Add missing test file
Former-commit-id: 0c101dccc825668cb7ff07c23e82db0f5642b786
2020-02-10 18:15:29 -05:00