164 Commits

Author SHA1 Message Date
John Sully
fa0be83fd9 Merge tag '6.0.2' into unstable
Redis 6.0.2


Former-commit-id: a010e4a4b2cc2bcad1cb14604b7ebc596c35b05e
2020-05-22 16:45:18 -04:00
John Sully
5a7ce664d0 Merge commit '78cbd3039858407837632bc37abb36e36ec60ce5' into unstable
Former-commit-id: d74871da40dea11bd1a226fbecb0974ff5f8ec8c
2020-05-22 15:36:44 -04:00
John Sully
193d7c76cb Fix bad merge in CI.yml
Former-commit-id: 6311d709c39b3bacaeab77b18033010f1b548f81
2020-05-21 22:09:06 -04:00
Oran Agra
a3dd04410d fix redis 6.0 not freeing closed connections during loading.
This bug was introduced by a recent change in which readQueryFromClient
is using freeClientAsync, and despite the fact that now
freeClientsInAsyncFreeQueue is in beforeSleep, that's not enough since
it's not called during loading in processEventsWhileBlocked.
furthermore, afterSleep was called in that case but beforeSleep wasn't.

This bug also caused slowness sine the level-triggered mode of epoll
kept signaling these connections as readable causing us to keep doing
connRead again and again for ll of these, which keep accumulating.

now both before and after sleep are called, but not all of their actions
are performed during loading, some are only reserved for the main loop.

fixes issue #7215
2020-05-14 11:29:43 +02:00
Oran Agra
5258341880 fix unstable replication test
this test which has coverage for varoius flows of diskless master was
failing randomly from time to time.

the failure was:
[err]: diskless all replicas drop during rdb pipe in tests/integration/replication.tcl
log message of '*Diskless rdb transfer, last replica dropped, killing fork child*' not found

what seemed to have happened is that the master didn't detect that all
replicas dropped by the time the replication ended, it thought that one
replica is still connected.

now the test takes a few seconds longer but it seems stable.
2020-05-14 11:29:43 +02:00
John
063672dbdb more reliability fixes for multimaster
Former-commit-id: 3543a3c763de91a4d76bca89659fec9bf6b7a1c8
2020-05-11 05:38:21 -04:00
John
0e6add2e84 Make multimaster tests more reliable
Former-commit-id: 3122912920973cb433d625a09b183c3f538e2523
2020-05-11 05:23:47 -04:00
Oran Agra
eb9d28903d add daily github actions with libc malloc and valgrind
* fix memlry leaks with diskless replica short read.
* fix a few timing issues with valgrind runs
* fix issue with valgrind and watchdog schedule signal

about the valgrind WD issue:
the stack trace test in logging.tcl, has issues with valgrind:
==28808== Can't extend stack to 0x1ffeffdb38 during signal delivery for thread 1:
==28808==   too small or bad protection modes

it seems to be some valgrind bug with SA_ONSTACK.
SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed),
also, not sure if it's even valid without a call to sigaltstack()
2020-05-08 10:37:35 +02:00
Oran Agra
a8995ce3c9 fix loading race in psync2 tests 2020-04-28 11:20:15 +02:00
Oran Agra
58619c1286 Keep track of meaningful replication offset in replicas too
Now both master and replicas keep track of the last replication offset
that contains meaningful data (ignoring the tailing pings), and both
trim that tail from the replication backlog, and the offset with which
they try to use for psync.

the implication is that if someone missed some pings, or even have
excessive pings that the promoted replica has, it'll still be able to
psync (avoid full sync).

the downside (which was already committed) is that replicas running old
code may fail to psync, since the promoted replica trims pings form it's
backlog.

This commit adds a test that reproduces several cases of promotions and
demotions with stale and non-stale pings

Background:
The mearningful offset on the master was added recently to solve a problem were
the master is left all alone, injecting PINGs into it's backlog when no one is
listening and then gets demoted and tries to replicate from a replica that didn't
have any of the PINGs (or at least not the last ones).

however, consider this case:
master A has two replicas (B and C) replicating directly from it.
there's no traffic at all, and also no network issues, just many pings in the
tail of the backlog. now B gets promoted, A becomes a replica of B, and C
remains a replica of A. when A gets demoted, it trims the pings from its
backlog, and successfully replicate from B. however, C is still aware of
these PINGs, when it'll disconnect and re-connect to A, it'll ask for something
that's not in the backlog anymore (since A trimmed the tail of it's backlog),
and be forced to do a full sync (something it didn't have to do before the
meaningful offset fix).

Besides that, the psync2 test was always failing randomly here and there, it
turns out the reason were PINGs. Investigating it shows the following scenario:

cycle 1: redis #1 is master, and all the rest are direct replicas of #1
cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1
now we see that when #1 is demoted it prints:
17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference)
17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964).
17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master.
and when #3 connects to the demoted #2, #2 says:
17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964

so the issue here is that the meaningful offset feature saved the day for the
demoted master (since it needs to sync from a replica that didn't get the last
ping), but it didn't help one of the other replicas which did get the last ping.
2020-04-27 15:52:49 +02:00
John Sully
c001ea5b41 Merge branch 'unstable' into redis_6_merge
Former-commit-id: cc9924ffa606200f331b3bf5e1e1a4aa3f2702fa
2020-04-15 23:00:13 -04:00
John Sully
2687677ba6 Multithreading reliability, force single thread for test relying on internal behavior
Former-commit-id: 033761c5f97fc1d1823a031b34467ac1df5588f3
2020-04-15 20:52:25 -04:00
John Sully
ce54857237 Merge commit '454e12cb8961f21c9dd8502dc82ae6ffd7e22fe0' into redis_6_merge
Former-commit-id: cc3ebbe5194e9744fb84ce490e90ac5fbe7f8716
2020-04-14 22:19:29 -04:00
John Sully
0725491043 Merge commit 'c609bf3f2c7f0982f632f82623ee4802868b8ef1' into redis_6_merge
Former-commit-id: 320bc3c0329ff9e5a980b79426b719addae381cf
2020-04-14 21:04:42 -04:00
John Sully
2684a266c8 Fix subkey expires not replicating correctly, and AOF issues
Former-commit-id: bd183cdee13081a02efef5df75edf2292b872a16
2020-04-04 21:52:27 -04:00
antirez
28d402d31b PSYNC2: meaningful offset test. 2020-03-25 15:55:24 +01:00
Oran Agra
cde46df309 fix for flaky psync2 test
*** [err]: PSYNC2: total sum of full synchronizations is exactly 4 in tests/integration/psync2.tcl
Expected 5 == 4 (context: type eval line 6 cmd {assert {$sum == 4}} proc ::test)

issue was that sometime the test got an unexpected full sync since it
tried to switch to the replica before it was in sync with it's master.
2020-03-12 15:53:47 +01:00
John Sully
fbaa46505c Merge branch 'unstable' into redis_6_merge
Former-commit-id: 18a5f46b6138e8a975dda0ed4897d19eed756d24
2020-02-11 02:39:08 -05:00
John Sully
d346ad7734 Add missing test file
Former-commit-id: 0c101dccc825668cb7ff07c23e82db0f5642b786
2020-02-10 18:15:29 -05:00
John Sully
25ef65463e Ensure multi-master works for ring topologies
Former-commit-id: a7cc3aac28ccec4dadb80aa2cc7279c53982bc28
2020-02-10 00:25:03 -05:00
John Sully
8800a516d1 Add test to detect issue #137 and #132
Former-commit-id: 49d86746edef497a568c6f3a64695d420305cca8
2020-02-06 23:31:12 -05:00
John Sully
1adc5e9832 More threading fixes from merge
Former-commit-id: 4a980f4ddbebe3f62703aa3de67c93cdffb6b4b8
2020-01-28 17:54:00 -05:00
John Sully
14188ef92d Fix most tests (still some failures)
Former-commit-id: da83e841255487efe0e4b13d42b2dcc55a369838
2020-01-27 18:16:19 -05:00
John Sully
6193e9ad4f Merge remote-tracking branch 'redis/6.0' into redis_merge
Former-commit-id: ef9a3cadcf94326bf2f163db7698aad9a3c01690
2020-01-27 02:55:48 -05:00
zhaozhao.zz
5c56f82bc3 incrbyfloat: fix issue #5256 ttl lost after propagate 2019-12-18 15:44:51 +08:00
John Sully
051bde5d3d Fix issue #107, active replicas do their own expires
Former-commit-id: 8e4f323439df29a5e8c0de9db7a848291721fd07
2019-11-20 19:44:31 -05:00
John Sully
949270801e Signals can happen on any thread, so look for the signal handler not the command on the callstack
Former-commit-id: f1d2b2945007f8811528b197480e255c6b35559c
2019-10-24 23:30:08 -04:00
Yossi Gottlieb
85d7f38136 Merge remote-tracking branch 'upstream/unstable' into tls 2019-10-16 17:08:07 +03:00
Daniel Dai
46c1377c83 update typo 2019-10-09 14:15:31 -04:00
Yossi Gottlieb
df08b624bd TLS: Configuration options.
Add configuration options for TLS protocol versions, ciphers/cipher
suites selection, etc.
2019-10-07 21:07:27 +03:00
Oran Agra
8704be6947 TLS: Implement support for write barrier. 2019-10-07 21:06:30 +03:00
Oran Agra
6fd5ff8c98 diskless replication rdb transfer uses pipe, and writes to sockets form the parent process.
misc:
- handle SSL_has_pending by iterating though these in beforeSleep, and setting timeout of 0 to aeProcessEvents
- fix issue with epoll signaling EPOLLHUP and EPOLLERR only to the write handlers. (needed to detect the rdb pipe was closed)
- add key-load-delay config for testing
- trim connShutdown which is no longer needed
- rioFdsetWrite -> rioFdWrite - simplified since there's no longer need to write to multiple FDs
- don't detect rdb child exited (don't call wait3) until we detect the pipe is closed
- Cleanup bad optimization from rio.c, add another one
2019-10-07 21:06:30 +03:00
Yossi Gottlieb
10ffeb03e4 TLS: Connections refactoring and TLS support.
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.
2019-10-07 21:06:13 +03:00
John Sully
4db6193052 RREPLAY command now takes a DB argument
Former-commit-id: 6e1e5bd08b59f8ad4653621a6c01fcf3a76f0692
2019-09-28 14:59:44 -04:00
Salvatore Sanfilippo
5c5761bcd6 Merge branch 'unstable' into modules_fork 2019-09-27 11:24:06 +02:00
Oran Agra
2add4524c5 Fix lastbgsave_status, when new child signal handler get intended kill
And add a test for that.
2019-09-26 15:16:34 +03:00
John Sully
4c49370efe Issue #64 RREPLAY isn't binary safe. Add fix and test.
Former-commit-id: afe66288fe9df6d8247d459e57858430f1ec7a25
2019-07-24 22:31:02 -04:00
John Sully
133650ac7d remove unnecessary variables
Former-commit-id: a9c4e3970fd0d00896cb5500b2cc926ec3177bc4
2019-07-18 19:37:49 -04:00
John Sully
594559b45e Fix issue #59 - Test replicaof config variable
Former-commit-id: d5ecc2347a717f6e6f938d81325476a9c3dff9a6
2019-07-18 19:22:49 -04:00
Oran Agra
73a945c73c prevent diskless replica from terminating on short read
now that replica can read rdb directly from the socket, it should avoid exiting
on short read and instead try to re-sync.

this commit tries to have minimal effects on non-diskless rdb reading.
and includes a test that tries to trigger this scenario on various read cases.
2019-07-17 16:46:22 +02:00
John Sully
0f5b59241f Give the active-rep more time to sync for test reliability
Former-commit-id: 620c2ec1412a3bcea5fecf21238caa065e73f3e9
2019-07-12 23:52:07 -04:00
John Sully
3ffe5b5dde Add Active Replication tests
Former-commit-id: 528d10091fda0d2c56674e825c4f70467587955f
2019-07-12 03:54:41 -04:00
Oran Agra
29754ebe22 diskless replication on slave side (don't store rdb to file), plus some other related fixes
The implementation of the diskless replication was currently diskless only on the master side.
The slave side was still storing the received rdb file to the disk before loading it back in and parsing it.

This commit adds two modes to load rdb directly from socket:
1) when-empty
2) using "swapdb"
the third mode of using diskless slave by flushdb is risky and currently not included.

other changes:
--------------
distinguish between aof configuration and state so that we can re-enable aof only when sync eventually
succeeds (and not when exiting from readSyncBulkPayload after a failed attempt)
also a CONFIG GET and INFO during rdb loading would have lied

When loading rdb from the network, don't kill the server on short read (that can be a network error)

Fix rdb check when performed on preamble AOF

tests:
run replication tests for diskless slave too
make replication test a bit more aggressive
Add test for diskless load swapdb
2019-07-08 15:37:48 +03:00
John Sully
397e85befb Merge branch 'unstable' of https://github.com/antirez/redis into MergeRedis
Note: some tests failing

Former-commit-id: 86d7276f24f0cf1a0eceb6cd00a6a0ae2a0fa520
2019-05-11 02:20:34 -04:00
Oran Agra
c76bb465f2 make replication tests more stable on slow machines
solving few replication related tests race conditions which fail on slow machines

bugfix in slave buffers test: since the test is executed twice, each time with
a different commands count, the threshold for the delta can't be a constant.
2019-05-05 08:25:01 +03:00
John Sully
3e591fe487 Make PSYNC2 tests more reliable on slower hardware
Former-commit-id: 7b1dd0b60d0d65baa43cb69457e06744e0d9094f
2019-03-26 18:59:31 -04:00
John Sully
219b0f7441 complete rebranding with tests passing
Former-commit-id: 3e9b8677098059964f3f7a492394da4ede9bd37d
2019-02-09 10:11:46 -05:00
antirez
11f6eb50a6 Remove debugging printf from replication.tcl test. 2018-12-12 11:55:30 +01:00
antirez
d7dd6b4618 Slave removal: remove slave from integration tests descriptions. 2018-09-11 15:32:28 +02:00
antirez
3880af7808 Test: processing of master stream in slave -BUSY state.
See #5297.
2018-08-31 16:45:02 +02:00