futriix

Author	SHA1	Message	Date
YaacovHazan	bd2785d18b	unregister AE_READABLE from the read pipe in backgroundSaveDoneHandlerSocket (#8991 ) In diskless replication, we create a read pipe for the RDB, between the child and the parent. When we close this pipe (fd), the read handler also needs to be removed from the event loop (if it still registered). Otherwise, next time we will use the same fd, the registration will be fail (panic), because we will use EPOLL_CTL_MOD (the fd still register in the event loop), on fd that already removed from epoll_ctl (cherry picked from commit 501d7755831527b4237f9ed6050ec84203934e4d)	2021-06-01 17:03:36 +03:00
Oran Agra	442bd29612	Fix timing of new replication test (#8807 ) In github actions CI with valgrind, i saw that even the fast replica (one that wasn't paused), didn't get to complete the replication fast enough, and ended up getting disconnected by timeout. Additionally, due to a typo in uname, we didn't get to actually run the CPU efficiency part of the test.	2021-04-18 15:12:34 +03:00
guybe7	0bf6c205db	Add a timeout mechanism for replicas stuck in fullsync (#8762 ) Starting redis 6.0 (part of the TLS feature), diskless master uses pipe from the fork child so that the parent is the one sending data to the replicas. This mechanism has an issue in which a hung replica will cause the master to wait for it to read the data sent to it forever, thus preventing the fork child from terminating and preventing the creations of any other forks. This PR adds a timeout mechanism, much like the ACK-based timeout, we disconnect replicas that aren't reading the RDB file fast enough.	2021-04-15 17:18:51 +03:00
Qu Chen	1372210a03	Properly initialize variable to make valgrind happy in checkChildrenDone(). Removed usage for the obsolete wait3() and wait4() in favor of waitpid(), and properly check for the exit status code. (#8666 )	2021-03-24 08:41:05 -07:00
Oran Agra	10929ace19	Fix race in replication test (#8679 ) Since redis 6.2, redis immediately tries to connect to the master, not waiting for replication cron. in the slow freebsd CI, this test failed and master_link_status was already "up" when INFO was called.	2021-03-22 10:50:39 +02:00
Yossi Gottlieb	e4d0ba933d	Add io-thread daily CI tests. (#8232 ) This adds basic coverage to IO threads by running the cluster and few selected Redis test suite tests with the IO threads enabled. Also provides some necessary additional improvements to the test suite: * Add --config to sentinel/cluster tests for arbitrary configuration. * Fix --tags whitelisting which was broken. * Add a `network` tag to some tests that are more network intensive. This is work in progress and more tests should be properly tagged in the future.	2021-01-17 15:48:48 +02:00
Oran Agra	54b8623eeb	Propagate GETSET and SET-GET as SET (#7957 ) - Generates a more backwards compatible command stream - Slightly more efficient execution in replica/AOF - Add a test for coverage	2020-11-03 14:56:57 +02:00
Wang Yuan	2916bb4be8	Fix timing dependence in replication tcl tests (#7969 ) Remove 'fork child $pid' log in replication.tcl	2020-10-27 09:36:42 +02:00
Yossi Gottlieb	be43c030df	Add a --no-latency tests flag. (#7939 ) Useful for running tests on systems which may be way slower than usual.	2020-10-22 11:10:53 +03:00
Felipe Machado	57dfccfbf9	Adds new pop-push commands (LMOVE, BLMOVE) (#6929 ) Adding [B]LMOVE <src> <dst> RIGHT\|LEFT RIGHT\|LEFT. deprecating [B]RPOPLPUSH. Note that when receiving a BRPOPLPUSH we'll still propagate an RPOPLPUSH, but on BLMOVE RIGHT LEFT we'll propagate an LMOVE improvement to existing tests - Replace "after 1000" with "wait_for_condition" when wait for clients to block/unblock. - Add a pre-existing element to target list on basic tests so that we can check if the new element was added to the correct side of the list. - check command stats on the replica to make sure the right command was replicated Co-authored-by: Oran Agra <oran@redislabs.com>	2020-10-08 08:33:17 +03:00
Wang Yuan	b551c8fdb7	Kill disk-based fork child when all replicas drop and 'save' is not enabled (#7819 ) When all replicas waiting for a bgsave get disconnected (possibly due to output buffer limit), It may be good to kill the bgsave child. in diskless replication it already happens, but in disk-based, the child may still serve some purpose (for persistence). By killing the child, we prevent it from eating COW memory in vain, and we also allow a new child fork sooner for the next full synchronization or bgsave. We do that only if rdb persistence wasn't enabled in the configuration. Btw, now, rdbRemoveTempFile in killRDBChild won't block server, so we can killRDBChild safely.	2020-09-22 09:47:58 +03:00
Oran Agra	725616534e	if diskless repl child is killed, make sure to reap the pid (#7742 ) Starting redis 6.0 and the changes we made to the diskless master to be suitable for TLS, I made the master avoid reaping (wait3) the pid of the child until we know all replicas are done reading their rdb. I did that in order to avoid a state where the rdb_child_pid is -1 but we don't yet want to start another fork (still busy serving that data to replicas). It turns out that the solution used so far was problematic in case the fork child was being killed (e.g. by the kernel OOM killer), in that case there's a chance that we currently disabled the read event on the rdb pipe, since we're waiting for a replica to become writable again. and in that scenario the master would have never realized the child exited, and the replica will remain hung too. Note that there's no mechanism to detect a hung replica while it's in rdb transfer state. The solution here is to add another pipe which is used by the parent to tell the child it is safe to exit. this mean that when the child exits, for whatever reason, it is safe to reap it. Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was part of #6271 (Accelerate diskless master connections) but was dropped when that PR was rebased after the TLS fork/pipe changes (6fd5ff8). Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK has chance to detect that the child exited, it should be the one to call it so that we don't have to wait for cron (server.hz) to do that.	2020-09-06 16:43:57 +03:00
Oran Agra	cad93ed273	Accelerate diskless master connections, and general re-connections (#6271 ) Diskless master has some inherent latencies. 1) fork starts with delay from cron rather than immediately 2) replica is put online only after an ACK. but the ACK was sent only once a second. 3) but even if it would arrive immediately, it will not register in case cron didn't yet detect that the fork is done. Besides that, when a replica disconnects, it doesn't immediately attempts to re-connect, it waits for replication cron (one per second). in case it was already online, it may be important to try to re-connect as soon as possible, so that the backlog at the master doesn't vanish. In case it disconnected during rdb transfer, one can argue that it's not very important to re-connect immediately, but this is needed for the "diskless loading short read" test to be able to run 100 iterations in 5 seconds, rather than 3 (waiting for replication cron re-connection) changes in this commit: 1) sync command starts a fork immediately if no sync_delay is configured 2) replica sends REPLCONF ACK when done reading the rdb (rather than on 1s cron) 3) when a replica unexpectedly disconnets, it immediately tries to re-connect rather than waiting 1s 4) when when a child exits, if there is another replica waiting, we spawn a new one right away, instead of waiting for 1s replicationCron. 5) added a call to connectWithMaster from replicationSetMaster. which is called from the REPLICAOF command but also in 3 places in cluster.c, in all of these the connection attempt will now be immediate instead of delayed by 1 second. side note: we can add a call to rdbPipeReadHandler in replconfCommand when getting a REPLCONF ACK from the replica to solve a race where the replica got the entire rdb and EOF marker before we detected that the pipe was closed. in the test i did see this race happens in one about of some 300 runs, but i concluded that this race is unlikely in real life (where the replica is on another host and we're more likely to first detect the pipe was closed. the test runs 100 iterations in 3 seconds, so in some cases it'll take 4 seconds instead (waiting for another REPLCONF ACK). Removing unneeded startBgsaveForReplication from updateSlavesWaitingForBgsave Now that CheckChildrenDone is calling the new replicationStartPendingFork (extracted from serverCron) there's actually no need to call startBgsaveForReplication from updateSlavesWaitingForBgsave anymore, since as soon as updateSlavesWaitingForBgsave returns, CheckChildrenDone is calling replicationStartPendingFork that handles that anyway. The code in updateSlavesWaitingForBgsave had a bug in which it ignored repl-diskless-sync-delay, but removing that code shows that this bug was hiding another bug, which is that the max_idle should have used >= and not >, this one second delay has a big impact on my new test.	2020-08-06 16:53:06 +03:00
Oran Agra	06aaeabaea	Fix failing tests due to issues with wait_for_log_message (#7572 ) - the test now waits for specific set of log messages rather than wait for timeout looking for just one message. - we don't wanna sample the current length of the log after an action, due to a race, we need to start the search from the line number of the last message we where waiting for. - when attempting to trigger a full sync, use multi-exec to avoid a race where the replica manages to re-connect before we completed the set of actions that should force a full sync. - fix verify_log_message which was broken and unused	2020-07-28 11:15:29 +03:00
Oran Agra	efc4189b62	stabilize tests that look for log lines (#7367 ) tests were sensitive to additional log lines appearing in the log causing the search to come empty handed. instead of just looking for the n last log lines, capture the log lines before performing the action, and then search from that offset.	2020-07-10 08:28:22 +03:00
Oran Agra	7c88eca1e6	fix valgrind test failure in replication test in 00323f342 i added more keys to that test to make it run longer but in valgrind this now means the test times out, give valgrind more time.	2020-05-18 10:26:53 +03:00
Oran Agra	ba6f40ea94	add regression test for the race in #7205 with the original version of 6.0.0, this test detects an excessive full sync. with the fix in 146201c69, this test detects memory corruption, especially when using libc allocator with or without valgrind.	2020-05-17 18:26:02 +03:00
Oran Agra	00323f342d	fix unstable replication test this test which has coverage for varoius flows of diskless master was failing randomly from time to time. the failure was: [err]: diskless all replicas drop during rdb pipe in tests/integration/replication.tcl log message of 'Diskless rdb transfer, last replica dropped, killing fork child' not found what seemed to have happened is that the master didn't detect that all replicas dropped by the time the replication ended, it thought that one replica is still connected. now the test takes a few seconds longer but it seems stable.	2020-05-12 08:59:09 +03:00
zhaozhao.zz	5c56f82bc3	incrbyfloat: fix issue #5256 ttl lost after propagate	2019-12-18 15:44:51 +08:00
Yossi Gottlieb	85d7f38136	Merge remote-tracking branch 'upstream/unstable' into tls	2019-10-16 17:08:07 +03:00
Daniel Dai	46c1377c83	update typo	2019-10-09 14:15:31 -04:00
Yossi Gottlieb	df08b624bd	TLS: Configuration options. Add configuration options for TLS protocol versions, ciphers/cipher suites selection, etc.	2019-10-07 21:07:27 +03:00
Oran Agra	6fd5ff8c98	diskless replication rdb transfer uses pipe, and writes to sockets form the parent process. misc: - handle SSL_has_pending by iterating though these in beforeSleep, and setting timeout of 0 to aeProcessEvents - fix issue with epoll signaling EPOLLHUP and EPOLLERR only to the write handlers. (needed to detect the rdb pipe was closed) - add key-load-delay config for testing - trim connShutdown which is no longer needed - rioFdsetWrite -> rioFdWrite - simplified since there's no longer need to write to multiple FDs - don't detect rdb child exited (don't call wait3) until we detect the pipe is closed - Cleanup bad optimization from rio.c, add another one	2019-10-07 21:06:30 +03:00
Yossi Gottlieb	10ffeb03e4	TLS: Connections refactoring and TLS support. * Introduce a connection abstraction layer for all socket operations and integrate it across the code base. * Provide an optional TLS connections implementation based on OpenSSL. * Pull a newer version of hiredis with TLS support. * Tests, redis-cli updates for TLS support.	2019-10-07 21:06:13 +03:00
Oran Agra	73a945c73c	prevent diskless replica from terminating on short read now that replica can read rdb directly from the socket, it should avoid exiting on short read and instead try to re-sync. this commit tries to have minimal effects on non-diskless rdb reading. and includes a test that tries to trigger this scenario on various read cases.	2019-07-17 16:46:22 +02:00
Oran Agra	29754ebe22	diskless replication on slave side (don't store rdb to file), plus some other related fixes The implementation of the diskless replication was currently diskless only on the master side. The slave side was still storing the received rdb file to the disk before loading it back in and parsing it. This commit adds two modes to load rdb directly from socket: 1) when-empty 2) using "swapdb" the third mode of using diskless slave by flushdb is risky and currently not included. other changes: -------------- distinguish between aof configuration and state so that we can re-enable aof only when sync eventually succeeds (and not when exiting from readSyncBulkPayload after a failed attempt) also a CONFIG GET and INFO during rdb loading would have lied When loading rdb from the network, don't kill the server on short read (that can be a network error) Fix rdb check when performed on preamble AOF tests: run replication tests for diskless slave too make replication test a bit more aggressive Add test for diskless load swapdb	2019-07-08 15:37:48 +03:00
antirez	11f6eb50a6	Remove debugging printf from replication.tcl test.	2018-12-12 11:55:30 +01:00
antirez	d7dd6b4618	Slave removal: remove slave from integration tests descriptions.	2018-09-11 15:32:28 +02:00
antirez	3880af7808	Test: processing of master stream in slave -BUSY state. See #5297.	2018-08-31 16:45:02 +02:00
antirez	70cd161879	Regression test for issue #2813 .	2015-10-15 11:23:15 +02:00
antirez	7701a4cc7e	Test: regression for issue #2473 .	2015-03-27 12:10:46 +01:00
antirez	013fc0ad58	Attempt to prevent false positives in replication test.	2014-11-24 11:54:56 +01:00
antirez	6982032cc0	Diskless replication tested with the multiple slaves consistency test.	2014-10-24 09:49:26 +02:00
Matt Stancliff	5fab7e5bf2	Remove trailing spaces from tests	2014-09-29 06:49:08 -04:00
antirez	383536119d	Test: AOF rewrite during write load.	2014-07-10 11:25:12 +02:00
antirez	ad2cf3ddeb	Fixed assert conditional in ROLE command test.	2014-06-26 22:13:46 +02:00
antirez	727974077a	Basic tests for the ROLE command.	2014-06-23 09:08:51 +02:00
antirez	376797e1b5	Make tests compatible with new INFO replication output.	2013-05-30 11:43:43 +02:00
Johan Bergström	5d22ef818a	Use `info nameofexectuable` to find current executable	2013-01-24 09:37:18 +11:00
antirez	8574733f48	A reimplementation of blocking operation internals. Redis provides support for blocking operations such as BLPOP or BRPOP. This operations are identical to normal LPOP and RPOP operations as long as there are elements in the target list, but if the list is empty they block waiting for new data to arrive to the list. All the clients blocked waiting for th same list are served in a FIFO way, so the first that blocked is the first to be served when there is more data pushed by another client into the list. The previous implementation of blocking operations was conceived to serve clients in the context of push operations. For for instance: 1) There is a client "A" blocked on list "foo". 2) The client "B" performs `LPUSH foo somevalue`. 3) The client "A" is served in the context of the "B" LPUSH, synchronously. Processing things in a synchronous way was useful as if "A" pushes a value that is served by "B", from the point of view of the database is a NOP (no operation) thing, that is, nothing is replicated, nothing is written in the AOF file, and so forth. However later we implemented two things: 1) Variadic LPUSH that could add multiple values to a list in the context of a single call. 2) BRPOPLPUSH that was a version of BRPOP that also provided a "PUSH" side effect when receiving data. This forced us to make the synchronous implementation more complex. If client "B" is waiting for data, and "A" pushes three elemnents in a single call, we needed to propagate an LPUSH with a missing argument in the AOF and replication link. We also needed to make sure to replicate the LPUSH side of BRPOPLPUSH, but only if in turn did not happened to serve another blocking client into another list ;) This were complex but with a few of mutually recursive functions everything worked as expected... until one day we introduced scripting in Redis. Scripting + synchronous blocking operations = Issue #614. Basically you can't "rewrite" a script to have just a partial effect on the replicas and AOF file if the script happened to serve a few blocked clients. The solution to all this problems, implemented by this commit, is to change the way we serve blocked clients. Instead of serving the blocked clients synchronously, in the context of the command performing the PUSH operation, it is now an asynchronous and iterative process: 1) If a key that has clients blocked waiting for data is the subject of a list push operation, We simply mark keys as "ready" and put it into a queue. 2) Every command pushing stuff on lists, as a variadic LPUSH, a script, or whatever it is, is replicated verbatim without any rewriting. 3) Every time a Redis command, a MULTI/EXEC block, or a script, completed its execution, we run the list of keys ready to serve blocked clients (as more data arrived), and process this list serving the blocked clients. 4) As a result of "3" maybe more keys are ready again for other clients (as a result of BRPOPLPUSH we may have push operations), so we iterate back to step "3" if it's needed. The new code has a much simpler semantics, and a simpler to understand implementation, with the disadvantage of not being able to "optmize out" a PUSH+BPOP as a No OP. This commit will be tested with care before the final merge, more tests will be added likely.	2012-09-17 10:26:46 +02:00
antirez	30d3fc3024	Properly wait the slave to sync with master in BRPOPLPUSH test.	2012-04-30 10:55:03 +02:00
antirez	9f97d57184	A more lightweight implementation of issue 141 regression test.	2012-04-29 17:16:44 +02:00
antirez	53e898d3f1	Redis test: More reliable BRPOPLPUSH replication test. Now it uses the new wait_for_condition testing primitive. Also wait_for_condition implementation was fixed in this commit to properly escape the expr command and its argument.	2012-04-26 11:25:13 +02:00
antirez	a95fb8ebe2	On slow computers, 10 seconds are not enough for this heavy replication test.	2012-04-04 19:54:23 +02:00
antirez	7b85752db0	Possible fix for false positives in issue 141 regression test	2012-01-12 16:24:54 +01:00
antirez	4f69e70bcd	Regression test for the main problem causing issue #141 . Minor changes/fixes/additions to the test suite itself needed to write the test.	2012-01-06 17:28:40 +01:00
antirez	a9dedd0d88	Regression test for issue #142 added	2011-10-17 10:41:46 +02:00
antirez	2840d1652f	new test engine valgrind support	2011-07-11 13:41:06 +02:00
antirez	818df58b14	replication test split into three parts in order to improve test execution time. Random fixes and improvements.	2011-07-11 00:46:25 +02:00
antirez	7192cd5cb7	Fix for bug 561 and other related problems	2011-06-20 17:19:36 +02:00

1 2

58 Commits