futriix

Author	SHA1	Message	Date
John Sully	9fb7552b63	PSYNC test shouldn't wait forever Former-commit-id: 130613e16636923296a8d5b2c4bc623e62fef2f5	2020-06-01 16:13:58 -04:00
John Sully	2b08505fed	PSYNC test reliability improvements (test only issue) Former-commit-id: 50fd4fa7e62f3996f15f6a8c4dcd892022f111ec	2020-06-01 16:01:26 -04:00
John Sully	4f7102f46c	Fix for issue #187 we need to properly handle the case where a key with a subkey expirey itself expires during load Former-commit-id: e6a9a6b428b91b6108df24ae6285ea9b582b7b23	2020-06-01 15:33:19 -04:00
John Sully	df5b0f0be5	sendfile has high latency in some scenarios, don't use it Former-commit-id: 1eb0e3c1c604e71c54423f1d11b8c709c847a516	2020-05-31 23:22:25 -04:00
John Sully	eddc1ad46a	Don't start multimaster tests until all nodes are connected Former-commit-id: 202b97eff76501e736a2f0969607e3297e9703a4	2020-05-31 22:50:30 -04:00
Oran Agra	c480af9007	fix pingoff test race	2020-05-31 15:51:52 +03:00
John Sully	2aed24d0a5	active replica tests on slow computers Former-commit-id: c9920849dd6d6d0f6ecfe0d1002cb0edd7f7bfa9	2020-05-29 01:58:15 -04:00
John Sully	acde7c340e	Fix test issue with TLS Former-commit-id: 81b240f81d1c52fd331c4e0e89659913380229c4	2020-05-29 01:44:52 -04:00
John Sully	cfe9f8f3bc	Merge tag '6.0.4' into unstable Redis 6.0.4. Former-commit-id: 9c31ac7925edba187e527f506e5e992946bd38a6	2020-05-29 00:57:07 -04:00
antirez	59cd4c9f65	Test: take PSYNC2 test master timeout high during switch. This will likely avoid false positives due to trailing pings.	2020-05-28 10:56:14 +02:00
antirez	23f2b4d0a8	Test: take PSYNC2 test master timeout high during switch. This will likely avoid false positives due to trailing pings.	2020-05-28 10:47:30 +02:00
Oran Agra	ab2984b1e2	adjust revived meaningful offset tests these tests create several edge cases that are otherwise uncovered (at least not consistently) by the test suite, so although they're no longer testing what they were meant to test, it's still a good idea to keep them in hope that they'll expose some issue in the future.	2020-05-28 10:09:51 +02:00
Oran Agra	1ff5a222de	revive meaningful offset tests	2020-05-28 10:09:51 +02:00
antirez	3f8d113f1b	Another meaningful offset test removed.	2020-05-28 10:09:51 +02:00
antirez	d4541349dc	Remove the PSYNC2 meaningful offset test.	2020-05-28 10:09:51 +02:00
antirez	8f10137227	Test: PSYNC2 test can now show server logs.	2020-05-28 10:09:51 +02:00
Oran Agra	2a8af8e675	adjust revived meaningful offset tests these tests create several edge cases that are otherwise uncovered (at least not consistently) by the test suite, so although they're no longer testing what they were meant to test, it's still a good idea to keep them in hope that they'll expose some issue in the future.	2020-05-28 09:10:51 +03:00
Oran Agra	90f3856fd5	revive meaningful offset tests	2020-05-28 08:21:24 +03:00
antirez	484cfc3d76	Another meaningful offset test removed.	2020-05-27 12:50:02 +02:00
antirez	32d0df0c1f	Remove the PSYNC2 meaningful offset test.	2020-05-27 12:47:34 +02:00
antirez	091fb64681	Test: PSYNC2 test can now show server logs.	2020-05-25 20:26:29 +02:00
John Sully	cece963cf3	Merge branch 'unstable' into keydbpro Former-commit-id: a830cf85df236885558c5571c0bf23cfb23e3655	2020-05-24 14:41:53 -04:00
John Sully	2d783a3cbf	Merge tag '6.0.2' into unstable Redis 6.0.2 Former-commit-id: a010e4a4b2cc2bcad1cb14604b7ebc596c35b05e	2020-05-22 16:45:18 -04:00
John Sully	1eeb5de69f	Merge commit 'c57d9146f41f4b661d9d2cb48b83b3abc757ba0e' into unstable Former-commit-id: d74871da40dea11bd1a226fbecb0974ff5f8ec8c	2020-05-22 15:36:44 -04:00
Qu Chen	58fc456cbd	Disconnect chained replicas when the replica performs PSYNC with the master always to avoid replication offset mismatch between master and chained replicas.	2020-05-22 12:37:59 +02:00
Oran Agra	00d8b92b89	fix valgrind test failure in replication test in b4416280c i added more keys to that test to make it run longer but in valgrind this now means the test times out, give valgrind more time.	2020-05-22 12:37:49 +02:00
Oran Agra	5e17e6276c	add regression test for the race in #7205 with the original version of 6.0.0, this test detects an excessive full sync. with the fix in 1a7cd2c0e, this test detects memory corruption, especially when using libc allocator with or without valgrind.	2020-05-22 12:37:49 +02:00
antirez	96e7c011e2	Improve the PSYNC2 test reliability.	2020-05-22 12:37:49 +02:00
John Sully	27eb239f1a	Fix bad merge in CI.yml Former-commit-id: 6311d709c39b3bacaeab77b18033010f1b548f81	2020-05-21 22:09:06 -04:00
Qu Chen	42f5da5d2d	Disconnect chained replicas when the replica performs PSYNC with the master always to avoid replication offset mismatch between master and chained replicas.	2020-05-21 18:42:10 -07:00
Oran Agra	75c11d7fec	fix valgrind test failure in replication test in b4416280c i added more keys to that test to make it run longer but in valgrind this now means the test times out, give valgrind more time.	2020-05-18 10:26:53 +03:00
antirez	0ca2f4f824	Merge branch 'unstable' of github.com:/antirez/redis into unstable	2020-05-17 18:24:48 +02:00
antirez	96bb0c9471	Improve the PSYNC2 test reliability.	2020-05-17 18:24:34 +02:00
Oran Agra	357aace895	add regression test for the race in #7205 with the original version of 6.0.0, this test detects an excessive full sync. with the fix in 1a7cd2c0e, this test detects memory corruption, especially when using libc allocator with or without valgrind.	2020-05-17 18:26:02 +03:00
Oran Agra	9da134cd88	fix redis 6.0 not freeing closed connections during loading. This bug was introduced by a recent change in which readQueryFromClient is using freeClientAsync, and despite the fact that now freeClientsInAsyncFreeQueue is in beforeSleep, that's not enough since it's not called during loading in processEventsWhileBlocked. furthermore, afterSleep was called in that case but beforeSleep wasn't. This bug also caused slowness sine the level-triggered mode of epoll kept signaling these connections as readable causing us to keep doing connRead again and again for ll of these, which keep accumulating. now both before and after sleep are called, but not all of their actions are performed during loading, some are only reserved for the main loop. fixes issue #7215	2020-05-14 11:29:43 +02:00
Oran Agra	5c41802d55	fix unstable replication test this test which has coverage for varoius flows of diskless master was failing randomly from time to time. the failure was: [err]: diskless all replicas drop during rdb pipe in tests/integration/replication.tcl log message of 'Diskless rdb transfer, last replica dropped, killing fork child' not found what seemed to have happened is that the master didn't detect that all replicas dropped by the time the replication ended, it thought that one replica is still connected. now the test takes a few seconds longer but it seems stable.	2020-05-14 11:29:43 +02:00
antirez	c38fd1f661	Merge branch 'free_clients_during_loading' into unstable	2020-05-14 11:28:08 +02:00
Oran Agra	b4416280cf	fix unstable replication test this test which has coverage for varoius flows of diskless master was failing randomly from time to time. the failure was: [err]: diskless all replicas drop during rdb pipe in tests/integration/replication.tcl log message of 'Diskless rdb transfer, last replica dropped, killing fork child' not found what seemed to have happened is that the master didn't detect that all replicas dropped by the time the replication ended, it thought that one replica is still connected. now the test takes a few seconds longer but it seems stable.	2020-05-12 08:59:09 +03:00
John	181fadb708	more reliability fixes for multimaster Former-commit-id: 3543a3c763de91a4d76bca89659fec9bf6b7a1c8	2020-05-11 05:38:21 -04:00
John	daaf82b673	more reliability fixes for multimaster Former-commit-id: fd5b541260908423c35227ff9e42a83f96ace6c0	2020-05-11 09:37:42 +00:00
John	3d6f990104	Make multimaster tests more reliable Former-commit-id: 3122912920973cb433d625a09b183c3f538e2523	2020-05-11 05:23:47 -04:00
John	0e024808c2	Make multimaster tests more reliable Former-commit-id: 4fe59ba11b720864ea0124885b358cb72127cc2d	2020-05-11 09:22:27 +00:00
Oran Agra	905e28ee87	fix redis 6.0 not freeing closed connections during loading. This bug was introduced by a recent change in which readQueryFromClient is using freeClientAsync, and despite the fact that now freeClientsInAsyncFreeQueue is in beforeSleep, that's not enough since it's not called during loading in processEventsWhileBlocked. furthermore, afterSleep was called in that case but beforeSleep wasn't. This bug also caused slowness sine the level-triggered mode of epoll kept signaling these connections as readable causing us to keep doing connRead again and again for ll of these, which keep accumulating. now both before and after sleep are called, but not all of their actions are performed during loading, some are only reserved for the main loop. fixes issue #7215	2020-05-11 11:33:46 +03:00
Oran Agra	3d3861dd88	add daily github actions with libc malloc and valgrind * fix memlry leaks with diskless replica short read. * fix a few timing issues with valgrind runs * fix issue with valgrind and watchdog schedule signal about the valgrind WD issue: the stack trace test in logging.tcl, has issues with valgrind: ==28808== Can't extend stack to 0x1ffeffdb38 during signal delivery for thread 1: ==28808== too small or bad protection modes it seems to be some valgrind bug with SA_ONSTACK. SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed), also, not sure if it's even valid without a call to sigaltstack()	2020-05-08 10:37:35 +02:00
Oran Agra	deee2c1ef2	add daily github actions with libc malloc and valgrind * fix memlry leaks with diskless replica short read. * fix a few timing issues with valgrind runs * fix issue with valgrind and watchdog schedule signal about the valgrind WD issue: the stack trace test in logging.tcl, has issues with valgrind: ==28808== Can't extend stack to 0x1ffeffdb38 during signal delivery for thread 1: ==28808== too small or bad protection modes it seems to be some valgrind bug with SA_ONSTACK. SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed), also, not sure if it's even valid without a call to sigaltstack()	2020-05-04 09:52:20 +03:00
Oran Agra	ea63aea72d	fix loading race in psync2 tests	2020-04-28 11:20:15 +02:00
Oran Agra	d31c0c5264	fix loading race in psync2 tests	2020-04-28 09:18:01 +03:00
Oran Agra	e4d2bb62b2	Keep track of meaningful replication offset in replicas too Now both master and replicas keep track of the last replication offset that contains meaningful data (ignoring the tailing pings), and both trim that tail from the replication backlog, and the offset with which they try to use for psync. the implication is that if someone missed some pings, or even have excessive pings that the promoted replica has, it'll still be able to psync (avoid full sync). the downside (which was already committed) is that replicas running old code may fail to psync, since the promoted replica trims pings form it's backlog. This commit adds a test that reproduces several cases of promotions and demotions with stale and non-stale pings Background: The mearningful offset on the master was added recently to solve a problem were the master is left all alone, injecting PINGs into it's backlog when no one is listening and then gets demoted and tries to replicate from a replica that didn't have any of the PINGs (or at least not the last ones). however, consider this case: master A has two replicas (B and C) replicating directly from it. there's no traffic at all, and also no network issues, just many pings in the tail of the backlog. now B gets promoted, A becomes a replica of B, and C remains a replica of A. when A gets demoted, it trims the pings from its backlog, and successfully replicate from B. however, C is still aware of these PINGs, when it'll disconnect and re-connect to A, it'll ask for something that's not in the backlog anymore (since A trimmed the tail of it's backlog), and be forced to do a full sync (something it didn't have to do before the meaningful offset fix). Besides that, the psync2 test was always failing randomly here and there, it turns out the reason were PINGs. Investigating it shows the following scenario: cycle 1: redis #1 is master, and all the rest are direct replicas of #1 cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1 now we see that when #1 is demoted it prints: 17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference) 17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964). 17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master. and when #3 connects to the demoted #2, #2 says: 17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964 so the issue here is that the meaningful offset feature saved the day for the demoted master (since it needs to sync from a replica that didn't get the last ping), but it didn't help one of the other replicas which did get the last ping.	2020-04-27 15:52:49 +02:00
Oran Agra	4447ddc8bb	Keep track of meaningful replication offset in replicas too Now both master and replicas keep track of the last replication offset that contains meaningful data (ignoring the tailing pings), and both trim that tail from the replication backlog, and the offset with which they try to use for psync. the implication is that if someone missed some pings, or even have excessive pings that the promoted replica has, it'll still be able to psync (avoid full sync). the downside (which was already committed) is that replicas running old code may fail to psync, since the promoted replica trims pings form it's backlog. This commit adds a test that reproduces several cases of promotions and demotions with stale and non-stale pings Background: The mearningful offset on the master was added recently to solve a problem were the master is left all alone, injecting PINGs into it's backlog when no one is listening and then gets demoted and tries to replicate from a replica that didn't have any of the PINGs (or at least not the last ones). however, consider this case: master A has two replicas (B and C) replicating directly from it. there's no traffic at all, and also no network issues, just many pings in the tail of the backlog. now B gets promoted, A becomes a replica of B, and C remains a replica of A. when A gets demoted, it trims the pings from its backlog, and successfully replicate from B. however, C is still aware of these PINGs, when it'll disconnect and re-connect to A, it'll ask for something that's not in the backlog anymore (since A trimmed the tail of it's backlog), and be forced to do a full sync (something it didn't have to do before the meaningful offset fix). Besides that, the psync2 test was always failing randomly here and there, it turns out the reason were PINGs. Investigating it shows the following scenario: cycle 1: redis #1 is master, and all the rest are direct replicas of #1 cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1 now we see that when #1 is demoted it prints: 17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference) 17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964). 17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master. and when #3 connects to the demoted #2, #2 says: 17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964 so the issue here is that the meaningful offset feature saved the day for the demoted master (since it needs to sync from a replica that didn't get the last ping), but it didn't help one of the other replicas which did get the last ping.	2020-04-27 15:52:23 +02:00
John Sully	05cc1fd3de	Initial merge of unstable 6 Former-commit-id: aac140de199646914cc02997a45111c9c695e55d	2020-04-16 16:36:16 -04:00

... 2 3 4 5 6 ...

358 Commits