futriix

Author	SHA1	Message	Date
naglera	48ca2c9176	Improve dual channel replication stability and fix compatibility issues (#804 ) Introduce several improvements to improve the stability of dual-channel replication and fix compatibility issues. 1. Make dual-channel-replication tests more reliable: use pause instead of forced sleep. 2. Fix race conditions when freeing RDB client. 3. Check if sync was stopped during local buffer streaming. 4. Fix $ENDOFFSET reply format to work on 32-bit machines too. --------- Signed-off-by: naglera <anagler123@gmail.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-07-25 09:34:39 -07:00
naglera	ff6b780fe6	Dual channel replication (#60 ) In this PR we introduce the main benefit of dual channel replication by continuously steaming the COB (client output buffers) in parallel to the RDB and thus keeping the primary's side COB small AND accelerating the overall sync process. By streaming the replication data to the replica during the full sync, we reduce 1. Memory load from the primary's node. 2. CPU load from the primary's main process. [Latest performance tests](#data) ## Motivation * Reduce primary memory load. We do that by moving the COB tracking to the replica side. This also decrease the chance for COB overruns. Note that primary's input buffer limits at the replica side are less restricted then primary's COB as the replica plays less critical part in the replication group. While increasing the primary’s COB may end up with primary reaching swap and clients suffering, at replica side we’re more at ease with it. Larger COB means better chance to sync successfully. * Reduce primary main process CPU load. By opening a new, dedicated connection for the RDB transfer, child processes can have direct access to the new connection. Due to TLS connection restrictions, this was not possible using one main connection. We eliminate the need for the child process to use the primary's child-proc -> main-proc pipeline, thus freeing up the main process to process clients queries. ## Dual Channel Replication high level interface design - Dual channel replication begins when the replica sends a `REPLCONF CAPA DUALCHANNEL` to the primary during initial handshake. This is used to state that the replica is capable of dual channel sync and that this is the replica's main channel, which is not used for snapshot transfer. - When replica lacks sufficient data for PSYNC, the primary will send `-FULLSYNCNEEDED` response instead of RDB data. As a next step, the replica creates a new connection (rdb-channel) and configures it against the primary with the appropriate capabilities and requirements. The replica then requests a sync using the RDB channel. - Prior to forking, the primary sends the replica the snapshot's end repl-offset, and attaches the replica to the replication backlog to keep repl data until the replica requests psync. The replica uses the main channel to request a PSYNC starting at the snapshot end offset. - The primary main threads sends incremental changes via the main channel, while the bgsave process sends the RDB directly to the replica via the rdb-channel. As for the replica, the incremental changes are stored on a local buffer, while the RDB is loaded into memory. - Once the replica completes loading the rdb, it drops the rdb-connection and streams the accumulated incremental changes into memory. Repl steady state continues normally. ## New replica state machine ![image](https://github.com/user-attachments/assets/38fbfff0-60b9-4066-8b13-becdb87babc3) ## Data <a name="data"></a> ![image](https://github.com/user-attachments/assets/d73631a7-0a58-4958-a494-a7f4add9108f) ![image](https://github.com/user-attachments/assets/f44936ed-c59a-4223-905d-0fe48a6d31a6) ![image](https://github.com/user-attachments/assets/bd333ee2-3c47-47e5-b244-4ea75f77c836) ## Explanation These graphs demonstrate performance improvements during full sync sessions using rdb-channel + streaming rdb directly from the background process to the replica. First graph- with at most 50 clients and light weight commands, we saw 5%-7.5% improvement in write latency during sync session. Two graphs below- full sync was tested during heavy read commands from the primary (such as sdiff, sunion on large sets). In that case, the child process writes to the replica without sharing CPU with the loaded main process. As a result, this not only improves client response time, but may also shorten sync time by about 50%. The shorter sync time results in less memory being used to store replication diffs (>60% in some of the tested cases). ## Test setup Both primary and replica in the performance tests ran on the same machine. RDB size in all tests is 3.7gb. I generated write load using valkey-benchmark ` ./valkey-benchmark -r 100000 -n 6000000 lpush my_list __rand_int__`. --------- Signed-off-by: naglera <anagler123@gmail.com> Signed-off-by: naglera <58042354+naglera@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Ping Xie <pingxie@outlook.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-07-17 13:59:33 -07:00
Wen Hui	191be272b4	Rename redis.tcl to valkey.tcl (#283 ) Includes some more changes e.g. the README under tests and some script under utils. Signed-off-by: hwware <wen.hui.ware@gmail.com>	2024-04-24 20:54:52 +02:00
Shivshankar	34413e0862	Replace "redis" with "valkey" test code (#287 ) Occurrences of "redis" in TCL test suites and helpers, such as TCL client used in tests, are replaced with "valkey". Occurrences of uppercase "Redis" are not changed in this PR. No files are renamed in this PR. --------- Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>	2024-04-18 15:57:17 +02:00
Jacob Murphy	df5db0627f	Remove trademarked language in code comments (#223 ) This includes comments used for module API documentation. * Strategy for replacement: Regex search: `(//\|/\\| \\|#).* ("\|$)?(r\|R)edis( \|\. \|'\|\n\|,\|-\|$\|")(?!nor the names of its contributors)(?!Ltd.)(?!Labs)(?!Contributors.)` * Don't edit copyright comments * Replace "Redis version X.X" -> "Redis OSS version X.X" to distinguish from newly licensed repository * Replace "Redis Object" -> "Object" * Exclude markdown for now * Don't edit Lua scripting comments referring to redis.X API * Replace "Redis Protocol" -> "RESP" * Replace redis-benchmark, -cli, -server, -check-aof/rdb with "valkey-" prefix * Most other places, I use best judgement to either remove "Redis", or replace with "the server" or "server" Fixes #148 --------- Signed-off-by: Jacob Murphy <jkmurphy@google.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-04-09 10:24:03 +02:00
Huang Zhw	cf61ad14cc	When redis-cli received ASK, it didn't handle it (#8930 ) When redis-cli received ASK, it used string matching wrong and didn't handle it. When we access a slot which is in migrating state, it maybe return ASK. After redirect to the new node, we need send ASKING command before retry the command. In this PR after redis-cli receives ASK, we send a ASKING command before send the origin command after reconnecting. Other changes: * Make redis-cli -u and -c (unix socket and cluster mode) incompatible with one another. * When send command fails, we avoid the 2nd reconnect retry and just print the error info. Users will decide how to do next. See #9277. * Add a test faking two redis nodes in TCL to just send ASK and OK in redis protocol to test ASK behavior. Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-02 14:59:08 +03:00
YaacovHazan	32a2584e07	stabilize tests that involved with load handlers (#8967 ) When test stop 'load handler' by killing the process that generating the load, some commands that already in the input buffer, still might be processed by the server. This may cause some instability in tests, that count on that no more commands processed after we stop the `load handler' In this commit, new proc 'wait_load_handlers_disconnected' added, to verify that no more cammands from any 'load handler' prossesed, by checking that the clients who genreate the load is disconnceted. Also, replacing check of dbsize with wait_for_ofs_sync before comparing debug digest, as it would fail in case the last key the workload wrote was an overridden key (not a new one). Affected tests Race fix: - failover command to specific replica works - Connect multiple replicas at the same time (issue #141), master diskless=$mdl, replica diskless=$sdl - AOF rewrite during write load: RDB preamble=$rdbpre Cleanup and speedup: - Test replication with blocking lists and sorted sets operations - Test replication with parallel clients writing in different DBs - Test replication partial resync: $descr (diskless: $mdl, $sdl, reconnect: $reconnect	2021-05-20 15:29:43 +03:00
Oran Agra	07b365b7d7	revert an accidental test code change done as part of the tls project it seems that commit b087dd1db60ed23d9e59304deb0b1599437f6e23 accidentially changed gen_write_load to not use deferred client, which causes them to be slower and not generate high load which they should, making some tests less effecitive	2019-12-01 16:10:09 +02:00
Yossi Gottlieb	b087dd1db6	TLS: Connections refactoring and TLS support. * Introduce a connection abstraction layer for all socket operations and integrate it across the code base. * Provide an optional TLS connections implementation based on OpenSSL. * Pull a newer version of hiredis with TLS support. * Tests, redis-cli updates for TLS support.	2019-10-07 21:06:13 +03:00
antirez	e344aa4a6d	Test: fix blocking lists/zsets replication test. By verifying that it was able to find a regression, and fixing it accordingly.	2018-05-15 17:43:41 +02:00
antirez	8327ccef0e	Test: replication test blocking lists/zsets ops.	2018-05-15 17:33:29 +02:00
antirez	22c9c4076b	Regression test for issue 417 (memory leak when replicating to DB with id >= 10)	2012-03-30 10:26:07 +02:00
antirez	414c3deac1	Regression test for the main problem causing issue #141 . Minor changes/fixes/additions to the test suite itself needed to write the test.	2012-01-06 17:28:40 +01:00

13 Commits