futriix

Author	SHA1	Message	Date
Viktor Söderqvist	12ec3d5932	Increase timeout for cross-version-replication test (#1644 ) Fixes #1641 Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-29 13:29:35 -08:00
Binbin	4b8f3ed9ac	Do command existence and arity checks when loading AOF to avoid crash (#1614 ) Do command existence and arity checks when loading AOF to avoid crash Currently, loading commands such as `cluster` or `cluster slots xxx` from AOF will cause the server to crash. 1. `cluster` is a container command, and executing proc will cause a crash because we do not check subcommand and arity. 2. `cluster slots xxx`, arity check fail, reply with an error from the AOF client and trigger a panic. Of course, there are many other ways for a problematic AOF to cause the panic, but it is still necessary do some basic checks before executing. In this way, in these basic cases, we can print useful error messages instead of crashing directly. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-30 01:06:13 +08:00
Viktor Söderqvist	e9b8970e72	Relaxed RDB version check (#1604 ) New config `rdb-version-check` with values: * `strict`: Reject future RDB versions. * `relaxed`: Try parsing future RDB versions and fail only when an unknown RDB opcode or type is encountered. This can make it possible for Valkey 8.1 to try read a dump from for example Valkey 9.0 or later on a best-effort basis. The conditions for when this is expected to work can be defined when the future Valkey versions are released. Loading is expected to fail in the following cases: * If the data set contains any new key types or other data elements not supported by the current version. * If the RDB contains new representations or encodings of existing key types or other data elements. This change also prepares for the next RDB version bump. A range of RDB versions (12-79) is reserved, since it's expected to be used by foreign software RDB versions, so Valkey will not accept versions in this range even with the `relaxed` version check. The DUMP/RESTORE format has no magic string; only the RDB version number. This change also prepares for the magic string to change from REDIS to VALKEY next time we bump the RDB version. Related to #1108. --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2025-01-27 18:44:24 +01:00
Madelyn Olson	88a68303c0	Make sure to disable pause after fork for dual channel test (#1612 ) Might close https://github.com/valkey-io/valkey/issues/1484. I noticed that we don't disable pause after fork on the last test that was getting executed, so it might getting stuck in pause loops after the test ends if it tries another psync for any reason. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2025-01-27 06:44:48 -08:00
Viktor Söderqvist	99ed308817	Add cross-version test framework (and a simple test) (#1371 ) This includes a way to run two versions of the server from the TCL test framework. It's a preparation to add more cross-version tests. The runtest script accepts a new parameter ./runtest --other-server-path path/to/valkey-server and a new tag "needs:other-server" for test cases and start_server. Tests with this tag are automatically skipped if `--other-server-path` is not provided. This PR adds it in a CI job with Valkey 7.2.7 by downloading a binary release. Fixes #76 --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-23 11:26:54 +01:00
ranshid	3032ccd48a	Change the shared format for dual channel replication logs (#1586 ) change the format of the dual channel replication logs so that it will not conflict with existing log formats like modules. Fixes: https://github.com/valkey-io/valkey/issues/1509 Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-01-20 08:04:47 +02:00
Amit Nagler	6be1c77b1e	Fix valgrind test (#1555 ) Introduced at https://github.com/valkey-io/valkey/pull/1165/files Signed-off-by: naglera <anagler123@gmail.com>	2025-01-14 10:49:46 +02:00
Viktor Söderqvist	ad592f73d7	Skip CLI tests with reply schema validation (#1545 ) The commands used in valkey-cli tests are not important the reply schema validation. Skip them to avoid the problem if tests hanging. This has failed lately in the daily job: ``` [TIMEOUT]: clients state report follows. sock55fedcc19be0 => (IN PROGRESS) valkey-cli pubsub mode with single standard channel subscription Killing still running Valkey server 33357 ``` These test cases use a special valkey-cli command `:get pubsub` command, which is an internal command to valkey-cli rather than a Valkey server command. This command hangs when compiled with with logreqres enabled. Easy solution is to skip the tests in this setup. The test cases were introduced in #1432. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-12 08:02:39 +08:00
Binbin	b207b421bc	Fix new cli subscribed mode test in cluster mode (#1533 ) We need to add a hash tag in cluster mode. Fixes #1531. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-09 12:21:31 +08:00
Nikhil Manglore	9e0204941d	valkey-cli auto-exit from subscribed mode (#1432 ) Resolves issue with valkey-cli not auto exiting from subscribed mode on reaching zero pub/sub subscription (previously filed on Redis) https://github.com/redis/redis/issues/12592 --------- Signed-off-by: Nikhil Manglore <nmanglor@amazon.com>	2025-01-08 21:03:06 +01:00
Amit Nagler	8aff235721	Fix unreliable dual channel Valgrind tests (#1500 ) Used same approach as PR #1165 to solve random failures. Resolves #1491 Signed-off-by: naglera <anagler123@gmail.com>	2025-01-02 10:00:29 +08:00
ranshid	0f273bb648	Align rejected unblocked commands to update the correct error statistic (#577 ) Currently, in case a blocked command is unblocked externally (eg. due to the relevant slot being migrated or the CLIENT UNBLOCK command was issued, the command statistics will always update the failed_calls error statistic. This leads to missalignment with `90b9f08e5d` as well as some inconsistencies. For example when a key is migrated during cluster slot migration, clients blocked on XREADGROUP will be unblocked and update the rejected_calls stat, while clients blocked on BLPOP will get unblocked updating the failed_calls stat. In this PR we add explicit indication in updateStatsOnUnblock thet indicates if the command was rejected or failed. --------- Signed-off-by: ranshid <ranshid@amazon.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-01-01 16:33:09 +02:00
uriyage	bb325bde35	Fix restore replica output bytes stat update (#1486 ) This PR fixes the missing stat update for `total_net_repl_output_bytes` that was removed during the refactoring in PR #758. The metric was not being updated when writing to replica connections. Changes: - Restored the stat update in postWriteToClient for replica connections - Added integration test to verify the metric is properly updated Signed-off-by: Uri Yagelnik <uriy@amazon.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2024-12-25 10:58:49 +08:00
Amit Nagler	9f4503ca50	Add scoped RDB loading context and immediate abort flag (#1173 ) This PR introduces a new mechanism for temporarily changing the server's loading_rio context during RDB loading operations. The new `RDB_SCOPED_LOADING_RIO` macro allows for a scoped change of the `server.loading_rio` value, ensuring that it's automatically restored to its original value when the scope ends. Introduces a dedicated flag to `rio` to signal immediate abort, preventing potential use-after-free scenarios during replication disconnection in dual-channel load. This ensures proper termination of `rdbLoadRioWithLoadingCtx` when replication is cancelled due to connection loss on main connection. Fixes https://github.com/valkey-io/valkey/issues/1152 --------- Signed-off-by: naglera <anagler123@gmail.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Amit Nagler <58042354+naglera@users.noreply.github.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>	2024-12-24 08:14:32 +02:00
Amit Nagler	f1b7f3072c	Reduce dual channel testing time (#1477 ) - By not waiting `repl-diskless-sync-delay` when we don't have to, we can reduce ~30% of dual channel tests execution time. - This commit also drops one test which is not required for regular sync (`Sync should continue if not all slaves dropped`). - Skip dual channel test with master diskless disabled because it will initiate the same synchronization process as the non-dual channel test, making it redundant. Before: ``` Execution time of different units: 171 seconds - integration/dual-channel-replication 305 seconds - integration/replication-psync \o/ All tests passed without errors! ``` After: ``` Execution time of different units: 120 seconds - integration/dual-channel-replication 236 seconds - integration/replication-psync \o/ All tests passed without errors! ``` Discused on https://github.com/valkey-io/valkey/pull/1173 --------- Signed-off-by: naglera <anagler123@gmail.com>	2024-12-24 08:13:25 +02:00
Viktor Szépe	b66698b887	Discover and fix new typos (#1446 ) Upgrade `typos` and fix corresponding typos --------- Signed-off-by: Viktor Szépe <viktor@szepe.net>	2024-12-17 17:45:43 -08:00
Viktor Söderqvist	3eb8314be6	Replace dict with hashtable for keys, expires and pubsub channels Instead of a dictEntry with pointers to key and value, the hashtable has a pointer directly to the value (robj) which can hold an embedded key and acts as a key-value in the hashtable. This minimizes the number of pointers to follow and thus the number of memory accesses to lookup a key-value pair. Keys robj hashtable +-------+ +-----------------------+ \| 0 \| \| type, encoding, LRU \| \| 1 ------->\| refcount, expire \| \| 2 \| \| ptr \| \| ... \| \| optional embedded key \| +-------+ \| optional embedded val \| +-----------------------+ The expire timestamp (TTL) is also stored in the robj, if any. The expire hash table points to the same robj. Overview of changes: * Replace dict with hashtable in kvstore (kvstore.c) * Add functions for embedding key and expire in robj (object.c) * When there's unused space, reserve an expire field to avoid realloting it later if expire is added. * Always reserve space for expire for large key names to avoid realloc if it's set later. * Update db functions (db.c) * dbAdd, setKey and setExpire reallocate the object when embedding a key * setKey does not increment the reference counter, since it would require duplicating the object. This responsibility is moved to the caller. * Remove logic for shared integer objects as values in the database. The keys are now embedded in the objects, so all objects in the database need to be unique. Thus, we can't use shared objects as values. Also delete test cases for shared integers. * Adjust various commands to the changes mentioned above. * Adjust defrag code * Improvement: Don't access the expires table before defrag has actually reallocated the object. * Adjust test cases that were using hard-coded sizes for dict when realloc would happen, and some other adjustments in test cases. * Adjust memory prefetch for new hash table implementation in IO-threading, using new `hashtableIncrementalFind` API * Adjust offloading of free() to IO threads: Object free to be done in main thread while keeping obj->ptr offloading in IO-thread since the DB object is now allocated by the main-thread and not by the IO-thread as it used to be. * Let expireIfNeeded take an optional value, to avoid looking up the expires table when possible. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Uri Yagelnik <uriy@amazon.com>	2024-12-10 21:30:56 +01:00
Amit Nagler	7043ef0bbb	Split dual-channel COB overrun tests to separate servers (#1374 ) 1. The test isn't waiting long enough for the output buffer to overrun. This problem is happening because an error from the previous test is bleeding into the current test's logs. The simplest fix would be to split these tests. 2. Increased replication timeout to ensure sync fails due to output buffer overrun before a timeout occurs. Fixes #1367 Signed-off-by: naglera <anagler123@gmail.com>	2024-12-01 21:33:43 +08:00
Amit Nagler	9305b49145	Add tag for dual-channel logs (#999 ) This PR introduces a consistent tagging system for dual-channel logs. The goal is to improve log readability and filterability, making it easier for operators to manage and analyze log entries. Resolves https://github.com/valkey-io/valkey/issues/986 --------- Signed-off-by: naglera <anagler123@gmail.com>	2024-11-26 16:51:52 +02:00
Binbin	4aacffa32d	Stabilize dual replication test to avoid getting LOADING error (#1288 ) When doing `$replica replicaof no one`, we may get a LOADING error, this is because during the test execution, the replica may reconnect very quickly, and the full sync is initiated, and the replica has entered the LOADING state. In this commit, we make sure the primary is pasued after the fork, so the replica won't enter the LOADING state, and with this fix, this test seems more natural and predictable. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-11-11 21:42:34 +08:00
ranshid	29b83f1ac8	Introduce bgsave cancel (#757 ) In some cases bgsave child process can run for a long time exhausting system resources. Although it is possible to kill the bgsave child process from the system shell, sometimes it is not possible allowing OS level access. This PR adds a new subcommand to the BGSAVE command. When user will issue `BGSAVE CANCEL`, it will do one of the 2: 1. In case a bgsave child process is currently running, the child process would be immediately killed thus terminating any save/replication full sync process. 2. In case a bgsave child process is SCHEDULED to run, the scheduled execution will be cancelled. --------- Signed-off-by: ranshid <ranshid@amazon.com> Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-10-21 11:56:44 +02:00
ranshid	36d438ba27	Deflake test ync should continue if not all slaves dropped dual-channel-replication (#1164 ) Sometimes when dual-channel is turned off the tested replica might disconnect on COB overrun. disable the replica COB limit in order to prevent such cases. Fixes: #1153 Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2024-10-14 15:31:59 +08:00
ranshid	597aa037cc	Deflake test Primary COB growth with inactive replica (#1165 ) in case of valgrind run, the replica might get disconnected from the primary due to repl-timeout reached. Fix is to configure larger timeout in case of valgrind test. Partially fixes: #1152 Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2024-10-14 15:30:29 +08:00
kronwerk	cd8de095c4	Add flush-before-load option for repl-diskless-load (#909 ) A new option for diskless replication on the replica side. After a network failure, the replica may need to perform a full sync. The other option for diskless full sync is `swapdb`, but it uses twice as much memory, temporarily. In situations where this is not acceptable, and where losing data is acceptable, the `flush-before-load` can be useful. If the full sync fails, the old data is lost though. Therefore, the new option is marked as "dangerous". --------- Signed-off-by: kronwerk <ca11e5e22g@gmail.com> Signed-off-by: kronwerk <kronwerk@users.noreply.github.com> Co-authored-by: kronwerk <ca11e5e22g@gmail.com>	2024-10-09 13:11:53 +02:00
Binbin	80fcbd3fec	Fix module / script call CLUSTER SLOTS / SHARDS fake client check crash (#1063 ) The reason is VM_Call will use a fake client without connection, so we also need to check if c->conn is NULL. This also affects scripts. If they are called in the script, the server will crash. Injecting commands into AOF will also cause startup failure. Fixes #1054. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-25 14:50:48 +08:00
Binbin	6ce75cdea8	Fix replica online timing issue in failover test (#1044 ) Ci reported this failure: ``` [exception]: Executing test client: ERR FAILOVER target replica is not online.. ERR FAILOVER target replica is not online. while executing "$node_0 failover to $node_1_host $node_1_port" ``` We can see somehow the replica is not online in time and casuing this failure, added a verify_replica_online to make sure the replica is online for the test. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-23 17:35:02 +08:00
Binbin	17390383b5	Replica flush the old data after RDB file is ok in disk-based replication (#926 ) Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database. The real changes is in disk-based part, the rest is just code movement. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-14 11:49:49 +08:00
Amit Nagler	1b24168450	Dual Channel Replication - Verify Replica Local Buffer Limit Configuration (#989 ) Prior to comparing the replica buffer against the configured limit, we need to ensure that the limit configuration is enabled. If the limit is set to zero, it indicates that there is no limit, and we should skip the buffer limit check. --------- Signed-off-by: naglera <anagler123@gmail.com>	2024-09-10 17:26:28 -07:00
Binbin	6478526597	Fix aof base suffix when modifying aof-use-rdb-preamble during rewrite (#886 ) If we modify aof-use-rdb-preamble in the middle of rewrite, we may get a wrong aof base suffix. This is because the suffix is concatenated by the main process afterwards, and it may be different from the beginning. We cache this value when we start the rewrite. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-07 23:27:59 +08:00
zhaozhao.zz	743f5ac2ae	standalone -REDIRECT handles special case of MULTI context (#895 ) In standalone mode, when a `-REDIRECT` error occurs, special handling is required if the client is in the `MULTI` context. We have adopted the same handling method as the cluster mode: 1. If a command in the transaction encounters a `REDIRECT` at the time of queuing, the execution of `EXEC` will return an `EXECABORT` error (we expect the client to redirect and discard the transaction upon receiving a `REDIRECT`). That is: ``` MULTI ==> +OK SET x y ==> -REDIRECT EXEC ==> -EXECABORT ``` 2. If all commands are successfully queued (i.e., `QUEUED` results are received) but a redirect is detected during `EXEC` execution (such as a primary-replica switch), a `REDIRECT` is returned to instruct the client to perform a redirect. That is: ``` MULTI ==> +OK SET x y ==> +QUEUED failover EXEC ==> -REDIRECT ``` --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2024-08-30 10:17:53 +08:00
Amit Nagler	1ff2a3b6ae	Remove `dual-channel-replication` Feature Flag's Protection (#908 ) Currently, the `dual-channel-replication` feature flag is immutable if `enable-protected-configs` is enabled, which is the default behavior. This PR proposes to make the `dual-channel-replication` flag mutable, allowing it to be changed dynamically without restarting the cluster. Motivation: The ability to change the `dual-channel-replication` flag dynamically is essential for testing and validating the feature on real clusters running in production environments. By making the flag mutable, we can enable or disable the feature without disrupting the cluster's operations, facilitating easier testing and experimentation. Additionally, this change would provide more flexibility for users to enable or disable the feature based on their specific requirements or operational needs without requiring a cluster restart. --------- Signed-off-by: naglera <anagler123@gmail.com>	2024-08-27 10:18:48 -07:00
Madelyn Olson	b12668af7a	Revert repl backlog size back to 1mb for dual channel tests (#934 ) There is a test that assumes that the backlog will get overrun, but because of the recent changes to the default it no longer fails. It seems like it is a bit flakey now though, so resetting the value in the test back to 1mb. (This relates to the CoB of 1100k. So it should consistently work with a 1mb limit). Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-08-22 15:35:28 -07:00
Binbin	a1ac459ef1	Set repl-backlog-size from 1mb to 10mb by default (#911 ) The repl-backlog-size 1mb is too small in most cases, now network transmission and bandwidth performance have improved rapidly in more than ten years. The bigger the replication backlog, the longer the replica can endure the disconnect and later be able to perform a partial resynchronization. Part of #653. --------- Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-21 11:59:02 -04:00
gmbnomis	7795152fff	Fix valgrind timing issue failure in replica-redirect test (#917 ) Wait for the replica to become online before starting the actual test. Signed-off-by: Simon Baatz <gmbnomis@gmail.com>	2024-08-18 21:20:53 +08:00
zhaozhao.zz	131857e80a	To avoid bouncing -REDIRECT during FAILOVER (#871 ) Fix #821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2024-08-14 14:04:29 +08:00
Amit Nagler	6cb86fff51	Fix dual-channel replication test under valgrind (#904 ) Test dual-channel-replication primary gets cob overrun during replica rdb load` fails during the Valgrind run. This is due to the load handlers disconnecting before the tests complete, resulting in a low primary COB. Increasing the handlers' timeout should resolve this issue. Failure: https://github.com/valkey-io/valkey/actions/runs/10361286333/job/28681321393 Server logs reveals that the load handler clients were disconnected before the test started Also the two previus test took about 20 seconds which is the handler timeout. --------- Signed-off-by: naglera <anagler123@gmail.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-08-13 10:40:19 -07:00
naglera	27fce29500	Fix dual-channel-replication related issues (#837 ) - Fix TLS bug where connection were shutdown by primary's main process while the child process was still writing- causing main process to be blocked. - TLS connection fix -file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`) (@xbasel). - Improve the reliability of dual-channel tests. Modify the pause mechanism to verify process status directly, rather than relying on log. - Ensure that `server.repl_offset` and `server.replid` are updated correctly when dual channel synchronization completes successfully. Thist led to failures in replication tests that validate replication IDs or compare replication offsets. --------- Signed-off-by: naglera <anagler123@gmail.com> Signed-off-by: naglera <58042354+naglera@users.noreply.github.com> Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com> Co-authored-by: xbasel <103044017+xbasel@users.noreply.github.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2024-08-12 13:03:12 -07:00
Binbin	929283fc6f	Dual channel replication should not update lastbgsave_status when transfer error (#811 ) Currently lastbgsave_status is used in bgsave or disk-replication, and the target is the disk. In #60, we update it when transfer error, i think it is mainly used in tests, so we can use log to replace it. It changes lastbgsave_status to err in this case, but it is strange that it does not set ok or err in the above if and the following else. Also noted this will affect stop-writes-on-bgsave-error. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-12 11:25:55 -07:00
naglera	9211aed72e	Improve reliability of dual-channel-replication pause resume tests (#835 ) Update the dual channel-replication tests to wait for the pause to begin before attempting to unpause. --------- Signed-off-by: naglera <anagler123@gmail.com>	2024-07-28 11:14:56 -07:00
Binbin	e32518d655	Fix unexpected behavior when turning appendonly on and off within a transaction (#826 ) If we do `config set appendonly yes` and `config set appendonly no` in a multi, there are some unexpected behavior. When doing appendonly yes, we will schedule a AOFRW, and when we are doding appendonly no, we will call stopAppendOnly to stop it. In stopAppendOnly, the aof_fd is -1 since the aof is not start yet and the fsync and close will take the -1 and call it, so they will all fail with EBADF. And stopAppendOnly will emit a server log, the close(-1) should be no problem but it is still an undefined behavior. This PR also adds a log `Background append only file rewriting scheduled.` to bgrewriteaofCommand when it was scheduled. And adds a log in stopAppendOnly when a scheduled AOF is canceled, it will print `AOF was disabled but there is a scheduled AOF background, cancel it.` Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-07-27 18:38:23 +08:00
naglera	48ca2c9176	Improve dual channel replication stability and fix compatibility issues (#804 ) Introduce several improvements to improve the stability of dual-channel replication and fix compatibility issues. 1. Make dual-channel-replication tests more reliable: use pause instead of forced sleep. 2. Fix race conditions when freeing RDB client. 3. Check if sync was stopped during local buffer streaming. 4. Fix $ENDOFFSET reply format to work on 32-bit machines too. --------- Signed-off-by: naglera <anagler123@gmail.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-07-25 09:34:39 -07:00
naglera	ff6b780fe6	Dual channel replication (#60 ) In this PR we introduce the main benefit of dual channel replication by continuously steaming the COB (client output buffers) in parallel to the RDB and thus keeping the primary's side COB small AND accelerating the overall sync process. By streaming the replication data to the replica during the full sync, we reduce 1. Memory load from the primary's node. 2. CPU load from the primary's main process. [Latest performance tests](#data) ## Motivation * Reduce primary memory load. We do that by moving the COB tracking to the replica side. This also decrease the chance for COB overruns. Note that primary's input buffer limits at the replica side are less restricted then primary's COB as the replica plays less critical part in the replication group. While increasing the primary’s COB may end up with primary reaching swap and clients suffering, at replica side we’re more at ease with it. Larger COB means better chance to sync successfully. * Reduce primary main process CPU load. By opening a new, dedicated connection for the RDB transfer, child processes can have direct access to the new connection. Due to TLS connection restrictions, this was not possible using one main connection. We eliminate the need for the child process to use the primary's child-proc -> main-proc pipeline, thus freeing up the main process to process clients queries. ## Dual Channel Replication high level interface design - Dual channel replication begins when the replica sends a `REPLCONF CAPA DUALCHANNEL` to the primary during initial handshake. This is used to state that the replica is capable of dual channel sync and that this is the replica's main channel, which is not used for snapshot transfer. - When replica lacks sufficient data for PSYNC, the primary will send `-FULLSYNCNEEDED` response instead of RDB data. As a next step, the replica creates a new connection (rdb-channel) and configures it against the primary with the appropriate capabilities and requirements. The replica then requests a sync using the RDB channel. - Prior to forking, the primary sends the replica the snapshot's end repl-offset, and attaches the replica to the replication backlog to keep repl data until the replica requests psync. The replica uses the main channel to request a PSYNC starting at the snapshot end offset. - The primary main threads sends incremental changes via the main channel, while the bgsave process sends the RDB directly to the replica via the rdb-channel. As for the replica, the incremental changes are stored on a local buffer, while the RDB is loaded into memory. - Once the replica completes loading the rdb, it drops the rdb-connection and streams the accumulated incremental changes into memory. Repl steady state continues normally. ## New replica state machine ![image](https://github.com/user-attachments/assets/38fbfff0-60b9-4066-8b13-becdb87babc3) ## Data <a name="data"></a> ![image](https://github.com/user-attachments/assets/d73631a7-0a58-4958-a494-a7f4add9108f) ![image](https://github.com/user-attachments/assets/f44936ed-c59a-4223-905d-0fe48a6d31a6) ![image](https://github.com/user-attachments/assets/bd333ee2-3c47-47e5-b244-4ea75f77c836) ## Explanation These graphs demonstrate performance improvements during full sync sessions using rdb-channel + streaming rdb directly from the background process to the replica. First graph- with at most 50 clients and light weight commands, we saw 5%-7.5% improvement in write latency during sync session. Two graphs below- full sync was tested during heavy read commands from the primary (such as sdiff, sunion on large sets). In that case, the child process writes to the replica without sharing CPU with the loaded main process. As a result, this not only improves client response time, but may also shorten sync time by about 50%. The shorter sync time results in less memory being used to store replication diffs (>60% in some of the tested cases). ## Test setup Both primary and replica in the performance tests ran on the same machine. RDB size in all tests is 3.7gb. I generated write load using valkey-benchmark ` ./valkey-benchmark -r 100000 -n 6000000 lpush my_list __rand_int__`. --------- Signed-off-by: naglera <anagler123@gmail.com> Signed-off-by: naglera <58042354+naglera@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Ping Xie <pingxie@outlook.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-07-17 13:59:33 -07:00
uriyage	bbfd041895	Async IO threads (#758 ) This PR is 1 of 3 PRs intended to achieve the goal of 1 million requests per second, as detailed by [dan touitou](https://github.com/touitou-dan) in https://github.com/valkey-io/valkey/issues/22. This PR modifies the IO threads to be fully asynchronous, which is a first and necessary step to allow more work offloading and better utilization of the IO threads. ### Current IO threads state: Valkey IO threads were introduced in Redis 6.0 to allow better utilization of multi-core machines. Before this, Redis was single-threaded and could only use one CPU core for network and command processing. The introduction of IO threads helps in offloading the IO operations to multiple threads. Current IO Threads flow: 1. Initialization: When Redis starts, it initializes a specified number of IO threads. These threads are in addition to the main thread, each thread starts with an empty list, the main thread will populate that list in each event-loop with pending-read-clients or pending-write-clients. 2. Read Phase: The main thread accepts incoming connections and reads requests from clients. The reading of requests are offloaded to IO threads. The main thread puts the clients ready-to-read in a list and set the global io_threads_op to IO_THREADS_OP_READ, the IO threads pick the clients up, perform the read operation and parse the first incoming command. 3. Command Processing: After reading the requests, command processing is still single-threaded and handled by the main thread. 4. Write Phase: Similar to the read phase, the write phase is also be offloaded to IO threads. The main thread prepares the response in the clients’ output buffer then the main thread puts the client in the list, and sets the global io_threads_op to the IO_THREADS_OP_WRITE. The IO threads then pick the clients up and perform the write operation to send the responses back to clients. 5. Synchronization: The main-thread communicate with the threads on how many jobs left per each thread with atomic counter. The main-thread doesn’t access the clients while being handled by the IO threads. Issues with current implementation: * Underutilized Cores: The current implementation of IO-threads leads to the underutilization of CPU cores. * The main thread remains responsible for a significant portion of IO-related tasks that could be offloaded to IO-threads. * When the main-thread is processing client’s commands, the IO threads are idle for a considerable amount of time. * Notably, the main thread's performance during the IO-related tasks is constrained by the speed of the slowest IO-thread. * Limited Offloading: Currently, Since the Main-threads waits synchronously for the IO threads, the Threads perform only read-parse, and write operations, with parsing done only for the first command. If the threads can do work asynchronously we may offload more work to the threads reducing the load from the main-thread. * TLS: Currently, we don't support IO threads with TLS (where offloading IO would be more beneficial) since TLS read/write operations are not thread-safe with the current implementation. ### Suggested change Non-blocking main thread - The main thread and IO threads will operate in parallel to maximize efficiency. The main thread will not be blocked by IO operations. It will continue to process commands independently of the IO thread's activities. Implementation details Inter-thread communication. * We use a static, lock-free ring buffer of fixed size (2048 jobs) for the main thread to send jobs and for the IO to receive them. If the ring buffer fills up, the main thread will handle the task itself, acting as back pressure (in case IO operations are more expensive than command processing). A static ring buffer is a better candidate than a dynamic job queue as it eliminates the need for allocation/freeing per job. * An IO job will be in the format: ` [void* function-call-back \| void data] `where data is either a client to read/write from and the function-ptr is the function to be called with the data for example readQueryFromClient using this format we can use it later to offload other types of works to the IO threads. The Ring buffer is one way from the main-thread to the IO thread, Upon read/write event the main thread will send a read/write job then in before sleep it will iterate over the pending read/write clients to checking for each client if the IO threads has already finished handling it. The IO thread signals it has finished handling a client read/write by toggling an atomic flag read_state / write_state on the client struct. Thread Safety As suggested in this solution, the IO threads are reading from and writing to the clients' buffers while the main thread may access those clients. We must ensure no race conditions or unsafe access occurs while keeping the Valkey code simple and lock free. Minimal Action in the IO Threads The main change is to limit the IO thread operations to the bare minimum. The IO thread will access only the client's struct and only the necessary fields in this struct. The IO threads will be responsible for the following: * Read Operation: The IO thread will only read and parse a single command. It will not update the server stats, handle read errors, or parsing errors. These tasks will be taken care of by the main thread. * Write Operation: The IO thread will only write the available data. It will not free the client's replies, handle write errors, or update the server statistics. To achieve this without code duplication, the read/write code has been refactored into smaller, independent components: * Functions that perform only the read/parse/write calls. * Functions that handle the read/parse/write results. This refactor accounts for the majority of the modifications in this PR. Client Struct Safe Access As we ensure that the IO threads access memory only within the client struct, we need to ensure thread safety only for the client's struct's shared fields. * Query Buffer * Command parsing - The main thread will not try to parse a command from the query buffer when a client is offloaded to the IO thread. * Client's memory checks in client-cron - The main thread will not access the client query buffer if it is offloaded and will handle the querybuf grow/shrink when the client is back. * CLIENT LIST command - The main thread will busy-wait for the IO thread to finish handling the client, falling back to the current behavior where the main thread waits for the IO thread to finish their processing. * Output Buffer * The IO thread will not change the client's bufpos and won't free the client's reply lists. These actions will be done by the main thread on the client's return from the IO thread. * bufpos / block→used: As the main thread may change the bufpos, the reply-block→used, or add/delete blocks to the reply list while the IO thread writes, we add two fields to the client struct: io_last_bufpos and io_last_reply_block. The IO thread will write until the io_last_bufpos, which was set by the main-thread before sending the client to the IO thread. If more data has been added to the cob in between, it will be written in the next write-job. In addition, the main thread will not trim or merge reply blocks while the client is offloaded. * Parsing Fields * Client's cmd, argc, argv, reqtype, etc., are set during parsing. * The main thread will indicate to the IO thread not to parse a cmd if the client is not reset. In this case, the IO thread will only read from the network and won't attempt to parse a new command. * The main thread won't access the c→cmd/c→argv in the CLIENT LIST command as stated before it will busy wait for the IO threads. * Client Flags * c→flags, which may be changed by the main thread in multiple places, won't be accessed by the IO thread. Instead, the main thread will set the c→io_flags with the information necessary for the IO thread to know the client's state. * Client Close * On freeClient, the main thread will busy wait for the IO thread to finish processing the client's read/write before proceeding to free the client. * Client's Memory Limits * The IO thread won't handle the qb/cob limits. In case a client crosses the qb limit, the IO thread will stop reading for it, letting the main thread know that the client crossed the limit. TLS TLS is currently not supported with IO threads for the following reasons: 1. Pending reads - If SSL has pending data that has already been read from the socket, there is a risk of not calling the read handler again. To handle this, a list is used to hold the pending clients. With IO threads, multiple threads can access the list concurrently. 2. Event loop modification - Currently, the TLS code registers/unregisters the file descriptor from the event loop depending on the read/write results. With IO threads, multiple threads can modify the event loop struct simultaneously. 3. The same client can be sent to 2 different threads concurrently (https://github.com/redis/redis/issues/12540). Those issues were handled in the current PR: 1. The IO thread only performs the read operation. The main thread will check for pending reads after the client returns from the IO thread and will be the only one to access the pending list. 2. The registering/unregistering of events will be similarly postponed and handled by the main thread only. 3. Each client is being sent to the same dedicated thread (c→id % num_of_threads). Sending Replies Immediately with IO threads. Currently, after processing a command, we add the client to the pending_writes_list. Only after processing all the clients do we send all the replies. Since the IO threads are now working asynchronously, we can send the reply immediately after processing the client’s requests, reducing the command latency. However, if we are using AOF=always, we must wait for the AOF buffer to be written, in which case we revert to the current behavior. IO threads dynamic adjustment Currently, we use an all-or-nothing approach when activating the IO threads. The current logic is as follows: if the number of pending write clients is greater than twice the number of threads (including the main thread), we enable all threads; otherwise, we enable none. For example, if 8 IO threads are defined, we enable all 8 threads if there are 16 pending clients; else, we enable none. It makes more sense to enable partial activation of the IO threads. If we have 10 pending clients, we will enable 5 threads, and so on. This approach allows for a more granular and efficient allocation of resources based on the current workload. In addition, the user will now be able to change the number of I/O threads at runtime. For example, when decreasing the number of threads from 4 to 2, threads 3 and 4 will be closed after flushing their job queues. Tests Currently, we run the io-threads tests with 4 IO threads (`443d80f168/.github/workflows/daily.yml (L353)`). This means that we will not activate the IO threads unless there are 8 (threads * 2) pending write clients per single loop, which is unlikely to happened in most of tests, meaning the IO threads are not currently being tested. To enforce the main thread to always offload work to the IO threads, regardless of the number of pending events, we add an events-per-io-thread configuration with a default value of 2. When set to 0, this configuration will force the main thread to always offload work to the IO threads. When we offload every single read/write operation to the IO threads, the IO-threads are running with 100% CPU when running multiple tests concurrently some tests fail as a result of larger than expected command latencies. To address this issue, we have to add some after or wait_for calls to some of the tests to ensure they pass with IO threads as well. Signed-off-by: Uri Yagelnik <uriy@amazon.com>	2024-07-08 20:01:39 -07:00
zhaozhao.zz	28c5a17edf	replica redirect read&write to primary in standalone mode (#325 ) To implement #319 1. replica is able to redirect read and write commands to it's primary in standalone mode * reply with "-REDIRECT primary-ip:port" 2. add a subcommand `CLIENT CAPA redirect`, a client can announce the capability to handle redirection * if a client can handle redirection, the data access commands (read and write) will be redirected 3. allow `readonly` and `readwrite` command in standalone mode, may be a breaking change * a client with redirect capability cannot process read commands on a replica by default * use READONLY command can allow read commands on a replica --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2024-06-27 19:00:45 +08:00
Ping Xie	4135894a5d	Update remaining `master` references to `primary` (#660 ) Signed-off-by: Ping Xie <pingxie@google.com>	2024-06-17 20:31:15 -07:00
Madelyn Olson	bce240eab7	Replace masteruser and masterauth with primaryuser and primaryauth (#598 ) Make the one backwards compatible config change we are allowed to replace for removing master from our API. `masterauth` and `masteruser` are still used as an alias, but aren't explicitly referenced. As an addendum to https://github.com/valkey-io/valkey/pull/591, it would be good to have this in 8. Given the related PR for updated other references for master, I just updated the ones around this specific change. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-06-07 00:46:52 -07:00
Viktor Söderqvist	ad5fd5b95c	More rebranding (#606 ) More rebranding of * Log messages (#252) * The DENIED error reply * Internal function names and comments, mainly Lua API --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-06-07 01:40:55 +02:00
naglera	28e055af0b	Deflake chained replicas disconnect (#574 ) Deflake chained replicas disconnect when replica re-connect with the same master. sync_partial_ok counter might get incremented if replica timed out during test. Signed-off-by: naglera <anagler123@gmail.com>	2024-06-02 20:53:39 -07:00
Shivshankar	52f9291f79	Rename redis to valkey in test suite logs and test names. (#366 ) This PR covers below cases. 1. test suite's prints(i.e., puts statement logs). 2. Links refering to redis issues. 3. test names contains redis. Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>	2024-04-25 15:13:21 +08:00
Shivshankar	8baf322742	Rename remaining test procedures (#355 ) Renamed below procedures and variables (missed in #287) as follows. redis_cluster -> valkey_cluster redis1 -> valkey1 redis2 -> valkey2 get_redis_dir -> get_valkey_dir test_redis_cli_rdb_dump -> test_valkey_cli_rdb_dump test_redis_cli_repl -> test_valkey_cli_repl redis-cli -> valkey-cli redis_reset_state -> valkey_reset_state redisHandle -> valkeyHandle redis_safe_read -> valkey_safe_read redis_safe_gets -> valkey_safe_gets redis_write -> valkey_write redis_read_reply -> valkey_read_reply redis_readable -> valkey_readable redis_readnl -> valkey_readnl redis_writenl -> valkey_writenl redis_read_map -> valkey_read_map redis_read_line -> valkey_read_line redis_read_null -> valkey_read_null redis_read_bool -> valkey_read_bool redis_read_double -> valkey_read_double redis_read_verbatim_str -> valkey_read_verbatim_str redis_call_callback -> valkey_call_callback --------- Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>	2024-04-24 18:01:33 +02:00

1 2 3 4 5 ...

416 Commits