futriix

Author	SHA1	Message	Date
Shivshankar	079f18ad97	Add io-threads-do-reads config to deprecated config table to have no effect. (#1138 ) this fixes: https://github.com/valkey-io/valkey/issues/1116 _Issue details from #1116 by @zuiderkwast_ > This config is undocumented since #758. The default was changed to "yes" and it is quite useless to set it to "no". Yet, it can happen that some user has an old config file where it is explicitly set to "no". The result will be bad performace, since I/O threads will not do all the I/O. > > It's indeed confusing. > > 1. Either remove the whole option from the code. And thus no need for documentation. _OR:_ > 2. Introduce the option back in the configuration, just as a comment is fine. And showing the default value "yes": `# io-threads-do-reads yes` with additional text. > > _Originally posted by @melroy89 in [#1019 (reply in thread)](https://github.com/orgs/valkey-io/discussions/1019#discussioncomment-10824778)_ --------- Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>	2024-10-10 17:46:09 +02:00
Roshan Khatri	9b8a06137c	Fix empty response for ACL CAT category subcommand for module defined categories (#1140 ) The module commands which were added to acl categories were getting skipped when `ACL CAT category` command was executed. This PR fixes the bug. Before: ``` 127.0.0.1:6379> ACL CAT foocategory (empty array) ``` After: ``` 127.0.0.1:6379> ACL CAT foocategory aclcheck.module.command.test.add.new.aclcategories ``` --------- Signed-off-by: Roshan Khatri <rvkhatri@amazon.com> Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>	2024-10-09 21:20:47 -07:00
kronwerk	cd8de095c4	Add flush-before-load option for repl-diskless-load (#909 ) A new option for diskless replication on the replica side. After a network failure, the replica may need to perform a full sync. The other option for diskless full sync is `swapdb`, but it uses twice as much memory, temporarily. In situations where this is not acceptable, and where losing data is acceptable, the `flush-before-load` can be useful. If the full sync fails, the old data is lost though. Therefore, the new option is marked as "dangerous". --------- Signed-off-by: kronwerk <ca11e5e22g@gmail.com> Signed-off-by: kronwerk <kronwerk@users.noreply.github.com> Co-authored-by: kronwerk <ca11e5e22g@gmail.com>	2024-10-09 13:11:53 +02:00
Binbin	1892f8a731	Add server log when module load fails with busy name (#1084 ) Currently when module loading fails due to busy name, we don't have a clean way to assist to troubleshooting. Case 1: when loading the same module multiple times, we can not detemine the cause of its failure without referring to the module list or the earliest module load log. The log may not exist and sometimes it is difficult for people to associate module list. Case 2: when multiple modules use the same module name, we can not quickly associate the busy name without referring to the module list and the earliest module load log. Different people wrote modules with the same module name, they don't easily associate module name. So in this PR, when doing module onload, we will try to print a busy name log if this happen. Currently we check ctx.module since if it is NULL it means the Init call failed, and Init currently only fails with busy name. It's kind of ugly. It would have been nice if we could have had a better way for onload to signal why the load failed. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-10-09 16:10:29 +08:00
Viktor Söderqvist	00c97979d9	Make ./runtest --dump-logs dump logs on crash (#1117 ) Until now, this flag only dumped logs on a failed assert in test case. It is useful that this flag dumps logs on a crash as well. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-10-06 10:40:36 -07:00
zhenwei pi	23ae21244e	RDMA: use protected mode for test (#1124 ) Since a7cbca40661 ("RDMA: Support .is_local method (#1089)"), valkey-server started to support auto-detect local connection, then we can use protected mode for local RDMA device for test. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>	2024-10-04 23:22:48 +02:00
Shivshankar	1c22680fa7	Include second solo test execution in total test count (#1071 ) This change counts both solo test executions to give an accurate total number of tests being run. --------- Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>	2024-10-04 10:19:44 -07:00
Madelyn Olson	150c197bdd	Apply CVE patches for CVE-2024-31449, CVE-2024-31227, CVE-2024-31228 (#1115 ) Applying the CVEs against mainline. (CVE-2024-31449) Lua library commands may lead to stack overflow and potential RCE. (CVE-2024-31227) Potential Denial-of-service due to malformed ACL selectors. (CVE-2024-31228) Potential Denial-of-service due to unbounded pattern matching. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-10-02 19:22:09 -04:00
Binbin	9827eef4d0	Avoid timing issue in diskless-load-swapdb test (#1077 ) Since we paused the primary node earlier, the replica may enter cluster down due to primary node pfail. Here set allow read to prevent subsequent read errors. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-10-01 13:14:30 -07:00
Viktor Söderqvist	69eddb4874	Speed up AOF rewrite test case (#1093 ) These two test cases run in a loop: * AOF rewrite during write load: RDB preamble=yes * AOF rewrite during write load: RDB preamble=no Both of the test cases build up a lot of data (3-4 million keys when I run locally) so we should empty the data before the second test case. Otherwise, the second test cases adds keys on top of the keys added in the first test case, resulting in the double number of keys and takes more time. Before this commit: [ok]: AOF rewrite during write load: RDB preamble=yes (18225 ms) [ok]: AOF rewrite during write load: RDB preamble=no (37249 ms) After: [ok]: AOF rewrite during write load: RDB preamble=yes (18777 ms) [ok]: AOF rewrite during write load: RDB preamble=no (19940 ms) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-09-30 19:55:23 +02:00
chx9	bb57dfe630	Fix typo in test_helper.tcl (#1080 ) Fix typo in test_helper.tcl: even driven => event driven Signed-off-by: chx9 <cheng.huan@icloud.com>	2024-09-28 11:48:35 +08:00
Binbin	bf8183d065	Add --cluster option to runtest to run only cluster tests (#1052 ) Currently cluster tests in unit/cluster are run as part of the ./runtest. Sometims we change the cluster code and only want to run cluster tests. This PR added a --cluster option to runtest so that we can run only cluster tests. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-26 10:31:57 +08:00
Viktor Söderqvist	99865b197c	Fix bug for CLUSTER SLOTS from EVAL over TLS (#1072 ) For fake clients like the ones used for Lua and modules, we don't determine TLS in the right way, causing CLUSTER SLOTS from EVAL over TLS to fail a debug-assert. This error was introduced when the caching of CLUSTER SLOTS was introduced, i.e. in 8.0.0. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-09-25 03:55:53 -04:00
Binbin	80fcbd3fec	Fix module / script call CLUSTER SLOTS / SHARDS fake client check crash (#1063 ) The reason is VM_Call will use a fake client without connection, so we also need to check if c->conn is NULL. This also affects scripts. If they are called in the script, the server will crash. Injecting commands into AOF will also cause startup failure. Fixes #1054. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-25 14:50:48 +08:00
Binbin	6ce75cdea8	Fix replica online timing issue in failover test (#1044 ) Ci reported this failure: ``` [exception]: Executing test client: ERR FAILOVER target replica is not online.. ERR FAILOVER target replica is not online. while executing "$node_0 failover to $node_1_host $node_1_port" ``` We can see somehow the replica is not online in time and casuing this failure, added a verify_replica_online to make sure the replica is online for the test. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-23 17:35:02 +08:00
Binbin	56fba564b6	Print an empty primary log when primary lost its last slot (#1064 ) The one in CLUSTER SETSLOT help us keep track of state better, of course it also can make the test case happy. The one in gossip process fixes a problem that a replica can print a log saying it is an empty primary. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com>	2024-09-23 13:14:09 +08:00
Binbin	d9c41e9ef9	Fix timing issue in the new tot-net-out replica test (#1060 ) Apparently there is a timing issue when using wait_for_ofs_sync: ``` [exception]: Executing test client: can't read "out_before": no such variable. can't read "out_before": no such variable ``` The reason is that if the connection between the primary and the replica is not established yet, the master_repl_offset of the primary and replica in wait_for_ofs_sync is 0, and the check fails, resulting in no replica client in the client list below. In this case, we need to make sure the replica is online before proceeding. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-20 14:25:05 +08:00
Shivshankar	56fd97733b	Move printver test to info-command file (#1056 ) This fixes: #219 Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>	2024-09-20 10:18:19 +08:00
Binbin	f89ff3137d	Add --moduleapi option to better use runtest-moduleapi (#1007 ) This allows us to avoid error #1002 and enables us to actually use `./runtest-moduleapi --single xxx`. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-17 19:50:38 +08:00
Binbin	17390383b5	Replica flush the old data after RDB file is ok in disk-based replication (#926 ) Call emptyData right before rdbLoad to prevent errors in the middle and we drop the replication stream and leaving an empty database. The real changes is in disk-based part, the rest is just code movement. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-14 11:49:49 +08:00
Binbin	dcc7678fc4	Fix replica unable trigger migration when it received CLUSTER SETSLOT in advance (#981 ) Fix timing issue in evaluating `cluster-allow-replica-migration` for replicas There is a timing bug where the primary and replica have different `cluster-allow-replica-migration` settings. In issue #970, we found that if the replica receives `CLUSTER SETSLOT` before the gossip update, it remains in the original shard. This happens because we only process the `cluster-allow-replica-migration` flag for primaries during `CLUSTER SETSLOT`. This commit fixes the issue by also evaluating this flag for replicas in the `CLUSTER SETSLOT` path, ensuring correct replica migration behavior. Closes #970 --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com>	2024-09-13 15:32:20 -07:00
Ping Xie	3cc619f637	Disable flaky empty shard slot migration tests (#1027 ) Will continue my investigation offline Signed-off-by: Ping Xie <pingxie@google.com>	2024-09-13 00:02:39 -07:00
Binbin	f7c5b40183	Avoid false positive in election tests (#984 ) The node may not be able to initiate an election in time due to problems with cluster communication. If an election is initiated, make sure its offset is 0. Closes #967. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-13 14:53:39 +08:00
Ping Xie	76a59788e6	Re-enable empty-shard slot migration tests (#1024 ) Related to #734 and #858 Signed-off-by: Ping Xie <pingxie@google.com>	2024-09-11 23:19:32 -07:00
uriyage	8cca11ac54	Fix wrong count for replica's tot-net-out (#1013 ) Fix duplicate calculation of replica's `net_output_bytes` - Remove redundant calculation leftover from previous refactor - Add test to prevent regression Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2024-09-12 10:36:40 +08:00
Madelyn Olson	2b207ee1b3	Improve stability of hostnames test (#1016 ) Maybe partially resolves https://github.com/valkey-io/valkey/issues/952. The hostnames test relies on an assumption that node zero and node six don't communicate with each other to test a bunch of behavior in the handshake stake. This was done by previously dropping all meet packets, however it seems like there was some case where node zero was sending a single pong message to node 6, which was partially initializing the state. I couldn't track down why this happened, but I adjusted the test to simply pause node zero which also correctly emulates the state we want to be in since we're just testing state on node 6, and removes the chance of errant messages. The test was failing about 5% of the time locally, and I wasn't able to reproduce a failure with this new configuration. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-09-11 09:52:34 -07:00
Binbin	4033c99ef5	Fix module RdbLoad wrongly disable the AOF (#1001 ) In RdbLoad, we disable AOF before emptyData and rdbLoad to prevent copy-on-write issues. After rdbLoad completes, AOF should be re-enabled, but the code incorrectly checks server.aof_state, which has been reset to AOF_OFF in stopAppendOnly. This leads to AOF not being re-enabled after being disabled. --------- Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-10 21:00:08 -07:00
Amit Nagler	1b24168450	Dual Channel Replication - Verify Replica Local Buffer Limit Configuration (#989 ) Prior to comparing the replica buffer against the configured limit, we need to ensure that the limit configuration is enabled. If the limit is set to zero, it indicates that there is no limit, and we should skip the buffer limit check. --------- Signed-off-by: naglera <anagler123@gmail.com>	2024-09-10 17:26:28 -07:00
Binbin	50c1fe59f7	Add missing moduleapi getchannels test and fix tests (#1002 ) Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-10 10:13:54 +08:00
Binbin	6478526597	Fix aof base suffix when modifying aof-use-rdb-preamble during rewrite (#886 ) If we modify aof-use-rdb-preamble in the middle of rewrite, we may get a wrong aof base suffix. This is because the suffix is concatenated by the main process afterwards, and it may be different from the beginning. We cache this value when we start the rewrite. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-07 23:27:59 +08:00
Amit Nagler	5fdb47c2e2	Add configuration hide-user-data-from-log to hide user data from server logs (#877 ) Implement data masking for user data in server logs and diagnostic output. This change prevents potential exposure of confidential information, such as PII, and enhances privacy protection. It masks all command arguments, client names, and client usernames. Added a new hide-user-data-from-log configuration item, default yes. --------- Signed-off-by: Amit Nagler <anagler123@gmail.com>	2024-09-02 09:50:36 -07:00
Binbin	5693fe4664	Fix set expire test due to the new lazyfree configs changes (#980 ) Test failed because these two PRs #865 and #913. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-02 22:43:09 +08:00
Binbin	70624ea63d	Change all the lazyfree configurations to yes by default (#913 ) ## Set replica-lazy-flush and lazyfree-lazy-user-flush to yes by default. There are many problems with running flush synchronously. Even in single CPU environments, the thread managers should balance between the freeing and serving incoming requests. ## Set lazy eviction, expire, server-del, user-del to yes by default We now have a del and a lazyfree del, we also have these configuration items to control: lazyfree-lazy-eviction, lazyfree-lazy-expire, lazyfree-lazy-server-del, lazyfree-lazy-user-del. In most cases lazyfree is better since it reduces the risk of blocking the main thread, and because we have lazyfreeGetFreeEffort, on those with high effor (currently 64) will use lazyfree. Part of #653. --------- Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-02 07:07:17 -07:00
Binbin	e3af1a30e4	Fast path in SET if the expiration time is expired (#865 ) If the expiration time passed in SET is expired, for example, it has expired due to the machine time (DTS) or the expiration time passed in (wrong arg). In this case, we don't need to set the key and wait for the active expire scan before deleting the key. Compared with previous changes: 1. If the key does not exist, previously we would set the key and wait for the active expire to delete it, so it is a set + del from the perspective of propaganda. Now we will no set the key and return, so it a NOP. 2. If the key exists, previously we woule set the key and wait for the active expire to delete it, so it is a set + del From the perspective of propaganda. Now we will delete it and return, so it is a del. Adding a new deleteExpiredKeyFromOverwriteAndPropagate function to reduce the duplicate code. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-08-31 22:39:07 +08:00
Binbin	fea49bce2c	Fix timing issue in replica migration test (#968 ) The reason is the server 3 still have the server 7 as its replica due to a short wait, the wait is not enough, we should wait for server loss its replica. ``` *** [err]: valkey-cli make source node ignores NOREPLICAS error when doing the last CLUSTER SETSLOT Expected '{127.0.0.1 21497 267}' to be equal to '' (context: type eval line 34 cmd {assert_equal [lindex [R 3 role] 2] {}} proc ::test) ``` Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-30 19:58:46 +08:00
zhaozhao.zz	743f5ac2ae	standalone -REDIRECT handles special case of MULTI context (#895 ) In standalone mode, when a `-REDIRECT` error occurs, special handling is required if the client is in the `MULTI` context. We have adopted the same handling method as the cluster mode: 1. If a command in the transaction encounters a `REDIRECT` at the time of queuing, the execution of `EXEC` will return an `EXECABORT` error (we expect the client to redirect and discard the transaction upon receiving a `REDIRECT`). That is: ``` MULTI ==> +OK SET x y ==> -REDIRECT EXEC ==> -EXECABORT ``` 2. If all commands are successfully queued (i.e., `QUEUED` results are received) but a redirect is detected during `EXEC` execution (such as a primary-replica switch), a `REDIRECT` is returned to instruct the client to perform a redirect. That is: ``` MULTI ==> +OK SET x y ==> +QUEUED failover EXEC ==> -REDIRECT ``` --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2024-08-30 10:17:53 +08:00
Binbin	ecbfb6a7ec	Fix reconfiguring sub-replica causing data loss when myself change shard_id (#944 ) When reconfiguring sub-replica, there may a case that the sub-replica will use the old offset and win the election and cause the data loss if the old primary went down. In this case, sender is myself's primary, when executing updateShardId, not only the sender's shard_id is updated, but also the shard_id of myself is updated, casuing the subsequent areInSameShard check, that is, the full_sync_required check to fail. As part of the recent fix of #885, the sub-replica needs to decide whether a full sync is required or not when switching shards. This shard membership check is supposed to be done against sub-replica's current shard_id, which however was lost in this code path. This then leads to sub-replica joining the other shard with a completely different and incorrect replication history. This is the only place where replicaof state can be updated on this path so the most natural fix would be to pull the chain replication reduction logic into this code block and before the updateShardId call. This one follow #885 and closes #942. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com>	2024-08-29 22:39:53 +08:00
Ping Xie	ad0ede302c	Exclude '.' and ':' from `isValidAuxChar`'s banned charset (#963 ) Fix a bug in isValidAuxChar where valid characters '.' and ':' were incorrectly included in the banned charset. This issue affected the validation of auxiliary fields in the nodes.conf file used by Valkey in cluster mode, particularly when handling IPv4 and IPv6 addresses. The code now correctly allows '.' and ':' as valid characters, ensuring proper handling of these fields. Comments were added to clarify the use of the banned charset. Related to #736 --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-08-28 23:35:31 -07:00
Binbin	75b824052d	Revert make KEYS to be an exact match if there is no pattern (#964 ) In #792, the time complexity became ambiguous, fluctuating between O(1) and O(n), which is a significant difference. And we agree uncertainty can potentially bring disaster to the business, the right thing to do is to persuade users to use EXISTS instead of KEYS in this case, to do the right thing the right way, rather than accommodating this incorrect usage. This reverts commit d66a06e8183818c035bb78706f46fd62645db07e. This reverts #792. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-29 10:58:19 +08:00
Viktor Söderqvist	25dd943087	Delete TLS.md and update README.md about tests (#960 ) Most of the content of TLS.md has already been copied to README.md in #927. The description of how to run tests with TLS is moved to tests/README.md. Descriptions of the additional scripts runtest-cluster, runtest-sentinel and runtest-module are added in tests/README.md. Links to tests/README.md and src/unit/README.md are added in the top-level README.md along with a brief overview of the `make test-*` commands. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-08-28 21:17:04 +02:00
Binbin	4fe8320711	Add pause path coverage to replica migration tests (#937 ) In #885, we only add a shutdown path, there is another path is that the server might got hang by slowlog. This PR added the pause path coverage to cover it. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-28 11:08:27 +08:00
Binbin	6a84e06b05	Wait for the role change and fix the timing issue in the new test (#947 ) The test might be fast enough and then there is no change in the role causing the test to fail. Adding a wait to avoid the timing issue: ``` *** [err]: valkey-cli make source node ignores NOREPLICAS error when doing the last CLUSTER SETSLOT Expected '{127.0.0.1 23154 267}' to be equal to '' (context: type eval line 24 cmd {assert_equal [lindex [R 3 role] 2] {}} proc ::test) ``` Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-28 09:51:10 +08:00
Amit Nagler	1ff2a3b6ae	Remove `dual-channel-replication` Feature Flag's Protection (#908 ) Currently, the `dual-channel-replication` feature flag is immutable if `enable-protected-configs` is enabled, which is the default behavior. This PR proposes to make the `dual-channel-replication` flag mutable, allowing it to be changed dynamically without restarting the cluster. Motivation: The ability to change the `dual-channel-replication` flag dynamically is essential for testing and validating the feature on real clusters running in production environments. By making the flag mutable, we can enable or disable the feature without disrupting the cluster's operations, facilitating easier testing and experimentation. Additionally, this change would provide more flexibility for users to enable or disable the feature based on their specific requirements or operational needs without requiring a cluster restart. --------- Signed-off-by: naglera <anagler123@gmail.com>	2024-08-27 10:18:48 -07:00
uriyage	04d76d8b02	Improve multithreaded performance with memory prefetching (#861 ) This PR utilizes the IO threads to execute commands in batches, allowing us to prefetch the dictionary data in advance. After making the IO threads asynchronous and offloading more work to them in the first 2 PRs, the `lookupKey` function becomes a main bottle-neck and it takes about 50% of the main-thread time (Tested with SET command). This is because the Valkey dictionary is a straightforward but inefficient chained hash implementation. While traversing the hash linked lists, every access to either a dictEntry structure, pointer to key, or a value object requires, with high probability, an expensive external memory access. ### Memory Access Amortization Memory Access Amortization (MAA) is a technique designed to optimize the performance of dynamic data structures by reducing the impact of memory access latency. It is applicable when multiple operations need to be executed concurrently. The principle behind it is that for certain dynamic data structures, executing operations in a batch is more efficient than executing each one separately. Rather than executing operations sequentially, this approach interleaves the execution of all operations. This is done in such a way that whenever a memory access is required during an operation, the program prefetches the necessary memory and transitions to another operation. This ensures that when one operation is blocked awaiting memory access, other memory accesses are executed in parallel, thereby reducing the average access latency. We applied this method in the development of `dictPrefetch`, which takes as parameters a vector of keys and dictionaries. It ensures that all memory addresses required to execute dictionary operations for these keys are loaded into the L1-L3 caches when executing commands. Essentially, `dictPrefetch` is an interleaved execution of dictFind for all the keys. Implementation details When the main thread iterates over the `clients-pending-io-read`, for clients with ready-to-execute commands (i.e., clients for which the IO thread has parsed the commands), a batch of up to 16 commands is created. Initially, the command's argv, which were allocated by the IO thread, is prefetched to the main thread's L1 cache. Subsequently, all the dict entries and values required for the commands are prefetched from the dictionary before the command execution. Only then will the commands be executed. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com>	2024-08-26 21:10:44 -07:00
Binbin	d66a06e818	Make KEYS to be an exact match if there is no pattern (#792 ) Although KEYS is a dangerous command and we recommend people to avoid using it, some people who are not familiar with it still using it, and even use KEYS with no pattern at all. Once KEYS is using with no pattern, we can convert it to an exact match to avoid iterating over all data. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-27 12:04:27 +08:00
Ayush Sharma	b48596a914	Add support for setting the group on a unix domain socket (#901 ) Add new optional, immutable string config called `unixsocketgroup`. Change the group of the unix socket to `unixsocketgroup` after `bind()` if specified. Adds tests to validate the behavior. Fixes #873. Signed-off-by: Ayush Sharma <mrayushs933@gmail.com>	2024-08-23 11:52:08 -07:00
Binbin	8045994972	valkey-cli make source node ignores NOREPLICAS when doing the last CLUSTER SETSLOT (#928 ) This fixes #899. In that issue, the primary is cluster-allow-replica-migration no and its replica is cluster-allow-replica-migration yes. And during the slot migration: 1. Primary calling blockClientForReplicaAck, waiting its replica. 2. Its replica reconfiguring itself as a replica of other shards due to replica migration and disconnect from the old primary. 3. The old primary never got the chance to receive the ack, so it got a timeout and got a NOREPLICAS error. In this case, the replicas might automatically migrate to another primary, resulting in the client being unblocked with the NOREPLICAS error. In this case, since the configuration will eventually propagate itself, we can safely ignore this error on the source node. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-23 16:22:30 +08:00
Binbin	5d97f5133c	Fix CLUSTER SETSLOT block and unblock error when all replicas are down (#879 ) In CLUSTER SETSLOT propagation logic, if the replicas are down, the client will get block during command processing and then unblock with `NOREPLICAS Not enough good replicas to write`. The reason is that all replicas are down (or some are down), but myself->num_replicas is including all replicas, so the client will get block and always get timeout. We should only wait for those online replicas, otherwise the waiting propagation will always timeout since there are not enough replicas. The admin can easily check if there are replicas that are down for an extended period of time. If they decide to move forward anyways, we should not block it. If a replica failed right before the replication and was not included in the replication, it would also unlikely win the election. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@google.com>	2024-08-23 16:21:53 +08:00
Madelyn Olson	b12668af7a	Revert repl backlog size back to 1mb for dual channel tests (#934 ) There is a test that assumes that the backlog will get overrun, but because of the recent changes to the default it no longer fails. It seems like it is a bit flakey now though, so resetting the value in the test back to 1mb. (This relates to the CoB of 1100k. So it should consistently work with a 1mb limit). Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-08-22 15:35:28 -07:00
Wen Hui	959dd3485b	Decline unsubscribe related command in non-subscribed mode (#759 ) Now, when clients run the unsubscribe, sunsubscribe and punsubscribe commands in the non-subscribed mode, it returns 0. Indeed this is a bug, we should not allow client run these kind of commands here. Thus, this PR fixes this bug, but it is a break change for existing clients --------- Signed-off-by: hwware <wen.hui.ware@gmail.com>	2024-08-22 11:21:33 -04:00

1 2 3 4 5 ...

2494 Commits