futriix

Author	SHA1	Message	Date
Viktor Söderqvist	e9b8970e72	Relaxed RDB version check (#1604 ) New config `rdb-version-check` with values: * `strict`: Reject future RDB versions. * `relaxed`: Try parsing future RDB versions and fail only when an unknown RDB opcode or type is encountered. This can make it possible for Valkey 8.1 to try read a dump from for example Valkey 9.0 or later on a best-effort basis. The conditions for when this is expected to work can be defined when the future Valkey versions are released. Loading is expected to fail in the following cases: * If the data set contains any new key types or other data elements not supported by the current version. * If the RDB contains new representations or encodings of existing key types or other data elements. This change also prepares for the next RDB version bump. A range of RDB versions (12-79) is reserved, since it's expected to be used by foreign software RDB versions, so Valkey will not accept versions in this range even with the `relaxed` version check. The DUMP/RESTORE format has no magic string; only the RDB version number. This change also prepares for the magic string to change from REDIS to VALKEY next time we bump the RDB version. Related to #1108. --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2025-01-27 18:44:24 +01:00
Viktor Söderqvist	7699a3a94a	Fix use-after-free in hashtableTwoPhasePopDelete (#1626 ) Use-after-free has been detect by address sanitizer, such as in this test run: https://github.com/valkey-io/valkey/actions/runs/12981530413/job/36200075972?pr=1620#step:5:1339 `hashtableShrinkIfNeeded` may free one of the hash tables and invalidate the variables used by the `fillBucketHole(ht, b, pos_in_bucket, table_index)` just after, causing use-after-free. Fill bucket hole first and shrink afterwards is assumed to solve the issue. (Not reproduced locally.) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-27 15:45:09 +01:00
Viktor Söderqvist	a18fcdb371	Deflake hashtable random fairness test (#1618 ) Fixes the unit test for hashtable random fairness intermittent failures when running with the `--accurate` flag. https://github.com/valkey-io/valkey/actions/runs/12969591890/job/36173815884#step:10:105 The test case picks a random element out of 400, repeated 1M times, and then checks that 60% of the elements are picked within 3 standard deviations from the number of times they're expected to be picked. In this test run (with `--accurate`), the expected number is 2500 and the standard deviation is 50, which is only 2% of the expected value. This makes the check too strict and makes the test flaky. As an alternative, we allow 80% of the elements to be picked within 10% of the expected number. With this alternative condition, we can also raise the check for the non-edge case from 60% to 80% of the elements to be within 3 standard deviations. (With fewer repetitions, 3 standard deviations is greater than 10% of the expected value, so this new condition only affects the `--accurate` test run.) Additional change: Set a random seed to the hash function in the test suite. Until now, we only seeded the random number generator. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-27 10:13:46 +01:00
zhaozhao.zz	3f21705a6c	Feature COMMANDLOG to record slow execution and large request/reply (#1294 ) As discussed in PR #336. We have different types of resources like CPU, memory, network, etc. The `slowlog` can only record commands eat lots of CPU during the processing phase (doesn't include read/write network time), but can not record commands eat too many memory and network. For example: 1. run "SET key value(10 megabytes)" command would not be recored in slowlog, since when processing it the SET command only insert the value's pointer into db dict. But that command eats huge memory in query buffer and bandwidth from network. In this case, just 1000 tps can cause 10GB/s network flow. 2. run "GET key" command and the key's value length is 10 megabytes. The get command can eat huge memory in output buffer and bandwidth to network. This PR introduces a new command `COMMANDLOG`, to log commands that consume significant network bandwidth, including both input and output. Users can retrieve the results using `COMMANDLOG get <count> large-request` and `COMMANDLOG get <count> large-reply`, all subcommands for `COMMANDLOG` are: * `COMMANDLOG HELP` * `COMMANDLOG GET <count> <slow\|large-request\|large-reply>` * `COMMANDLOG LEN <slow\|large-request\|large-reply>` * `COMMANDLOG RESET <slow\|large-request\|large-reply>` And the slowlog is also incorporated into the commandlog. For each of these three types, additional configs have been added for control: * `commandlog-request-larger-than` and `commandlog-large-request-max-len` represent the threshold for large requests(the unit is Bytes) and the maximum number of commands that can be recorded. * `commandlog-reply-larger-than` and `commandlog-large-reply-max-len` represent the threshold for large replies(the unit is Bytes) and the maximum number of commands that can be recorded. * `commandlog-execution-slower-than` and `commandlog-slow-execution-max-len` represent the threshold for slow executions(the unit is microseconds) and the maximum number of commands that can be recorded. * Additionally, `slowlog-log-slower-than` and `slowlog-max-len` are now set as aliases for these two new configs. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Ping Xie <pingxie@outlook.com>	2025-01-24 11:41:40 +08:00
Nadav Gigi	f2510783f9	Accelerate hash table iterator with value prefetching (#1568 ) This PR builds upon the [previous entry prefetching optimization](https://github.com/valkey-io/valkey/pull/1501) to further enhance performance by implementing value prefetching for hashtable iterators. ## Implementation Modified `hashtableInitIterator` to accept a new flags parameter, allowing control over iterator behavior. Implemented conditional value prefetching within `hashtableNext` based on the new `HASHTABLE_ITER_PREFETCH_VALUES` flag. When the flag is set, hashtableNext now calls `prefetchBucketValues` at the start of each new bucket, preemptively loading the values of filled entries into the CPU cache. The actual prefetching of values is performed using type-specific callback functions implemented in `server.c`: - For `robj` the `hashtableObjectPrefetchValue` callback is used to prefetch the value if not embeded. This implementation is specifically focused on main database iterations at this stage. Applying it to hashtables that hold other object types should not be problematic, but its performance benefits for those cases will need to be proven through testing and benchmarking. ## Performance ### Setup: - 64cores Graviton 3 Amazon EC2 instance. - 50 mil keys with different value sizes. - Running valkey server over RAM file system. - crc checksum and comperssion off. ### Action - save command. ### Results The results regarding the duration of “save” command was taken from “info all” command. ``` +--------------------+------------------+------------------+ \| Prefetching \| Value size (byte)\| Time (seconds) \| +--------------------+------------------+------------------+ \| No \| 100 \| 20.112279 \| \| Yes \| 100 \| 12.758519 \| \| No \| 40 \| 16.945366 \| \| Yes \| 40 \| 10.902022 \| \| No \| 20 \| 9.817000 \| \| Yes \| 20 \| 9.626821 \| \| No \| 10 \| 9.71510 \| \| Yes \| 10 \| 9.510565 \| +--------------------+------------------+------------------+ ``` The results largely align with our expectations, showing significant improvements for larger values (100 bytes and 40 bytes) that are stored outside the robj. For smaller values (20 bytes and 10 bytes) that are embedded within the robj, we see almost no improvement, which is as expected. However, the small improvement observed even for these embedded values is somewhat surprising. Given that we are not actively prefetching these embedded values, this minor performance gain was not anticipated. perf record on save command without value prefetching: ``` --99.98%--rdbSaveDb \| \|--91.38%--rdbSaveKeyValuePair \| \| \| \|--42.72%--rdbSaveRawString \| \| \| \| \| \|--26.69%--rdbWriteRaw \| \| \| \| \| \| \| --25.75%--rioFileWrite.lto_priv.0 \| \| \| \| \| --15.41%--rdbSaveLen \| \| \| \| \| \|--7.58%--rdbWriteRaw \| \| \| \| \| \| \| --7.08%--rioFileWrite.lto_priv.0 \| \| \| \| \| \| \| --6.54%--_IO_fwrite \| \| \| \| \| \| \| \| --7.42%--rdbWriteRaw.constprop.1 \| \| \| \| \| --7.18%--rioFileWrite.lto_priv.0 \| \| \| \| \| --6.73%--_IO_fwrite \| \| \| \| \| \|--40.44%--rdbSaveStringObject \| \| \| --7.62%--rdbSaveObjectType \| \| \| --7.39%--rdbWriteRaw.constprop.1 \| \| \| --7.04%--rioFileWrite.lto_priv.0 \| \| \| --6.59%--_IO_fwrite \| \| --7.33%--hashtableNext.constprop.1 \| --6.28%--prefetchNextBucketEntries.lto_priv.0 ``` perf record on save command with value prefetching: ``` rdbSaveRio \| --99.93%--rdbSaveDb \| \|--79.81%--rdbSaveKeyValuePair \| \| \| \|--66.79%--rdbSaveRawString \| \| \| \| \| \|--42.31%--rdbWriteRaw \| \| \| \| \| \| \| --40.74%--rioFileWrite.lto_priv.0 \| \| \| \| \| --23.37%--rdbSaveLen \| \| \| \| \| \|--11.78%--rdbWriteRaw \| \| \| \| \| \| \| --11.03%--rioFileWrite.lto_priv.0 \| \| \| \| \| \| \| --10.30%--_IO_fwrite \| \| \| \| \| \| \| \| \| --10.98%--rdbWriteRaw.constprop.1 \| \| \| \| \| --10.44%--rioFileWrite.lto_priv.0 \| \| \| \| \| --9.74%--_IO_fwrite \| \| \| \| \| \| \|--11.33%--rdbSaveObjectType \| \| \| \| \| --10.96%--rdbWriteRaw.constprop.1 \| \| \| \| \| --10.51%--rioFileWrite.lto_priv.0 \| \| \| \| \| --9.75%--_IO_fwrite \| \| \| \| \| \| --0.77%--rdbSaveStringObject \| --18.39%--hashtableNext \| \|--10.04%--hashtableObjectPrefetchValue \| --6.06%--prefetchNextBucketEntries ``` Conclusions: The prefetching strategy appears to be working as intended, shifting the performance bottleneck from data access to I/O operations. The significant reduction in rdbSaveStringObject time suggests that string objects(which are the values) are being accessed more efficiently. Signed-off-by: NadavGigi <nadavgigi102@gmail.com>	2025-01-23 12:17:20 +01:00
ranshid	3032ccd48a	Change the shared format for dual channel replication logs (#1586 ) change the format of the dual channel replication logs so that it will not conflict with existing log formats like modules. Fixes: https://github.com/valkey-io/valkey/issues/1509 Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-01-20 08:04:47 +02:00
Pierre	2d0b8e3608	Update comments and log message in cluster_legacy.c (#1561 ) Update comments and log message in `cluster_legacy.c`. Follow-up from #1441. Signed-off-by: Pierre Turin <pieturin@amazon.com> Co-authored-by: Ping Xie <pingxie@outlook.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2025-01-17 15:56:52 +08:00
Pierre	c9aea6d2d3	Fix memory leak in forgotten node ping ext code path (#1574 ) When processing a cluster bus PING extension, there is a memory leak when adding a new key to the `nodes_black_list` dict. We now make sure to free the key `sds` if the dict did not take ownership of it. Signed-off-by: Pierre Turin <pieturin@amazon.com>	2025-01-16 15:38:15 -08:00
Harkrishn Patro	87cc3d7a71	Fix cluster info sent stats for message with light header (#1563 ) This issue affected only two message types (CLUSTERMSG_TYPE_PUBLISH and CLUSTERMSG_TYPE_PUBLISHSHARD) because they used a light message header, which caused the CLUSTER INFO stats to miss sent/received message information for those types. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2025-01-16 11:25:37 -08:00
Ricardo Dias	af71619c45	Extract the scripting engine code from the functions unit (#1312 ) This commit creates a new compilation unit for the scripting engine code by extracting the existing code from the functions unit. We're doing this refactor to prepare the code for running the `EVAL` command using different scripting engines. This PR has a module API change: we changed the type of error messages returned by the callback `ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc` to be a `ValkeyModuleString` (aka `robj`); This PR also fixes #1470. --------- Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>	2025-01-16 10:08:16 +01:00
Ray Cao	921ba19acb	Incr expired_keys if the unix-time is already expired for EXPIREAT and other commands(#1517 ) Some commands that use unix-time, such as `EXPIREAT` and `SET EXAT`, should include the deleted keys in the `expired_keys` statistics if the specified time has already expired, and notifications should be sent in the manner of expired. --------- Signed-off-by: Ray Cao <zisong.cw@alibaba-inc.com>	2025-01-16 16:40:34 +08:00
Sarthak Aggarwal	6a8f068e36	Adding Missing filters to CLIENT LIST and Dedup Parsing (#1401 ) Adds filter options to CLIENT LIST: * USER <username> Return clients authenticated by <username>. * ADDR <ip:port> Return clients connected from the specified address. * LADDR <ip:port> Return clients connected to the specified local address. * SKIPME (YES\|NO) Exclude the current client from the list (default: no). * MAXAGE <maxage> Only list connections older than the specified age. Modifies the ID filter to CLIENT KILL to allow multiple IDs * ID <client-id> [<client-id>...] Kill connections by client ids. This makes CLIENT LIST and CLIENT KILL accept the same options. For backward compatibility, the default value for SKIPME is NO for CLIENT LIST and YES for CLIENT KILL. The MAXAGE comes from CLIENT KILL, where it keeps clients with the given max age and kills the older ones. This logic becomes weird for CLIENT LIST, but is kept for similary with CLIENT KILL, for the use case of first testing manually using CLIENT LIST, and then running CLIENT KILL with the same filters. The `ID client-id [client-id ...]` no longer needs to be the last filter. The parsing logic determines if an argument is an ID or not based on whether it can be parsed as an integer or not. Partly addresses: #668 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2025-01-15 20:44:13 +01:00
zhaozhao.zz	c5a1585547	add paused_actions for INFO Clients (#1519 ) Add `paused_actions` and `paused_timeout_milliseconds` for INFO Clients to inform users about if clients are paused. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2025-01-14 19:01:00 +08:00
Viktor Söderqvist	2a1a65b4c7	Introduce const_sds for const-content sds (#1553 ) `sds` is a typedef of `char `. `const sds` means `char const`, i.e. a const-pointer to non-const content. More often, you would want `const char *`, i.e. a pointer to const-content. Until now, it's not possible to express that. This PR adds `const_sds` which is a pointer to const-content sds. To get a const-pointer to const-content sds, you can use `const const_sds`. In this PR, some uses of `const sds` are replaced by `const_sds`. We can use it more later. Fixes #1542 --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-14 10:38:12 +01:00
Rain Valentine	d13aad45f4	Replace dict with new hashtable: hash datatype (#1502 ) This PR replaces dict with the new hashtable data structure in the HASH datatype. There is a new struct for hashtable items which contains a pointer to value sds string and the embedded key sds string. These values were previously stored in dictEntry. This structure is kept opaque so we can easily add small value embedding or other optimizations in the future. closes #1095 --------- Signed-off-by: Rain Valentine <rsg000@gmail.com>	2025-01-13 11:17:16 +01:00
Binbin	11cb8ee27c	Add latency stats around cluster config file operations (#1534 ) When the cluster changes, we need to persist the cluster configuration, and these file IO operations may cause latency. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-11 11:03:10 +08:00
Binbin	10357ceda5	Mark the node as FAIL when the node is marked as NOADDR and broadcast the FAIL (#1191 ) Imagine we have a cluster, for example a three-shard cluster, if shard 1 doing a CLUSTER RESET HARD, it will change the node name, and then other nodes will mark it as NOADR since the node name received by PONG has changed. In the eyes of other nodes, there is one working primary node left but with no address, and in this case, the address report in MOVED will be invalid and will confuse the clients. And in the same time, the replica will not failover since its primary is not in the FAIL state. And the cluster looks OK to everyone. This leaves a cluster that appears OK, but with no coverage for shard 1, obviously we should do something like CLUSTER FORGET to remove the node and fix the cluster before using it. But the point in here, we can mark the NOADDR node as FAIL to advance the cluster state. If a node is NOADDR means it does not have a valid address, so we won't reconnect it, we won't send PING, we won't gossip it, it seems reasonable to mark it as FAIL. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-11 11:02:05 +08:00
Binbin	211b250aad	Do election in order based on failed primary rank to avoid voting conflicts (#1018 ) When multiple primary nodes fail simultaneously, the cluster can not recover within the default effective time (data_age limit). The main reason is that the vote is without ranking among multiple replica nodes, which case too many epoch conflicts. Therefore, we introduced into ranking based on the failed primary shard-id. Introduced a new failed_primary_rank var, this var means the rank of this myself instance in the context of all failed primary list. This var will be used in failover and we will do the failover election packets in order based on the rank, this can effectively avoid the voting conflicts. If a single primary is down, the behavior is the same as before. If multiple primaries are down, their replica election initiation time will be delayed by 500ms according to the ranking. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-11 10:43:18 +08:00
Binbin	d6bdd9e7d7	Fix module LatencyAddSample still work when latency-monitor-threshold is 0 (#1541 ) When latency-monitor-threshold is set to 0, it means the latency monitor is disabled, and in VM_LatencyAddSample, we wrote the condition incorrectly, causing us to record latency when latency was turned off. This bug was introduced in the very first day, see e3b1d6d, it was merged in 2019. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-11 10:32:58 +08:00
Binbin	e60990e579	Fix crash when freeing newly created node when nodeIp2String fail (#1535 ) In #1441, we found a assert, and decided remove this assert and instead just free the newly created node and close the link, since if we cannot get the IP from the link it probably means the connection was closed. ``` === VALKEY BUG REPORT START: Cut & paste starting from here === 17847:M 19 Dec 2024 00:15:58.021 # === ASSERTION FAILED === 17847:M 19 Dec 2024 00:15:58.021 # ==> cluster_legacy.c:3252 'nodeIp2String(node->ip, link, hdr->myip) == C_OK' is not true ------ STACK TRACE ------ 17847 valkey-server * src/valkey-server 127.0.0.1:27131 [cluster](clusterProcessPacket+0x1304) [0x4e5634] src/valkey-server 127.0.0.1:27131 [cluster](clusterReadHandler+0x11e) [0x4e59de] /__w/valkey/valkey/src/valkey-tls.so(+0x2f1e) [0x7f083983ff1e] src/valkey-server 127.0.0.1:27131 [cluster](aeMain+0x8a) [0x41afea] src/valkey-server 127.0.0.1:27131 [cluster](main+0x4d7) [0x40f547] /lib64/libc.so.6(+0x40c8) [0x7f083985a0c8] /lib64/libc.so.6(__libc_start_main+0x8b) [0x7f083985a18b] src/valkey-server 127.0.0.1:27131 [cluster](_start+0x25) [0x410ef5] ``` But it also introduces another assert. The reason is that this new node is not added to the cluster nodes dict. ``` 17128:M 08 Jan 2025 10:51:44.061 # === ASSERTION FAILED === 17128:M 08 Jan 2025 10:51:44.061 # ==> cluster_legacy.c:1693 'dictDelete(server.cluster->nodes, nodename) == DICT_OK' is not true ------ STACK TRACE ------ 17128 valkey-server * src/valkey-server 127.0.0.1:28627 [cluster][0x4ebdc4] src/valkey-server 127.0.0.1:28627 [cluster][0x4e81d2] src/valkey-server 127.0.0.1:28627 [cluster](clusterReadHandler+0x268)[0x4e8618] /__w/valkey/valkey/src/valkey-tls.so(+0xb278)[0x7f109480b278] src/valkey-server 127.0.0.1:28627 [cluster](aeMain+0x89)[0x592b09] src/valkey-server 127.0.0.1:28627 [cluster](main+0x4b3)[0x453e23] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f10958bf7e5] src/valkey-server 127.0.0.1:28627 [cluster](_start+0x2e)[0x454a5e] ``` This closes #1527. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-10 10:19:04 +08:00
Madelyn Olson	d99457c09c	Free the passed in lua context instead of the global (#1536 ) The fix that Redis gave us for the CVE-2024-46981 was freeing lctx.lua, and I didn't merge it correctly. We made some changes so that we are able to async free the lua context, so we need to free the passed in context. This was applied correctly on the two released versions (8.0 and 7.2) just not on unstable. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2025-01-09 14:35:48 +08:00
Karthick Ariyaratnam	80c35402bc	Remove legacy SERVER_TEST compiler flag from cmake. (#1530 ) This PR is to cleanup the `SERVER_TEST` compiler flag from cmake compile definitions, as it is no longer required in the new unit test framework, see #428. Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>	2025-01-09 11:52:45 +08:00
Nadav Gigi	9f4815a224	Accelerate hash table iterator with prefetching (#1501 ) This PR introduces improvements to the hashtable iterator, implementing prefetching technique described in the blog post [Unlock One Million RPS - Part 2](https://valkey.io/blog/unlock-one-million-rps-part2/) . The changes lay the groundwork for further enhancements in use cases involving iterators. Future PRs will build upon this foundation to improve performance and functionality in various iterator-dependent operations. In the pursuit of maximizing iterator performance, I conducted a comprehensive series of experiments. My tests encompassed a wide range of approaches, including processing multiple bucket indices in parallel, prefetching the next bucket upon completion of the current one, and several other timing and quantity variations. Surprisingly, after rigorous testing and performance analysis, the simplest implementation presented in this PR consistently outperformed all other more complex strategies. ## Implementation Each time we start iterating over a bucket, we prefetch data for future iterations: - We prefetch the entries of the next bucket (if it exists). - We prefetch the structure (but not the entries) of the bucket after the next. This prefetching is done when we pick up a new bucket, increasing the chance that the data will be in cache by the time we need it. ## Performance The data below was taken by conducting keys command on 64cores Graviton 3 Amazon EC2 instance with 50 mil keys in size of 100 bytes each. The results regarding the duration of “keys *” command was taken from “info all” command. ``` +--------------------+------------------+-----------------------------+ \| prefetching \| Time (seconds) \| Keys Processed per Second \| +--------------------+------------------+-----------------------------+ \| No \| 11.112279 \| 4,499,529 \| \| Yes \| 3.141916 \| 15,913,862 \| +--------------------+------------------+-----------------------------+ Improvement: Comparing the iterator without prefetching to the one with prefetching, we can see a speed improvement of 11.112279 / 3.141916 ≈ 3.54 times faster. ``` ### Save command improvment #### Setup: - 64cores Graviton 3 Amazon EC2 instance. - 50 mil keys in size of 100 bytes each. - Running valkey server over RAM file system. - crc checksum and comperssion off. #### Results ``` +--------------------+------------------+-----------------------------+ \| prefetching \| Time (seconds) \| Keys Processed per Second \| +--------------------+------------------+-----------------------------+ \| No \| 28 \| 1,785,700 \| \| Yes \| 19.6 \| 2,550,000 \| +--------------------+------------------+-----------------------------+ Improvement: - Reduced SAVE time by 30% (8.4 seconds faster) - Increased key processing rate by 42.8% (764,300 more keys/second) ``` Signed-off-by: NadavGigi <nadavgigi102@gmail.com>	2025-01-08 23:18:55 +01:00
Nikhil Manglore	9e0204941d	valkey-cli auto-exit from subscribed mode (#1432 ) Resolves issue with valkey-cli not auto exiting from subscribed mode on reaching zero pub/sub subscription (previously filed on Redis) https://github.com/redis/redis/issues/12592 --------- Signed-off-by: Nikhil Manglore <nmanglor@amazon.com>	2025-01-08 21:03:06 +01:00
Rain Valentine	ab627d6721	Replace dict with new hashtable: sorted set datatype (#1427 ) This PR replaces dict with hashtable in the ZSET datatype. Instead of mapping key to score as dict did, the hashtable maps key to a node in the skiplist, which contains the score. This takes advantage of hashtable performance improvements and saves 15 bytes per set item - 24 bytes overhead before, 9 bytes after. Closes #1096 --------- Signed-off-by: Rain Valentine <rsg000@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-08 18:34:02 +01:00
uriyage	6c09eea2bc	client struct: lazy init components and optimize struct layout (#1405 ) # Refactor client structure to use modular data components ## Current State The client structure allocates memory for replication / pubsub / multi-keys / module / blocked data for every client, despite these features being used by only a small subset of clients. In addition the current field layout in the client struct is suboptimal, with poor alignment and unnecessary padding between fields, leading to a larger than necessary memory footprint of 896 bytes per client. Furthermore, fields that are frequently accessed together during operations are scattered throughout the struct, resulting in poor cache locality. ## This PR's Change 1. Lazy Initialization - Components are only allocated when first used: - PubSubData: Created on first SUBSCRIBE/PUBLISH operation - ReplicationData: Initialized only for replica connections - ModuleData: Allocated when module interaction begins - BlockingState: Created when first blocking command is issued - MultiState: Initialized on MULTI command 2. Memory Layout Optimization: - Grouped related fields for better locality - Moved rarely accessed fields (e.g., client->name) to struct end - Optimized field alignment to eliminate padding 3. Additional changes: - Moved watched_keys to be static allocated in the `mstate` struct - Relocated replication init logic to replication.c ### Key Benefits - Efficient Memory Usage: - 45% smaller base client structure - Basic clients now use 528 bytes (down from 896). - Better memory locality for related operations - Performance improvement in high throughput scenarios. No performance regressions in other cases. ### Performance Impact Tested with 650 clients and 512 bytes values. #### Single Thread Performance \| Operation \| Dataset \| New (ops/sec) \| Old (ops/sec) \| Change % \| \|------------\|---------\|---------------\|---------------\|-----------\| \| SET \| 1 key \| 261,799 \| 258,261 \| +1.37% \| \| SET \| 3M keys \| 209,134 \| ~209,000 \| ~0% \| \| GET \| 1 key \| 281,564 \| 277,965 \| +1.29% \| \| GET \| 3M keys \| 231,158 \| 228,410 \| +1.20% \| #### 8 IO Threads Performance \| Operation \| Dataset \| New (ops/sec) \| Old (ops/sec) \| Change % \| \|------------\|---------\|---------------\|---------------\|-----------\| \| SET \| 1 key \| 1,331,578 \| 1,331,626 \| -0.00% \| \| SET \| 3M keys \| 1,254,441 \| 1,152,645 \| +8.83% \| \| GET \| 1 key \| 1,293,149 \| 1,289,503 \| +0.28% \| \| GET \| 3M keys \| 1,152,898 \| 1,101,791 \| +4.64% \| #### Pipeline Performance (3M keys) \| Operation \| Pipeline Size \| New (ops/sec) \| Old (ops/sec) \| Change % \| \|-----------\|--------------\|---------------\|---------------\|-----------\| \| SET \| 10 \| 548,964 \| 538,498 \| +1.94% \| \| SET \| 20 \| 606,148 \| 594,872 \| +1.89% \| \| SET \| 30 \| 631,122 \| 616,606 \| +2.35% \| \| GET \| 10 \| 628,482 \| 624,166 \| +0.69% \| \| GET \| 20 \| 687,371 \| 681,659 \| +0.84% \| \| GET \| 30 \| 725,855 \| 721,102 \| +0.66% \| ### Observations: 1. Single-threaded operations show consistent improvements (1-1.4%) 2. Multi-threaded performance shows significant gains for large datasets: - SET with 3M keys: +8.83% improvement - GET with 3M keys: +4.64% improvement 3. Pipeline operations show consistent improvements: - SET operations: +1.89% to +2.35% - GET operations: +0.66% to +0.84% 4. No performance regressions observed in any test scenario Related issue:https://github.com/valkey-io/valkey/issues/761 --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-08 10:28:54 +02:00
Rueian	dc4628d444	Add `availability_zone` to the HELLO command history (#1524 ) This PR is a followup for #1487. Signed-off-by: Rueian <rueiancsie@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2025-01-08 10:04:58 +08:00
Rueian	3b52186b6a	Add `availability_zone` to the HELLO response (#1487 ) It's inconvenient for client implementations to extract the `availability_zone` information from the `INFO` response. The `INFO` response contains a lot of information that a client implementation typically doesn't need. This PR adds the availability zone to the `HELLO` response. Clients usually already use the `HELLO` command for protocol negotiation and also get the server `version` and `role` from its response. To keep the `HELLO` response small, the field is only added if availability zone is configured. --------- Signed-off-by: Rueian <rueiancsie@gmail.com>	2025-01-07 22:54:55 +01:00
Madelyn Olson	4ffd3ebdeb	Fix LUA garbage collector (CVE-2024-46981) (#1513 ) Reset GC state before closing the lua VM to prevent user data to be wrongly freed while still might be used on destructor callbacks. Created and publish by Redis in their OSS branch. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>	2025-01-06 14:02:22 -08:00
Madelyn Olson	7977c55ac9	Fix Read/Write key pattern selector (CVE-2024-51741) (#1514 ) The explanation on the original commit was wrong. Key based access must have a `~` in order to correctly configure whey key prefixes to apply the selector to. If this is missing, a server assert will be triggered later. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>	2025-01-06 14:02:16 -08:00
Binbin	c0014ef15e	Check whether to switch to fail when setting the node to pfail in cron (#1061 ) This may speed up the transition to the fail state a bit. Previously we would only check when we received a pfail/fail report from others in gossip. If myself is the last vote, we can directly switch to fail in here without waiting for the next gossip packet. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-06 09:26:17 +08:00
Binbin	33b824137e	Explicitly check C_ERR condition to improve readability in clusterSaveConfig (#1505 ) It's not obvious to see it at first, modify it to use C_ERR. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-04 10:47:32 +08:00
eifrah-aws	b3b4bdcda4	CMake: fail on warnings (#1503 ) When building with `CMake` (especially the targets `valkey-cli`, `valkey-server` and `valkey-benchmark`) it is possible to have a successful build while having warnings. This PR fixes this - which is aligned with how the `Makefile` is working today: - Enable `-Wall` + `-Werror` for valkey targets - Fixed warning in valkey-cli:jsonStringOutput method Signed-off-by: Eran Ifrah <eifrah@amazon.com>	2025-01-03 09:44:41 +08:00
gmbnomis	26a72fa89c	Use the correct command proc for the LOOKUP_NOTOUCH exception in lookupKey (#1499 ) When looking up a key in no-touch mode, `LOOKUP_NOTOUCH` is set to avoid updating the last access time in `lookupKey`. An exception must be made for the `TOUCH` command which must always update the key. When called from a script, `server.executing_client` will point to the `TOUCH` command, while `server.current_client` will point to e.g. an `EVAL` command. So, we must use the former to find out the currently executing command if defined. This fix addresses the issue where TOUCH wasn't updating key access times when called from scripts like EVAL. Fixes #1498 Signed-off-by: Simon Baatz <gmbnomis@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2025-01-03 09:41:15 +08:00
Ricardo Dias	8d764f27b3	Refactor: move all valkey modules related declarations to `module.h` (#1489 ) In this commit we move all structures and functions declarations related to Valkey modules from `server.h` to the recently added `module.h` file. This re-organization makes it easier for new contributors to find the valkey modules related code, as well as reducing the compilation times when changes are made to the modules code. --------- Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>	2025-01-02 18:35:10 +01:00
uriyage	35abb68b79	Offload reading the replication stream to IO threads (#1449 ) Support Primary client IO offload. Related issue: https://github.com/valkey-io/valkey/issues/761 --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com>	2025-01-02 10:42:39 +01:00
uriyage	ae70c5459b	replication: fix io-threads possible race by moving waitForClientIO (#1422 ) ### Fix race with pending writes in replica state transition #### The Problem In #60 (Dual channel replication) a new `connWrite` call was added before the `waitForClientIO` check. This created a race condition where the main thread may attempt to write to a client that could have pending writes in IO threads. #### The Fix Moved the `waitForClientIO()` call earlier in `syncCommand`, before any `connWrite` call. This ensures all pending IO operations are completed before attempting to write to the client. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com>	2025-01-02 10:01:55 +02:00
ranshid	0f273bb648	Align rejected unblocked commands to update the correct error statistic (#577 ) Currently, in case a blocked command is unblocked externally (eg. due to the relevant slot being migrated or the CLIENT UNBLOCK command was issued, the command statistics will always update the failed_calls error statistic. This leads to missalignment with `90b9f08e5d` as well as some inconsistencies. For example when a key is migrated during cluster slot migration, clients blocked on XREADGROUP will be unblocked and update the rejected_calls stat, while clients blocked on BLPOP will get unblocked updating the failed_calls stat. In this PR we add explicit indication in updateStatsOnUnblock thet indicates if the command was rejected or failed. --------- Signed-off-by: ranshid <ranshid@amazon.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-01-01 16:33:09 +02:00
zhenwei pi	a136ad9a50	Make global configs as static (#1159 ) Don't expose static configs symbol, and make configEnumGetValue as static function. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>	2024-12-30 15:58:06 -05:00
Pierre	e4179f1f3b	Only (re-)send MEET packet once every handshake timeout period (#1441 ) Add `meet_sent` field in `clusterNode` indicating the last time we sent a MEET packet. Use this field to only (re-)send a MEET packet once every handshake timeout period when detecting a node without an inbound link. When receiving multiple MEET packets on the same link while the node is in handshake state, instead of dropping the packet, we now simply prevent the creation of a new node. This way we still process the MEET packet's gossip and reply with a PONG as any other packets. Improve some logging messages to include `human_nodename`. Add `nodeExceedsHandshakeTimeout()` function. This is a follow-up to this previous PR: https://github.com/valkey-io/valkey/pull/1307 And a partial fix to the crash described in: https://github.com/valkey-io/valkey/pull/1436 --------- Signed-off-by: Pierre Turin <pieturin@amazon.com>	2024-12-30 15:56:39 -05:00
Madelyn Olson	e470735d91	Immediately restart the defrag cycle if we still need to defrag (#1492 )	2024-12-29 08:22:49 -08:00
gmbnomis	8b40341295	Fix JSON description of SET command (#1473 ) In the `arguments` section, the `arguments` key is only used for arguments of type `block` or `oneof`. Consequently, the `arguments` given for `IFEQ` are ignored by the server. However, they lead to strange results when rendering the command's page for the web documentation. Fix this by removing `arguments` for `IFEQ`. Signed-off-by: Simon Baatz <gmbnomis@gmail.com>	2024-12-27 00:55:20 +01:00
uriyage	bb325bde35	Fix restore replica output bytes stat update (#1486 ) This PR fixes the missing stat update for `total_net_repl_output_bytes` that was removed during the refactoring in PR #758. The metric was not being updated when writing to replica connections. Changes: - Restored the stat update in postWriteToClient for replica connections - Added integration test to verify the metric is properly updated Signed-off-by: Uri Yagelnik <uriy@amazon.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2024-12-25 10:58:49 +08:00
Binbin	da92c1d6c8	Document all command flags near serverCommand (#1474 ) These flags are not documented here. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-12-25 10:57:42 +08:00
Amit Nagler	9f4503ca50	Add scoped RDB loading context and immediate abort flag (#1173 ) This PR introduces a new mechanism for temporarily changing the server's loading_rio context during RDB loading operations. The new `RDB_SCOPED_LOADING_RIO` macro allows for a scoped change of the `server.loading_rio` value, ensuring that it's automatically restored to its original value when the scope ends. Introduces a dedicated flag to `rio` to signal immediate abort, preventing potential use-after-free scenarios during replication disconnection in dual-channel load. This ensures proper termination of `rdbLoadRioWithLoadingCtx` when replication is cancelled due to connection loss on main connection. Fixes https://github.com/valkey-io/valkey/issues/1152 --------- Signed-off-by: naglera <anagler123@gmail.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Amit Nagler <58042354+naglera@users.noreply.github.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>	2024-12-24 08:14:32 +02:00
Madelyn Olson	2ee06e7983	Remove readability refactor for failover auth to fix clang warning (#1481 ) As part of #1463, I made a small refactor between the PR and the daily test I submitted to try to improve readability by adding a function to abstract the extraction of the message types. However, that change apparently caused GCC to throw another warning, so reverting the abstraction on just one line. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-12-24 13:07:15 +08:00
Ricardo Dias	6adef8e2f9	Adds support for scripting engines as Valkey modules (#1277 ) This PR extends the module API to support the addition of different scripting engines to execute user defined functions. The scripting engine can be implemented as a Valkey module, and can be dynamically loaded with the `loadmodule` config directive, or with the `MODULE LOAD` command. This PR also adds an example of a dummy scripting engine module, to show how to use the new module API. The dummy module is implemented in `tests/modules/helloscripting.c`. The current module API support, only allows to load scripting engines to run functions using `FCALL` command. The additions to the module API are the following: ```c /* This struct represents a scripting engine function that results from the * compilation of a script by the engine implementation. / struct ValkeyModuleScriptingEngineCompiledFunction typedef ValkeyModuleScriptingEngineCompiledFunction (ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc)( ValkeyModuleScriptingEngineCtx engine_ctx, const char code, size_t timeout, size_t out_num_compiled_functions, char err); typedef void (ValkeyModuleScriptingEngineCallFunctionFunc)( ValkeyModuleCtx module_ctx, ValkeyModuleScriptingEngineCtx engine_ctx, ValkeyModuleScriptingEngineFunctionCtx func_ctx, void compiled_function, ValkeyModuleString keys, size_t nkeys, ValkeyModuleString args, size_t nargs); typedef size_t (ValkeyModuleScriptingEngineGetUsedMemoryFunc)( ValkeyModuleScriptingEngineCtx engine_ctx); typedef size_t (ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc)( void compiled_function); typedef size_t (ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc)( ValkeyModuleScriptingEngineCtx engine_ctx); typedef void (ValkeyModuleScriptingEngineFreeFunctionFunc)( ValkeyModuleScriptingEngineCtx engine_ctx, void compiled_function); / This struct stores the callback functions implemented by the scripting * engine to provide the functionality for the `FUNCTION ` commands. / typedef struct ValkeyModuleScriptingEngineMethodsV1 { uint64_t version; /* Version of this structure for ABI compat. / / Library create function callback. When a new script is loaded, this * callback will be called with the script code, and returns a list of * ValkeyModuleScriptingEngineCompiledFunc objects. / ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc create_functions_library; / The callback function called when `FCALL` command is called on a function * registered in this engine. / ValkeyModuleScriptingEngineCallFunctionFunc call_function; / Function callback to get current used memory by the engine. / ValkeyModuleScriptingEngineGetUsedMemoryFunc get_used_memory; / Function callback to return memory overhead for a given function. / ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc get_function_memory_overhead; / Function callback to return memory overhead of the engine. / ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc get_engine_memory_overhead; / Function callback to free the memory of a registered engine function. / ValkeyModuleScriptingEngineFreeFunctionFunc free_function; } ValkeyModuleScriptingEngineMethodsV1; / Registers a new scripting engine in the server. * * - `engine_name`: the name of the scripting engine. This name will match * against the engine name specified in the script header using a shebang. * * - `engine_ctx`: engine specific context pointer. * * - `engine_methods`: the struct with the scripting engine callback functions * pointers. / int ValkeyModule_RegisterScriptingEngine(ValkeyModuleCtx ctx, const char engine_name, void engine_ctx, ValkeyModuleScriptingEngineMethods engine_methods); /* Removes the scripting engine from the server. * * `engine_name` is the name of the scripting engine. * / int ValkeyModule_UnregisterScriptingEngine(ValkeyModuleCtx ctx, const char *engine_name); ``` --------- Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>	2024-12-21 23:09:35 +01:00
Madelyn Olson	1c97317518	Resolve bounds checks on cluster_legacy.c (#1463 ) We are getting a number of errors like: ``` array subscript ‘clusterMsg[0]’ is partly outside array bounds of ‘unsigned char[2272]’ ``` Which is basically GCC telling us that we have an object which is longer than the underlying storage of the allocation. We actually do this a lot, but GCC is generally not aware of how big the underlying allocation is, so it doesn't throw this error. We are specifically getting this error because the msgBlock can be of variable length depending on the type of message, but GCC assumes it's the longest one possible. The solution I went with here was make the message type optional, so that it wasn't included in the size. I think this also makes some sense, since it's really just a helper for us to easily cast the object around. I considered disabling this error, but it is generally pretty useful since it can catch real issues. Another solution would be to over-allocate to the largest possible object, which could hurt performance as we initialize it to zero. Results: https://github.com/madolson/valkey/actions/runs/12423414811/job/34686899884 This is a slightly cleaned up version of https://github.com/valkey-io/valkey/pull/1439. I thought I had another strategy but alas, it didn't work out. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-12-20 12:10:48 -08:00
Binbin	ca0b0c662a	Clear outdated failure reports more accurately (#1184 ) There are two changes here: 1. The one in clusterNodeCleanupFailureReports, only primary with slots can report the failure report, if the primary became a replica its failure report should be cleared. This may lead to inaccurate node fail judgment in some network partition cases i guess, it will also affect the CLUSTER COUNT-FAILURE-REPORTS command. 2. The one in clusterProcessGossipSection, it is not that important, but it can print a "node is back online" log helps us troubleshoot the problem, although it may conflict with 1 at some points. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-12-20 10:14:01 +08:00
Jungwoo Song	e9a1fe0b32	Support for reading from replicas in valkey-benchmark (#1392 ) Background When conducting performance tests using `valkey-benchmark`, reading from replicas was not supported. Consequently, even in cluster mode, all reads were directed to the primary nodes. This limitation made it challenging to obtain accurate metrics during workload stress testing for performance measurement or before a version upgrade. Related issue : https://github.com/valkey-io/valkey/issues/900 Changes 1. Replaced the use of `CLUSTER NODES` with `CLUSTER SLOTS` when fetching cluster configuration. This allows for easier identification of replica slots. 2. Support for reading from replicas by executing the client in `READONLY` mode. 3. Support reading from replicas even during slot migrations. 4. Introduced two CLI options `--rfr` to enable reading from replicas only or all cluster nodes. A warning added to indicate that write requests might not be handled correctly when using this option. --------- Signed-off-by: bluayer <ijacsong98@gmail.com> Signed-off-by: bluayer <bluayer@gmail.com> Signed-off-by: Jungwoo Song <37579681+bluayer@users.noreply.github.com> Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>	2024-12-19 18:32:31 +02:00

1 2 3 4 5 ...

9391 Commits