futriix

Author	SHA1	Message	Date
Madelyn Olson	d3aabd7f13	Hex encode the data in dump test (#1637 ) Addresses the failure here: https://github.com/valkey-io/valkey/actions/runs/13000845302/job/36259016156#step:5:7272. This change does three things: 1. For some reason TCL 8.5 (which is used on macos) is handling `\x03ba` as `\0xba`, according to https://www.tcl-lang.org/man/tcl8.5/TclCmd/Tcl.htm#M27 so we encode "bar" using hex escapes too. 2. Fix a spacing issue. 3. Make it so that if the restore fails, it immediately errors. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-29 16:20:38 -05:00
Madelyn Olson	f695c52acb	Fix timing issue in pause test (#1631 )	2025-01-28 06:35:24 -08:00
ranshid	230efa4fbf	deflake tracking-redir-broken test (#1628 ) This address 2 issues: 1. It is possible (somehow) that the inner server client (r) was not working resp 3 when entering this test. this makes sure it does. 2. in case the test failed it might leave the redirection client closed. there is a cross test assumption it should be open, so moved most of the assert checks to the end of the test. example fail: https://github.com/valkey-io/valkey/actions/runs/12979601179/job/36195523412 --------- Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-01-28 12:35:32 +02:00
Viktor Söderqvist	e9b8970e72	Relaxed RDB version check (#1604 ) New config `rdb-version-check` with values: * `strict`: Reject future RDB versions. * `relaxed`: Try parsing future RDB versions and fail only when an unknown RDB opcode or type is encountered. This can make it possible for Valkey 8.1 to try read a dump from for example Valkey 9.0 or later on a best-effort basis. The conditions for when this is expected to work can be defined when the future Valkey versions are released. Loading is expected to fail in the following cases: * If the data set contains any new key types or other data elements not supported by the current version. * If the RDB contains new representations or encodings of existing key types or other data elements. This change also prepares for the next RDB version bump. A range of RDB versions (12-79) is reserved, since it's expected to be used by foreign software RDB versions, so Valkey will not accept versions in this range even with the `relaxed` version check. The DUMP/RESTORE format has no magic string; only the RDB version number. This change also prepares for the magic string to change from REDIS to VALKEY next time we bump the RDB version. Related to #1108. --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2025-01-27 18:44:24 +01:00
Viktor Söderqvist	66577573f2	Test coverage for COMMANDLOG HELP (#1617 ) Fixes reply-schema-validator test job which needs coverage for all commands. Failing job: https://github.com/valkey-io/valkey/actions/runs/12969591890/job/36173810824 Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-27 04:38:54 +01:00
zhaozhao.zz	3f21705a6c	Feature COMMANDLOG to record slow execution and large request/reply (#1294 ) As discussed in PR #336. We have different types of resources like CPU, memory, network, etc. The `slowlog` can only record commands eat lots of CPU during the processing phase (doesn't include read/write network time), but can not record commands eat too many memory and network. For example: 1. run "SET key value(10 megabytes)" command would not be recored in slowlog, since when processing it the SET command only insert the value's pointer into db dict. But that command eats huge memory in query buffer and bandwidth from network. In this case, just 1000 tps can cause 10GB/s network flow. 2. run "GET key" command and the key's value length is 10 megabytes. The get command can eat huge memory in output buffer and bandwidth to network. This PR introduces a new command `COMMANDLOG`, to log commands that consume significant network bandwidth, including both input and output. Users can retrieve the results using `COMMANDLOG get <count> large-request` and `COMMANDLOG get <count> large-reply`, all subcommands for `COMMANDLOG` are: * `COMMANDLOG HELP` * `COMMANDLOG GET <count> <slow\|large-request\|large-reply>` * `COMMANDLOG LEN <slow\|large-request\|large-reply>` * `COMMANDLOG RESET <slow\|large-request\|large-reply>` And the slowlog is also incorporated into the commandlog. For each of these three types, additional configs have been added for control: * `commandlog-request-larger-than` and `commandlog-large-request-max-len` represent the threshold for large requests(the unit is Bytes) and the maximum number of commands that can be recorded. * `commandlog-reply-larger-than` and `commandlog-large-reply-max-len` represent the threshold for large replies(the unit is Bytes) and the maximum number of commands that can be recorded. * `commandlog-execution-slower-than` and `commandlog-slow-execution-max-len` represent the threshold for slow executions(the unit is microseconds) and the maximum number of commands that can be recorded. * Additionally, `slowlog-log-slower-than` and `slowlog-max-len` are now set as aliases for these two new configs. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Ping Xie <pingxie@outlook.com>	2025-01-24 11:41:40 +08:00
ranshid	7fc958da52	fix test Protocol desync regression test with TLS (#1593 ) remove socket nonblocking and simplify the validation fixes https://github.com/valkey-io/valkey/issues/1592 Signed-off-by: ranshid <ranshid@amazon.com>	2025-01-21 08:57:01 +02:00
ranshid	dd92d079dc	Fix Protocol desync regression test (#1590 ) The desync regression test was created as a regression test for the following bug: in case we embed NULL termination inside inline/multi-bulk message we will not be able to perform strchr in order to identify the newline(\n)/carriage-return(\r) in the client query buffer. this can influence (for example) replica reading primary stream and keep filling it's query buffer endlessly consuming more and more memory. In order to handle the above risk, a check was added to verify the inline bulk and multi-bulk size are not exceeding the 64K bytes in the query-buffer. A test was placed in order to verify this. This PR introduce the following fixes to the desync regression test: 1. fix the sent payload to flush 1024 bytes block of 'A's instead of 'payload' which was sent by mistake. 2. Make sure that the connection is correctly terminated on protocol error by the server after exceeding the 64K and not over 64K. 3. add another test intrinsic which will also verify the nested bulk with embedded null termination (was not verified before) fixes https://github.com/valkey-io/valkey/issues/1583 NOTE: Although it is possible to change the use of strchr to a more "safe" utility (eg memchr) which will not pause scan at first occurrence of '\0', we still like to protect against over excessive usage of the query buffer and also preserve the current behavior(?). We will look into improving this though in a followup issue. --------- Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>	2025-01-20 20:28:45 +02:00
Viktor Söderqvist	b2e4155f54	Lower latenct-monitor-threashold in expire-cycle test case (#1584 ) The test case checks for expire-cycle in LATENCY LATEST, but with the new hash table, the expiry-cycle is too fast to be logged by latency monitor. Lower the latency monitor threshold to make it more likely to be logged. Fixes #1580 --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-19 19:23:00 +01:00
Harkrishn Patro	87cc3d7a71	Fix cluster info sent stats for message with light header (#1563 ) This issue affected only two message types (CLUSTERMSG_TYPE_PUBLISH and CLUSTERMSG_TYPE_PUBLISHSHARD) because they used a light message header, which caused the CLUSTER INFO stats to miss sent/received message information for those types. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2025-01-16 11:25:37 -08:00
Ricardo Dias	af71619c45	Extract the scripting engine code from the functions unit (#1312 ) This commit creates a new compilation unit for the scripting engine code by extracting the existing code from the functions unit. We're doing this refactor to prepare the code for running the `EVAL` command using different scripting engines. This PR has a module API change: we changed the type of error messages returned by the callback `ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc` to be a `ValkeyModuleString` (aka `robj`); This PR also fixes #1470. --------- Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>	2025-01-16 10:08:16 +01:00
Ray Cao	921ba19acb	Incr expired_keys if the unix-time is already expired for EXPIREAT and other commands(#1517 ) Some commands that use unix-time, such as `EXPIREAT` and `SET EXAT`, should include the deleted keys in the `expired_keys` statistics if the specified time has already expired, and notifications should be sent in the manner of expired. --------- Signed-off-by: Ray Cao <zisong.cw@alibaba-inc.com>	2025-01-16 16:40:34 +08:00
Sarthak Aggarwal	6a8f068e36	Adding Missing filters to CLIENT LIST and Dedup Parsing (#1401 ) Adds filter options to CLIENT LIST: * USER <username> Return clients authenticated by <username>. * ADDR <ip:port> Return clients connected from the specified address. * LADDR <ip:port> Return clients connected to the specified local address. * SKIPME (YES\|NO) Exclude the current client from the list (default: no). * MAXAGE <maxage> Only list connections older than the specified age. Modifies the ID filter to CLIENT KILL to allow multiple IDs * ID <client-id> [<client-id>...] Kill connections by client ids. This makes CLIENT LIST and CLIENT KILL accept the same options. For backward compatibility, the default value for SKIPME is NO for CLIENT LIST and YES for CLIENT KILL. The MAXAGE comes from CLIENT KILL, where it keeps clients with the given max age and kills the older ones. This logic becomes weird for CLIENT LIST, but is kept for similary with CLIENT KILL, for the use case of first testing manually using CLIENT LIST, and then running CLIENT KILL with the same filters. The `ID client-id [client-id ...]` no longer needs to be the last filter. The parsing logic determines if an argument is an ID or not based on whether it can be parsed as an integer or not. Partly addresses: #668 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2025-01-15 20:44:13 +01:00
zhaozhao.zz	c5a1585547	add paused_actions for INFO Clients (#1519 ) Add `paused_actions` and `paused_timeout_milliseconds` for INFO Clients to inform users about if clients are paused. --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2025-01-14 19:01:00 +08:00
secwall	fdc89c56b7	Escape unix socket group in unit tests (#1554 ) In some cases unix groups could have whitespace and/or `\` in them. One example is my workstation. It's a MacOS in an Active Directory domain. So my user has group `LD\Domain Users`. Running `make test` on `unstable` and `8.0` branches fails with: I'm not sure if we need to fix this in 8.0. But it seems that it should be fixed in unstable. Signed-off-by: secwall <secwall@yandex-team.ru>	2025-01-13 20:05:04 -08:00
Viktor Söderqvist	dc9ca1b98d	Test coverage for ECHO for reply schema validation (#1549 ) After #1545 disabled some tests for reply schema validation, we now have another issue that ECHO is not covered. ``` WARNING! The following commands were not hit at all: echo ERROR! at least one command was not hit by the tests ``` This patch adds a test case for ECHO in the unit/other test suite. I haven't checked if there are more commands that aren't covered. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-13 10:14:09 +08:00
Binbin	11cb8ee27c	Add latency stats around cluster config file operations (#1534 ) When the cluster changes, we need to persist the cluster configuration, and these file IO operations may cause latency. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-11 11:03:10 +08:00
Binbin	10357ceda5	Mark the node as FAIL when the node is marked as NOADDR and broadcast the FAIL (#1191 ) Imagine we have a cluster, for example a three-shard cluster, if shard 1 doing a CLUSTER RESET HARD, it will change the node name, and then other nodes will mark it as NOADR since the node name received by PONG has changed. In the eyes of other nodes, there is one working primary node left but with no address, and in this case, the address report in MOVED will be invalid and will confuse the clients. And in the same time, the replica will not failover since its primary is not in the FAIL state. And the cluster looks OK to everyone. This leaves a cluster that appears OK, but with no coverage for shard 1, obviously we should do something like CLUSTER FORGET to remove the node and fix the cluster before using it. But the point in here, we can mark the NOADDR node as FAIL to advance the cluster state. If a node is NOADDR means it does not have a valid address, so we won't reconnect it, we won't send PING, we won't gossip it, it seems reasonable to mark it as FAIL. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-11 11:02:05 +08:00
Binbin	211b250aad	Do election in order based on failed primary rank to avoid voting conflicts (#1018 ) When multiple primary nodes fail simultaneously, the cluster can not recover within the default effective time (data_age limit). The main reason is that the vote is without ranking among multiple replica nodes, which case too many epoch conflicts. Therefore, we introduced into ranking based on the failed primary shard-id. Introduced a new failed_primary_rank var, this var means the rank of this myself instance in the context of all failed primary list. This var will be used in failover and we will do the failover election packets in order based on the rank, this can effectively avoid the voting conflicts. If a single primary is down, the behavior is the same as before. If multiple primaries are down, their replica election initiation time will be delayed by 500ms according to the ranking. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-11 10:43:18 +08:00
Binbin	d6bdd9e7d7	Fix module LatencyAddSample still work when latency-monitor-threshold is 0 (#1541 ) When latency-monitor-threshold is set to 0, it means the latency monitor is disabled, and in VM_LatencyAddSample, we wrote the condition incorrectly, causing us to record latency when latency was turned off. This bug was introduced in the very first day, see e3b1d6d, it was merged in 2019. Signed-off-by: Binbin <binloveplay1314@qq.com>	2025-01-11 10:32:58 +08:00
Rueian	0a89571dcc	Skip logreqres on tests for the HELLO command (#1528 ) Skip logreqres on tests for the HELLO command Signed-off-by: Rueian <rueiancsie@gmail.com>	2025-01-08 10:05:20 -08:00
Rueian	3b52186b6a	Add `availability_zone` to the HELLO response (#1487 ) It's inconvenient for client implementations to extract the `availability_zone` information from the `INFO` response. The `INFO` response contains a lot of information that a client implementation typically doesn't need. This PR adds the availability zone to the `HELLO` response. Clients usually already use the `HELLO` command for protocol negotiation and also get the server `version` and `role` from its response. To keep the `HELLO` response small, the field is only added if availability zone is configured. --------- Signed-off-by: Rueian <rueiancsie@gmail.com>	2025-01-07 22:54:55 +01:00
Madelyn Olson	e1db553834	Add tests for acl selectors with no permissions or patterns (#1515 ) Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2025-01-06 15:46:55 -08:00
Madelyn Olson	7977c55ac9	Fix Read/Write key pattern selector (CVE-2024-51741) (#1514 ) The explanation on the original commit was wrong. Key based access must have a `~` in order to correctly configure whey key prefixes to apply the selector to. If this is missing, a server assert will be triggered later. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>	2025-01-06 14:02:16 -08:00
gmbnomis	26a72fa89c	Use the correct command proc for the LOOKUP_NOTOUCH exception in lookupKey (#1499 ) When looking up a key in no-touch mode, `LOOKUP_NOTOUCH` is set to avoid updating the last access time in `lookupKey`. An exception must be made for the `TOUCH` command which must always update the key. When called from a script, `server.executing_client` will point to the `TOUCH` command, while `server.current_client` will point to e.g. an `EVAL` command. So, we must use the former to find out the currently executing command if defined. This fix addresses the issue where TOUCH wasn't updating key access times when called from scripts like EVAL. Fixes #1498 Signed-off-by: Simon Baatz <gmbnomis@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2025-01-03 09:41:15 +08:00
ranshid	0f273bb648	Align rejected unblocked commands to update the correct error statistic (#577 ) Currently, in case a blocked command is unblocked externally (eg. due to the relevant slot being migrated or the CLIENT UNBLOCK command was issued, the command statistics will always update the failed_calls error statistic. This leads to missalignment with `90b9f08e5d` as well as some inconsistencies. For example when a key is migrated during cluster slot migration, clients blocked on XREADGROUP will be unblocked and update the rejected_calls stat, while clients blocked on BLPOP will get unblocked updating the failed_calls stat. In this PR we add explicit indication in updateStatsOnUnblock thet indicates if the command was rejected or failed. --------- Signed-off-by: ranshid <ranshid@amazon.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2025-01-01 16:33:09 +02:00
Pierre	e4179f1f3b	Only (re-)send MEET packet once every handshake timeout period (#1441 ) Add `meet_sent` field in `clusterNode` indicating the last time we sent a MEET packet. Use this field to only (re-)send a MEET packet once every handshake timeout period when detecting a node without an inbound link. When receiving multiple MEET packets on the same link while the node is in handshake state, instead of dropping the packet, we now simply prevent the creation of a new node. This way we still process the MEET packet's gossip and reply with a PONG as any other packets. Improve some logging messages to include `human_nodename`. Add `nodeExceedsHandshakeTimeout()` function. This is a follow-up to this previous PR: https://github.com/valkey-io/valkey/pull/1307 And a partial fix to the crash described in: https://github.com/valkey-io/valkey/pull/1436 --------- Signed-off-by: Pierre Turin <pieturin@amazon.com>	2024-12-30 15:56:39 -05:00
Ricardo Dias	6adef8e2f9	Adds support for scripting engines as Valkey modules (#1277 ) This PR extends the module API to support the addition of different scripting engines to execute user defined functions. The scripting engine can be implemented as a Valkey module, and can be dynamically loaded with the `loadmodule` config directive, or with the `MODULE LOAD` command. This PR also adds an example of a dummy scripting engine module, to show how to use the new module API. The dummy module is implemented in `tests/modules/helloscripting.c`. The current module API support, only allows to load scripting engines to run functions using `FCALL` command. The additions to the module API are the following: ```c /* This struct represents a scripting engine function that results from the * compilation of a script by the engine implementation. / struct ValkeyModuleScriptingEngineCompiledFunction typedef ValkeyModuleScriptingEngineCompiledFunction (ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc)( ValkeyModuleScriptingEngineCtx engine_ctx, const char code, size_t timeout, size_t out_num_compiled_functions, char err); typedef void (ValkeyModuleScriptingEngineCallFunctionFunc)( ValkeyModuleCtx module_ctx, ValkeyModuleScriptingEngineCtx engine_ctx, ValkeyModuleScriptingEngineFunctionCtx func_ctx, void compiled_function, ValkeyModuleString keys, size_t nkeys, ValkeyModuleString args, size_t nargs); typedef size_t (ValkeyModuleScriptingEngineGetUsedMemoryFunc)( ValkeyModuleScriptingEngineCtx engine_ctx); typedef size_t (ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc)( void compiled_function); typedef size_t (ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc)( ValkeyModuleScriptingEngineCtx engine_ctx); typedef void (ValkeyModuleScriptingEngineFreeFunctionFunc)( ValkeyModuleScriptingEngineCtx engine_ctx, void compiled_function); / This struct stores the callback functions implemented by the scripting * engine to provide the functionality for the `FUNCTION ` commands. / typedef struct ValkeyModuleScriptingEngineMethodsV1 { uint64_t version; /* Version of this structure for ABI compat. / / Library create function callback. When a new script is loaded, this * callback will be called with the script code, and returns a list of * ValkeyModuleScriptingEngineCompiledFunc objects. / ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc create_functions_library; / The callback function called when `FCALL` command is called on a function * registered in this engine. / ValkeyModuleScriptingEngineCallFunctionFunc call_function; / Function callback to get current used memory by the engine. / ValkeyModuleScriptingEngineGetUsedMemoryFunc get_used_memory; / Function callback to return memory overhead for a given function. / ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc get_function_memory_overhead; / Function callback to return memory overhead of the engine. / ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc get_engine_memory_overhead; / Function callback to free the memory of a registered engine function. / ValkeyModuleScriptingEngineFreeFunctionFunc free_function; } ValkeyModuleScriptingEngineMethodsV1; / Registers a new scripting engine in the server. * * - `engine_name`: the name of the scripting engine. This name will match * against the engine name specified in the script header using a shebang. * * - `engine_ctx`: engine specific context pointer. * * - `engine_methods`: the struct with the scripting engine callback functions * pointers. / int ValkeyModule_RegisterScriptingEngine(ValkeyModuleCtx ctx, const char engine_name, void engine_ctx, ValkeyModuleScriptingEngineMethods engine_methods); /* Removes the scripting engine from the server. * * `engine_name` is the name of the scripting engine. * / int ValkeyModule_UnregisterScriptingEngine(ValkeyModuleCtx ctx, const char *engine_name); ``` --------- Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>	2024-12-21 23:09:35 +01:00
Madelyn Olson	b56f4f70d2	Update info.tcl test to revert client output limits sooner (#1462 ) We set the client output buffer limits to 10 bytes, and then execute `info stats` which produces more than 10 bytes of output, which can cause that command to throw an error. I'm not sure why it wasn't consistently erroring before, might have been some change related to the ubuntu upgrade though. Issues related to ubuntu-tls are hopefully resolved now. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-12-20 10:16:46 +08:00
Madelyn Olson	60197b30e2	Attempt to read secondary error from info test (#1452 ) The test attempts to write 1MB of data in order to trigger a disconnect. Normally, the data is fully flushed and we get the error on the read (I/O error). However, it's possible we might fail the write, which leaves the client in an inconsistent state. On the next command, we finally process the I/O error on the FD. So, the simple fix is to consume any secondary errors. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-12-18 09:17:11 -08:00
ranshid	ba25b586d5	Introduce FORCE_DEFRAG compilation option to allow activedefrag run when allocator is not jemalloc (#1303 ) Introduce compile time option to force activedefrag to run even when jemalloc is not used as the allocator. This is in order to be able to run tests with defrag enabled while using memory instrumentation tools. fixes: https://github.com/valkey-io/valkey/issues/1241 --------- Signed-off-by: ranshid <ranshid@amazon.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-12-17 19:07:55 +02:00
Binbin	ad24220681	Automatic failover vote is not limited by two times the node timeout (#1356 ) This is a follow of #1305, we now decided to apply the same change to automatic failover as well, that is, move forward with removing it for both automatic and manual failovers. Quote from Ping during the review: Note that we already debounce transient primary failures with node timeout, ensuring failover is only triggered after sustained outages. Election timing is naturally staggered by replica spacing, making the likelihood of simultaneous elections from replicas of the same shard very low. The one-vote-per-epoch rule further throttles retries and ensures orderly elections. On top of that, quorum-based primary failure confirmation, cluster-state convergence, and slot ownership validation are all built into the process. Quote from Madelyn during the review: It against the specific primary. It's to prevent double failovers. If a primary just took over we don't want someone else to try to take over and give the new primary some amount of time to take over. I have not seen this issue though, it might have been over optimizing? The double failure mode, where a node fails and then another node fails within the nodetimeout also doesn't seem that common either though. So the conclusion is that we all agreed to remove it completely, it will make the code a lot simpler. And if there is other specific edge cases we are missing, we will fix it in other way. See discussion #1305 for more information. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-12-15 12:09:53 +08:00
Rain Valentine	88942c8e61	Replace dict with new hashtable for sets datatype (#1176 ) The new `hashtable` provides faster lookups and uses less memory than `dict`. A TCL test case "SRANDMEMBER with a dict containing long chain" is deleted because it's covered by a hashtable unit test "test_random_entry_with_long_chain", which is already present. This change also moves some logic from dismissMemory (object.c) to zmadvise_dontneed (zmalloc.c), so the hashtable implementation which needs the dismiss functionality doesn't need to depend on object.c and server.h. This PR follows #1186. --------- Signed-off-by: Rain Valentine <rsg000@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-12-14 20:53:48 +01:00
Madelyn Olson	0e96bb311e	Synchronously delete data during defrag tests (#1443 ) The creation of fragmentation is delayed when we use lazy-free. You can induce some of the active-defrag tests to fail by artificially adding a delay in the lazyfree process, similar to the issues seen in #1433 and issues like https://github.com/valkey-io/valkey/actions/runs/12267010712/job/34226304803#step:7:6538. The solution is to always do sync free during tests. Might close https://github.com/valkey-io/valkey/issues/1433. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-12-14 19:14:01 +01:00
Jim Brunner	32f2c73cb5	defrag: eliminate persistent kvstore pointer and edge case fixes (#1430 ) This update addresses several issues in defrag: 1. In the defrag redesign (https://github.com/valkey-io/valkey/pull/1242), a bug was introduced where `server.cronloops` was no longer being incremented in the `whileBlockedCron()`. This resulted in some memory statistics not being updated while blocked. 2. In the test case for AOF loading, we were seeing errors due to defrag latencies. However, running the math, the latencies are justified given the extremely high CPU target of the testcase. Adjusted the expected latency check to allow longer latencies for this case where defrag is undergoing starvation while AOF loading is in progress. 3. A "stage" is passed a "target". For the main dictionary and expires, we were passing in a `kvstore*`. However, on flushall or swapdb, the pointer may change. It's safer and more stable to use an index for the DB (a DBID). Then if the pointer changes, we can detect the change, and simply abort the stage. (If there's still fragmentation to deal with, we'll pick it up again on the next cycle.) 4. We always start a new stage on a new defrag cycle. This gives the new stage time to run, and prevents latency issues for certain stages which don't operate incrementally. However, often several stages will require almost no work, and this will leave a chunk of our CPU allotment unused. This is mainly an issue in starvation situations (like AOF loading or LUA script) - where defrag is running infrequently, with a large duty-cycle. This change allows a new stage to be initiated if we still have a standard duty-cycle remaining. (This can happen during starvation situations where the planned duty cycle is larger than the standard cycle. Most likely this isn't a concern for real scenarios, but it was observed in testing.) 5. Minor comment correction in `server.h` Signed-off-by: Jim Brunner <brunnerj@amazon.com>	2024-12-12 14:55:57 -08:00
Pierre	5f7fe9ef21	Send MEET packet to node if there is no inbound link to fix inconsistency when handshake timedout (#1307 ) In some cases, when meeting a new node, if the handshake times out, we can end up with an inconsistent view of the cluster where the new node knows about all the nodes in the cluster, but the cluster does not know about this new node (or vice versa). To detect this inconsistency, we now check if a node has an outbound link but no inbound link, in this case it probably means this node does not know us. In this case we (re-)send a MEET packet to this node to do a new handshake with it. If we receive a MEET packet from a known node, we disconnect the outbound link to force a reconnect and sending of a PING packet so that the other node recognizes the link as belonging to us. This prevents cases where a node could send MEET packets in a loop because it thinks the other node does not have an inbound link. This fixes the bug described in #1251. --------- Signed-off-by: Pierre Turin <pieturin@amazon.com>	2024-12-11 17:26:06 -08:00
Jim Brunner	0c8ad5cd34	defrag: allow defrag to start during AOF loading (#1420 ) Addresses https://github.com/valkey-io/valkey/issues/1393 Changes: * During AOF loading or long running script, this allows defrag to be initiated. * The AOF defrag test was corrected to eliminate the wait period and rely on non-timer invocations. * Logic for "overage" time in defrag was changed. It previously accumulated underage leading to large latencies in extreme tests having very high CPU percentage. After several simple stages were completed during infrequent blocked processing, a large cycle time would be experienced. Signed-off-by: Jim Brunner <brunnerj@amazon.com>	2024-12-11 19:47:06 +02:00
Viktor Söderqvist	3eb8314be6	Replace dict with hashtable for keys, expires and pubsub channels Instead of a dictEntry with pointers to key and value, the hashtable has a pointer directly to the value (robj) which can hold an embedded key and acts as a key-value in the hashtable. This minimizes the number of pointers to follow and thus the number of memory accesses to lookup a key-value pair. Keys robj hashtable +-------+ +-----------------------+ \| 0 \| \| type, encoding, LRU \| \| 1 ------->\| refcount, expire \| \| 2 \| \| ptr \| \| ... \| \| optional embedded key \| +-------+ \| optional embedded val \| +-----------------------+ The expire timestamp (TTL) is also stored in the robj, if any. The expire hash table points to the same robj. Overview of changes: * Replace dict with hashtable in kvstore (kvstore.c) * Add functions for embedding key and expire in robj (object.c) * When there's unused space, reserve an expire field to avoid realloting it later if expire is added. * Always reserve space for expire for large key names to avoid realloc if it's set later. * Update db functions (db.c) * dbAdd, setKey and setExpire reallocate the object when embedding a key * setKey does not increment the reference counter, since it would require duplicating the object. This responsibility is moved to the caller. * Remove logic for shared integer objects as values in the database. The keys are now embedded in the objects, so all objects in the database need to be unique. Thus, we can't use shared objects as values. Also delete test cases for shared integers. * Adjust various commands to the changes mentioned above. * Adjust defrag code * Improvement: Don't access the expires table before defrag has actually reallocated the object. * Adjust test cases that were using hard-coded sizes for dict when realloc would happen, and some other adjustments in test cases. * Adjust memory prefetch for new hash table implementation in IO-threading, using new `hashtableIncrementalFind` API * Adjust offloading of free() to IO threads: Object free to be done in main thread while keeping obj->ptr offloading in IO-thread since the DB object is now allocated by the main-thread and not by the IO-thread as it used to be. * Let expireIfNeeded take an optional value, to avoid looking up the expires table when possible. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Uri Yagelnik <uriy@amazon.com>	2024-12-10 21:30:56 +01:00
Viktor Söderqvist	b4c2a1804a	Fix flaky init_test proc in maxmemory test suite (#1419 ) The following error has been seen, but not reliably reproduced: ``` *** [err]: eviction due to output buffers of pubsub, client eviction: true in tests/unit/maxmemory.tcl Expected '42' to be equal to '50' (context: type proc line 17 cmd {assert_equal [r dbsize] 50} proc ::init_test level 2) ``` The reason is probably that FLUSHDB is asynchronous and when we start populating new keys, they are evicted because the background flush is too slow. Changing this to FLUSHDB SYNC prevents this. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-12-10 20:52:06 +02:00
Viktor Söderqvist	2dfe25b408	Fix race in test "CLUSTER SLOT-STATS cpu-usec for blocking commands, unblocked on timeout" (#1416 ) This fix changes the timeout for BLPOP in this test case from 1 second to 0.5 seconds. In the test case quoted below, the procedure `wait_for_blocked_clients_count` waits for one second by default. If BLPOP has 1 second timeout and the first `wait_for_blocked_clients_count` finishes very fast, then the second `wait_for_blocked_clients_count` can time out before the BLPOP has been unblocked. ```TCL test "CLUSTER SLOT-STATS cpu-usec for blocking commands, unblocked on timeout." { # Blocking command with 1 second timeout. set rd [valkey_deferring_client] $rd BLPOP $key 1 # Confirm that the client is blocked, then unblocked after 1 second timeout. wait_for_blocked_clients_count 1 wait_for_blocked_clients_count 0 ``` As seen in the definition of `wait_for_blocked_clients_count`, the total time to wait is 1 second by default. ```TCL proc wait_for_blocked_clients_count {count {maxtries 100} {delay 10} {idx 0}} { wait_for_condition $maxtries $delay { [s $idx blocked_clients] == $count } else { fail "Timeout waiting for blocked clients" } } ``` Fixes #1121 Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-12-10 14:46:21 +01:00
Yanqi Lv	f951a1ca73	Add new flag in `CLIENT LIST` for import-source client (#1398 ) - Add new flag "I" in `CLIENT LIST` for import-source client - Add `DEBUG_CONFIG` for import-mode - Allow import-source status to be turned off when import-mode is off Fixes #1350 and https://github.com/valkey-io/valkey/pull/1185#discussion_r1851049362. --------- Signed-off-by: lvyanqi.lyq <lvyanqi.lyq@alibaba-inc.com> Signed-off-by: Yanqi Lv <lvyanqi.lyq@alibaba-inc.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Binbin <binloveplay1314@qq.com>	2024-12-10 13:35:07 +01:00
Sarthak Aggarwal	9cfe1b3d81	Set Command with IFEQ Support (#1324 ) This PR allows the Valkey users to perform conditional updates where the SET command is completed if the given comparison-value matches the key’s current value. Syntax: ``` SET key value IFEQ comparison-value ``` Behavior: If the values match, the SET completes as expected. If they do not match, the command returns a (nil), except if the GET argument is also given (see below). Behavior with Additional Flags: 1. ```SET key value IFEQ comparison-value GET``` returns the existing value, regardless of whether it matches comparison-value or not. The conditional set operation is performed if the given comparison value matches the existing value. To check if the SET succeeded, the caller needs to check if the returned string matches the comparison-value. 2. ```SET key value IFEQ comparison-value XX``` is a syntax error. 3. ```SET key value IFEQ comparison-value NX``` is a syntax error. Closes: #1215 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2024-12-10 12:54:49 +01:00
Binbin	924729eb16	Fix the election was reset wrongly before failover epoch was obtained (#1339 ) After #1009, we will reset the election when we received a claim with an equal or higher epoch since a node can win an election in the past. But we need to consider the time before the node actually obtains the failover_auth_epoch. The failover_auth_epoch default is 0, so before the node actually get the failover epoch, we might wrongly reset the election. This is probably harmless, but will produce misleading log output and may delay election by a cron cycle or beforesleep. Now we will only reset the election when a node is actually obtains the failover epoch. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-12-09 16:19:02 +08:00
Roman Gershman	b09db3ef78	Fix typo in streams seen-time / active-time test (#1409 ) This variable name is wrong, it causes the wrong variable to be asserted. Signed-off-by: Roman Gershman <romange@gmail.com>	2024-12-09 16:01:43 +08:00
Guillaume Koenig	e8078b7315	Allow MEMORY MALLOC-STATS and MEMORY PURGE during loading phase (#1317 ) - Enable investigation of memory issues during loading - Previously, all memory commands were rejected with LOADING error (except memory help) - `MEMORY MALLOC-STATS` and `MEMORTY PURGE` are now allowed as they don't depend on the dataset - `MEMORY STATS` and `MEMORY USAGE KEY` remain disallowed Fixes #1299 Signed-off-by: Guillaume Koenig <knggk@amazon.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2024-12-08 20:30:07 +08:00
Binbin	176fafcaf7	Add a note to conf about the dangers of modifying dir at runtime (#887 ) We've had security issues in the past with it, which is why we marked it as PROTECTED. But, modifying during runtime is also a dangerous action. For example, when child processes are running, persistent temp files and log files may have unexpected effects. A scenario for modifying dir at runtime is to migrate a disk failure, such as using disk-based replication to migrate a node, writing nodes.conf to save the cluster configuration. We decided to leave it as is and add a note in the conf about the dangers of modifying dir at runtime. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-12-08 20:28:14 +08:00
Viktor Söderqvist	a2fe6af457	Fix Module Update Args test when other modules are loaded (#1403 ) Fixes #1400 Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-12-07 10:25:40 +01:00
Wen Hui	71560a2a4a	Add API UpdateRuntimeArgs for updating the module arguments during runtime (#1041 ) Before Redis OSS 7, if we load a module with some arguments during runtime, and run the command "config rewrite", the module information will not be saved into the config file. Since Redis OSS 7 and Valkey 7.2, if we load a module with some arguments during runtime, the module information (path, arguments number, and arguments value) can be saved into the config file after config rewrite command is called. Thus, the module will be loaded automatically when the server startup next time. Following is one example: bind 172.25.0.58 port 7000 protected-mode no enable-module-command yes Generated by CONFIG REWRITE latency-tracking-info-percentiles 50 99 99.9 dir "/home/ubuntu/valkey" save 3600 1 300 100 60 10000 user default on nopass sanitize-payload ~* &* +https://github.com/ALL loadmodule tests/modules/datatype.so 10 20 However, there is one problem. If developers write a module, and update the running arguments by someway, the updated arguments can not be saved into the config file even "config rewrite" is called. The reason comes from the following function rewriteConfigLoadmoduleOption (src/config.c) void rewriteConfigLoadmoduleOption(struct rewriteConfigState state) { .......... struct ValkeyModule module = dictGetVal(de); line = sdsnew("loadmodule "); line = sdscatsds(line, module->loadmod->path); for (int i = 0; i < module->loadmod->argc; i++) { line = sdscatlen(line, " ", 1); line = sdscatsds(line, module->loadmod->argv[i]->ptr); } rewriteConfigRewriteLine(state, "loadmodule", line, 1); ....... } The function only save the initial arguments information (module->loadmod) into the configfile. After core members discuss, ref https://github.com/valkey-io/valkey/issues/1177 We decide add the following API to implement this feature: Original proposal: int VM_UpdateRunTimeArgs(ValkeyModuleCtx ctx, int index, char value); Updated proposal: ValkeyModuleString *values VM_GetRuntimeArgs(ValkeyModuleCtx ctx); *int VM_UpdateRuntimeArgs(ValkeyModuleCtx ctx, int argc, ValkeyModuleString **values); Why we do not recommend the following way: MODULE UNLOAD Update module args in the conf file MODULE LOAD I think there are the following disadvantages: 1. Some modules can not be unloaded. Such as the example module datatype.so, which is tests/modules/datatype.so 2. it is not atomic operation for MODULE UNLOAD + MODULE LOAD 3. sometimes, if we just run the module unload, the client business could be interrupted --------- Signed-off-by: hwware <wen.hui.ware@gmail.com>	2024-12-05 11:58:24 -05:00
uriyage	9f8b174c2e	Optimize IO thread offload for modified argv (#1360 ) ### Improve expired commands performance with IO threads #### Background In our IO threads architecture, IO threads allocate client argv's and later when we free it after processCommand we offload its free to the IO threads. With jemalloc, it's crucial that the same thread that allocates memory also frees it. For some commands we modify the client's argv in the main thread during command processing (for example in `SET EX` command we rewrite the command to use absolute time for replication propagation). #### Current issues 1. When commands are rewritten (e.g., expire commands), we store the original argv in `c->original_argv`. However, we're currently: - Freeing new argv (allocated by main thread) in IO threads - Freeing original argv (allocated by IO threads) in main thread 2. Currently, `c->original_argv` points to new array with old objects, while `c->argv` has old array with new objects, making memory free management complicated. #### Changes 1. Refactored argv modification handling code to ensure consistency - both array and objects are now either all new or all old 2. Moved original_argv cleanup to happen in resetClient after argv cleanup 3. Modified IO threads code to properly handle original argv cleanup when argv are modified. #### Performance Impact Benchmark with `SET EX` commands (650 clients, 512 byte value, 8 IO threads): - New implementation: 729,548 ops/sec - Old implementation: 633,243 ops/sec Representing a ~15% performance improvement due to more efficient memory handling. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com> Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>	2024-12-03 19:20:31 +02:00
Jim Brunner	397201c48f	Refactor of ActiveDefrag to reduce latencies (#1242 ) Refer to: https://github.com/valkey-io/valkey/issues/1141 This update refactors the defrag code to: * Make the overall code more readable and maintainable * Reduce latencies incurred during defrag processing With this update, the defrag cycle time is reduced to 500us, with more frequent cycles. This results in much more predictable latencies, with a dramatic reduction in tail latencies. (See https://github.com/valkey-io/valkey/issues/1141 for more complete details.) This update is focused mostly on the high-level processing, and does NOT address lower level functions which aren't currently timebound (e.g. `activeDefragSdsDict()`, and `moduleDefragGlobals()`). These are out of scope for this update and left for a future update. I fixed `kvstoreDictLUTDefrag` because it was using up to 7ms on a CME single shard. See original github issue for performance details. --------- Signed-off-by: Jim Brunner <brunnerj@amazon.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-12-03 08:42:29 -08:00

1 2 3 4 5 ...

1609 Commits