futriix

Author	SHA1	Message	Date
Rain Valentine	88942c8e61	Replace dict with new hashtable for sets datatype (#1176 ) The new `hashtable` provides faster lookups and uses less memory than `dict`. A TCL test case "SRANDMEMBER with a dict containing long chain" is deleted because it's covered by a hashtable unit test "test_random_entry_with_long_chain", which is already present. This change also moves some logic from dismissMemory (object.c) to zmadvise_dontneed (zmalloc.c), so the hashtable implementation which needs the dismiss functionality doesn't need to depend on object.c and server.h. This PR follows #1186. --------- Signed-off-by: Rain Valentine <rsg000@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-12-14 20:53:48 +01:00
Viktor Söderqvist	3eb8314be6	Replace dict with hashtable for keys, expires and pubsub channels Instead of a dictEntry with pointers to key and value, the hashtable has a pointer directly to the value (robj) which can hold an embedded key and acts as a key-value in the hashtable. This minimizes the number of pointers to follow and thus the number of memory accesses to lookup a key-value pair. Keys robj hashtable +-------+ +-----------------------+ \| 0 \| \| type, encoding, LRU \| \| 1 ------->\| refcount, expire \| \| 2 \| \| ptr \| \| ... \| \| optional embedded key \| +-------+ \| optional embedded val \| +-----------------------+ The expire timestamp (TTL) is also stored in the robj, if any. The expire hash table points to the same robj. Overview of changes: * Replace dict with hashtable in kvstore (kvstore.c) * Add functions for embedding key and expire in robj (object.c) * When there's unused space, reserve an expire field to avoid realloting it later if expire is added. * Always reserve space for expire for large key names to avoid realloc if it's set later. * Update db functions (db.c) * dbAdd, setKey and setExpire reallocate the object when embedding a key * setKey does not increment the reference counter, since it would require duplicating the object. This responsibility is moved to the caller. * Remove logic for shared integer objects as values in the database. The keys are now embedded in the objects, so all objects in the database need to be unique. Thus, we can't use shared objects as values. Also delete test cases for shared integers. * Adjust various commands to the changes mentioned above. * Adjust defrag code * Improvement: Don't access the expires table before defrag has actually reallocated the object. * Adjust test cases that were using hard-coded sizes for dict when realloc would happen, and some other adjustments in test cases. * Adjust memory prefetch for new hash table implementation in IO-threading, using new `hashtableIncrementalFind` API * Adjust offloading of free() to IO threads: Object free to be done in main thread while keeping obj->ptr offloading in IO-thread since the DB object is now allocated by the main-thread and not by the IO-thread as it used to be. * Let expireIfNeeded take an optional value, to avoid looking up the expires table when possible. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Uri Yagelnik <uriy@amazon.com>	2024-12-10 21:30:56 +01:00
Sarthak Aggarwal	9cfe1b3d81	Set Command with IFEQ Support (#1324 ) This PR allows the Valkey users to perform conditional updates where the SET command is completed if the given comparison-value matches the key’s current value. Syntax: ``` SET key value IFEQ comparison-value ``` Behavior: If the values match, the SET completes as expected. If they do not match, the command returns a (nil), except if the GET argument is also given (see below). Behavior with Additional Flags: 1. ```SET key value IFEQ comparison-value GET``` returns the existing value, regardless of whether it matches comparison-value or not. The conditional set operation is performed if the given comparison value matches the existing value. To check if the SET succeeded, the caller needs to check if the returned string matches the comparison-value. 2. ```SET key value IFEQ comparison-value XX``` is a syntax error. 3. ```SET key value IFEQ comparison-value NX``` is a syntax error. Closes: #1215 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2024-12-10 12:54:49 +01:00
Roman Gershman	b09db3ef78	Fix typo in streams seen-time / active-time test (#1409 ) This variable name is wrong, it causes the wrong variable to be asserted. Signed-off-by: Roman Gershman <romange@gmail.com>	2024-12-09 16:01:43 +08:00
Binbin	5693fe4664	Fix set expire test due to the new lazyfree configs changes (#980 ) Test failed because these two PRs #865 and #913. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-02 22:43:09 +08:00
Binbin	70624ea63d	Change all the lazyfree configurations to yes by default (#913 ) ## Set replica-lazy-flush and lazyfree-lazy-user-flush to yes by default. There are many problems with running flush synchronously. Even in single CPU environments, the thread managers should balance between the freeing and serving incoming requests. ## Set lazy eviction, expire, server-del, user-del to yes by default We now have a del and a lazyfree del, we also have these configuration items to control: lazyfree-lazy-eviction, lazyfree-lazy-expire, lazyfree-lazy-server-del, lazyfree-lazy-user-del. In most cases lazyfree is better since it reduces the risk of blocking the main thread, and because we have lazyfreeGetFreeEffort, on those with high effor (currently 64) will use lazyfree. Part of #653. --------- Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-02 07:07:17 -07:00
Binbin	e3af1a30e4	Fast path in SET if the expiration time is expired (#865 ) If the expiration time passed in SET is expired, for example, it has expired due to the machine time (DTS) or the expiration time passed in (wrong arg). In this case, we don't need to set the key and wait for the active expire scan before deleting the key. Compared with previous changes: 1. If the key does not exist, previously we would set the key and wait for the active expire to delete it, so it is a set + del from the perspective of propaganda. Now we will no set the key and return, so it a NOP. 2. If the key exists, previously we woule set the key and wait for the active expire to delete it, so it is a set + del From the perspective of propaganda. Now we will delete it and return, so it is a del. Adding a new deleteExpiredKeyFromOverwriteAndPropagate function to reduce the duplicate code. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-08-31 22:39:07 +08:00
uriyage	bbfd041895	Async IO threads (#758 ) This PR is 1 of 3 PRs intended to achieve the goal of 1 million requests per second, as detailed by [dan touitou](https://github.com/touitou-dan) in https://github.com/valkey-io/valkey/issues/22. This PR modifies the IO threads to be fully asynchronous, which is a first and necessary step to allow more work offloading and better utilization of the IO threads. ### Current IO threads state: Valkey IO threads were introduced in Redis 6.0 to allow better utilization of multi-core machines. Before this, Redis was single-threaded and could only use one CPU core for network and command processing. The introduction of IO threads helps in offloading the IO operations to multiple threads. Current IO Threads flow: 1. Initialization: When Redis starts, it initializes a specified number of IO threads. These threads are in addition to the main thread, each thread starts with an empty list, the main thread will populate that list in each event-loop with pending-read-clients or pending-write-clients. 2. Read Phase: The main thread accepts incoming connections and reads requests from clients. The reading of requests are offloaded to IO threads. The main thread puts the clients ready-to-read in a list and set the global io_threads_op to IO_THREADS_OP_READ, the IO threads pick the clients up, perform the read operation and parse the first incoming command. 3. Command Processing: After reading the requests, command processing is still single-threaded and handled by the main thread. 4. Write Phase: Similar to the read phase, the write phase is also be offloaded to IO threads. The main thread prepares the response in the clients’ output buffer then the main thread puts the client in the list, and sets the global io_threads_op to the IO_THREADS_OP_WRITE. The IO threads then pick the clients up and perform the write operation to send the responses back to clients. 5. Synchronization: The main-thread communicate with the threads on how many jobs left per each thread with atomic counter. The main-thread doesn’t access the clients while being handled by the IO threads. Issues with current implementation: * Underutilized Cores: The current implementation of IO-threads leads to the underutilization of CPU cores. * The main thread remains responsible for a significant portion of IO-related tasks that could be offloaded to IO-threads. * When the main-thread is processing client’s commands, the IO threads are idle for a considerable amount of time. * Notably, the main thread's performance during the IO-related tasks is constrained by the speed of the slowest IO-thread. * Limited Offloading: Currently, Since the Main-threads waits synchronously for the IO threads, the Threads perform only read-parse, and write operations, with parsing done only for the first command. If the threads can do work asynchronously we may offload more work to the threads reducing the load from the main-thread. * TLS: Currently, we don't support IO threads with TLS (where offloading IO would be more beneficial) since TLS read/write operations are not thread-safe with the current implementation. ### Suggested change Non-blocking main thread - The main thread and IO threads will operate in parallel to maximize efficiency. The main thread will not be blocked by IO operations. It will continue to process commands independently of the IO thread's activities. Implementation details Inter-thread communication. * We use a static, lock-free ring buffer of fixed size (2048 jobs) for the main thread to send jobs and for the IO to receive them. If the ring buffer fills up, the main thread will handle the task itself, acting as back pressure (in case IO operations are more expensive than command processing). A static ring buffer is a better candidate than a dynamic job queue as it eliminates the need for allocation/freeing per job. * An IO job will be in the format: ` [void* function-call-back \| void data] `where data is either a client to read/write from and the function-ptr is the function to be called with the data for example readQueryFromClient using this format we can use it later to offload other types of works to the IO threads. The Ring buffer is one way from the main-thread to the IO thread, Upon read/write event the main thread will send a read/write job then in before sleep it will iterate over the pending read/write clients to checking for each client if the IO threads has already finished handling it. The IO thread signals it has finished handling a client read/write by toggling an atomic flag read_state / write_state on the client struct. Thread Safety As suggested in this solution, the IO threads are reading from and writing to the clients' buffers while the main thread may access those clients. We must ensure no race conditions or unsafe access occurs while keeping the Valkey code simple and lock free. Minimal Action in the IO Threads The main change is to limit the IO thread operations to the bare minimum. The IO thread will access only the client's struct and only the necessary fields in this struct. The IO threads will be responsible for the following: * Read Operation: The IO thread will only read and parse a single command. It will not update the server stats, handle read errors, or parsing errors. These tasks will be taken care of by the main thread. * Write Operation: The IO thread will only write the available data. It will not free the client's replies, handle write errors, or update the server statistics. To achieve this without code duplication, the read/write code has been refactored into smaller, independent components: * Functions that perform only the read/parse/write calls. * Functions that handle the read/parse/write results. This refactor accounts for the majority of the modifications in this PR. Client Struct Safe Access As we ensure that the IO threads access memory only within the client struct, we need to ensure thread safety only for the client's struct's shared fields. * Query Buffer * Command parsing - The main thread will not try to parse a command from the query buffer when a client is offloaded to the IO thread. * Client's memory checks in client-cron - The main thread will not access the client query buffer if it is offloaded and will handle the querybuf grow/shrink when the client is back. * CLIENT LIST command - The main thread will busy-wait for the IO thread to finish handling the client, falling back to the current behavior where the main thread waits for the IO thread to finish their processing. * Output Buffer * The IO thread will not change the client's bufpos and won't free the client's reply lists. These actions will be done by the main thread on the client's return from the IO thread. * bufpos / block→used: As the main thread may change the bufpos, the reply-block→used, or add/delete blocks to the reply list while the IO thread writes, we add two fields to the client struct: io_last_bufpos and io_last_reply_block. The IO thread will write until the io_last_bufpos, which was set by the main-thread before sending the client to the IO thread. If more data has been added to the cob in between, it will be written in the next write-job. In addition, the main thread will not trim or merge reply blocks while the client is offloaded. * Parsing Fields * Client's cmd, argc, argv, reqtype, etc., are set during parsing. * The main thread will indicate to the IO thread not to parse a cmd if the client is not reset. In this case, the IO thread will only read from the network and won't attempt to parse a new command. * The main thread won't access the c→cmd/c→argv in the CLIENT LIST command as stated before it will busy wait for the IO threads. * Client Flags * c→flags, which may be changed by the main thread in multiple places, won't be accessed by the IO thread. Instead, the main thread will set the c→io_flags with the information necessary for the IO thread to know the client's state. * Client Close * On freeClient, the main thread will busy wait for the IO thread to finish processing the client's read/write before proceeding to free the client. * Client's Memory Limits * The IO thread won't handle the qb/cob limits. In case a client crosses the qb limit, the IO thread will stop reading for it, letting the main thread know that the client crossed the limit. TLS TLS is currently not supported with IO threads for the following reasons: 1. Pending reads - If SSL has pending data that has already been read from the socket, there is a risk of not calling the read handler again. To handle this, a list is used to hold the pending clients. With IO threads, multiple threads can access the list concurrently. 2. Event loop modification - Currently, the TLS code registers/unregisters the file descriptor from the event loop depending on the read/write results. With IO threads, multiple threads can modify the event loop struct simultaneously. 3. The same client can be sent to 2 different threads concurrently (https://github.com/redis/redis/issues/12540). Those issues were handled in the current PR: 1. The IO thread only performs the read operation. The main thread will check for pending reads after the client returns from the IO thread and will be the only one to access the pending list. 2. The registering/unregistering of events will be similarly postponed and handled by the main thread only. 3. Each client is being sent to the same dedicated thread (c→id % num_of_threads). Sending Replies Immediately with IO threads. Currently, after processing a command, we add the client to the pending_writes_list. Only after processing all the clients do we send all the replies. Since the IO threads are now working asynchronously, we can send the reply immediately after processing the client’s requests, reducing the command latency. However, if we are using AOF=always, we must wait for the AOF buffer to be written, in which case we revert to the current behavior. IO threads dynamic adjustment Currently, we use an all-or-nothing approach when activating the IO threads. The current logic is as follows: if the number of pending write clients is greater than twice the number of threads (including the main thread), we enable all threads; otherwise, we enable none. For example, if 8 IO threads are defined, we enable all 8 threads if there are 16 pending clients; else, we enable none. It makes more sense to enable partial activation of the IO threads. If we have 10 pending clients, we will enable 5 threads, and so on. This approach allows for a more granular and efficient allocation of resources based on the current workload. In addition, the user will now be able to change the number of I/O threads at runtime. For example, when decreasing the number of threads from 4 to 2, threads 3 and 4 will be closed after flushing their job queues. Tests Currently, we run the io-threads tests with 4 IO threads (`443d80f168/.github/workflows/daily.yml (L353)`). This means that we will not activate the IO threads unless there are 8 (threads * 2) pending write clients per single loop, which is unlikely to happened in most of tests, meaning the IO threads are not currently being tested. To enforce the main thread to always offload work to the IO threads, regardless of the number of pending events, we add an events-per-io-thread configuration with a default value of 2. When set to 0, this configuration will force the main thread to always offload work to the IO threads. When we offload every single read/write operation to the IO threads, the IO-threads are running with 100% CPU when running multiple tests concurrently some tests fail as a result of larger than expected command latencies. To address this issue, we have to add some after or wait_for calls to some of the tests to ensure they pass with IO threads as well. Signed-off-by: Uri Yagelnik <uriy@amazon.com>	2024-07-08 20:01:39 -07:00
jonghoonpark	c090874ed4	List test files dynamically (#313 ) Motivation: Currently we have to manually update the all_tests variable when introducing new test files. Fix: I've modified it to list test files dynamically, but rather than modify it to add all test files, I've first modified it to only add test files from the following 4 paths so that it doesn't deviate too much from what we already do - unit - unit/type - unit/cluster - integration Related issue: https://github.com/valkey-io/valkey/issues/302 --------- Signed-off-by: jonghoonpark <dev@jonghoonpark.com>	2024-04-15 14:25:33 +02:00
Shivshankar	a054862b72	Rename redis_client* procedure to valkey_client* in test environment (#276 ) Renamed redis-client* procedure to valkey_client* Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>	2024-04-10 10:18:47 -04:00
Shivshankar	da831c0d22	rename procedure redis_deferring_client to valkey_deferring_client (#270 ) Updated procedure redis_deferring_client in test environent to valkey_deferring_client. Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>	2024-04-09 10:38:09 -04:00
Jacob Murphy	df5db0627f	Remove trademarked language in code comments (#223 ) This includes comments used for module API documentation. * Strategy for replacement: Regex search: `(//\|/\\| \\|#).* ("\|$)?(r\|R)edis( \|\. \|'\|\n\|,\|-\|$\|")(?!nor the names of its contributors)(?!Ltd.)(?!Labs)(?!Contributors.)` * Don't edit copyright comments * Replace "Redis version X.X" -> "Redis OSS version X.X" to distinguish from newly licensed repository * Replace "Redis Object" -> "Object" * Exclude markdown for now * Don't edit Lua scripting comments referring to redis.X API * Replace "Redis Protocol" -> "RESP" * Replace redis-benchmark, -cli, -server, -check-aof/rdb with "valkey-" prefix * Most other places, I use best judgement to either remove "Redis", or replace with "the server" or "server" Fixes #148 --------- Signed-off-by: Jacob Murphy <jkmurphy@google.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-04-09 10:24:03 +02:00
John Vandenberg	253fe9dced	Fix typos and replace 'codespell' with 'typos' (#72 ) Uses https://github.com/taiki-e/install-action to install https://github.com/crate-ci/typos in CI This finds many more/different typos than https://github.com/codespell-project/codespell , while having very few false positives. Signed-off-by: John Vandenberg <jayvdb@gmail.com>	2024-03-31 12:38:22 -07:00
Yanqi Lv	bad33f8738	fix wrong data type conversion in zrangeResultBeginStore (#13148 ) In `beginResultEmission`, -1 means the result length is not known in advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`. Although `dictExpand` won't succeed because the size overflows, I think we'd better to avoid this wrong conversion. This bug can be triggered when the source of `zrangestore` doesn't exist or we use `zrangestore` command with `byscore` or `bylex`. The impact is that dst keys will be converted to use skiplist instead of listpack.	2024-03-19 08:52:55 +02:00
Ronen Kalish	a8e745117f	Xread last entry in stream (#7388 ) (#13117 ) Allow using `+` as a special ID for last item in stream on XREAD command. This would allow to iterate on a stream with XREAD starting with the last available message instead of the next one which `$` is used for. I.e. the caller can use `BLOCK` and `+` on the first call, and change to `$` on the next call. Closes #7388 --------- Co-authored-by: Felipe Machado <462154+felipou@users.noreply.github.com>	2024-03-13 08:23:32 +02:00
Binbin	f17381a38d	Fix propagation of entries_read by calling streamPropagateGroupID unconditionally (#12898 ) In XREADGROUP ACK, because streamPropagateXCLAIM does not propagate entries-read, entries-read will be inconsistent between master and replicas. I.e. if no entries were claimed, it would have propagated correctly, but if some were claimed, then the entries-read field would be inconsistent on the replica. The fix was suggested by guybe7, call streamPropagateGroupID unconditionally, so that we will normalize entries_read on the replicas. In the past, we would only set propagate_last_id when NOACK was specified. And in #9127, XCLAIM did not propagate entries_read in ACK, which would cause entries_read to be inconsistent between master and replicas. Another approach is add another arg to XCLAIM and let it propagate entries_read, but we decided not to use it. Because we want minimal damage in case there's an old target and new source (in the worst case scenario, the new source doesn't recognize XGROUP SETID ... ENTRIES READ and the lag is lost. If we change XCLAIM, the damage is much more severe). In this patch, now if the user uses XREADGROUP .. COUNT 1 there will be an additional overhead of MULTI, EXEC and XGROUPSETID. We assume the extra command in case of COUNT 1 (4x factor, changing from one XCLAIM to MULTI+XCLAIM+XSETID+EXEC), is probably ok since reading just one entry is in any case very inefficient (a client round trip per record), so we're hoping it's not a common case. Issue was introduced in #9127.	2024-02-29 09:48:20 +02:00
debing.sun	165afc5f2a	Determine the large limit of the quicklist node based on fill (#12659 ) Following #12568 In issue #9357, when inserting an element larger than 1GB, we currently store it in a plain node instead of a listpack. Presently, when we insert an element that exceeds the maximum size of a packed node, it cannot be accommodated in any other nodes, thus ending up isolated like a large element. I.e. it's a node with only one element, but it's listpack encoded rather than a plain buffer. This PR lowers the threshold for considering an element as 'large' from 1GB to the maximum size of a node. While this change doesn't completely resolve the bug mentioned in the previous PR, it does mitigate its potential impact. As a result of this change, we can now only use LSET to replace an element with another element that falls below the maximum size threshold. In the worst-case scenario, with a fill of -5, the largest packed node we can create is 2GB (32k * 64k): * 32k: The smallest element in a listpack is 2 bytes, which allows us to store up to 32k elements. * 64k: This is the maximum size for a single quicklist node. ## Others To fully fix #9357, we need more work, as discussed in #12568, when we insert an element into a quicklistNode, it may be created in a new node, put into another node, or merged, and we can't correctly delete the node that was supposed to be deleted. I'm not sure it's worth it, since it involves a lot of modifications.	2024-02-22 10:02:38 +02:00
Binbin	32f44da510	Increase tolerance range to block reprocess tests to avoid timing issues (#13053 ) These tests have all failed in daily CI: ``` * [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) * [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) *** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) ``` Increase the range to avoid failures, and improve the comment to be clearer. tests was introduced in #13004.	2024-02-15 10:44:49 +02:00
debing.sun	1e8dc1da0d	Fix crash due to merge of quicklist node introduced by #12955 (#13040 ) Fix two crash introducted by #12955 When a quicklist node can't be inserted and split, we eventually merge the current node with its neighboring nodes after inserting, and compress the current node and its siblings. 1. When the current node is merged with another node, the current node may become invalid and can no longer be used. Solution: let `_quicklistMergeNodes()` return the merged nodes. 3. If the current node is a LZF quicklist node, its recompress will be 1. If the split node can be merged with a sibling node to become head or tail, recompress may cause the head and tail to be compressed, which is not allowed. Solution: always recompress to 0 after merging.	2024-02-08 14:29:16 +02:00
debing.sun	1f00c951c2	Prevent LSET command from causing quicklist plain node size to exceed 4GB (#12955 ) Fix #12864 The main reason for this crash is that when replacing a element of a quicklist packed node with lpReplace() method, if the final size is larger than 4GB, lpReplace() will fail and returns NULL, causing `node->entry` to be incorrectly set to NULL. Since the inserted data is not a large element, we can't just replace it like a large element, first quicklistInsertAfter() and then quicklistDelIndex(), because the current node may be merged and invalidated in quicklistInsertAfter(). The solution of this PR: When replacing a node fails (listpack exceeds 4GB), split the current node, create a new node to put in the middle, and try to merge them. This is the same as inserting a large element. In the worst case, its size will not exceed 4GB.	2024-02-06 18:21:28 +02:00
Binbin	492021db95	Fix blocking commands timeout is reset due to re-processing command (#13004 ) In #11012, we will reprocess command when client is unblocked on keys, in some blocking commands, for example, in the XREADGROUP BLOCK scenario, because of the re-processing command, we will recalculate the block timeout, causing the blocking time to be reset. This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly let the command know that it is being re-processed, later in blockForKeys we will not reset the timeout. Affected BLOCK cases: - list / zset / stream, added test cases for each. Unaffected cases: - module (never re-process the commands). - WAIT / WAITAOF (never re-process the commands). Fixes #12998.	2024-01-30 11:32:59 +02:00
Roshan Khatri	5358bd7cdd	Reduce performance impact of dict rehashing and make it shorter. (#12899 ) #### Problem Statement: For any read/update operation during rehashing, we're doing ~10+ random DRAM lookups to do the rehashing, as we are using the `rehashidx` to rehash 10 buckets, whose dict entries most likely aren't cached in the CPU or near the bucket we are operating on. If these random bucket are empty, the rehashing process during that command execution is skipped. #### Implementation: For reducing the performance recession while dict is rehashing, we determine the index at which the key would be stored in the 0th HT, we check if that index has already been rehashed, if not we will rehash the bucket containing the key and the bucket will be moved from 0th HT to the 1st HT. If the key has already been rehashed, we perform the random access bucket rehash (using `rehashidx`) and we again verify if rehashing is still ongoing and look up the key in the respective HT. This ensures rehashing is not skipped in any command call and that we rehash a particular bucket or random bucket in each call. #### Changes in this PR: - Added a new method `dictBucketRehash` to perform rehash on a single bucket. - Helper function `moveKeysInBucketOldtoNew` for `dictRehash` and `dictBucketRehash` to move all the keys in a bucket from the old to the new hash HT. - Helper function `verifyMoreRehashRequired` for `dictRehash` and `dictBucketRehash` to check if we have already rehashed the whole table and if more rehashing is required. ### Benchmark: - This PR still shows ~13% improvement in the latency during rehashing. - Rehashing is now ~2% faster for this PR when compared to unstable. --------- Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>	2024-01-27 11:11:53 +02:00
Wen Hui	685409139b	Add INCR type command against wrong argument test cases. (#12836 ) We have test cases for incr related commands with no key exist and spaces in key and wrong type of key. However, we dont have test cases covered for INCRBY INCRBYFLOAT DECRBY INCR DECR HINCRBY HINCRBYFLOAT ZINCRBY with valid key and invalid value as argument, and float value to incrby and decrby. So added test cases for the scenarios in incr.tcl. Thank you!	2024-01-23 15:39:38 +02:00
Yanqi Lv	e2b7932b34	Shrink dict when deleting dictEntry (#12850 ) When we insert entries into dict, it may autonomously expand if needed. However, when we delete entries from dict, it doesn't shrink to the proper size. If there are few entries in a very large dict, it may cause huge waste of memory and inefficiency when iterating. The main keyspace dicts (keys and expires), are shrinked by cron (`tryResizeHashTables` calls `htNeedsResize` and `dictResize`), And some data structures such as zset and hash also do that (call `htNeedsResize`) right after a loop of calls to `dictDelete`, But many other dicts are completely missing that call (they can only expand). In this PR, we provide the ability to automatically shrink the dict when deleting. The conditions triggering the shrinking is the same as `htNeedsResize` used to have. i.e. we expand when we're over 100% utilization, and shrink when we're below 10% utilization. Additionally: * Add `dictPauseAutoResize` so that flows that do mass deletions, will only trigger shrinkage at the end. * Rename `dictResize` to `dictShrinkToFit` (same logic as it used to have, but better name describing it) * Rename `_dictExpand` to `_dictResize` (same logic as it used to have, but better name describing it) related to discussion https://github.com/redis/redis/pull/12819#discussion_r1409293878 --------- Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2024-01-15 08:20:53 +02:00
Wen Hui	5dc631d880	Add missing test cases for hash commands (#12851 ) We dont have test for hgetall against key doesnot exist so added the test in test suite and along with this, added wrong type cases for other missing commands.	2023-12-17 14:02:53 +02:00
Binbin	96e9dec419	Bump codespell from 2.2.4 to 2.2.5 (#12557 ) and adjustments.	2023-09-08 16:10:17 +03:00
Chen Tianjie	b26e8e3213	Optimize ZRANGE offset location from linear search to skiplist jump. (#12450 ) ZRANGE BYSCORE/BYLEX with [LIMIT offset count] option was using every level in skiplist to jump to the first/last node in range, but only use level[0] in skiplist to locate the node at offset, resulting in sub-optimal performance using LIMIT: ``` while (ln && offset--) { if (reverse) { ln = ln->backward; } else { ln = ln->level[0].forward; } } ``` It could be slow when offset is very big. We can get the total rank of the offset location and use skiplist to jump to it. It is an improvement from O(offset) to O(log rank). Below shows how this is implemented (if the offset is positve): Use the skiplist to seach for the first element in the range, record its rank `rank_0`, so we can have the rank of the target node `rank_t`. Meanwhile we record the last node we visited which has zsl->level-1 levels and its rank `rank_1`. Then we start from the zsl->level-1 node, use skiplist to go forward `rank_t-rank_1` nodes to reach the target node. It is very similiar when the offset is reversed. Note that if `rank_t` is very close to `rank_0`, we just start from the first element in range and go node by node, this for the case when zsl->level-1 node is to far away and it is quicker to reach the target node by node. Here is a test using a random generated zset including 10000 elements (with different positive scores), doing a bench mark which compares how fast the `ZRANGE` command is exucuted before and after the optimization. The start score is set to 0 and the count is set to 1 to make sure that most of the time is spent on locating the offset. ``` memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test 0 +inf byscore limit <offset> 1" ``` \| offset \| QPS(unstable) \| QPS(optimized) \| \|--------\|--------\|--------\| \| 10 \| 73386.02 \| 74819.82 \| \| 1000 \| 48084.96 \| 73177.73 \| \| 2000 \| 31156.79 \| 72805.83 \| \| 5000 \| 10954.83 \| 71218.21 \| With the result above, we can see that the original code is greatly slowed down when offset gets bigger, and with the optimization the speed is almost not affected. Similiar results are generated when testing reversed offset: ``` memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test +inf 0 byscore rev limit <offset> 1" ``` \| offset \| QPS(unstable) \| QPS(optimized) \| \|--------\|--------\|--------\| \| 10 \| 74505.14 \| 71653.67 \| \| 1000 \| 46829.25 \| 72842.75 \| \| 2000 \| 28985.48 \| 73669.01 \| \| 5000 \| 11066.22 \| 73963.45 \| And the same conclusion is drawn from the tests of ZRANGE BYLEX.	2023-08-31 14:42:08 +03:00
Harkrishn Patro	42985b00ea	Test coverage for incr/decr operation on robj encoding type optimization (#12435 ) Additional test coverage for incr/decr operation. integer number could be present in raw encoding format due to operation like append. A incr/decr operation following it optimize the string to int encoding format.	2023-07-25 16:43:31 -07:00
Harkrishn Patro	34b95f752c	Add test case for APPEND command usage on integer value (#12429 ) Add test coverage to validate object encoding update on APPEND command usage on a integer value	2023-07-24 18:25:50 -07:00
Binbin	6b57241fa8	Revert zrangeGenericCommand negative offset check (#12377 ) The negative offset check was added in #9052, we realized that this is a non-mandatory breaking change and we would like to add it only in 8.0. This reverts PR #9052, will be re-introduced later in 8.0.	2023-07-02 20:38:30 +03:00
Binbin	d9c2ef8a72	zrangeGenericCommand add check for negative offset (#9052 ) Now we will check the offset in zrangeGenericCommand. With a negative offset, we will throw an error and return. This also resolve the issue of zeroing the destination key in case of the "store" variant when we input a negative offset. ``` 127.0.0.1:6379> set key value OK 127.0.0.1:6379> zrangestore key myzset 0 10 byscore limit -1 10 (integer) 0 127.0.0.1:6379> exists key (integer) 0 ``` This change affects the following commands: - ZRANGE / ZRANGESTORE / ZRANGEBYLEX / ZRANGEBYSCORE - ZREVRANGE / ZREVRANGEBYSCORE / ZREVRANGEBYLEX	2023-06-20 14:33:17 +03:00
Binbin	d306d86146	Fix ZRANK/ZREVRANK reply_schema description (#12331 ) The parameter name is WITHSCORE instead of WITHSCORES.	2023-06-20 11:15:40 +03:00
sundb	b00a235186	Use Reservoir Sampling for random sampling of dict, and fix hang during fork (#12276 ) ## Issue: When a dict has a long chain or the length of the chain is longer than the number of samples, we will never be able to sample the elements at the end of the chain using dictGetSomeKeys(). This could mean that SRANDMEMBER can be hang in and endless loop. The most severe case, is the pathological case of when someone uses SCAN+DEL or SSCAN+SREM creating an unevenly distributed dict. This was amplified by the recent change in #11692 which prevented a down-sizing rehashing while there is a fork. ## Solution 1. Before, we will stop sampling when we reach the maximum number of samples, even if there is more data after the current chain. Now when we reach the maximum we use the Reservoir Sampling algorithm to fairly sample the end of the chain that cannot be sampled 2. Fix the rehashing code, so that the same as it allows rehashing for up-sizing during fork when the ratio is extreme, it will allow it for down-sizing as well. Issue was introduced (or became more severe) by #11692 Co-authored-by: Oran Agra <oran@redislabs.com>	2023-06-16 19:13:07 +03:00
Binbin	439b0315c8	Fix SPOP/RESTORE propagation when doing lazy free (#12320 ) In SPOP, when COUNT is greater than or equal to set's size, we will remove the set. In dbDelete, we will do DEL or UNLINK according to the lazy flag. This is also required for propagate. In RESTORE, we won't store expired keys into the db, see #7472. When used together with REPLACE, it should emit a DEL or UNLINK according to the lazy flag. This PR also adds tests to cover the propagation. The RESTORE test will also cover #7472.	2023-06-16 08:14:11 -07:00
Binbin	9dc6f93e9d	Add command being unblocked cause another command to get unblocked execution order test (#12324 ) * Add command being unblocked cause another command to get unblocked execution order test In #12301, we observed that if the `while(listLength(server.ready_keys) != 0)` in handleClientsBlockedOnKeys is changed to `if(listLength(server.ready_keys) != 0)`, the order of command execution will change. It is wrong to change that. It means that if a command being unblocked causes another command to get unblocked (like a BLMOVE would do), then the new unblocked command will wait for later to get processed rather than right away. It'll not have any real implication if we change that since we do call handleClientsBlockedOnKeys in beforeSleep again, and redis will still behave correctly, but we don't change that. An example: 1. $rd1 blmove src{t} dst{t} left right 0 2. $rd2 blmove dst{t} src{t} right left 0 3. $rd3 set key1{t}, $rd3 lpush src{t}, $rd3 set key2{t} in a pipeline The correct order would be: 1. set key1{t} 2. lpush src{t} 3. lmove src{t} dst{t} left right 4. lmove dst{t} src{t} right left 5. set key2{t} The wrong order would be: 1. set key1{t} 2. lpush src{t} 3. lmove src{t} dst{t} left right 4. set key2{t} 5. lmove dst{t} src{t} right left This PR adds corresponding test to cover it. * Add comment near while(listLength(server.ready_keys) != 0)	2023-06-16 15:39:00 +03:00
Wen Hui	1946083410	Removing the duplicate test case (#12310 ) Looks like the Zadd test case was copied to create Zincrby test case ,but missed to change the command.	2023-06-14 10:03:33 +03:00
Binbin	e7129e43e0	Fix XREADGROUP BLOCK stuck in endless loop (#12301 ) For the XREADGROUP BLOCK > scenario, there is an endless loop. Due to #11012, it keep going, reprocess command -> blockForKeys -> reprocess command The right fix is to avoid an endless loop in handleClientsBlockedOnKey and handleClientsBlockedOnKeys, looks like there was some attempt in handleClientsBlockedOnKeys but maybe not sufficiently good, and it looks like using a similar trick in handleClientsBlockedOnKey is complicated. i.e. stashing the list on the stack and iterating on it after creating a fresh one for future use, is problematic since the code keeps accessing the global list. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-06-13 13:27:05 +03:00
Oran Agra	2764dc3768	Optimize MSETNX to avoid double lookup (#11944 ) This is a redo of #11594 which got reverted in #11940 It improves performance by avoiding double lookup of the the key.	2023-05-28 10:58:29 +03:00
Wen Hui	1a188e4ed6	[BUG] Incorrect error msg for XREAD command (#12238 ) XREAD only supports a special ID of $ and XREADGROUP only supports ^. make sure not to suggest the wrong one when rerunning an error about unbalanced ID arguments Co-authored-by: Oran Agra <oran@redislabs.com>	2023-05-28 08:37:32 +03:00
Wen Hui	d664889992	Adding test case for hvals, hkeys, hexists against wrong type (#12198 ) HVALS, HKEYS and HEXISTS commands wrong type test cases were not covered so added the test cases.	2023-05-24 09:34:13 +03:00
Binbin	48757934ff	Performance improvement to ZADD and ZRANGESTORE, convert to skiplist and expand dict in advance (#12185 ) For zsets that will eventually be stored as the skiplist encoding (has a dict), we can convert it to skiplist ahead of time. This change checks the number of arguments in the ZADD command, and converts the data-structure if the number of new entries exceeds the listpack-max-entries configuration. This can cause us to over-allocate memory if there are duplicate entries in the input, which is unexpected. For ZRANGESTORE, we know the size of the zset, so we can expand the dict in advance, to avoid the temporary dict from being rehashed while it grows. Simple benchmarks shows it provides some 4% improvement in ZADD and 20% in ZRANGESTORE	2023-05-18 15:24:46 +03:00
Binbin	fd566f4050	Fix for set max entries edge case in setTypeCreate / setTypeMaybeConvert (#12183 ) In the judgment in setTypeCreate, we should judge size_hint <= max_entries. This results in the following inconsistencies: ``` 127.0.0.1:6379> config set set-max-intset-entries 5 set-max-listpack-entries 5 OK 127.0.0.1:6379> sadd intset_set1 1 2 3 4 5 (integer) 5 127.0.0.1:6379> object encoding intset_set1 "hashtable" 127.0.0.1:6379> sadd intset_set2 1 2 3 4 (integer) 4 127.0.0.1:6379> sadd intset_set2 5 (integer) 1 127.0.0.1:6379> object encoding intset_set2 "intset" 127.0.0.1:6379> sadd listpack_set1 a 1 2 3 4 (integer) 5 127.0.0.1:6379> object encoding listpack_set1 "hashtable" 127.0.0.1:6379> sadd listpack_set2 a 1 2 3 (integer) 4 127.0.0.1:6379> sadd listpack_set2 4 (integer) 1 127.0.0.1:6379> object encoding listpack_set2 "listpack" ``` This was introduced in #12019, added corresponding tests.	2023-05-16 11:32:21 -07:00
Wen Hui	e45272884e	Adding missing test case for smembers scard commands (#12148 ) Minor missing test case addition. SMEMBERS SCARD against non set SMEMBERS SCARD against non existing key	2023-05-16 09:26:49 -07:00
JJ Lu	11cf5cbdcc	Fix bug: LPOS RANK LONG_ MIN causes overflow (#12167 ) Limit the range of RANK to -LONG_ MAX ~ LONG_ MAX. Without this limit, passing -9223372036854775808 would effectively be the same as passing -1.	2023-05-14 09:04:33 +03:00
Wen Hui	42dd98ec19	adding missing test cases GET and GETEX (#12125 ) adding test case of expired key or not exist for GET and GETEX. for better test coverage.	2023-05-07 11:46:11 +03:00
Wen Hui	f32d1817e3	Updating missing test cases for Hash commands (#12116 ) Adding missing test case against wrong type for HRANDFIELD HGET HGETALL HDEL HINCRBY HINCRBYFLOAT HSTRLEN.	2023-05-01 21:00:07 +03:00
Wen Hui	091412cf62	add test cases for decr decrby on missing key (#12070 ) Minor test case addition for DECR and DECRBY. Currently DECR and DECRBY do not have test case coverage for the scenarios where they run on a non-existing key.	2023-04-19 09:55:56 +03:00
Wen Hui	d2db4aa753	Added getrange missing testcase (#12061 ) Minor test case addition. Currently GETRANGE command does not have the test case coverage for the scenarios: An error is returned when key exists but of different type Added missing test cases for getrange command.	2023-04-18 08:32:10 +03:00
Wen Hui	4375b01cc7	Adding missing test cases for substring (#12039 ) There is are some missing test cases for SUBSTR command. These might already be covered by GETRANGE, but no harm in adding them since they're simple. Added 3 test case. * start > stop * start and stop both greater than string length * when no key is present.	2023-04-13 21:48:26 +03:00
Wen Hui	bc82309ceb	Adding missing test cases for linsert command (#12040 ) Currently LINSERT command does not have the test case coverage for following scenarios. 1. When key does not exist, it is considered an empty list and no operation is performed. 2. An error is returned when key exists but does not hold a list value. Added above two missing test cases for linsert command.	2023-04-13 19:05:41 +03:00

1 2 3 4 5 ...

382 Commits