futriix

Author	SHA1	Message	Date
Rain Valentine	ab627d6721	Replace dict with new hashtable: sorted set datatype (#1427 ) This PR replaces dict with hashtable in the ZSET datatype. Instead of mapping key to score as dict did, the hashtable maps key to a node in the skiplist, which contains the score. This takes advantage of hashtable performance improvements and saves 15 bytes per set item - 24 bytes overhead before, 9 bytes after. Closes #1096 --------- Signed-off-by: Rain Valentine <rsg000@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-08 18:34:02 +01:00
Rain Valentine	88942c8e61	Replace dict with new hashtable for sets datatype (#1176 ) The new `hashtable` provides faster lookups and uses less memory than `dict`. A TCL test case "SRANDMEMBER with a dict containing long chain" is deleted because it's covered by a hashtable unit test "test_random_entry_with_long_chain", which is already present. This change also moves some logic from dismissMemory (object.c) to zmadvise_dontneed (zmalloc.c), so the hashtable implementation which needs the dismiss functionality doesn't need to depend on object.c and server.h. This PR follows #1186. --------- Signed-off-by: Rain Valentine <rsg000@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-12-14 20:53:48 +01:00
Viktor Söderqvist	3eb8314be6	Replace dict with hashtable for keys, expires and pubsub channels Instead of a dictEntry with pointers to key and value, the hashtable has a pointer directly to the value (robj) which can hold an embedded key and acts as a key-value in the hashtable. This minimizes the number of pointers to follow and thus the number of memory accesses to lookup a key-value pair. Keys robj hashtable +-------+ +-----------------------+ \| 0 \| \| type, encoding, LRU \| \| 1 ------->\| refcount, expire \| \| 2 \| \| ptr \| \| ... \| \| optional embedded key \| +-------+ \| optional embedded val \| +-----------------------+ The expire timestamp (TTL) is also stored in the robj, if any. The expire hash table points to the same robj. Overview of changes: * Replace dict with hashtable in kvstore (kvstore.c) * Add functions for embedding key and expire in robj (object.c) * When there's unused space, reserve an expire field to avoid realloting it later if expire is added. * Always reserve space for expire for large key names to avoid realloc if it's set later. * Update db functions (db.c) * dbAdd, setKey and setExpire reallocate the object when embedding a key * setKey does not increment the reference counter, since it would require duplicating the object. This responsibility is moved to the caller. * Remove logic for shared integer objects as values in the database. The keys are now embedded in the objects, so all objects in the database need to be unique. Thus, we can't use shared objects as values. Also delete test cases for shared integers. * Adjust various commands to the changes mentioned above. * Adjust defrag code * Improvement: Don't access the expires table before defrag has actually reallocated the object. * Adjust test cases that were using hard-coded sizes for dict when realloc would happen, and some other adjustments in test cases. * Adjust memory prefetch for new hash table implementation in IO-threading, using new `hashtableIncrementalFind` API * Adjust offloading of free() to IO threads: Object free to be done in main thread while keeping obj->ptr offloading in IO-thread since the DB object is now allocated by the main-thread and not by the IO-thread as it used to be. * Let expireIfNeeded take an optional value, to avoid looking up the expires table when possible. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Uri Yagelnik <uriy@amazon.com>	2024-12-10 21:30:56 +01:00
ranshid	5be4ce6d27	Optimize ZRANK to avoid path comparisons (#1389 ) ZRANK is a widly used command for workloads using sorted-sets. For example, in leaderboards It enables query the specific rank of a player. The way ZRANK is currently implemented is: 1. locate the element in the SortedSet hashtable. 2. take the score of the element and use it in order to locate the element in the SkipList (when listpack encoding is not used) 3. During the SkipLis scan for the elemnt we keep the path and use it in order to sum the span in each path node in order to calculate the elemnt rank One problem with this approach is that it involves multiple compare operations in order to locate the element. Specifically string comparison can be expensive since it will require access multiple memory locations for the items the element string is compared against. Perf analysis showed this can take up to 20% of the rank scan time. (TBD - provide the perf results for example) We can improve the rank search by taking advantage of the fact that the element node in the skiplist is pointed by the hashtable value! Our Skiplist implementation is using FatKeys, where each added node is assigned a randomly chosen height. Say we keep a height record for every skiplist element. In order to get an element rank we simply: 1. locate the element in the SortedSet hashtable. 2. we go directly to the node in the skiplist. 3. we jump to the full height of the node and take the span value. 4. we continue going foreward and always jump to the heighst point in each node we get to, making sure to sum all the spans. 5. we take off the summed spans from the SkipList length and we now have the specific node rank. :) In order to test this method I created several benchmarks. All benchmarks used the same seeds and the lists contained 1M elements. Since a very important factor is the number of scores compared to the number of elements (since small ratio means more string compares during searches) each benchmark test used different number of scores (1, 10K, 100K, 1M) some results: TPS Scores range \| non-optimized \| optimized \| gain -- \| -- \| -- \| -- 1 \| 416042 \| 605363 \| 45.51% 10K \| 359776 \| 459200 \| 27.63% 100K \| 380387 \| 459157 \| 20.71% 1M \| 416059 \| 450853 \| 8.36% Latency Scores range \| non-optimized \| optimized \| gain -- \| -- \| -- \| -- 1 \| 1.191000 \| 0.831000 \| -30.23% 10K \| 1.383000 \| 1.095000 \| -20.82% 100K \| 1.311000 \| 1.087000 \| -17.09% 1M \| 1.191000 \| 1.119000 \| -6.05% ### Memory efficiency adding another field to each skiplist node can cause degredation in memory efficiency for large sortedsets. We use the fact that level 0 recorded span of ALL nodes can either be 1 or zero (for the last node). So we use wrappers in order to get a node span and override the span for level 0 to hold the node height. --------- Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2024-12-09 15:48:46 +01:00
Parth	c4920bca4a	Integrating fast_float to optionally replace strtod (#1260 ) Fast_float is a C++ header-only library to parse doubles using SIMD instructions. The purpose is to speed up sorted sets and other commands that use doubles. A single-file copy of fast_float is included in this repo. This introduces an optional dependency on a C++ compiler. The use of fast_float is enabled at compile time using the make variable `USE_FAST_FLOAT=yes`. It is disabled by default. Fixes #1069. --------- Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com> Signed-off-by: Parth <661497+parthpatel@users.noreply.github.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Roshan Swain <swainroshan001@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-11-25 10:01:43 +01:00
Mikhail Koviazin	af811748e7	clang-format: set ColumnLimit to 0 and reformat (#1045 ) This commit hopefully improves the formatting of the codebase by setting ColumnLimit to 0 and hence stopping clang-format from trying to put as much stuff in one line as possible. This change enabled us to remove most of `clang-format off` directives and fixed a bunch of lines that looked like this: ```c #define KEY \ VALUE /* comment */ ``` Additionally, one pair of `clang-format off` / `clang-format on` had `clang-format off` as the second comment and hence didn't enable the formatting for the rest of the file. This commit addresses this issue as well. Please tell me if anything in the changes seem off. If everything is fine, I will add this commit to `.git-blame-ignore-revs` later. --------- Signed-off-by: Mikhail Koviazin <mikhail.koviazin@aiven.io>	2024-09-25 01:22:54 +02:00
Pieter Cailliau	4d284daefd	Copyright update to reflect IP transfer from salvatore to Redis (#740 ) Update references of copyright being assigned to Salvatore when it was transferred to Redis Ltd. as per https://github.com/valkey-io/valkey/issues/544. --------- Signed-off-by: Pieter Cailliau <pieter@redis.com>	2024-08-14 09:20:36 -07:00
Rayacoo	76f809bc19	Optimize ZUNION[STORE] command by removing unnecessary accumulator dict (#829 ) In the past implementation of `ZUNION` and `ZUNIONSTORE` commands, we first create a temporary dict called `accumulator`. After adding all member-score mappings to `accumulator`, we still need to convert `accumulator` back to the final dict `dstzset->dict`. However, we can directly use `dstzset->dict` to avoid the additional copy operation. This PR removes the `accumulator` dict and directly uses` dstzset->dict `to store the member-score mappings. - Test First, I added 300 unique elements to two sorted sets called 'zunion_test1' and 'zunion_test2'. Then, I tested `zunion` and `zunionstore` on these two sorted sets. The test results shown below indicate that the performance of both zunion and zunionstore improved about 31%. ### ZUNION #### unstable ``` ./valkey-benchmark -P 10 -n 100000 zunion 2 zunion_test1 zunion_test2 Summary: throughput summary: 2713.41 requests per second latency summary (msec): avg min p50 p95 p99 max 146.252 3.464 153.343 182.015 184.959 192.895 ``` #### This PR ``` ./valkey-benchmark -P 10 -n 100000 zunion 2 zunion_test1 zunion_test2 Summary: throughput summary: 3543.84 requests per second latency summary (msec): avg min p50 p95 p99 max 108.259 2.984 114.239 141.695 145.151 160.255 ``` ### ZUNIONSTORE #### unstable ``` ./valkey-benchmark -P 10 -n 100000 zunionstore out 2 zunion_test1 zunion_test2 Summary: throughput summary: 3168.07 requests per second latency summary (msec): avg min p50 p95 p99 max 157.511 3.368 183.167 189.311 193.535 231.679 ``` #### This PR ``` ./valkey-benchmark -P 10 -n 100000 zunionstore out 2 zunion_test1 zunion_test2 Summary: throughput summary: 4144.73 requests per second latency summary (msec): avg min p50 p95 p99 max 120.374 2.648 141.823 149.119 153.855 183.167 ``` --------- Signed-off-by: RayCao <zisong.cw@alibaba-inc.com> Signed-off-by: zisong.cw <zisong.cw@alibaba-inc.com>	2024-08-13 16:50:57 +08:00
zisong.cw	da286a599d	Optimize the logic for checking the conversion of zset from listpack to skiplist during the ZADD operation. (#806 ) During the ZADD operation, a conversion from listpack to skiplist might be necessary for the sorted set. Currently, the function zsetTypeMaybeConvert only examines the number of elements but does not check the Max size of the elements. It is advisable to include a check for value_len_hint for a more robust conversion check mechanism. --------- Signed-off-by: RayCao <zisong.cw@alibaba-inc.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2024-07-25 17:46:00 +08:00
skyfirelee	e4c1f6d45a	Replace client flags to bitfield (#614 )	2024-06-30 11:33:10 -07:00
Eran Liberty	0700c441c6	Remove unused valDup (#443 ) Remove the unused value duplicate API from dict. It's unused in the codebase and introduces unnecessary overhead. --------- Signed-off-by: Eran Liberty <eran.liberty@gmail.com>	2024-06-03 12:22:06 -07:00
Ping Xie	c41dd77a3e	Add clang-format configs (#323 ) I have validated that these settings closely match the existing coding style with one major exception on `BreakBeforeBraces`, which will be `Attach` going forward. The mixed `BreakBeforeBraces` styles in the current codebase are hard to imitate and also very odd IMHO - see below ``` if (a == 1) { /Attach / } ``` ``` if (a == 1 \|\| b == 2) { /* Why? */ } ``` Please do NOT merge just yet. Will add the github action next once the style is reviewed/approved. --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-05-22 23:24:12 -07:00
Jacob Murphy	df5db0627f	Remove trademarked language in code comments (#223 ) This includes comments used for module API documentation. * Strategy for replacement: Regex search: `(//\|/\\| \\|#).* ("\|$)?(r\|R)edis( \|\. \|'\|\n\|,\|-\|$\|")(?!nor the names of its contributors)(?!Ltd.)(?!Labs)(?!Contributors.)` * Don't edit copyright comments * Replace "Redis version X.X" -> "Redis OSS version X.X" to distinguish from newly licensed repository * Replace "Redis Object" -> "Object" * Exclude markdown for now * Don't edit Lua scripting comments referring to redis.X API * Replace "Redis Protocol" -> "RESP" * Replace redis-benchmark, -cli, -server, -check-aof/rdb with "valkey-" prefix * Most other places, I use best judgement to either remove "Redis", or replace with "the server" or "server" Fixes #148 --------- Signed-off-by: Jacob Murphy <jkmurphy@google.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-04-09 10:24:03 +02:00
Yanqi Lv	bad33f8738	fix wrong data type conversion in zrangeResultBeginStore (#13148 ) In `beginResultEmission`, -1 means the result length is not known in advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`. Although `dictExpand` won't succeed because the size overflows, I think we'd better to avoid this wrong conversion. This bug can be triggered when the source of `zrangestore` doesn't exist or we use `zrangestore` command with `byscore` or `bylex`. The impact is that dst keys will be converted to use skiplist instead of listpack.	2024-03-19 08:52:55 +02:00
Chen Tianjie	aeada20140	Avoid unnecessary dict shrink in zremrangeGenericCommand (#13143 ) If the skiplist is emptied, there is no need to shrink the dict in skiplist, it can be deleted directly.	2024-03-19 10:14:19 +08:00
Binbin	063de675e0	zunionInterDiffGenericCommand use ztrycalloc to avoid OOM panic (#13052 ) In low memory situations, sending a big number of arguments (sets) may cause OOM panic. Use ztrycalloc, like we do on LCS and XAUTOCLAIM, and fail gracefully. This change affects the following commands: ZUNION, ZINTER, ZDIFF, ZUNIONSTORE, ZINTERSTORE, ZDIFFSTORE, ZINTERCARD.	2024-02-15 10:49:10 +02:00
Chen Tianjie	af7ceeb765	Optimize resizing hash table to resize not only non-empty dicts. (#12819 ) The function `tryResizeHashTables` only attempts to shrink the dicts that has keys (change from #11695), this was a serious problem until the change in #12850 since it meant if all keys are deleted, we won't shrink the dick. But still, both dictShrink and dictExpand may be blocked by a fork child process, therefore, the cron job needs to perform both dictShrink and dictExpand, for not just non-empty dicts, but all dicts in DBs. What this PR does: 1. Try to resize all dicts in DBs (not just non-empty ones, as it was since #12850) 2. handle both shrink and expand (not just shrink, as it was since forever) 3. Refactor some APIs about dict resizing (get rid of `htNeedsShrink` `htNeedsShrink` `dictShrinkToFit`, and expose `dictShrinkIfNeeded` `dictExpandIfNeeded` which already contains all the code of those functions we get rid of, to make APIs more neat) 4. In the `Don't rehash if redis has child process` test, now that cron would do resizing, we no longer need to write to DB after the child process got killed, and can wait for the cron to expand the hash table.	2024-01-29 21:02:07 +02:00
Yanqi Lv	e2b7932b34	Shrink dict when deleting dictEntry (#12850 ) When we insert entries into dict, it may autonomously expand if needed. However, when we delete entries from dict, it doesn't shrink to the proper size. If there are few entries in a very large dict, it may cause huge waste of memory and inefficiency when iterating. The main keyspace dicts (keys and expires), are shrinked by cron (`tryResizeHashTables` calls `htNeedsResize` and `dictResize`), And some data structures such as zset and hash also do that (call `htNeedsResize`) right after a loop of calls to `dictDelete`, But many other dicts are completely missing that call (they can only expand). In this PR, we provide the ability to automatically shrink the dict when deleting. The conditions triggering the shrinking is the same as `htNeedsResize` used to have. i.e. we expand when we're over 100% utilization, and shrink when we're below 10% utilization. Additionally: * Add `dictPauseAutoResize` so that flows that do mass deletions, will only trigger shrinkage at the end. * Rename `dictResize` to `dictShrinkToFit` (same logic as it used to have, but better name describing it) * Rename `_dictExpand` to `_dictResize` (same logic as it used to have, but better name describing it) related to discussion https://github.com/redis/redis/pull/12819#discussion_r1409293878 --------- Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2024-01-15 08:20:53 +02:00
binfeng-xin	56ec1ff1ce	Call signalModifiedKey after the key modification is completed (#11144 ) Fix `signalModifiedKey()` order, call it after key modification completed, to ensure the state of key eventual consistency. When a key is modified, Redis calls `signalModifiedKey` to notify other systems, such as the watch system of transactions and the tracking system of client side caching. However, in some commands, the `signalModifiedKey` call happens during the key modification process instead of after the key modification is completed. This can potentially cause issues, as systems relying on `signalModifiedKey` may receive the "write in flight" status of the key rather than its final state. These commands include: 1. PFADD 2. LSET, LMOVE, LREM 3. ZPOPMIN, ZPOPMAX, BZPOPMIN, BZPOPMAX, ZMPOP, BZMPOP Currently there is no problem with Redis, but it is better to adjust the order of `signalModifiedKey()`, to avoid issues in future development on Redis. --------- Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2023-11-27 11:16:41 +08:00
Vitaly	0270abda82	Replace cluster metadata with slot specific dictionaries (#11695 ) This is an implementation of https://github.com/redis/redis/issues/10589 that eliminates 16 bytes per entry in cluster mode, that are currently used to create a linked list between entries in the same slot. Main idea is splitting main dictionary into 16k smaller dictionaries (one per slot), so we can perform all slot specific operations, such as iteration, without any additional info in the `dictEntry`. For Redis cluster, the expectation is that there will be a larger number of keys, so the fixed overhead of 16k dictionaries will be The expire dictionary is also split up so that each slot is logically decoupled, so that in subsequent revisions we will be able to atomically flush a slot of data. ## Important changes * Incremental rehashing - one big change here is that it's not one, but rather up to 16k dictionaries that can be rehashing at the same time, in order to keep track of them, we introduce a separate queue for dictionaries that are rehashing. Also instead of rehashing a single dictionary, cron job will now try to rehash as many as it can in 1ms. * getRandomKey - now needs to not only select a random key, from the random bucket, but also needs to select a random dictionary. Fairness is a major concern here, as it's possible that keys can be unevenly distributed across the slots. In order to address this search we introduced binary index tree). With that data structure we are able to efficiently find a random slot using binary search in O(log^2(slot count)) time. * Iteration efficiency - when iterating dictionary with a lot of empty slots, we want to skip them efficiently. We can do this using same binary index that is used for random key selection, this index allows us to find a slot for a specific key index. For example if there are 10 keys in the slot 0, then we can quickly find a slot that contains 11th key using binary search on top of the binary index tree. * scan API - in order to perform a scan across the entire DB, the cursor now needs to not only save position within the dictionary but also the slot id. In this change we append slot id into LSB of the cursor so it can be passed around between client and the server. This has interesting side effect, now you'll be able to start scanning specific slot by simply providing slot id as a cursor value. The plan is to not document this as defined behavior, however. It's also worth nothing the SCAN API is now technically incompatible with previous versions, although practically we don't believe it's an issue. * Checksum calculation optimizations - During command execution, we know that all of the keys are from the same slot (outside of a few notable exceptions such as cross slot scripts and modules). We don't want to compute the checksum multiple multiple times, hence we are relying on cached slot id in the client during the command executions. All operations that access random keys, either should pass in the known slot or recompute the slot. * Slot info in RDB - in order to resize individual dictionaries correctly, while loading RDB, it's not enough to know total number of keys (of course we could approximate number of keys per slot, but it won't be precise). To address this issue, we've added additional metadata into RDB that contains number of keys in each slot, which can be used as a hint during loading. * DB size - besides `DBSIZE` API, we need to know size of the DB in many places want, in order to avoid scanning all dictionaries and summing up their sizes in a loop, we've introduced a new field into `redisDb` that keeps track of `key_count`. This way we can keep DBSIZE operation O(1). This is also kept for O(1) expires computation as well. ## Performance This change improves SET performance in cluster mode by ~5%, most of the gains come from us not having to maintain linked lists for keys in slot, non-cluster mode has same performance. For workloads that rely on evictions, the performance is similar because of the extra overhead for finding keys to evict. RDB loading performance is slightly reduced, as the slot of each key needs to be computed during the load. ## Interface changes * Removed `overhead.hashtable.slot-to-keys` to `MEMORY STATS` * Scan API will now require 64 bits to store the cursor, even on 32 bit systems, as the slot information will be stored. * New RDB version to support the new op code for SLOT information. --------- Co-authored-by: Vitaly Arbuzov <arvit@amazon.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Roshan Khatri <rvkhatri@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2023-10-14 23:58:26 -07:00
Chen Tianjie	b26e8e3213	Optimize ZRANGE offset location from linear search to skiplist jump. (#12450 ) ZRANGE BYSCORE/BYLEX with [LIMIT offset count] option was using every level in skiplist to jump to the first/last node in range, but only use level[0] in skiplist to locate the node at offset, resulting in sub-optimal performance using LIMIT: ``` while (ln && offset--) { if (reverse) { ln = ln->backward; } else { ln = ln->level[0].forward; } } ``` It could be slow when offset is very big. We can get the total rank of the offset location and use skiplist to jump to it. It is an improvement from O(offset) to O(log rank). Below shows how this is implemented (if the offset is positve): Use the skiplist to seach for the first element in the range, record its rank `rank_0`, so we can have the rank of the target node `rank_t`. Meanwhile we record the last node we visited which has zsl->level-1 levels and its rank `rank_1`. Then we start from the zsl->level-1 node, use skiplist to go forward `rank_t-rank_1` nodes to reach the target node. It is very similiar when the offset is reversed. Note that if `rank_t` is very close to `rank_0`, we just start from the first element in range and go node by node, this for the case when zsl->level-1 node is to far away and it is quicker to reach the target node by node. Here is a test using a random generated zset including 10000 elements (with different positive scores), doing a bench mark which compares how fast the `ZRANGE` command is exucuted before and after the optimization. The start score is set to 0 and the count is set to 1 to make sure that most of the time is spent on locating the offset. ``` memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test 0 +inf byscore limit <offset> 1" ``` \| offset \| QPS(unstable) \| QPS(optimized) \| \|--------\|--------\|--------\| \| 10 \| 73386.02 \| 74819.82 \| \| 1000 \| 48084.96 \| 73177.73 \| \| 2000 \| 31156.79 \| 72805.83 \| \| 5000 \| 10954.83 \| 71218.21 \| With the result above, we can see that the original code is greatly slowed down when offset gets bigger, and with the optimization the speed is almost not affected. Similiar results are generated when testing reversed offset: ``` memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test +inf 0 byscore rev limit <offset> 1" ``` \| offset \| QPS(unstable) \| QPS(optimized) \| \|--------\|--------\|--------\| \| 10 \| 74505.14 \| 71653.67 \| \| 1000 \| 46829.25 \| 72842.75 \| \| 2000 \| 28985.48 \| 73669.01 \| \| 5000 \| 11066.22 \| 73963.45 \| And the same conclusion is drawn from the tests of ZRANGE BYLEX.	2023-08-31 14:42:08 +03:00
Binbin	6b57241fa8	Revert zrangeGenericCommand negative offset check (#12377 ) The negative offset check was added in #9052, we realized that this is a non-mandatory breaking change and we would like to add it only in 8.0. This reverts PR #9052, will be re-introduced later in 8.0.	2023-07-02 20:38:30 +03:00
Binbin	d9c2ef8a72	zrangeGenericCommand add check for negative offset (#9052 ) Now we will check the offset in zrangeGenericCommand. With a negative offset, we will throw an error and return. This also resolve the issue of zeroing the destination key in case of the "store" variant when we input a negative offset. ``` 127.0.0.1:6379> set key value OK 127.0.0.1:6379> zrangestore key myzset 0 10 byscore limit -1 10 (integer) 0 127.0.0.1:6379> exists key (integer) 0 ``` This change affects the following commands: - ZRANGE / ZRANGESTORE / ZRANGEBYLEX / ZRANGEBYSCORE - ZREVRANGE / ZREVRANGEBYSCORE / ZREVRANGEBYLEX	2023-06-20 14:33:17 +03:00
Binbin	32f45215c3	Try lazyfree temp zset in ZUNION / ZINTER / ZDIFF and optimize ZINTERCARD to avoid create temp zset (#12229 ) We check lazyfree_lazy_server_del in sunionDiffGenericCommand to see if we need to lazyfree the temp set. Now do the same in zunionInterDiffGenericCommand to lazyfree the temp zset. This is a minor change, follow #5903. Also improved the comments. Additionally, avoid creating unused zset object in ZINTERCARD, results in some 10% performance improvement.	2023-05-29 09:55:17 +03:00
Binbin	006ab26c37	Optimize HRANDFIELD and ZRANDMEMBER case 3 when listpack encoded (#12205 ) Optimized HRANDFIELD and ZRANDMEMBER commands as in #8444, CASE 3 under listpack encoding. Boost optimization to CASE 2.5. CASE 2.5 listpack only. Sampling unique elements, in non-random order. Listpack encoded hashes / zsets are meant to be relatively small, so HRANDFIELD_SUB_STRATEGY_MUL / ZRANDMEMBER_SUB_STRATEGY_MUL isn't necessary and we rather not make copies of the entries. Instead, we emit them directly to the output buffer. Simple benchmarks shows it provides some 400% improvement in HRANDFIELD and ZRANGESTORE both in CASE 3. Unrelated changes: remove useless setTypeRandomElements and fix a typo.	2023-05-22 15:48:32 +03:00
Binbin	48757934ff	Performance improvement to ZADD and ZRANGESTORE, convert to skiplist and expand dict in advance (#12185 ) For zsets that will eventually be stored as the skiplist encoding (has a dict), we can convert it to skiplist ahead of time. This change checks the number of arguments in the ZADD command, and converts the data-structure if the number of new entries exceeds the listpack-max-entries configuration. This can cause us to over-allocate memory if there are duplicate entries in the input, which is unexpected. For ZRANGESTORE, we know the size of the zset, so we can expand the dict in advance, to avoid the temporary dict from being rehashed while it grows. Simple benchmarks shows it provides some 4% improvement in ZADD and 20% in ZRANGESTORE	2023-05-18 15:24:46 +03:00
tison	9b53cb28d6	Correct zrangeGenericCommand comment (#12136 ) Fix incorrect documentation about the arguments of ZRANGE commands	2023-05-08 16:13:32 -07:00
Oran Agra	b1939b052a	Integer Overflow in RAND commands can lead to assertion (CVE-2023-25155) (#11857 ) Issue happens when passing a negative long value that greater than the max positive value that the long can store.	2023-02-28 15:15:46 +02:00
Oran Agra	b4123663c3	Obuf limit, exit during loop in RAND commands and KEYS (#11676 ) Related to the hang reported in #11671 Currently, redis can disconnect a client due to reaching output buffer limit, it'll also avoid feeding that output buffer with more data, but it will keep running the loop in the command (despite the client already being marked for disconnection) This PR is an attempt to mitigate the problem, specifically for commands that are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER. The RAND family of commands can take a negative COUNT argument (which is not bound to the number of elements in the key), so it's enough to create a key with one field, and then these commands can be used to hang redis. For KEYS the caller can use the existing keyspace in redis (if big enough).	2023-01-16 13:51:18 +02:00
Oran Agra	16f408b1a0	Fix range issues in ZRANDMEMBER and HRANDFIELD (CVE-2023-22458) (#11674 ) missing range check in ZRANDMEMBER and HRANDIFLD leading to panic due to protocol limitations	2023-01-16 13:50:27 +02:00
Viktor Söderqvist	c84248b5d2	Make dictEntry opaque Use functions for all accesses to dictEntry (except in dict.c). Dict abuses e.g. in defrag.c have been replaced by support functions provided by dict.	2023-01-11 09:59:24 +01:00
ranshid	383d902ce6	reprocess command when client is unblocked on keys (#11012 ) TL;DR --------------------------------------- Following the discussion over the issue [#7551](https://github.com/redis/redis/issues/7551) We decided to refactor the client blocking code to eliminate some of the code duplications and to rebuild the infrastructure better for future key blocking cases. In this PR --------------------------------------- 1. reprocess the command once a client becomes unblocked on key (instead of running custom code for the unblocked path that's different than the one that would have run if blocking wasn't needed) 2. eliminate some (now) irrelevant code for handling unblocking lists/zsets/streams etc... 3. modify some tests to intercept the error in cases of error on reprocess after unblock (see details in the notes section below) 4. replace '$' on the client argv with current stream id. Since once we reprocess the stream XREAD we need to read from the last msg and not wait for new msg in order to prevent endless block loop. 5. Added statistics to the info "Clients" section to report the: * `total_blocking_keys` - number of blocking keys * `total_blocking_keys_on_nokey` - number of blocking keys which have at least 1 client which would like to be unblocked on when the key is deleted. 6. Avoid expiring unblocked key during unblock. Previously we used to lookup the unblocked key which might have been expired during the lookup. Now we lookup the key using NOTOUCH and NOEXPIRE to avoid deleting it at this point, so propagating commands in blocked.c is no longer needed. 7. deprecated command flags. We decided to remove the CMD_CALL_STATS and CMD_CALL_SLOWLOG and make an explicit verification in the call() function in order to decide if stats update should take place. This should simplify the logic and also mitigate existing issues: for example module calls which are triggered as part of AOF loading might still report stats even though they are called during AOF loading. Behavior changes --------------------------------------------------- 1. As this implementation prevents writing dedicated code handling unblocked streams/lists/zsets, since we now re-process the command once the client is unblocked some errors will be reported differently. The old implementation used to issue ``UNBLOCKED the stream key no longer exists`` in the following cases: - The stream key has been deleted (ie. calling DEL) - The stream and group existed but the key type was changed by overriding it (ie. with set command) - The key not longer exists after we swapdb with a db which does not contains this key - After swapdb when the new db has this key but with different type. In the new implementation the reported errors will be the same as if the command was processed after effect: NOGROUP - in case key no longer exists, or WRONGTYPE in case the key was overridden with a different type. 2. Reprocessing the command means that some checks will be reevaluated once the client is unblocked. For example, ACL rules might change since the command originally was executed and will fail once the client is unblocked. Another example is OOM condition checks which might enable the command to run and block but fail the command reprocess once the client is unblocked. 3. One of the changes in this PR is that no command stats are being updated once the command is blocked (all stats will be updated once the client is unblocked). This implies that when we have many clients blocked, users will no longer be able to get that information from the command stats. However the information can still be gathered from the client list. Client blocking --------------------------------------------------- the blocking on key will still be triggered the same way as it is done today. in order to block the current client on list of keys, the call to blockForKeys will still need to be made which will perform the same as it is today: * add the client to the list of blocked clients on each key * keep the key with a matching list node (position in the global blocking clients list for that key) in the client private blocking key dict. * flag the client with CLIENT_BLOCKED * update blocking statistics * register the client on the timeout table Key Unblock --------------------------------------------------- Unblocking a specific key will be triggered (same as today) by calling signalKeyAsReady. the implementation in that part will stay the same as today - adding the key to the global readyList. The reason to maintain the readyList (as apposed to iterating over all clients blocked on the specific key) is in order to keep the signal operation as short as possible, since it is called during the command processing. The main change is that instead of going through a dedicated code path that operates the blocked command we will just call processPendingCommandsAndResetClient. ClientUnblock (keys) --------------------------------------------------- 1. Unblocking clients on keys will be triggered after command is processed and during the beforeSleep 8. the general schema is: 9. For each key k in the readyList: ``` For each client c which is blocked on k: in case either: 1. k exists AND the k type matches the current client blocking type OR 2. k exists and c is blocked on module command OR 3. k does not exists and c was blocked with the flag unblock_on_deleted_key do: 1. remove the client from the list of clients blocked on this key 2. remove the blocking list node from the client blocking key dict 3. remove the client from the timeout list 10. queue the client on the unblocked_clients list 11. NEW: call processCommandAndResetClient(c); ``` NOTE: for module blocked clients we will still call the moduleUnblockClientByHandle which will queue the client for processing in moduleUnblockedClients list. Process Unblocked clients --------------------------------------------------- The process of all unblocked clients is done in the beforeSleep and no change is planned in that part. The general schema will be: For each client c in server.unblocked_clients: * remove client from the server.unblocked_clients * set back the client readHandler * continue processing the pending command and input buffer. Some notes regarding the new implementation --------------------------------------------------- 1. Although it was proposed, it is currently difficult to remove the read handler from the client while it is blocked. The reason is that a blocked client should be unblocked when it is disconnected, or we might consume data into void. 2. While this PR mainly keep the current blocking logic as-is, there might be some future additions to the infrastructure that we would like to have: - allow non-preemptive blocking of client - sometimes we can think that a new kind of blocking can be expected to not be preempt. for example lets imagine we hold some keys on disk and when a command needs to process them it will block until the keys are uploaded. in this case we will want the client to not disconnect or be unblocked until the process is completed (remove the client read handler, prevent client timeout, disable unblock via debug command etc...). - allow generic blocking based on command declared keys - we might want to add a hook before command processing to check if any of the declared keys require the command to block. this way it would be easier to add new kinds of key-based blocking mechanisms. Co-authored-by: Oran Agra <oran@redislabs.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2023-01-01 23:35:42 +02:00
Binbin	20854cb610	Fix zuiFind crash / RM_ScanKey hang on SET object listpack encoding (#11581 ) In #11290, we added listpack encoding for SET object. But forgot to support it in zuiFind, causes ZINTER, ZINTERSTORE, ZINTERCARD, ZIDFF, ZDIFFSTORE to crash. And forgot to support it in RM_ScanKey, causes it hang. This PR add support SET listpack in zuiFind, and in RM_ScanKey. And add tests for related commands to cover this case. Other changes: - There is no reason for zuiFind to go into the internals of the SET. It can simply use setTypeIsMember and don't care about encoding. - Remove the `#include "intset.h"` from server.h reduce the chance of accidental intset API use. - Move setTypeAddAux, setTypeRemoveAux and setTypeIsMemberAux interfaces to the header. - In scanGenericCommand, use setTypeInitIterator and setTypeNext to handle OBJ_SET scan. - In RM_ScanKey, improve hash scan mode, use lpGetValue like zset, they can share code and better performance. The zuiFind part fixes #11578 Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2022-12-09 17:08:01 +02:00
C Charles	eeca7f2911	Add withscore option to ZRANK and ZREVRANK. (#11235 ) Add an option "withscores" to ZRANK and ZREVRANK. Add `[withscore]` option to both `zrank` and `zrevrank`, like this: ``` z[rev]rank key member [withscore] ```	2022-11-28 11:57:11 +02:00
Viktor Söderqvist	4e472a1a7f	Listpack encoding for sets (#11290 ) Small sets with not only integer elements are listpack encoded, by default up to 128 elements, max 64 bytes per element, new config `set-max-listpack-entries` and `set-max-listpack-value`. This saves memory for small sets compared to using a hashtable. Sets with only integers, even very small sets, are still intset encoded (up to 1G limit, etc.). Larger sets are hashtable encoded. This PR increments the RDB version, and has an effect on OBJECT ENCODING Possible conversions when elements are added: intset -> listpack listpack -> hashtable intset -> hashtable Note: No conversion happens when elements are deleted. If all elements are deleted and then added again, the set is deleted and recreated, thus implicitly converted to a smaller encoding.	2022-11-09 19:50:07 +02:00
guybe7	b57fd01064	Blocked module clients should be aware when a key is deleted (#11310 ) The use case is a module that wants to implement a blocking command on a key that necessarily exists and wants to unblock the client in case the key is deleted (much like what we implemented for XREADGROUP in #10306) New module API: * RedisModule_BlockClientOnKeysWithFlags Flags: * REDISMODULE_BLOCK_UNBLOCK_NONE * REDISMODULE_BLOCK_UNBLOCK_DELETED ### Detailed description of code changes blocked.c: 1. Both module and stream functions are called whether the key exists or not, regardless of its type. We do that in order to allow modules/stream to unblock the client in case the key is no longer present or has changed type (the behavior for streams didn't change, just code that moved into serveClientsBlockedOnStreamKey) 2. Make sure afterCommand is called in serveClientsBlockedOnKeyByModule, in order to propagate actions from moduleTryServeClientBlockedOnKey. 3. handleClientsBlockedOnKeys: call propagatePendingCommands directly after lookupKeyReadWithFlags to prevent a possible lazy-expire DEL from being mixed with any command propagated by the preceding functions. 4. blockForKeys: Caller can specifiy that it wants to be awakened if key is deleted. Minor optimizations (use dictAddRaw). 5. signalKeyAsReady became signalKeyAsReadyLogic which can take a boolean in case the key is deleted. It will only signal if there's at least one client that awaits key deletion (to save calls to handleClientsBlockedOnKeys). Minor optimizations (use dictAddRaw) db.c: 1. scanDatabaseForDeletedStreams is now scanDatabaseForDeletedKeys and will signalKeyAsReady for any key that was removed from the database or changed type. It is the responsibility of the code in blocked.c to ignore or act on deleted/type-changed keys. 2. Use the new signalDeletedKeyAsReady where needed blockedonkey.c + tcl: 1. Added test of new capabilities (FSL.BPOPGT now requires the key to exist in order to work)	2022-10-18 19:50:02 +03:00
Ozan Tezcan	b08ebff31f	Pass -flto flag to the linker (#11350 ) Currently, we add -flto to the compile flags only. We are supposed to add it to the linker flags as well. Clang build fails because of this. Added a change to add -flto to REDIS_CFLAGS and REDIS_LDFLAGS if the build optimization flag is -O3. (noopt build will not use -flto)	2022-10-06 11:26:19 +03:00
Oran Agra	2189100383	optimize zset conversion on large ZRANGESTORE (#10789 ) when we know the size of the zset we're gonna store in advance, we can check if it's greater than the listpack encoding threshold, in which case we can create a skiplist from the get go, and avoid converting the listpack to skiplist later after it was already populated.	2022-06-14 21:12:45 +03:00
Vitaly	6461f09f43	Fix ZRANGESTORE crash when zset_max_listpack_entries is 0 (#10767 ) When `zrangestore` is called container destination object is created. Before this PR we used to create a listpack based object even if `zset-max-ziplist-entries` or equivalent`zset-max-listpack-entries` was set to 0. This triggered immediate conversion of the listpack into a skiplist in `zrangestore`, which hits an assertion resulting in an engine crash. Added a TCL test that reproduces this issue.	2022-05-27 22:34:00 +03:00
Binbin	450c88f368	Fix BZMPOP gets unblocked by non-key args and returns them (#10764 ) This bug was introduced in #9484 (7.0.0). It result that BZMPOP blocked on non-key arguments. Like `bzmpop 0 1 myzset min count 10`, this command will additionally block in these keys (except for the first and the last argument) and can return their values: - 0: timeout value - 1: numkeys value - min: min/max token - count: count token	2022-05-23 14:15:54 +03:00
Lu JJ	6b44e4ea92	fix some typos in "t_zset.c" (#10670 ) fix some typo in "t_zset.c". 1. `zzlisinlexrange` the function name mentioned in the comment is misspelled. 2. fix typo in function name`zarndmemberReplyWithListpack` -> `zrandmemberReplyWithListpack`	2022-05-09 15:04:39 +03:00
Oran Agra	0c4733c8d7	Optimize integer zset scores in listpack (converting to string and back) (#10486 ) When the score doesn't have fractional part, and can be stored as an integer, we use the integer capabilities of listpack to store it, rather than convert it to string. This already existed before this PR (lpInsert dose that conversion implicitly). But to do that, we would have first converted the score from double to string (calling `d2string`), then pass the string to `lpAppend` which identified it as being an integer and convert it back to an int. Now, instead of converting it to a string, we store it using lpAppendInteger`. Unrelated: --- * Fix the double2ll range check (negative and positive ranges, and also the comparison operands were slightly off. but also, the range could be made much larger, see comment). * Unify the double to string conversion code in rdb.c with the one in util.c * Small optimization in lpStringToInt64, don't attempt to convert strings that are obviously too long. Benchmark; --- Up to 20% improvement in certain tight loops doing zzlInsert with large integers. (if listpack is pre-allocated to avoid realloc, and insertion is sorted from largest to smaller)	2022-04-17 17:16:46 +03:00
Wen Hui	ca913a5de0	Fix several document error and function comments (#10580 ) This PR fix the following minor errors before Redis 7 release: ZRANGEBYLEX command in deprecated in 6.2.0, and could be replaced by ZRANGE with the BYLEX argument, but in the document, the words is written incorrect in " by ZRANGE with the BYSCORE argument" Fix function zpopmaxCommand incorrect comment The comments of function zmpopCommand and bzmpopCommand are not consistent with document description, fix them Co-authored-by: Ubuntu <lucas.guang.yang1@huawei.com>	2022-04-13 21:18:37 +03:00
Oran Agra	14b198868f	introduce MAX_D2STRING_CHARS instead of 128 const (#10487 ) There are a few places that use a hard coded const of 128 to allocate a buffer for d2string. Replace these with a clear macro. Note that In theory, converting double into string could take as much as nearly 400 chars, but since d2string uses `%g` and not `%f`, it won't pass some 40 chars. unrelated: restore some changes to auto generated commands.c that got accidentally reverted in #10293	2022-03-28 18:35:56 +03:00
yiyuaner	08aed7e7dd	Fix an off by one error in zzlStrtod (#10465 ) When vlen = sizeof(buf), the statement buf[vlen] = '\0' accessing the buffer buf is an off by one error.	2022-03-22 10:46:16 +02:00
Henry	feb032fd42	A faster and more robust code of zslRandomLevel using RAND_MAX (#5539 ) 1. since ZSKIPLIST_P is float, using it directly inside the condition used to causes floating point code to be used (gcc/x86) 2. In some operating system(eg.Windows), the largest value returned from random() is 0x7FFF(15bit), so after bitwise AND with 0xFFFF, the probability of the less operation returning true in the while loop's condition is no more equal to ZSKIPLIST_P. 3. In case some library has random() returning int in range [0~ZSKIPLIST_P*65535], the while loop will be an infinite loop. 4. on Linux where RAND_MAX is higher than 0xFFFF, this change actually improves precision (despite not matching the result against a float value)	2022-03-02 10:40:39 +02:00
filipe oliveira	1dc89e2d02	Optimization: Avoid deferred array reply on ZRANGE commands BYRANK (#10337 ) Avoid deferred array reply on genericZrangebyrankCommand() when consumer type is client. I.e. any ZRANGE / ZREVRNGE (when tank is used). This was a performance regression introduced in #7844 (v 6.2) mainly affecting pipelined workloads. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-02-24 14:20:00 +02:00
Binbin	23325c135f	sub-command support for ACL CAT and COMMAND LIST. redisCommand always stores fullname (#10127 ) Summary of changes: 1. Rename `redisCommand->name` to `redisCommand->declared_name`, it is a const char * for native commands and SDS for module commands. 2. Store the [sub]command fullname in `redisCommand->fullname` (sds). 3. List subcommands in `ACL CAT` 4. List subcommands in `COMMAND LIST` 5. `moduleUnregisterCommands` now will also free the module subcommands. 6. RM_GetCurrentCommandName returns full command name Other changes: 1. Add `addReplyErrorArity` and `addReplyErrorExpireTime` 2. Remove `getFullCommandName` function that now is useless. 3. Some cleanups about `fullname` since now it is SDS. 4. Delete `populateSingleCommand` function from server.h that is useless. 5. Added tests to cover this change. 6. Add some module unload tests and fix the leaks 7. Make error messages uniform, make sure they always contain the full command name and that it's quoted. 7. Fixes some typos see the history in #9504, fixes #10124 Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: guybe7 <guy.benoish@redislabs.com>	2022-01-23 10:05:06 +02:00
Viktor Söderqvist	acf3495eb8	Sort out the mess around writable replicas and lookupKeyRead/Write (#9572 ) Writable replicas now no longer use the values of expired keys. Expired keys are deleted when lookupKeyWrite() is used, even on a writable replica. Previously, writable replicas could use the value of an expired key in write commands such as INCR, SUNIONSTORE, etc.. This commit also sorts out the mess around the functions lookupKeyRead() and lookupKeyWrite() so they now indicate what we intend to do with the key and are not affected by the command calling them. Multi-key commands like SUNIONSTORE, ZUNIONSTORE, COPY and SORT with the store option now use lookupKeyRead() for the keys they're reading from (which will not allow reading from logically expired keys). This commit also fixes a bug where PFCOUNT could return a value of an expired key. Test modules commands have their readonly and write flags updated to correctly reflect their lookups for reading or writing. Modules are not required to correctly reflect this in their command flags, but this change is made for consistency since the tests serve as usage examples. Fixes #6842. Fixes #7475.	2021-11-28 11:26:28 +02:00
sundb	4512905961	Replace ziplist with listpack in quicklist (#9740 ) Part three of implementing #8702, following #8887 and #9366 . ## Description of the feature 1. Replace the ziplist container of quicklist with listpack. 2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation. ## Interface changes 1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`. 2. Replace `debug ziplist` command with `debug listpack`. ## Internal changes 1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`) 2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`) 3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following #9357 ). It represent that a quicklistNode is a packed node, as opposed to a plain node. 4. Remove `createZiplistObject` method, which is never used. 5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`. We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k. ## Improvements 1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at #9366. 2. Optimize `quicklistAppendPlainNode` to avoid memcpy data. ## Bugfix 1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from #9366. ## Test 1. Add unittest for `lpMerge`. 2. Modify the old quicklist ziplist corrupt dump test. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-24 13:34:13 +02:00

1 2 3 4 5 ...

271 Commits