futriix

Author	SHA1	Message	Date
Viktor Söderqvist	e9b8970e72	Relaxed RDB version check (#1604 ) New config `rdb-version-check` with values: * `strict`: Reject future RDB versions. * `relaxed`: Try parsing future RDB versions and fail only when an unknown RDB opcode or type is encountered. This can make it possible for Valkey 8.1 to try read a dump from for example Valkey 9.0 or later on a best-effort basis. The conditions for when this is expected to work can be defined when the future Valkey versions are released. Loading is expected to fail in the following cases: * If the data set contains any new key types or other data elements not supported by the current version. * If the RDB contains new representations or encodings of existing key types or other data elements. This change also prepares for the next RDB version bump. A range of RDB versions (12-79) is reserved, since it's expected to be used by foreign software RDB versions, so Valkey will not accept versions in this range even with the `relaxed` version check. The DUMP/RESTORE format has no magic string; only the RDB version number. This change also prepares for the magic string to change from REDIS to VALKEY next time we bump the RDB version. Related to #1108. --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2025-01-27 18:44:24 +01:00
Nadav Gigi	f2510783f9	Accelerate hash table iterator with value prefetching (#1568 ) This PR builds upon the [previous entry prefetching optimization](https://github.com/valkey-io/valkey/pull/1501) to further enhance performance by implementing value prefetching for hashtable iterators. ## Implementation Modified `hashtableInitIterator` to accept a new flags parameter, allowing control over iterator behavior. Implemented conditional value prefetching within `hashtableNext` based on the new `HASHTABLE_ITER_PREFETCH_VALUES` flag. When the flag is set, hashtableNext now calls `prefetchBucketValues` at the start of each new bucket, preemptively loading the values of filled entries into the CPU cache. The actual prefetching of values is performed using type-specific callback functions implemented in `server.c`: - For `robj` the `hashtableObjectPrefetchValue` callback is used to prefetch the value if not embeded. This implementation is specifically focused on main database iterations at this stage. Applying it to hashtables that hold other object types should not be problematic, but its performance benefits for those cases will need to be proven through testing and benchmarking. ## Performance ### Setup: - 64cores Graviton 3 Amazon EC2 instance. - 50 mil keys with different value sizes. - Running valkey server over RAM file system. - crc checksum and comperssion off. ### Action - save command. ### Results The results regarding the duration of “save” command was taken from “info all” command. ``` +--------------------+------------------+------------------+ \| Prefetching \| Value size (byte)\| Time (seconds) \| +--------------------+------------------+------------------+ \| No \| 100 \| 20.112279 \| \| Yes \| 100 \| 12.758519 \| \| No \| 40 \| 16.945366 \| \| Yes \| 40 \| 10.902022 \| \| No \| 20 \| 9.817000 \| \| Yes \| 20 \| 9.626821 \| \| No \| 10 \| 9.71510 \| \| Yes \| 10 \| 9.510565 \| +--------------------+------------------+------------------+ ``` The results largely align with our expectations, showing significant improvements for larger values (100 bytes and 40 bytes) that are stored outside the robj. For smaller values (20 bytes and 10 bytes) that are embedded within the robj, we see almost no improvement, which is as expected. However, the small improvement observed even for these embedded values is somewhat surprising. Given that we are not actively prefetching these embedded values, this minor performance gain was not anticipated. perf record on save command without value prefetching: ``` --99.98%--rdbSaveDb \| \|--91.38%--rdbSaveKeyValuePair \| \| \| \|--42.72%--rdbSaveRawString \| \| \| \| \| \|--26.69%--rdbWriteRaw \| \| \| \| \| \| \| --25.75%--rioFileWrite.lto_priv.0 \| \| \| \| \| --15.41%--rdbSaveLen \| \| \| \| \| \|--7.58%--rdbWriteRaw \| \| \| \| \| \| \| --7.08%--rioFileWrite.lto_priv.0 \| \| \| \| \| \| \| --6.54%--_IO_fwrite \| \| \| \| \| \| \| \| --7.42%--rdbWriteRaw.constprop.1 \| \| \| \| \| --7.18%--rioFileWrite.lto_priv.0 \| \| \| \| \| --6.73%--_IO_fwrite \| \| \| \| \| \|--40.44%--rdbSaveStringObject \| \| \| --7.62%--rdbSaveObjectType \| \| \| --7.39%--rdbWriteRaw.constprop.1 \| \| \| --7.04%--rioFileWrite.lto_priv.0 \| \| \| --6.59%--_IO_fwrite \| \| --7.33%--hashtableNext.constprop.1 \| --6.28%--prefetchNextBucketEntries.lto_priv.0 ``` perf record on save command with value prefetching: ``` rdbSaveRio \| --99.93%--rdbSaveDb \| \|--79.81%--rdbSaveKeyValuePair \| \| \| \|--66.79%--rdbSaveRawString \| \| \| \| \| \|--42.31%--rdbWriteRaw \| \| \| \| \| \| \| --40.74%--rioFileWrite.lto_priv.0 \| \| \| \| \| --23.37%--rdbSaveLen \| \| \| \| \| \|--11.78%--rdbWriteRaw \| \| \| \| \| \| \| --11.03%--rioFileWrite.lto_priv.0 \| \| \| \| \| \| \| --10.30%--_IO_fwrite \| \| \| \| \| \| \| \| \| --10.98%--rdbWriteRaw.constprop.1 \| \| \| \| \| --10.44%--rioFileWrite.lto_priv.0 \| \| \| \| \| --9.74%--_IO_fwrite \| \| \| \| \| \| \|--11.33%--rdbSaveObjectType \| \| \| \| \| --10.96%--rdbWriteRaw.constprop.1 \| \| \| \| \| --10.51%--rioFileWrite.lto_priv.0 \| \| \| \| \| --9.75%--_IO_fwrite \| \| \| \| \| \| --0.77%--rdbSaveStringObject \| --18.39%--hashtableNext \| \|--10.04%--hashtableObjectPrefetchValue \| --6.06%--prefetchNextBucketEntries ``` Conclusions: The prefetching strategy appears to be working as intended, shifting the performance bottleneck from data access to I/O operations. The significant reduction in rdbSaveStringObject time suggests that string objects(which are the values) are being accessed more efficiently. Signed-off-by: NadavGigi <nadavgigi102@gmail.com>	2025-01-23 12:17:20 +01:00
uriyage	6c09eea2bc	client struct: lazy init components and optimize struct layout (#1405 ) # Refactor client structure to use modular data components ## Current State The client structure allocates memory for replication / pubsub / multi-keys / module / blocked data for every client, despite these features being used by only a small subset of clients. In addition the current field layout in the client struct is suboptimal, with poor alignment and unnecessary padding between fields, leading to a larger than necessary memory footprint of 896 bytes per client. Furthermore, fields that are frequently accessed together during operations are scattered throughout the struct, resulting in poor cache locality. ## This PR's Change 1. Lazy Initialization - Components are only allocated when first used: - PubSubData: Created on first SUBSCRIBE/PUBLISH operation - ReplicationData: Initialized only for replica connections - ModuleData: Allocated when module interaction begins - BlockingState: Created when first blocking command is issued - MultiState: Initialized on MULTI command 2. Memory Layout Optimization: - Grouped related fields for better locality - Moved rarely accessed fields (e.g., client->name) to struct end - Optimized field alignment to eliminate padding 3. Additional changes: - Moved watched_keys to be static allocated in the `mstate` struct - Relocated replication init logic to replication.c ### Key Benefits - Efficient Memory Usage: - 45% smaller base client structure - Basic clients now use 528 bytes (down from 896). - Better memory locality for related operations - Performance improvement in high throughput scenarios. No performance regressions in other cases. ### Performance Impact Tested with 650 clients and 512 bytes values. #### Single Thread Performance \| Operation \| Dataset \| New (ops/sec) \| Old (ops/sec) \| Change % \| \|------------\|---------\|---------------\|---------------\|-----------\| \| SET \| 1 key \| 261,799 \| 258,261 \| +1.37% \| \| SET \| 3M keys \| 209,134 \| ~209,000 \| ~0% \| \| GET \| 1 key \| 281,564 \| 277,965 \| +1.29% \| \| GET \| 3M keys \| 231,158 \| 228,410 \| +1.20% \| #### 8 IO Threads Performance \| Operation \| Dataset \| New (ops/sec) \| Old (ops/sec) \| Change % \| \|------------\|---------\|---------------\|---------------\|-----------\| \| SET \| 1 key \| 1,331,578 \| 1,331,626 \| -0.00% \| \| SET \| 3M keys \| 1,254,441 \| 1,152,645 \| +8.83% \| \| GET \| 1 key \| 1,293,149 \| 1,289,503 \| +0.28% \| \| GET \| 3M keys \| 1,152,898 \| 1,101,791 \| +4.64% \| #### Pipeline Performance (3M keys) \| Operation \| Pipeline Size \| New (ops/sec) \| Old (ops/sec) \| Change % \| \|-----------\|--------------\|---------------\|---------------\|-----------\| \| SET \| 10 \| 548,964 \| 538,498 \| +1.94% \| \| SET \| 20 \| 606,148 \| 594,872 \| +1.89% \| \| SET \| 30 \| 631,122 \| 616,606 \| +2.35% \| \| GET \| 10 \| 628,482 \| 624,166 \| +0.69% \| \| GET \| 20 \| 687,371 \| 681,659 \| +0.84% \| \| GET \| 30 \| 725,855 \| 721,102 \| +0.66% \| ### Observations: 1. Single-threaded operations show consistent improvements (1-1.4%) 2. Multi-threaded performance shows significant gains for large datasets: - SET with 3M keys: +8.83% improvement - GET with 3M keys: +4.64% improvement 3. Pipeline operations show consistent improvements: - SET operations: +1.89% to +2.35% - GET operations: +0.66% to +0.84% 4. No performance regressions observed in any test scenario Related issue:https://github.com/valkey-io/valkey/issues/761 --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2025-01-08 10:28:54 +02:00
Ricardo Dias	8d764f27b3	Refactor: move all valkey modules related declarations to `module.h` (#1489 ) In this commit we move all structures and functions declarations related to Valkey modules from `server.h` to the recently added `module.h` file. This re-organization makes it easier for new contributors to find the valkey modules related code, as well as reducing the compilation times when changes are made to the modules code. --------- Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>	2025-01-02 18:35:10 +01:00
Viktor Söderqvist	3eb8314be6	Replace dict with hashtable for keys, expires and pubsub channels Instead of a dictEntry with pointers to key and value, the hashtable has a pointer directly to the value (robj) which can hold an embedded key and acts as a key-value in the hashtable. This minimizes the number of pointers to follow and thus the number of memory accesses to lookup a key-value pair. Keys robj hashtable +-------+ +-----------------------+ \| 0 \| \| type, encoding, LRU \| \| 1 ------->\| refcount, expire \| \| 2 \| \| ptr \| \| ... \| \| optional embedded key \| +-------+ \| optional embedded val \| +-----------------------+ The expire timestamp (TTL) is also stored in the robj, if any. The expire hash table points to the same robj. Overview of changes: * Replace dict with hashtable in kvstore (kvstore.c) * Add functions for embedding key and expire in robj (object.c) * When there's unused space, reserve an expire field to avoid realloting it later if expire is added. * Always reserve space for expire for large key names to avoid realloc if it's set later. * Update db functions (db.c) * dbAdd, setKey and setExpire reallocate the object when embedding a key * setKey does not increment the reference counter, since it would require duplicating the object. This responsibility is moved to the caller. * Remove logic for shared integer objects as values in the database. The keys are now embedded in the objects, so all objects in the database need to be unique. Thus, we can't use shared objects as values. Also delete test cases for shared integers. * Adjust various commands to the changes mentioned above. * Adjust defrag code * Improvement: Don't access the expires table before defrag has actually reallocated the object. * Adjust test cases that were using hard-coded sizes for dict when realloc would happen, and some other adjustments in test cases. * Adjust memory prefetch for new hash table implementation in IO-threading, using new `hashtableIncrementalFind` API * Adjust offloading of free() to IO threads: Object free to be done in main thread while keeping obj->ptr offloading in IO-thread since the DB object is now allocated by the main-thread and not by the IO-thread as it used to be. * Let expireIfNeeded take an optional value, to avoid looking up the expires table when possible. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Uri Yagelnik <uriy@amazon.com>	2024-12-10 21:30:56 +01:00
Viktor Söderqvist	99865b197c	Fix bug for CLUSTER SLOTS from EVAL over TLS (#1072 ) For fake clients like the ones used for Lua and modules, we don't determine TLS in the right way, causing CLUSTER SLOTS from EVAL over TLS to fail a debug-assert. This error was introduced when the caching of CLUSTER SLOTS was introduced, i.e. in 8.0.0. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-09-25 03:55:53 -04:00
Shivshankar	ba71c7e56e	Copy 'errno' and use copied value in the if check of retry in cluster migrate commands socket_err block. (#1042 ) errno is global variable and shared with system calls, so there is chance it may be overwritten during io free or close socket in migrate command code. It would be better it is copied before the free or closesocket and use copied value to check for retry in socket_err block. So added new variable to take copy and used the copy variable for the check. Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>	2024-09-18 10:34:11 +08:00
Kyle Kim (kimkyle@)	2d1eca577e	Add SLOT-STATS under CLUSTER HELP string. (#988 ) Add help wording for cluster SLOT-STATS. Signed-off-by: Kyle Kim <kimkyle@amazon.com>	2024-09-03 12:59:06 -07:00
Binbin	e3af1a30e4	Fast path in SET if the expiration time is expired (#865 ) If the expiration time passed in SET is expired, for example, it has expired due to the machine time (DTS) or the expiration time passed in (wrong arg). In this case, we don't need to set the key and wait for the active expire scan before deleting the key. Compared with previous changes: 1. If the key does not exist, previously we would set the key and wait for the active expire to delete it, so it is a set + del from the perspective of propaganda. Now we will no set the key and return, so it a NOP. 2. If the key exists, previously we woule set the key and wait for the active expire to delete it, so it is a set + del From the perspective of propaganda. Now we will delete it and return, so it is a del. Adding a new deleteExpiredKeyFromOverwriteAndPropagate function to reduce the duplicate code. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-08-31 22:39:07 +08:00
Ping Xie	ad0ede302c	Exclude '.' and ':' from `isValidAuxChar`'s banned charset (#963 ) Fix a bug in isValidAuxChar where valid characters '.' and ':' were incorrectly included in the banned charset. This issue affected the validation of auxiliary fields in the nodes.conf file used by Valkey in cluster mode, particularly when handling IPv4 and IPv6 addresses. The code now correctly allows '.' and ':' as valid characters, ensuring proper handling of these fields. Comments were added to clarify the use of the banned charset. Related to #736 --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-08-28 23:35:31 -07:00
Pieter Cailliau	4d284daefd	Copyright update to reflect IP transfer from salvatore to Redis (#740 ) Update references of copyright being assigned to Salvatore when it was transferred to Redis Ltd. as per https://github.com/valkey-io/valkey/issues/544. --------- Signed-off-by: Pieter Cailliau <pieter@redis.com>	2024-08-14 09:20:36 -07:00
Kyle Kim (kimkyle@)	5000c050b5	Add cpu-usec metric support under CLUSTER SLOT-STATS command (#20 ). (#712 ) The metric tracks cpu time in micro-seconds, sharing the same value as `INFO COMMANDSTATS`, aggregated under per-slot context. --------- Signed-off-by: Kyle Kim <kimkyle@amazon.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-07-22 18:03:28 -07:00
Viktor Söderqvist	a323dce890	Dual stack and client-specific IPs in cluster (#736 ) New configs: * `cluster-announce-client-ipv4` * `cluster-announce-client-ipv6` New module API function: * `ValkeyModule_GetClusterNodeInfoForClient`, takes a client id and is otherwise just like its non-ForClient cousin. If configured, one of these IP addresses are reported to each client in CLUSTER SLOTS, CLUSTER SHARDS, CLUSTER NODES and redirects, replacing the IP (`custer-announce-ip` or the auto-detected IP) of each node. Which one is reported to the client depends on whether the client is connected over IPv4 or IPv6. Benefits: * This allows clients using IPv4 to get the IPv4 addresses of all cluster nodes and IPv6 clients to get the IPv6 clients. * This allows the IPs visible to clients to be different to the IPs used between the cluster nodes due to NAT'ing. The information is propagated in the cluster bus using new Ping extensions. (Old nodes without this feature ignore unknown Ping extensions.) This adds another dimension to CLUSTER SLOTS reply. It now depends on the client's use of TLS, the IP address family and RESP version. Refactoring: The cached connection type definition is moved from connection.h (it actually has nothing to do with the connection abstraction) to server.h and is changed to a bitmap, with one bit for each of TLS, IPv6 and RESP3. Fixes #337 --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-07-10 13:53:52 +02:00
KarthikSubbarao	fa01a29365	Allow Module authentication to succeed when cluster is down (#693 ) Module Authentication using a blocking implementation currently gets rejected when the "cluster is down" from the client timeout cron job (`clientsCronHandleTimeout`). This PR exempts clients blocked on Module Authentication from being rejected here. --------- Signed-off-by: KarthikSubbarao <karthikrs2021@gmail.com>	2024-07-01 13:59:06 -07:00
skyfirelee	e4c1f6d45a	Replace client flags to bitfield (#614 )	2024-06-30 11:33:10 -07:00
zhaozhao.zz	4fbe31ab87	Fix the TLS and REPS issues about CLUSTER SLOTS cache (#581 ) PR #53 introduced the cache of CLUSTER SLOTS response, but the cache has some problems for different types of clients: 1. the RESP version is wrongly ignored: ``` $./valkey-cli 127.0.0.1:6379> cluster slots 1) 1) (integer) 0 2) (integer) 16383 3) 1) "" 2) (integer) 6379 3) "f1aeceb352401ce57acd432c68c60b359c00ef85" 4) (empty array) 127.0.0.1:6379> hello 3 1# "server" => "valkey" 2# "version" => "255.255.255" 3# "proto" => (integer) 3 4# "id" => (integer) 3 5# "mode" => "cluster" 6# "role" => "master" 7# "modules" => (empty array) 127.0.0.1:6379> cluster slots 1) 1) (integer) 0 2) (integer) 16383 3) 1) "" 2) (integer) 6379 3) "f1aeceb352401ce57acd432c68c60b359c00ef85" 4) (empty array) ``` RESP3 should get "empty hash" but get RESP2's "empty array" 3. we should use the original client's connect type, or lua/function and module would get wrong port: ``` $./valkey-cli --tls --insecure -p 6789 127.0.0.1:6789> config get port tls-port 1) "tls-port" 2) "6789" 3) "port" 4) "6379" 127.0.0.1:6789> cluster slots 1) 1) (integer) 0 2) (integer) 16383 3) 1) "" 2) (integer) 6789 3) "f1aeceb352401ce57acd432c68c60b359c00ef85" 4) (empty array) 127.0.0.1:6789> eval "return redis.call('cluster','slots')" 0 1) 1) (integer) 0 2) (integer) 16383 3) 1) "" 2) (integer) 6379 3) "f1aeceb352401ce57acd432c68c60b359c00ef85" 4) (empty array) ``` --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2024-06-28 14:56:13 +08:00
zhaozhao.zz	28c5a17edf	replica redirect read&write to primary in standalone mode (#325 ) To implement #319 1. replica is able to redirect read and write commands to it's primary in standalone mode * reply with "-REDIRECT primary-ip:port" 2. add a subcommand `CLIENT CAPA redirect`, a client can announce the capability to handle redirection * if a client can handle redirection, the data access commands (read and write) will be redirected 3. allow `readonly` and `readwrite` command in standalone mode, may be a breaking change * a client with redirect capability cannot process read commands on a replica by default * use READONLY command can allow read commands on a replica --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2024-06-27 19:00:45 +08:00
Ping Xie	4135894a5d	Update remaining `master` references to `primary` (#660 ) Signed-off-by: Ping Xie <pingxie@google.com>	2024-06-17 20:31:15 -07:00
Madelyn Olson	6faa48a358	Don't initialize the key buffer in getKeysResult (#631 ) getKeysResults is typically initialized with 2kb of zeros (16 * 256), which isn't strictly necessary since the only thing we have to initialize is some of the metadata fields. The rest of the data can remain junk as long as we don't access it. This was a bit of a regression in 7.0 with the keyspecs, since we doubled the size of the zeros, but hopefully this recovers a lot of the performance drop. I saw a modest performance bump for deep pipeline of cluster mode (~8%). I think we would see some comparable improvements in the other places where we are using it such as tracking and ACLs. --------- Signed-off-by: Madelyn Olson <matolson@amazon.com>	2024-06-14 08:42:00 -07:00
Ping Xie	54c9747935	Remove `master` and `slave` from source code (#591 ) External facing interfaces are not affected. --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-06-07 14:21:33 -07:00
Viktor Söderqvist	d72ba06dd0	Make cluster replicas return ASK and TRYAGAIN (#495 ) After READONLY, make a cluster replica behave as its primary regarding returning ASK redirects and TRYAGAIN. Without this patch, a client reading from a replica cannot tell if a key doesn't exist or if it has already been migrated to another shard as part of an ongoing slot migration. Therefore, without an ASK redirect in this situation, offloading reads to cluster replicas wasn't reliable. Note: The target of a redirect is always a primary. If a client wants to continue reading from a replica after following a redirect, it needs to figure out the replicas of that new primary using CLUSTER SHARDS or similar. This is related to #21 and has been made possible by the introduction of Replication of Slot Migration States in #445. ---- Release notes: During cluster slot migration, replicas are able to return -ASK redirects and -TRYAGAIN. --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-05-24 17:58:03 +02:00
Ping Xie	c41dd77a3e	Add clang-format configs (#323 ) I have validated that these settings closely match the existing coding style with one major exception on `BreakBeforeBraces`, which will be `Attach` going forward. The mixed `BreakBeforeBraces` styles in the current codebase are hard to imitate and also very odd IMHO - see below ``` if (a == 1) { /Attach / } ``` ``` if (a == 1 \|\| b == 2) { /* Why? */ } ``` Please do NOT merge just yet. Will add the github action next once the style is reviewed/approved. --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-05-22 23:24:12 -07:00
Roshan Khatri	c4782066e7	Cache CLUSTER SLOTS response for improving throughput and reduced latency. (#53 ) This commit adds a logic to cache `CLUSTER SLOTS` response for reduced latency and also updates the cache when a change in the cluster is detected. Historically, `CLUSTER SLOTS` command was deprecated, however all the server clients have been using `CLUSTER SLOTS` and have not migrated to `CLUSTER SHARDS`. In future this logic can be added to any other commands to improve the performance of the engine. --------- Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>	2024-05-22 14:21:41 -07:00
Madelyn Olson	546cef6684	Initial cleanup for cluster refactoring (#460 ) Cleaned up the minor cluster refactoring notes that were intended to be follow ups that never happened. Basically: 1. Minor style nitpicks 2. Generalized clusterNodeIsMyself so that it wasn't implementation dependent. 3. Removed getMyClusterId, and just make it an explicit call to myself's name, which seems more straightforward and removes unnecessary abstraction. 4. Remove clusterNodeGetSlaveof infavor of clusterNodeGetMaster. We already do a check if it's a replica, and if it wasn't working it would have been crashing. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-05-14 17:09:49 -07:00
Jacob Murphy	df5db0627f	Remove trademarked language in code comments (#223 ) This includes comments used for module API documentation. * Strategy for replacement: Regex search: `(//\|/\\| \\|#).* ("\|$)?(r\|R)edis( \|\. \|'\|\n\|,\|-\|$\|")(?!nor the names of its contributors)(?!Ltd.)(?!Labs)(?!Contributors.)` * Don't edit copyright comments * Replace "Redis version X.X" -> "Redis OSS version X.X" to distinguish from newly licensed repository * Replace "Redis Object" -> "Object" * Exclude markdown for now * Don't edit Lua scripting comments referring to redis.X API * Replace "Redis Protocol" -> "RESP" * Replace redis-benchmark, -cli, -server, -check-aof/rdb with "valkey-" prefix * Most other places, I use best judgement to either remove "Redis", or replace with "the server" or "server" Fixes #148 --------- Signed-off-by: Jacob Murphy <jkmurphy@google.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-04-09 10:24:03 +02:00
0del	b19ebaf551	Rename redisCommand to serverCommand (#174 ) Part of #144 --------- Signed-off-by: 0del <bany.y0599@gmail.com>	2024-04-03 18:54:33 +02:00
Harkrishn Patro	c120a45874	Sharded pubsub command execution within multi/exec (#13 ) Allow SPUBLISH command within multi/exec on replica.	2024-03-27 21:29:44 -07:00
Binbin	81666a6510	Fix heap-use-after-free when pubsubshard_channels became NULL (#13038 ) After fix for #13033, address sanitizer reports this heap-use-after-free error. When the pubsubshard_channels dict becomes empty, we will delete the dict, and the dictReleaseIterator will call dictResetIterator, it will use the dict so we will trigger the error. This PR introduced a new struct kvstoreDictIterator to wrap dictIterator. Replace the original dict iterator with the new kvstore dict iterator. --------- Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: guybe7 <guy.benoish@redislabs.com>	2024-02-07 14:53:50 +02:00
guybe7	8cd62f82ca	Refactor the per-slot dict-array db.c into a new kvstore data structure (#12822 ) # Description Gather most of the scattered `redisDb`-related code from the per-slot dict PR (#11695) and turn it to a new data structure, `kvstore`. i.e. it's a class that represents an array of dictionaries. # Motivation The main motivation is code cleanliness, the idea of using an array of dictionaries is very well-suited to becoming a self-contained data structure. This allowed cleaning some ugly code, among others: loops that run twice on the main dict and expires dict, and duplicate code for allocating and releasing this data structure. # Notes 1. This PR reverts the part of https://github.com/redis/redis/pull/12848 where the `rehashing` list is global (handling rehashing `dict`s is under the responsibility of `kvstore`, and should not be managed by the server) 2. This PR also replaces the type of `server.pubsubshard_channels` from `dict**` to `kvstore` (original PR: https://github.com/redis/redis/pull/12804). After that was done, server.pubsub_channels was also chosen to be a `kvstore` (with only one `dict`, which seems odd) just to make the code cleaner by making it the same type as `server.pubsubshard_channels`, see `pubsubtype.serverPubSubChannels` 3. the keys and expires kvstores are currenlty configured to allocate the individual dicts only when the first key is added (unlike before, in which they allocated them in advance), but they won't release them when the last key is deleted. Worth mentioning that due to the recent change the reply of DEBUG HTSTATS changed, in case no keys were ever added to the db. before: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ``` after: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] [Expires HT] ```	2024-02-05 17:21:35 +02:00
Yehoshua (Josh) Hershberg	ef1ca6c882	add some file level comments and copyright (#12793 ) A followup PR for #12742 Add some brief comments explaining the purpose of the file to the head of cluster_legacy.c and cluster.c. Add copyright notice to cluster.c Signed-off-by: Josh Hershberg <yehoshua@redis.com> Co-authored-by: Josh Hershberg <yehoshua@redis.com>	2023-11-22 11:32:23 +02:00
Josh Hershberg	290f376429	Cluster refactor: fn renames + small compilation issue on ubuntu Signed-off-by: Josh Hershberg <yehoshua@redis.com>	2023-11-22 05:54:06 +02:00
Josh Hershberg	c6157b3510	Cluster refactor: Make clustering functions common Move primary functions used to implement datapath clustering into cluster.c, making them shared. This required adding "accessor" and other functions to abstract access to node details and cluster state. Signed-off-by: Josh Hershberg <yehoshua@redis.com>	2023-11-22 05:54:06 +02:00
Josh Hershberg	4afc54ad9b	Cluster refactor: break up clusterCommand Divide up clusterCommand into clusterCommand for shared sub-commands and clusterCommandSpecial for implementation specific sub-commands. So to, the cluster command help sub-command has been divided into two implementations, clusterCommandHelp and clusterCommandHelpSpecial. Some common sub-subcommand implementations have been extracted and their implemenations either made shared or else implementation specific. Signed-off-by: Josh Hershberg <yehoshua@redis.com>	2023-11-22 05:54:06 +02:00
Josh Hershberg	ac1513221b	Cluster refactor: Move items from cluster_legacy.c to cluster.c Move (but do not change) some items from cluster_legacy.c back info cluster.c. These items are shared code that all clustering implementations will use. Signed-off-by: Josh Hershberg <yehoshua@redis.com>	2023-11-22 05:54:06 +02:00
Josh Hershberg	6a6ae6ffe8	Cluster refactor: Create new cluster.c and include of cluster_legacy.h create new cluster.c Signed-off-by: Josh Hershberg <yehoshua@redis.com> forgot to #include cluster_legacy.h Signed-off-by: Josh Hershberg <yehoshua@redis.com>	2023-11-21 12:49:14 +02:00
Josh Hershberg	86915775f1	Cluster refactor: rename cluster.c -> cluster_legacy.c Signed-off-by: Josh Hershberg <yehoshua@redis.com>	2023-11-21 12:49:14 +02:00
Viktor Söderqvist	8878817d89	Optimize SCAN with MATCH when pattern implies cluster slot (#12536 ) Optimize the performance of SCAN commands when a match pattern can only contain keys from a single slot in cluster mode. This can happen when the pattern contains a hash tag before any wildcard matchers or when the key contains no matchers.	2023-11-01 00:06:49 -07:00
Viktor Söderqvist	8d675950e6	Don't crash when adding a forgotten node to blacklist twice (#12702 ) Add a defensive checks to prevent double freeing a node from the cluster blacklist.	2023-10-31 07:20:06 -07:00
Binbin	372ea21875	Update comment around propagateDeletion (#12687 ) Fix some outdated comments and add comment for moduleNotifyKeyspaceEvent we added in #11084 since it seems a bit implicit. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2023-10-24 13:10:03 +03:00
Vitaly	0270abda82	Replace cluster metadata with slot specific dictionaries (#11695 ) This is an implementation of https://github.com/redis/redis/issues/10589 that eliminates 16 bytes per entry in cluster mode, that are currently used to create a linked list between entries in the same slot. Main idea is splitting main dictionary into 16k smaller dictionaries (one per slot), so we can perform all slot specific operations, such as iteration, without any additional info in the `dictEntry`. For Redis cluster, the expectation is that there will be a larger number of keys, so the fixed overhead of 16k dictionaries will be The expire dictionary is also split up so that each slot is logically decoupled, so that in subsequent revisions we will be able to atomically flush a slot of data. ## Important changes * Incremental rehashing - one big change here is that it's not one, but rather up to 16k dictionaries that can be rehashing at the same time, in order to keep track of them, we introduce a separate queue for dictionaries that are rehashing. Also instead of rehashing a single dictionary, cron job will now try to rehash as many as it can in 1ms. * getRandomKey - now needs to not only select a random key, from the random bucket, but also needs to select a random dictionary. Fairness is a major concern here, as it's possible that keys can be unevenly distributed across the slots. In order to address this search we introduced binary index tree). With that data structure we are able to efficiently find a random slot using binary search in O(log^2(slot count)) time. * Iteration efficiency - when iterating dictionary with a lot of empty slots, we want to skip them efficiently. We can do this using same binary index that is used for random key selection, this index allows us to find a slot for a specific key index. For example if there are 10 keys in the slot 0, then we can quickly find a slot that contains 11th key using binary search on top of the binary index tree. * scan API - in order to perform a scan across the entire DB, the cursor now needs to not only save position within the dictionary but also the slot id. In this change we append slot id into LSB of the cursor so it can be passed around between client and the server. This has interesting side effect, now you'll be able to start scanning specific slot by simply providing slot id as a cursor value. The plan is to not document this as defined behavior, however. It's also worth nothing the SCAN API is now technically incompatible with previous versions, although practically we don't believe it's an issue. * Checksum calculation optimizations - During command execution, we know that all of the keys are from the same slot (outside of a few notable exceptions such as cross slot scripts and modules). We don't want to compute the checksum multiple multiple times, hence we are relying on cached slot id in the client during the command executions. All operations that access random keys, either should pass in the known slot or recompute the slot. * Slot info in RDB - in order to resize individual dictionaries correctly, while loading RDB, it's not enough to know total number of keys (of course we could approximate number of keys per slot, but it won't be precise). To address this issue, we've added additional metadata into RDB that contains number of keys in each slot, which can be used as a hint during loading. * DB size - besides `DBSIZE` API, we need to know size of the DB in many places want, in order to avoid scanning all dictionaries and summing up their sizes in a loop, we've introduced a new field into `redisDb` that keeps track of `key_count`. This way we can keep DBSIZE operation O(1). This is also kept for O(1) expires computation as well. ## Performance This change improves SET performance in cluster mode by ~5%, most of the gains come from us not having to maintain linked lists for keys in slot, non-cluster mode has same performance. For workloads that rely on evictions, the performance is similar because of the extra overhead for finding keys to evict. RDB loading performance is slightly reduced, as the slot of each key needs to be computed during the load. ## Interface changes * Removed `overhead.hashtable.slot-to-keys` to `MEMORY STATS` * Scan API will now require 64 bits to store the cursor, even on 32 bit systems, as the slot information will be stored. * New RDB version to support the new op code for SLOT information. --------- Co-authored-by: Vitaly Arbuzov <arvit@amazon.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Roshan Khatri <rvkhatri@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2023-10-14 23:58:26 -07:00
Harkrishn Patro	b784c5375e	Unsubscribe all clients from replica for shard channel if the master ownership changes (#12577 ) Unsubscribe all clients from replica for shard channel if the master ownership changes	2023-10-12 20:48:27 -07:00
Binbin	e5ef161374	Fix crash when running rebalance command in a mixed cluster of 7.0 and 7.2 (#12604 ) In #10536, we introduced the assert, some older versions of servers (like 7.0) doesn't gossip shard_id, so we will not add the node to cluster->shards, and node->shard_id is filled in randomly and may not be found here. It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow like #12538. In this PR, we remove the assert and replace it with an unconditional removal.	2023-10-11 22:15:25 -07:00
Madelyn Olson	9d31768cbb	Fix a couple of tabs that caused misindentation (#12541 ) Fixed some usages of tabs which caused weird indentation in the code. Tried to find all of the places so their was one PR. I ignored all of the usages of tabs which don't really affect readability.	2023-10-02 16:44:09 -07:00
Sankar	8cdeddc81c	Clear owner_not_claiming_slot bit for the slot in clusterDelSlot (#12564 ) Clear owner_not_claiming_slot bit for the slot in clusterDelSlot to keep it consistent with slot ownership information.	2023-09-26 14:03:27 -07:00
Chen Tianjie	2aad03fa39	Use server.current_client to decide whether cluster commands should return TLS info. (#12569 ) Starting a change in #12233 (released in 7.2), CLUSTER commands use client's connection to decide whether to return TLS port or non-TLS port, but commands called by Lua script and module's RM_Call don't have a real client with connection, and would currently be regarded as non-TLS connections. We can use server.current_client instead when it is available. When it is not (module calls commands without a real client), we may see this as an undefined behavior, and return null or default port (currently in this PR it returns default port, judged by server.tls_cluster).	2023-09-21 18:41:32 +03:00
Binbin	4031a18732	Fix that slot return in CLUSTER SHARDS should be integer (#12561 ) An unintentional change was introduced in #10536, we used to use addReplyLongLong and now it is addReplyBulkLonglong, revert it back the previous behavior.	2023-09-09 23:33:00 -07:00
secwall	a2046c1eb1	Check shard_id pointer validity in updateShardId (#12538 ) When connecting between a 7.0 and 7.2 cluster, the 7.0 cluster will not populate the shard_id field, which is expect on the 7.2 cluster. This is not intended behavior, as the 7.2 cluster is supposed to use a temporary shard_id while the node is in the upgrading state, but it wasn't being correctly set in this case.	2023-09-02 20:14:48 -07:00
Binbin	f4549d1cf4	Fix CLUSTER REPLICAS time complexity, should be O(N) (#12477 ) We iterate over all replicas to get the result, the time complexity should be O(N), like CLUSTER NODES complexity is O(N).	2023-08-14 20:57:55 -07:00
Chen Tianjie	91011100ba	Hide the comma after cport when there is no hostname. (#12411 ) According to the format shown in https://redis.io/commands/cluster-nodes/ ``` <ip:port@cport[,hostname[,auxiliary_field=value]*]> ``` when there is no hostname, and the auxiliary fields are hidden, the cluster topology should be ``` <ip:port@cport> ``` However in the code we always print the hostname even when it is an empty string, leaving an unnecessary comma tailing after cport, which is weird and conflicts with the doc. ``` 94ca2f6cf85228a49fde7b738ee1209de7bee325 127.0.0.1:6379@16379, myself,master - 0 0 0 connected 0-16383 ```	2023-07-15 20:31:42 -07:00
Binbin	14f802b360	Initialize cluster owner_not_claiming_slot to avoid warning (#12391 ) valgrind report a Uninitialised warning: ``` ==25508== Uninitialised value was created by a heap allocation ==25508== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==25508== by 0x1A35A1: ztrymalloc_usable_internal (zmalloc.c:117) ==25508== by 0x1A368D: zmalloc (zmalloc.c:145) ==25508== by 0x21FDEA: clusterInit (cluster.c:973) ==25508== by 0x19DC09: main (server.c:7306) ``` Introduced in #12344	2023-07-07 06:40:44 -07:00

1 2 3 4 5 ...

813 Commits