futriix

Author	SHA1	Message	Date
uriyage	bbfd041895	Async IO threads (#758 ) This PR is 1 of 3 PRs intended to achieve the goal of 1 million requests per second, as detailed by [dan touitou](https://github.com/touitou-dan) in https://github.com/valkey-io/valkey/issues/22. This PR modifies the IO threads to be fully asynchronous, which is a first and necessary step to allow more work offloading and better utilization of the IO threads. ### Current IO threads state: Valkey IO threads were introduced in Redis 6.0 to allow better utilization of multi-core machines. Before this, Redis was single-threaded and could only use one CPU core for network and command processing. The introduction of IO threads helps in offloading the IO operations to multiple threads. Current IO Threads flow: 1. Initialization: When Redis starts, it initializes a specified number of IO threads. These threads are in addition to the main thread, each thread starts with an empty list, the main thread will populate that list in each event-loop with pending-read-clients or pending-write-clients. 2. Read Phase: The main thread accepts incoming connections and reads requests from clients. The reading of requests are offloaded to IO threads. The main thread puts the clients ready-to-read in a list and set the global io_threads_op to IO_THREADS_OP_READ, the IO threads pick the clients up, perform the read operation and parse the first incoming command. 3. Command Processing: After reading the requests, command processing is still single-threaded and handled by the main thread. 4. Write Phase: Similar to the read phase, the write phase is also be offloaded to IO threads. The main thread prepares the response in the clients’ output buffer then the main thread puts the client in the list, and sets the global io_threads_op to the IO_THREADS_OP_WRITE. The IO threads then pick the clients up and perform the write operation to send the responses back to clients. 5. Synchronization: The main-thread communicate with the threads on how many jobs left per each thread with atomic counter. The main-thread doesn’t access the clients while being handled by the IO threads. Issues with current implementation: * Underutilized Cores: The current implementation of IO-threads leads to the underutilization of CPU cores. * The main thread remains responsible for a significant portion of IO-related tasks that could be offloaded to IO-threads. * When the main-thread is processing client’s commands, the IO threads are idle for a considerable amount of time. * Notably, the main thread's performance during the IO-related tasks is constrained by the speed of the slowest IO-thread. * Limited Offloading: Currently, Since the Main-threads waits synchronously for the IO threads, the Threads perform only read-parse, and write operations, with parsing done only for the first command. If the threads can do work asynchronously we may offload more work to the threads reducing the load from the main-thread. * TLS: Currently, we don't support IO threads with TLS (where offloading IO would be more beneficial) since TLS read/write operations are not thread-safe with the current implementation. ### Suggested change Non-blocking main thread - The main thread and IO threads will operate in parallel to maximize efficiency. The main thread will not be blocked by IO operations. It will continue to process commands independently of the IO thread's activities. Implementation details Inter-thread communication. * We use a static, lock-free ring buffer of fixed size (2048 jobs) for the main thread to send jobs and for the IO to receive them. If the ring buffer fills up, the main thread will handle the task itself, acting as back pressure (in case IO operations are more expensive than command processing). A static ring buffer is a better candidate than a dynamic job queue as it eliminates the need for allocation/freeing per job. * An IO job will be in the format: ` [void* function-call-back \| void data] `where data is either a client to read/write from and the function-ptr is the function to be called with the data for example readQueryFromClient using this format we can use it later to offload other types of works to the IO threads. The Ring buffer is one way from the main-thread to the IO thread, Upon read/write event the main thread will send a read/write job then in before sleep it will iterate over the pending read/write clients to checking for each client if the IO threads has already finished handling it. The IO thread signals it has finished handling a client read/write by toggling an atomic flag read_state / write_state on the client struct. Thread Safety As suggested in this solution, the IO threads are reading from and writing to the clients' buffers while the main thread may access those clients. We must ensure no race conditions or unsafe access occurs while keeping the Valkey code simple and lock free. Minimal Action in the IO Threads The main change is to limit the IO thread operations to the bare minimum. The IO thread will access only the client's struct and only the necessary fields in this struct. The IO threads will be responsible for the following: * Read Operation: The IO thread will only read and parse a single command. It will not update the server stats, handle read errors, or parsing errors. These tasks will be taken care of by the main thread. * Write Operation: The IO thread will only write the available data. It will not free the client's replies, handle write errors, or update the server statistics. To achieve this without code duplication, the read/write code has been refactored into smaller, independent components: * Functions that perform only the read/parse/write calls. * Functions that handle the read/parse/write results. This refactor accounts for the majority of the modifications in this PR. Client Struct Safe Access As we ensure that the IO threads access memory only within the client struct, we need to ensure thread safety only for the client's struct's shared fields. * Query Buffer * Command parsing - The main thread will not try to parse a command from the query buffer when a client is offloaded to the IO thread. * Client's memory checks in client-cron - The main thread will not access the client query buffer if it is offloaded and will handle the querybuf grow/shrink when the client is back. * CLIENT LIST command - The main thread will busy-wait for the IO thread to finish handling the client, falling back to the current behavior where the main thread waits for the IO thread to finish their processing. * Output Buffer * The IO thread will not change the client's bufpos and won't free the client's reply lists. These actions will be done by the main thread on the client's return from the IO thread. * bufpos / block→used: As the main thread may change the bufpos, the reply-block→used, or add/delete blocks to the reply list while the IO thread writes, we add two fields to the client struct: io_last_bufpos and io_last_reply_block. The IO thread will write until the io_last_bufpos, which was set by the main-thread before sending the client to the IO thread. If more data has been added to the cob in between, it will be written in the next write-job. In addition, the main thread will not trim or merge reply blocks while the client is offloaded. * Parsing Fields * Client's cmd, argc, argv, reqtype, etc., are set during parsing. * The main thread will indicate to the IO thread not to parse a cmd if the client is not reset. In this case, the IO thread will only read from the network and won't attempt to parse a new command. * The main thread won't access the c→cmd/c→argv in the CLIENT LIST command as stated before it will busy wait for the IO threads. * Client Flags * c→flags, which may be changed by the main thread in multiple places, won't be accessed by the IO thread. Instead, the main thread will set the c→io_flags with the information necessary for the IO thread to know the client's state. * Client Close * On freeClient, the main thread will busy wait for the IO thread to finish processing the client's read/write before proceeding to free the client. * Client's Memory Limits * The IO thread won't handle the qb/cob limits. In case a client crosses the qb limit, the IO thread will stop reading for it, letting the main thread know that the client crossed the limit. TLS TLS is currently not supported with IO threads for the following reasons: 1. Pending reads - If SSL has pending data that has already been read from the socket, there is a risk of not calling the read handler again. To handle this, a list is used to hold the pending clients. With IO threads, multiple threads can access the list concurrently. 2. Event loop modification - Currently, the TLS code registers/unregisters the file descriptor from the event loop depending on the read/write results. With IO threads, multiple threads can modify the event loop struct simultaneously. 3. The same client can be sent to 2 different threads concurrently (https://github.com/redis/redis/issues/12540). Those issues were handled in the current PR: 1. The IO thread only performs the read operation. The main thread will check for pending reads after the client returns from the IO thread and will be the only one to access the pending list. 2. The registering/unregistering of events will be similarly postponed and handled by the main thread only. 3. Each client is being sent to the same dedicated thread (c→id % num_of_threads). Sending Replies Immediately with IO threads. Currently, after processing a command, we add the client to the pending_writes_list. Only after processing all the clients do we send all the replies. Since the IO threads are now working asynchronously, we can send the reply immediately after processing the client’s requests, reducing the command latency. However, if we are using AOF=always, we must wait for the AOF buffer to be written, in which case we revert to the current behavior. IO threads dynamic adjustment Currently, we use an all-or-nothing approach when activating the IO threads. The current logic is as follows: if the number of pending write clients is greater than twice the number of threads (including the main thread), we enable all threads; otherwise, we enable none. For example, if 8 IO threads are defined, we enable all 8 threads if there are 16 pending clients; else, we enable none. It makes more sense to enable partial activation of the IO threads. If we have 10 pending clients, we will enable 5 threads, and so on. This approach allows for a more granular and efficient allocation of resources based on the current workload. In addition, the user will now be able to change the number of I/O threads at runtime. For example, when decreasing the number of threads from 4 to 2, threads 3 and 4 will be closed after flushing their job queues. Tests Currently, we run the io-threads tests with 4 IO threads (`443d80f168/.github/workflows/daily.yml (L353)`). This means that we will not activate the IO threads unless there are 8 (threads * 2) pending write clients per single loop, which is unlikely to happened in most of tests, meaning the IO threads are not currently being tested. To enforce the main thread to always offload work to the IO threads, regardless of the number of pending events, we add an events-per-io-thread configuration with a default value of 2. When set to 0, this configuration will force the main thread to always offload work to the IO threads. When we offload every single read/write operation to the IO threads, the IO-threads are running with 100% CPU when running multiple tests concurrently some tests fail as a result of larger than expected command latencies. To address this issue, we have to add some after or wait_for calls to some of the tests to ensure they pass with IO threads as well. Signed-off-by: Uri Yagelnik <uriy@amazon.com>	2024-07-08 20:01:39 -07:00
Lipeng Zhu	3323e422ad	Introduce thread-local storage variable to update thread's own used_memory and sum when reading to reduce atomic contention. (#674 ) #### Description This patch try to introduce a thread-local storage variable for each thread to update its own `used_memory`, and then sum them together when reading in `zmalloc_used_memory`. Then we can reduce unnecessary `lock add` contention from atomic variable. We also add a protection if too many threads created and the total threads number greater than 132, then fall back to atomic operation for the threads index >= 132. #### Problem Statement `zmalloc` and `zfree` related functions will update the `used_memory` atomicity for each operation, and they are called very frequency. From the benchmark of [memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml) , the cycles ratio of `zmalloc` and `zfree` are high, they are wrappers for the lower allocator library, it should not take too much cycles. And most of the cycles are contributed by `lock add` and `lock sub` , they are expensive instructions. From the profiling, the metrics' update mainly come from the main thread, use a TLS will reduce a lot of contention. #### Performance Boost Note: This optimization should benefit common benchmark widely. I choose below 2 scenarios to validate the performance boost in my local environment. \| Test Suites \| Performance Boost \| \|-\|-\| \|[memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml)\|8%\| \|[memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10.yml)\|4%\| ##### Test Env - OS: Ubuntu 22.04.4 LTS - Platform: Intel Xeon Platinum 8380 - Server and Client in same socket ##### Start Server ```sh taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey_1.conf port 9001 bind * -::* daemonize yes protected-mode no save "" ``` --------- Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com> Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>	2024-07-01 21:52:43 -07:00
skyfirelee	e4c1f6d45a	Replace client flags to bitfield (#614 )	2024-06-30 11:33:10 -07:00
zhaozhao.zz	4fbe31ab87	Fix the TLS and REPS issues about CLUSTER SLOTS cache (#581 ) PR #53 introduced the cache of CLUSTER SLOTS response, but the cache has some problems for different types of clients: 1. the RESP version is wrongly ignored: ``` $./valkey-cli 127.0.0.1:6379> cluster slots 1) 1) (integer) 0 2) (integer) 16383 3) 1) "" 2) (integer) 6379 3) "f1aeceb352401ce57acd432c68c60b359c00ef85" 4) (empty array) 127.0.0.1:6379> hello 3 1# "server" => "valkey" 2# "version" => "255.255.255" 3# "proto" => (integer) 3 4# "id" => (integer) 3 5# "mode" => "cluster" 6# "role" => "master" 7# "modules" => (empty array) 127.0.0.1:6379> cluster slots 1) 1) (integer) 0 2) (integer) 16383 3) 1) "" 2) (integer) 6379 3) "f1aeceb352401ce57acd432c68c60b359c00ef85" 4) (empty array) ``` RESP3 should get "empty hash" but get RESP2's "empty array" 3. we should use the original client's connect type, or lua/function and module would get wrong port: ``` $./valkey-cli --tls --insecure -p 6789 127.0.0.1:6789> config get port tls-port 1) "tls-port" 2) "6789" 3) "port" 4) "6379" 127.0.0.1:6789> cluster slots 1) 1) (integer) 0 2) (integer) 16383 3) 1) "" 2) (integer) 6789 3) "f1aeceb352401ce57acd432c68c60b359c00ef85" 4) (empty array) 127.0.0.1:6789> eval "return redis.call('cluster','slots')" 0 1) 1) (integer) 0 2) (integer) 16383 3) 1) "" 2) (integer) 6379 3) "f1aeceb352401ce57acd432c68c60b359c00ef85" 4) (empty array) ``` --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2024-06-28 14:56:13 +08:00
zhaozhao.zz	28c5a17edf	replica redirect read&write to primary in standalone mode (#325 ) To implement #319 1. replica is able to redirect read and write commands to it's primary in standalone mode * reply with "-REDIRECT primary-ip:port" 2. add a subcommand `CLIENT CAPA redirect`, a client can announce the capability to handle redirection * if a client can handle redirection, the data access commands (read and write) will be redirected 3. allow `readonly` and `readwrite` command in standalone mode, may be a breaking change * a client with redirect capability cannot process read commands on a replica by default * use READONLY command can allow read commands on a replica --------- Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>	2024-06-27 19:00:45 +08:00
Lipeng Zhu	4d3d6c06a1	Reduce redundant call of prepareClientToWrite when call addReply* continuously. (#670 ) ## Description While exploring hotspots with profiling some benchmark workloads, we noticed the high cycles ratio of `prepareClientToWrite`, taking about 9% of the CPU of `smembers`, `lrange` commands. After deep dive the code logic, we thought we can gain the performance by reducing the redundant call of `prepareClientToWrite` when call addReply* continuously. For example: In https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L1080-L1082, `prepareClientToWrite` is called three times in a row. --------- Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com> Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>	2024-06-24 18:33:30 -07:00
Ping Xie	4135894a5d	Update remaining `master` references to `primary` (#660 ) Signed-off-by: Ping Xie <pingxie@google.com>	2024-06-17 20:31:15 -07:00
Viktor Söderqvist	4bb7cc471a	Remove unnecessary clang-format off annotations (#628 ) We added some clang-format off comments before we had decided on the format configuration. Now, it turns out that turning formatting off is often not necessary. --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-06-12 12:52:18 +02:00
skyfirelee	09b5825b26	Moving client->authenticated to a flag instead of an int (#592 ) Moving client->authenticated to a flag Fix #589 Signed-off-by: artikell <739609084@qq.com>	2024-06-09 11:49:05 -07:00
Ping Xie	54c9747935	Remove `master` and `slave` from source code (#591 ) External facing interfaces are not affected. --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-06-07 14:21:33 -07:00
Viktor Söderqvist	ad5fd5b95c	More rebranding (#606 ) More rebranding of * Log messages (#252) * The DENIED error reply * Internal function names and comments, mainly Lua API --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-06-07 01:40:55 +02:00
Ping Xie	f927565d28	Consolidate various BLOCKED_WAIT* states (#562 ) There are currently three block types: BLOCKED_WAIT, BLOCKED_WAITAOF, and BLOCKED_WAIT_PREREPL, used to block clients executing `WAIT`, `WAITAOF`, and `CLUSTER SETSLOT`, respectively. They share the same workflow: the client is blocked until replication to the expected number of replicas completes. However, they provide different responses depending on the commands involved. Using distinct block types leads to code duplication and reduced readability. This PR consolidates the three types into a single WAIT type, differentiating them using the pending command to ensure the appropriate response is returned. Fix #427 --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-05-30 23:45:47 -07:00
uriyage	fd58b73f0a	Introduce shared query buffer for client reads (#258 ) This PR optimizes client query buffer handling in Valkey by introducing a shared query buffer that is used by default for client reads. This reduces memory usage by ~20KB per client by avoiding allocations for most clients using short (<16KB) complete commands. For larger or partial commands, the client still gets its own private buffer. The primary changes are: * Adding a shared query buffer `shared_qb` that clients use by default * Modifying client querybuf initialization and reset logic * Copying any partial query from shared to private buffer before command execution * Freeing idle client query buffers when empty to allow reuse of shared buffer * Master client query buffers are kept private as their contents need to be preserved for replication stream In addition to the memory savings, this change shows a 3% improvement in latency and throughput when running with 1000 active clients. The memory reduction may also help reduce the need to evict clients when reaching max memory limit, as the query buffer is the main memory consumer per client. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-05-28 11:09:37 -07:00
Ping Xie	84157890fd	Set up clang-format github action (#538 ) Setup clang-format GitHub action to ensure coding style consistency --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-05-28 09:27:51 -07:00
Viktor Söderqvist	045d475a94	Implement REPLCONF VERSION (#554 ) The replica sends its version when initiating replication, in pipeline with other REPLCONF commands. The primary stores it in the client struct. Other fields are made smaller to avoid making the client struct consume more memory. Fixes #414. --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-05-27 23:03:34 +02:00
Samuel Adetunji	5d0f4bc9f0	Require C11 atomics (#490 ) - Replaces custom atomics logic with C11 default atomics logic. - Drops "atomicvar_api" field from server info Closes #485 --------- Signed-off-by: adetunjii <adetunjithomas1@outlook.com> Signed-off-by: Samuel Adetunji <adetunjithomas1@outlook.com> Co-authored-by: teej4y <samuel.adetunji@prunny.com>	2024-05-26 18:41:11 +02:00
Ping Xie	c41dd77a3e	Add clang-format configs (#323 ) I have validated that these settings closely match the existing coding style with one major exception on `BreakBeforeBraces`, which will be `Attach` going forward. The mixed `BreakBeforeBraces` styles in the current codebase are hard to imitate and also very odd IMHO - see below ``` if (a == 1) { /Attach / } ``` ``` if (a == 1 \|\| b == 2) { /* Why? */ } ``` Please do NOT merge just yet. Will add the github action next once the style is reviewed/approved. --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-05-22 23:24:12 -07:00
Roshan Khatri	c4782066e7	Cache CLUSTER SLOTS response for improving throughput and reduced latency. (#53 ) This commit adds a logic to cache `CLUSTER SLOTS` response for reduced latency and also updates the cache when a change in the cluster is detected. Historically, `CLUSTER SLOTS` command was deprecated, however all the server clients have been using `CLUSTER SLOTS` and have not migrated to `CLUSTER SHARDS`. In future this logic can be added to any other commands to improve the performance of the engine. --------- Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>	2024-05-22 14:21:41 -07:00
Viktor Söderqvist	6af51f5092	Prevent clang-format in certain places (#468 ) This is a preparation for adding clang-format. These comments prevent automatic formatting in some places. With these exceptions, we will be able to run clang-format on the rest of the code. This is a preparation for #323. --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-05-08 20:58:53 +02:00
Ping Xie	6e7af9471c	Slot migration improvement (#445 )	2024-05-06 21:40:28 -07:00
Chen Tianjie	cc703aa3bc	Input output traffic stats and command process count for each client. (#327 ) We already have global stats for input traffic, output traffic and how many commands have been executed. However, some users have the difficulty of locating the IP(s) which have heavy network traffic. So here some stats for single client are introduced. ``` tot-net-in // Total network input bytes read from the client tot-net-out // Total network output bytes sent to the client tot-cmds // Total commands the client has executed ``` These three stats are shown in `CLIENT LIST` and `CLIENT INFO`. Though the metrics are handled in hot paths of the code, personally I don't think it will slow down the server. Considering all other complex operations handled nearby, this is only a small and simple operation. However we do need to be cautious when adding more and more metrics, as discussed in redis/redis#12640, we may need to find a way to tell whether this has obvious performance degradation. --------- Signed-off-by: Chen Tianjie <TJ_Chen@outlook.com>	2024-05-05 21:52:59 -07:00
Lipeng Zhu	393c8fde29	Rename macros in config.h (#257 ) This patch try to do following things: 1. Rename `redis_` and `REDIS_` macros defined in config.h to `valkey_`, `VALKEY_` and update associated used files. (`redis_fstat`, `redis_fsync`, `REDIS_THREAD_STACK_SIZE`, etc.) 2. Remove the leading double underscore for guard macro in config.h. --------- Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>	2024-04-23 14:20:35 +02:00
Viktor Söderqvist	9e2b7838ea	Add 'extended-redis-compatibility' config (#306 ) New config 'extended-redis-compatibility' (yes/no) default no * When yes: * Use "Redis" in the following error replies: - `-LOADING Redis is loading the dataset in memory` - `-BUSY Redis is busy`... - `-MISCONF Redis is configured to`... * Use `=== REDIS BUG REPORT` in the crash log delimiters (START and END). * The HELLO command returns `"server" => "redis"` and `"version" => "7.2.4"` (our Redis OSS compatibility version). * The INFO field for mode is called `"redis_mode"`. * When no: * Use "Valkey" instead of "Redis" in the mentioned errors and crash log delimiters. * The HELLO command returns `"server" => "valkey"` and the Valkey version for `"version"`. * The INFO field for mode is called `"server_mode"`. * Documentation added in valkey.conf: > Valkey is largely compatible with Redis OSS, apart from a few cases where > Redis OSS compatibility mode makes Valkey pretend to be Redis. Enable this > only if you have problems with tools or clients. This is a temporary > configuration added in Valkey 8.0 and is scheduled to have no effect in Valkey > 9.0 and be completely removed in Valkey 10.0. * A test case for the config is added. It is designed to fail if the config is not deprecated (has no effect) in Valkey 9 and deleted in Valkey 10. * Other test cases are adjusted to work regardless of this config. Fixes #274 Fixes #61 --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-04-18 14:10:24 +02:00
Vitah Lin	1221b3951a	Fix typo in comment (#318 ) Signed-off-by: Vitah Lin <vitahlin@gmail.com>	2024-04-15 13:45:00 +02:00
Jacob Murphy	df5db0627f	Remove trademarked language in code comments (#223 ) This includes comments used for module API documentation. * Strategy for replacement: Regex search: `(//\|/\\| \\|#).* ("\|$)?(r\|R)edis( \|\. \|'\|\n\|,\|-\|$\|")(?!nor the names of its contributors)(?!Ltd.)(?!Labs)(?!Contributors.)` * Don't edit copyright comments * Replace "Redis version X.X" -> "Redis OSS version X.X" to distinguish from newly licensed repository * Replace "Redis Object" -> "Object" * Exclude markdown for now * Don't edit Lua scripting comments referring to redis.X API * Replace "Redis Protocol" -> "RESP" * Replace redis-benchmark, -cli, -server, -check-aof/rdb with "valkey-" prefix * Most other places, I use best judgement to either remove "Redis", or replace with "the server" or "server" Fixes #148 --------- Signed-off-by: Jacob Murphy <jkmurphy@google.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-04-09 10:24:03 +02:00
Madelyn Olson	bc28fb4ac0	Update Server version to valkey version (#232 ) This commit updates the following fields: 1. server_version -> valkey_version in server info. Since we would like to advertise specific compatibility, we are making the version specific to valkey. servername will remain as an optional indicator, and other valkey compatible stores might choose to advertise something else. 1. We dropped redis-ver from the API. This isn't related to API compatibility, but we didn't want to "fake" that valkey was creating an rdb from a Redis version. 1. Renamed server-ver -> valkey_version in rdb info. Same as point one, we want to explicitly indicate this was created by a valkey server. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-04-05 21:15:57 -07:00
Ping Xie	aaec321213	Remove REDISMODULE_ prefixes and introduce compatibility header (#194 ) Fix #146 Removed REDISMODULE_ prefixes from the core source code to align with the new SERVERMODULE_ naming convention. Added a new 'redismodule.h' header file to ensure full backward compatibility with existing modules. This compatibility layer maps all legacy REDISMODULE_ prefixed identifiers to their new SERVERMODULE_ equivalents, allowing existing Redis modules to function without modification. --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-04-05 16:59:55 -07:00
Madelyn Olson	39d0f457a2	Update versioning fields for compatibility (#47 ) New info information to be used to determine the valkey versioning info. Internally, introduce new define values for "SERVER_VERSION" which is different from the Redis compatibility version, "REDIS_VERSION". Add two new info fields: `server_version`: The Valkey server version `server_name`: Indicates that the server is valkey. Add one new RDB field: `server_ver`, which indicates the valkey version that produced the server. Add 3 new LUA globals: `SERVER_VERSION_NUM`, `SERVER_VERSION`, and `SERVER_NAME`. Which reflect the valkey version instead of the Redis compatibility version. Also clean up various places where Redis and configuration was being used that is no longer necessary. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-04-03 14:52:36 -07:00
0del	8b3ab8f74f	Rename redisAtomic to serverAtomic (#180 ) https://github.com/valkey-io/valkey/issues/144 Signed-off-by: 0del <bany.y0599@gmail.com>	2024-04-03 20:29:33 +02:00
0del	f753db5141	rename redis functions in server.h (#179 ) redisPopcount -> serverPopcount redisSetProcTitle -> serverSetProcTitle redisCommunicateSystemd -> serverCommunicateSystemd redisSetCpuAffinity -> serverSetCpuAffinity redisFork -> serverFork #144 Signed-off-by: 0del <bany.y0599@gmail.com>	2024-04-03 20:26:33 +02:00
0del	b19ebaf551	Rename redisCommand to serverCommand (#174 ) Part of #144 --------- Signed-off-by: 0del <bany.y0599@gmail.com>	2024-04-03 18:54:33 +02:00
0del	a236fc8ef0	Rename redisCommandProc, redisGetKeysProc to server prefix (#173 ) Part of #144 Signed-off-by: 0del <bany.y0599@gmail.com>	2024-04-03 18:33:33 +02:00
Binbin	3c2ea1ea95	Fix wathced client test timing issue caused by late close (#13062 ) There is a timing issue in the test, close may arrive late, or in freeClientAsync we will free the client in async way, which will lead to errors in watching_clients statistics, since we will only unwatch all keys when we truly freeClient. Add a wait here to avoid this problem. Also fixed some outdated comments i saw. The test was introduced in #12966.	2024-02-20 11:12:19 +02:00
zhaozhao.zz	50d6fe8c4b	Add metrics for WATCH (#12966 ) Redis has some special commands that mark the client's state, such as `subscribe` and `blpop`, which mark the client as `CLIENT_PUBSUB` or `CLIENT_BLOCKED`, and we have metrics for the special use cases. However, there are also other special commands, like `WATCH`, which although do not have a specific flags, and should also be considered stateful client types. For stateful clients, in many scenarios, the connections cannot be shared in "connection pool", meaning connection pool cannot be used. For example, whenever the `WATCH` command is executed, a new connection is required to put the client into the "watch state" because the watched keys are stored in the client. If different business logic requires watching different keys, separate connections must be used; otherwise, there will be contamination. This also means that if a user's business heavily relies on the `WATCH` command, a large number of connections will be required. Recently we have encountered this situation in our platform, where some users consume a significant number of connections when using Redis because of `WATCH`. I hope we can have a way to observe these special use cases and special client connections. Here I add a few monitoring metrics: 1. `watching_clients` in `INFO` reply: The number of clients currently in the "watching" state. 2. `total_watched_keys` in `INFO` reply: The total number of keys being watched. 3. `watch` in `CLIENT LIST` reply: The number of keys each client is currently watching.	2024-02-18 10:36:41 +02:00
Slava Koyfman	24f6d08b3f	Implement `CLIENT KILL MAXAGE <maxage>` (#12299 ) Adds an ability to kill clients older than a specified age. Also, fixed the age calculation in `catClientInfoString` to use `commandTimeSnapshot` instead of the old `server.unixtime`, and added missing documentation for `CLIENT KILL ID` to output of `CLIENT help`. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2024-01-30 20:24:36 +02:00
Binbin	4cb5ad85a5	Fix unauthenticated client query buffer 1MB limit (#12989 ) Code incorrectly set the limit value to 1024MB. Introduced in #12961.	2024-01-25 14:56:21 +02:00
zhaozhao.zz	85a834bfa2	Revert multi OOM limit and add multi buffer limit (#12961 ) Fix #9926 , and introduce an alternative method to prevent abuse of transactions: 1. revert #5454 (which was blocking read-only transactions in OOM state), and break the tie of MULTI state memory usage and the server OOM state. Meaning that we'll limit the total memory a single client can queue, and do that unconditionally regardless of the server being OOM or not. 2. to prevent abuse of transactions, we use the `client-query-buffer-limit` to restrict the size of the transaction. Because the commands cached in the MULTI/EXEC queue have not been executed yet, so they are also considered a part of the "query buffer" in a broader sense. In other words, the commands in the MULTI queue and the `querybuf` of the client together constitute the "query buffer". When they exceed the limit, the connection will be disconnected. The reasoning is that it's sensible to sends a single command with a huge (1GB) argument, and it's sensible to sends a transaction with many small commands, but it's probably not common to sends a long transaction with many huge arguments (will consume a lot of memory before even being executed). If anyone runs into that, they can simply increase the `client-query-buffer-limit` config. P.S. To prevent DDoS attacks, unauthenticated clients have a separate hard limit. Their query buffer should not exceed a maximum of 1MB. In other words, if the query buffer of an unauthenticated client exceeds 1MB or the `client-query-buffer-limit` (if it is set to a value smaller than 1MB,), the connection will be disconnected.	2024-01-25 11:17:39 +02:00
debing.sun	d0640029dc	Fix race condition issues between the main thread and module threads (#12817 ) Fix #12785 and other race condition issues. See the following isolated comments. The following report was obtained using SANITIZER thread. ```sh make SANITIZER=thread ./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate ``` 1. Fixed thread-safe issue in RM_UnblockClient() Related discussion: https://github.com/redis/redis/pull/12817#issuecomment-1831181220 * When blocking a client in a module using `RM_BlockClientOnKeys()` or `RM_BlockClientOnKeysWithFlags()` with a timeout_callback, calling RM_UnblockClient() in module threads can lead to race conditions in `updateStatsOnUnblock()`. - Introduced: Version: 6.2 PR: #7491 - Touch: `server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Harm Level: High Potentially corrupts the memory data of `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Solution: Differentiate whether the call to moduleBlockedClientTimedOut() comes from the module or the main thread. Since we can't know if RM_UnblockClient() comes from module threads, we always assume it does and let `updateStatsOnUnblock()` asynchronously update the unblock status. * When error reply is called in timeout_callback(), ctx is not thread-safe, eventually lead to race conditions in `afterErrorReply`. - Introduced: Version: 6.2 PR: #8217 - Touch `server.stat_total_error_replies`, `server.errors`, - Harm Level: High Potentially corrupts the memory data of `server.errors` - Solution: Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`, and asynchronously reply errors to the client. 2. Made RM_Reply() family API thread-safe Related discussion: https://github.com/redis/redis/pull/12817#discussion_r1408707239 Call chain: `RM_Reply()` -> `_addReplyToBufferOrList()` -> touch server.current_client - Introduced: Version: 7.2.0 PR: #12326 - Harm Level: None Since the module fake client won't have the `CLIENT_PUSHING` flag, even if we touch server.current_client, we can still exit after `c->flags & CLIENT_PUSHING`. - Solution Checking `c->flags & CLIENT_PUSHING` earlier. 3. Made freeClient() thread-safe Fix #12785 - Introduced: Version: 4.0 Commit: `3fcf959e60` - Harm Level: Moderate * Trigger assertion It happens when the module thread calls freeClient while the io-thread is in progress, which just triggers an assertion, and doesn't make any race condiaions. * Touch `server.current_client`, `server.stat_clients_type_memory`, and `clientMemUsageBucket->clients`. It happens between the main thread and the module threads, may cause data corruption. 1. Error reset `server.current_client` to NULL, but theoretically this won't happen, because the module has already reset `server.current_client` to old value before entering freeClient. 2. corrupts `clientMemUsageBucket->clients` in updateClientMemUsageAndBucket(). 3. Causes server.stat_clients_type_memory memory statistics to be inaccurate. - Solution: * No longer counts memory usage on fake clients, to avoid updating `server.stat_clients_type_memory` in freeClient. * No longer resetting `server.current_client` in unlinkClient, because the fake client won't be evicted or disconnected in the mid of the process. * Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is not a fake client. 4. Fixed free client args without GIL Related discussion: https://github.com/redis/redis/pull/12817#discussion_r1408706695 When freeing retained strings in the module thread (refcount decr), or using them in some way (refcount incr), we should do so while holding the GIL, otherwise, they might be simultaneously freed while the main thread is processing the unblock client state. - Introduced: Version: 6.2.0 PR: #8141 - Harm Level: Low Trigger assertion or double free or memory leak. - Solution: Documenting that module API users need to ensure any access to these retained strings is done with the GIL locked 5. Fix adding fake client to server.clients_pending_write It will incorrectly log the memory usage for the fake client. Related discussion: https://github.com/redis/redis/pull/12817#issuecomment-1851899163 - Introduced: Version: 4.0 Commit: `9b01b64430` - Harm Level: None Only result in NOP - Solution: * Don't add fake client into server.clients_pending_write * Add c->conn assertion for updateClientMemUsageAndBucket() and updateClientMemoryUsage() to avoid same issue in the future. So now it will be the responsibility of the caller of both of them to avoid passing in fake client. 6. Fix calling RM_BlockedClientMeasureTimeStart() and RM_BlockedClientMeasureTimeEnd() without GIL - Introduced: Version: 6.2 PR: #7491 - Harm Level: Low Causes inaccuracies in command latency histogram and slow logs, but does not corrupt memory. - Solution: Module API users, if know that non-thread-safe APIs will be used in multi-threading, need to take responsibility for protecting them with their own locks instead of the GIL, as using the GIL is too expensive. ### Other issue 1. RM_Yield is not thread-safe, fixed via #12905. ### Summarize 1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and `RM_Yield`, potentially preventing memory corruption, data disorder, or assertion. 2. Updated docs and module test to clarify module API users' responsibility for locking non-thread-safe APIs in multi-threading, such as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(), RM_RetainString(), and RM_HoldString(). ### About backpot to 7.2 1. The implement of (1) is not too satisfying, would like to get more eyes. 2. (2), (3) can be safely for backport 3. (4), (6) just modifying the module tests and updating the documentation, no need for a backpot. 4. (5) is harmless, no need for a backpot. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2024-01-19 15:12:49 +02:00
Guillaume Koenig	967fb3c6e8	Extend rax usage by allowing any long long value (#12837 ) The raxFind implementation uses a special pointer value (the address of a static string) as the "not found" value. It works as long as actual pointers were used. However we've seen usages where long long, non-pointer values have been used. It creates a risk that one of the long long value precisely is the address of the special "not found" value. This commit changes raxFind to return 1 or 0 to indicate elementhood, and take in a new void **value to optionally return the associated value. By extension, this also allow the RedisModule_DictSet/Replace operations to also safely insert integers instead of just pointers.	2023-12-14 14:50:18 -08:00
Chen Tianjie	f9cc25c1dd	Add metric to INFO CLIENTS: pubsub_clients. (#12849 ) In INFO CLIENTS section, we already have blocked_clients and tracking_clients. We should add a new metric showing the number of pubsub connections, which helps performance monitoring and trouble shooting.	2023-12-13 13:44:13 +08:00
Josh Hershberg	d9a0478599	Cluster refactor: Make clusterNode private Move clusterNode into cluster_legacy.h. In order to achieve this some accessor methods were added and also a refactor of how debugCommand handles cluster related subcommands. Signed-off-by: Josh Hershberg <yehoshua@redis.com>	2023-11-22 05:50:46 +02:00
Chen Tianjie	e9f312e087	Change stat_client_qbuf_limit_disconnections to atomic. (#12711 ) In #12476 server.stat_client_qbuf_limit_disconnections was added. It is written in readQueryFromClient, which may be called by multiple threads when io-threads and io-threads-do-reads are turned on. Somehow we missed to make it an atomic variable.	2023-11-01 10:57:24 +08:00
Viktor Söderqvist	f924bebd83	Rewrite huge printf calls to smaller ones for readability (#12257 ) In a long printf call with many placeholders, it's hard to see which argument belongs to which placeholder. The long printf-like calls in the INFO and CLIENT commands are rewritten into pairs of (format, argument). These pairs are then rewritten to a single call with a long format string and a long list of arguments, using a macro called FMTARGS. The file `fmtargs.h` is added to the repo. Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>	2023-09-28 09:21:23 +03:00
Chen Tianjie	e3d4b30d09	Add two stats to count client input and output buffer oom. (#12476 ) Add these INFO metrics: * client_query_buffer_limit_disconnections * client_output_buffer_limit_disconnections Sometimes it is useful to monitor whether clients reaches size limit of query buffer and output buffer, to decide whether we need to adjust the buffer size limit or reduce client query payload.	2023-08-30 21:51:14 +03:00
zhaozhao.zz	01eb939a06	update monitor client's memory and evict correctly (#12420 ) A bug introduced in #11657 (7.2 RC1), causes client-eviction (#8687) and INFO to have inaccurate memory usage metrics of MONITOR clients. Because the type in `c->type` and the type in `getClientType()` are confusing (in the later, `CLIENT_TYPE_NORMAL` not `CLIENT_TYPE_SLAVE`), the comment we wrote in `updateClientMemUsageAndBucket` was wrong, and in fact that function didn't skip monitor clients. And since it doesn't skip monitor clients, it was wrong to delete the call for it from `replicationFeedMonitors` (it wasn't a NOP). That deletion could mean that the monitor client memory usage is not always up to date (updated less frequently, but still a candidate for client eviction).	2023-07-25 16:10:38 +03:00
Oran Agra	8ad8f0f9d8	Fix broken protocol when PUBLISH emits local push inside MULTI (#12326 ) When a connection that's subscribe to a channel emits PUBLISH inside MULTI-EXEC, the push notification messes up the EXEC response. e.g. MULTI, PING, PUSH foo bar, PING, EXEC the EXEC's response will contain: PONG, {message foo bar}, 1. and the second PONG will be delivered outside the EXEC's response. Additionally, this PR changes the order of responses in case of a plain PUBLISH (when the current client also subscribed to it), by delivering the push after the command's response instead of before it. This also affects modules calling RM_PublishMessage in a similar way, so that we don't run the risk of getting that push mixed together with the module command's response.	2023-06-20 20:41:41 +03:00
Binbin	b510624978	Optimize PSUBSCRIBE and PUNSUBSCRIBE from O(NM) to O(N) (#12298 ) In the original implementation, the time complexity of the commands is actually O(NM), where N is the number of patterns the client is already subscribed and M is the number of patterns to subscribe to. The docs are all wrong about this. Specifically, because the original client->pubsub_patterns is a list, so we need to do listSearchKey which is O(N). In this PR, we change it to a dict, so the search becomes O(1). At the same time, both pubsub_channels and pubsubshard_channels are dicts. Changing pubsub_patterns to a dictionary improves the readability and maintainability of the code.	2023-06-19 16:31:18 +03:00
Oran Agra	f228ec1ea5	flushSlavesOutputBuffers should not write to replicas scheduled to drop (#12242 ) This will increase the size of an already large COB (one already passed the threshold for disconnection) This could also mean that we'll attempt to write that data to the socket and the replica will manage to read it, which will result in an undesired partial sync (undesired for the test)	2023-06-12 14:05:34 +03:00
zhenwei pi	cb78acb865	Support maxiov per connection type (#12234 ) Rather than a fixed iovcnt for connWritev, support maxiov per connection type instead. A minor change to reduce memory for struct connection. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>	2023-05-28 08:35:27 +03:00
Madelyn Olson	5e3be1be09	Remove prototypes with empty declarations (#12020 ) Technically declaring a prototype with an empty declaration has been deprecated since the early days of C, but we never got a warning for it. C2x will apparently be introducing a breaking change if you are using this type of declarator, so Clang 15 has started issuing a warning with -pedantic. Although not apparently a problem for any of the compiler we build on, if feels like the right thing is to properly adhere to the C standard and use (void).	2023-05-02 17:31:32 -07:00

1 2 3 4 5 ...

728 Commits