futriix

Author	SHA1	Message	Date
ranshid	ba25b586d5	Introduce FORCE_DEFRAG compilation option to allow activedefrag run when allocator is not jemalloc (#1303 ) Introduce compile time option to force activedefrag to run even when jemalloc is not used as the allocator. This is in order to be able to run tests with defrag enabled while using memory instrumentation tools. fixes: https://github.com/valkey-io/valkey/issues/1241 --------- Signed-off-by: ranshid <ranshid@amazon.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-12-17 19:07:55 +02:00
ranshid	66ae8b7135	change the container image to ubuntu:plucky (#1359 ) Our fortify workflow is running on ubuntu lunar container that is EOL since [January 25, 2024(January 25, 2024](https://lists.ubuntu.com/archives/ubuntu-announce/2024-January/000298.html). This case cause the workflow to fail during update actions like: ``` apt-get update && apt-get install -y make gcc-13 update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-1[3](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:3) 100 make all-with-unit-tests CC=gcc OPT=-O3 SERVER_CFLAGS='-Werror -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3' shell: sh -e {0} Ign:1 http://security.ubuntu.com/ubuntu lunar-security InRelease Err:2 http://security.ubuntu.com/ubuntu lunar-security Release [4](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:4)04 Not Found [IP: 91.189.91.82 80] Ign:3 http://archive.ubuntu.com/ubuntu lunar InRelease Ign:4 http://archive.ubuntu.com/ubuntu lunar-updates InRelease Ign:[5](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:5) http://archive.ubuntu.com/ubuntu lunar-backports InRelease Err:[6](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:7) http://archive.ubuntu.com/ubuntu lunar Release 404 Not Found [IP: 185.125.190.81 80] Err:7 http://archive.ubuntu.com/ubuntu lunar-updates Release 404 Not Found [IP: 185.125.190.81 80] Err:8 http://archive.ubuntu.com/ubuntu lunar-backports Release 404 Not Found [IP: 185.125.190.81 80] Reading package lists... E: The repository 'http://security.ubuntu.com/ubuntu lunar-security Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu lunar Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu lunar-updates Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu lunar-backports Release' does not have a Release file. update-alternatives: error: alternative path /usr/bin/gcc-[13](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:14) doesn't exist Error: Process completed with exit code 2. ``` example: https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209 This pr uses the latest stable ubuntu image release [plucky](https://hub.docker.com/layers/library/ubuntu/plucky/images/sha256-dc4565c7636f006c26d54c988faae576465e825ea349fef6fd3af6bf5100e8b6?context=explore) Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2024-11-27 07:34:02 +02:00
Parth	c4920bca4a	Integrating fast_float to optionally replace strtod (#1260 ) Fast_float is a C++ header-only library to parse doubles using SIMD instructions. The purpose is to speed up sorted sets and other commands that use doubles. A single-file copy of fast_float is included in this repo. This introduces an optional dependency on a C++ compiler. The use of fast_float is enabled at compile time using the make variable `USE_FAST_FLOAT=yes`. It is disabled by default. Fixes #1069. --------- Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com> Signed-off-by: Parth <661497+parthpatel@users.noreply.github.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Roshan Swain <swainroshan001@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-11-25 10:01:43 +01:00
Seungmin Lee	f9d0b87622	Upgrade macos-12 to macos-13 in workflows (#1318 ) ### Problem GitHub Actions is starting the deprecation process for macOS 12. Deprecation will begin on 10/7/24 and the image will be fully unsupported by 12/3/24. For more details, see https://github.com/actions/runner-images/issues/10721 Signed-off-by: Seungmin Lee <sungming@amazon.com> Co-authored-by: Seungmin Lee <sungming@amazon.com>	2024-11-18 18:00:30 -08:00
Binbin	d3f3b9cc3a	Fix daily valgrind build with unit tests (#1309 ) This was introduced in #515. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-11-15 14:27:28 +08:00
skyfirelee	4a9864206f	Migrate quicklist unit test to new framework (#515 ) Migrate quicklist unit test to new unit test framework, and cleanup remaining references of SERVER_TEST, parent ticket #428. Closes #428. Signed-off-by: artikell <739609084@qq.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2024-11-14 10:37:44 +08:00
Binbin	9c20c84251	Set fail-fast to false in daily CI (#1162 ) Currently in our daily, if a job fails, it will cancel the other jobs in the same matrix, we want to avoid this so that all jobs in a matrix can eventually run to completion. Docs: jobs.<job_id>.strategy.fail-fast applies to the entire matrix. If jobs.<job_id>.strategy.fail-fast is set to true or its expression evaluates to true, GitHub will cancel all in-progress and queued jobs in the matrix if any job in the matrix fails. This property defaults to true. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-10-15 10:29:34 +08:00
Madelyn Olson	829aa7fe3c	Remove accurate from extra test tag (#935 ) Today if we attached the "run-extra-tests" tag it adds at least 20 minutes because the dump-fuzzer test runs with full accuracy. This fuzzer is useful, but probably only really needed for the daily, so removing it from the PRs. We still run the fuzzers, just not for as long. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-08-23 11:05:41 -07:00
uriyage	39f8bcb91b	Skip tracking clients OOM test when I/O threads are enabled (#764 ) Fix feedback loop in key eviction with tracking clients when using I/O threads. Current issue: Evicting keys while tracking clients or key space-notification exist creates a feedback loop when using I/O threads: While evicting keys we send tracking async writes to I/O threads, preventing immediate release of tracking clients' COB memory consumption. Before the I/O thread finishes its write, we recheck used_memory, which now includes the tracking clients' COB and thus continue to evict more keys. Fix: We will skip the test for now while IO threads are active. We may consider avoiding sending writes in `processPendingWrites` to I/O threads for tracking clients when we are out of memory. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-08-21 17:02:57 -07:00
Binbin	76ad8f7a76	Skip IPv6 tests when TCLSH version is < 8.6 (#910 ) In #786, we did skip it in the daily, but not for the others. When running ./runtest on MacOS, we will get the failure. ``` couldn't open socket: host is unreachable (nodename nor servname provided, or not known) ``` The reason is that TCL 8.5 doesn't support ipv6, so we skip tests tagged with ipv6. This also revert #786. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-08-15 15:11:38 +08:00
Yury-Fridlyand	bfdab65791	Fix CI concurrency (#849 ) Few CI improvements witch will reduce occupation CI queue and eliminate stale runs. 1. Kill CI jobs on PRs once PR branch gets a new push. This will prevent situation happened today - a huge job triggered twice in less than an hour and occupied all org (for all repositories) runners queue for the rest of the day (see pic). This completely blocked valkey-glide team. 2. Distribute nightly croned jobs on time to prevent them running together. Keep in mind, cron's TZ is UTC, so midnight tasks incur developers located in other timezones. This must be backported to all release branches (`valkey-x.y` and `x.y`) ![image](https://github.com/user-attachments/assets/923d8237-3cb7-42f5-80c8-5322b3f5187d) --------- Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>	2024-08-05 22:05:29 -07:00
Madelyn Olson	ccafbb750b	Added a flag to run additional tests for additional tests (#815 ) This PR allows running a subset of the daily tests with a PR by attaching the `run-extra-tests` flag. This is done by conditionally running the daily tests when the label is attached. (I will do that for this PR to demonstrate). One downside of this PR is that a lot of tests will forever show-up as "skipped" for most PRs, as long as that doesn't bother us it should be OK. Skipped tests don't take up any of our runner compute. Another note, if the label isn't attached on the first commit, the submitter will need to push something to get the tests to run again. There is a way to make it kick off tests during a label, but that added a bunch more complexity so just wanted to start with this. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-07-22 17:44:18 -07:00
Viktor Söderqvist	c1bbdc796d	Skip IPv6 tests on MacOS (daily) (#786 ) Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-07-15 13:38:15 +02:00
uriyage	bbfd041895	Async IO threads (#758 ) This PR is 1 of 3 PRs intended to achieve the goal of 1 million requests per second, as detailed by [dan touitou](https://github.com/touitou-dan) in https://github.com/valkey-io/valkey/issues/22. This PR modifies the IO threads to be fully asynchronous, which is a first and necessary step to allow more work offloading and better utilization of the IO threads. ### Current IO threads state: Valkey IO threads were introduced in Redis 6.0 to allow better utilization of multi-core machines. Before this, Redis was single-threaded and could only use one CPU core for network and command processing. The introduction of IO threads helps in offloading the IO operations to multiple threads. Current IO Threads flow: 1. Initialization: When Redis starts, it initializes a specified number of IO threads. These threads are in addition to the main thread, each thread starts with an empty list, the main thread will populate that list in each event-loop with pending-read-clients or pending-write-clients. 2. Read Phase: The main thread accepts incoming connections and reads requests from clients. The reading of requests are offloaded to IO threads. The main thread puts the clients ready-to-read in a list and set the global io_threads_op to IO_THREADS_OP_READ, the IO threads pick the clients up, perform the read operation and parse the first incoming command. 3. Command Processing: After reading the requests, command processing is still single-threaded and handled by the main thread. 4. Write Phase: Similar to the read phase, the write phase is also be offloaded to IO threads. The main thread prepares the response in the clients’ output buffer then the main thread puts the client in the list, and sets the global io_threads_op to the IO_THREADS_OP_WRITE. The IO threads then pick the clients up and perform the write operation to send the responses back to clients. 5. Synchronization: The main-thread communicate with the threads on how many jobs left per each thread with atomic counter. The main-thread doesn’t access the clients while being handled by the IO threads. Issues with current implementation: * Underutilized Cores: The current implementation of IO-threads leads to the underutilization of CPU cores. * The main thread remains responsible for a significant portion of IO-related tasks that could be offloaded to IO-threads. * When the main-thread is processing client’s commands, the IO threads are idle for a considerable amount of time. * Notably, the main thread's performance during the IO-related tasks is constrained by the speed of the slowest IO-thread. * Limited Offloading: Currently, Since the Main-threads waits synchronously for the IO threads, the Threads perform only read-parse, and write operations, with parsing done only for the first command. If the threads can do work asynchronously we may offload more work to the threads reducing the load from the main-thread. * TLS: Currently, we don't support IO threads with TLS (where offloading IO would be more beneficial) since TLS read/write operations are not thread-safe with the current implementation. ### Suggested change Non-blocking main thread - The main thread and IO threads will operate in parallel to maximize efficiency. The main thread will not be blocked by IO operations. It will continue to process commands independently of the IO thread's activities. Implementation details Inter-thread communication. * We use a static, lock-free ring buffer of fixed size (2048 jobs) for the main thread to send jobs and for the IO to receive them. If the ring buffer fills up, the main thread will handle the task itself, acting as back pressure (in case IO operations are more expensive than command processing). A static ring buffer is a better candidate than a dynamic job queue as it eliminates the need for allocation/freeing per job. * An IO job will be in the format: ` [void* function-call-back \| void data] `where data is either a client to read/write from and the function-ptr is the function to be called with the data for example readQueryFromClient using this format we can use it later to offload other types of works to the IO threads. The Ring buffer is one way from the main-thread to the IO thread, Upon read/write event the main thread will send a read/write job then in before sleep it will iterate over the pending read/write clients to checking for each client if the IO threads has already finished handling it. The IO thread signals it has finished handling a client read/write by toggling an atomic flag read_state / write_state on the client struct. Thread Safety As suggested in this solution, the IO threads are reading from and writing to the clients' buffers while the main thread may access those clients. We must ensure no race conditions or unsafe access occurs while keeping the Valkey code simple and lock free. Minimal Action in the IO Threads The main change is to limit the IO thread operations to the bare minimum. The IO thread will access only the client's struct and only the necessary fields in this struct. The IO threads will be responsible for the following: * Read Operation: The IO thread will only read and parse a single command. It will not update the server stats, handle read errors, or parsing errors. These tasks will be taken care of by the main thread. * Write Operation: The IO thread will only write the available data. It will not free the client's replies, handle write errors, or update the server statistics. To achieve this without code duplication, the read/write code has been refactored into smaller, independent components: * Functions that perform only the read/parse/write calls. * Functions that handle the read/parse/write results. This refactor accounts for the majority of the modifications in this PR. Client Struct Safe Access As we ensure that the IO threads access memory only within the client struct, we need to ensure thread safety only for the client's struct's shared fields. * Query Buffer * Command parsing - The main thread will not try to parse a command from the query buffer when a client is offloaded to the IO thread. * Client's memory checks in client-cron - The main thread will not access the client query buffer if it is offloaded and will handle the querybuf grow/shrink when the client is back. * CLIENT LIST command - The main thread will busy-wait for the IO thread to finish handling the client, falling back to the current behavior where the main thread waits for the IO thread to finish their processing. * Output Buffer * The IO thread will not change the client's bufpos and won't free the client's reply lists. These actions will be done by the main thread on the client's return from the IO thread. * bufpos / block→used: As the main thread may change the bufpos, the reply-block→used, or add/delete blocks to the reply list while the IO thread writes, we add two fields to the client struct: io_last_bufpos and io_last_reply_block. The IO thread will write until the io_last_bufpos, which was set by the main-thread before sending the client to the IO thread. If more data has been added to the cob in between, it will be written in the next write-job. In addition, the main thread will not trim or merge reply blocks while the client is offloaded. * Parsing Fields * Client's cmd, argc, argv, reqtype, etc., are set during parsing. * The main thread will indicate to the IO thread not to parse a cmd if the client is not reset. In this case, the IO thread will only read from the network and won't attempt to parse a new command. * The main thread won't access the c→cmd/c→argv in the CLIENT LIST command as stated before it will busy wait for the IO threads. * Client Flags * c→flags, which may be changed by the main thread in multiple places, won't be accessed by the IO thread. Instead, the main thread will set the c→io_flags with the information necessary for the IO thread to know the client's state. * Client Close * On freeClient, the main thread will busy wait for the IO thread to finish processing the client's read/write before proceeding to free the client. * Client's Memory Limits * The IO thread won't handle the qb/cob limits. In case a client crosses the qb limit, the IO thread will stop reading for it, letting the main thread know that the client crossed the limit. TLS TLS is currently not supported with IO threads for the following reasons: 1. Pending reads - If SSL has pending data that has already been read from the socket, there is a risk of not calling the read handler again. To handle this, a list is used to hold the pending clients. With IO threads, multiple threads can access the list concurrently. 2. Event loop modification - Currently, the TLS code registers/unregisters the file descriptor from the event loop depending on the read/write results. With IO threads, multiple threads can modify the event loop struct simultaneously. 3. The same client can be sent to 2 different threads concurrently (https://github.com/redis/redis/issues/12540). Those issues were handled in the current PR: 1. The IO thread only performs the read operation. The main thread will check for pending reads after the client returns from the IO thread and will be the only one to access the pending list. 2. The registering/unregistering of events will be similarly postponed and handled by the main thread only. 3. Each client is being sent to the same dedicated thread (c→id % num_of_threads). Sending Replies Immediately with IO threads. Currently, after processing a command, we add the client to the pending_writes_list. Only after processing all the clients do we send all the replies. Since the IO threads are now working asynchronously, we can send the reply immediately after processing the client’s requests, reducing the command latency. However, if we are using AOF=always, we must wait for the AOF buffer to be written, in which case we revert to the current behavior. IO threads dynamic adjustment Currently, we use an all-or-nothing approach when activating the IO threads. The current logic is as follows: if the number of pending write clients is greater than twice the number of threads (including the main thread), we enable all threads; otherwise, we enable none. For example, if 8 IO threads are defined, we enable all 8 threads if there are 16 pending clients; else, we enable none. It makes more sense to enable partial activation of the IO threads. If we have 10 pending clients, we will enable 5 threads, and so on. This approach allows for a more granular and efficient allocation of resources based on the current workload. In addition, the user will now be able to change the number of I/O threads at runtime. For example, when decreasing the number of threads from 4 to 2, threads 3 and 4 will be closed after flushing their job queues. Tests Currently, we run the io-threads tests with 4 IO threads (`443d80f168/.github/workflows/daily.yml (L353)`). This means that we will not activate the IO threads unless there are 8 (threads * 2) pending write clients per single loop, which is unlikely to happened in most of tests, meaning the IO threads are not currently being tested. To enforce the main thread to always offload work to the IO threads, regardless of the number of pending events, we add an events-per-io-thread configuration with a default value of 2. When set to 0, this configuration will force the main thread to always offload work to the IO threads. When we offload every single read/write operation to the IO threads, the IO-threads are running with 100% CPU when running multiple tests concurrently some tests fail as a result of larger than expected command latencies. To address this issue, we have to add some after or wait_for calls to some of the tests to ensure they pass with IO threads as well. Signed-off-by: Uri Yagelnik <uriy@amazon.com>	2024-07-08 20:01:39 -07:00
Samuel Adetunji	93123f97a0	Format yaml files (#615 ) Closes #533 --------- Signed-off-by: adetunjii <adetunjithomas1@outlook.com>	2024-06-14 13:40:06 -07:00
Ping Xie	2b97aa6171	Introduce `enable-debug-assert` to enable/disable debug asserts at runtime (#584 ) Introduce a new hidden server configuration, `enable-debug-assert`, which allows selectively enabling or disabling, at runtime, expensive or risky assertions used primarily for debugging and testing. Fix #569 --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-05-31 22:50:08 -07:00
Jonathan Wright	1c55f3ca5a	Replace centos 7 with alternative versions (#543 ) replace centos 7 with almalinux 8, add almalinux 9, centos stream 9, fedora stable, rawhide Fixes #527 --------- Signed-off-by: Jonathan Wright <jonathan@almalinux.org> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2024-05-24 16:08:51 -07:00
Siddhartha Sankar Mondal	005a018db6	Deprecate MacOS 11 build target (#524 ) Deprecate MacOS 11 build target. End of life June 2024. Fixes #523 --------- Signed-off-by: Siddhartha Mondal <siddharthmondal@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Roshan Khatri <117414976+roshkhatri@users.noreply.github.com>	2024-05-21 12:21:28 -07:00
Binbin	e7e5a104ec	Revert mmap_rnd bits back to default value in CI (#520 ) In 3f725b8, we introduced a change back in march to reduce the entropy of ASLR, because ASAN didn't support it. Now the vm.mmap_rnd_bits was reverted in actions/runner-images#9491 so can remove this changes. Closes #519. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-05-20 12:23:25 -07:00
Madelyn Olson	9b6232b501	Automatically notify the slack channel when tests fail (#509 ) Adds a job that will automatically run at the end of the daily, which will collect all the failed tests and send them to the developer slack. It will include a link to the job as well. Example job that ran on my private repo: https://github.com/madolson/valkey/actions/runs/9123245899/job/25085418567 Example notification: <img width="662" alt="image" src="https://github.com/valkey-io/valkey/assets/34459052/69127db4-e416-4321-bc06-eefcecab1130"> (Note: I removed the sassy text at the bottom from the PR) Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-05-16 23:51:33 -07:00
Madelyn Olson	1b3199e070	Fix unit test issues on address sanitizer and fortify (#437 ) This commit does four things: 1. On various images, the linker was not able to correctly load the flto optimizations from the archive generated for unit tests, and was throwing errors. I was able to solve this by updating the plugin for the fortify test, but was unable to reproduce it on the ASAN tests or find a solution. So I decided to go with a single solution for now, which was to just disable the linker optimizations for those tests. This shouldn't weaken the protections provided by ASAN. 2. The change to remove flto for some reason caused some odd inlining behavior in the intset test, that I wasn't really able to understand. The error was basically that we were doing a 4 byte write, starting at byte offset 8, for the first addition to listpack that was of size 10. Practically this has no effect, since I'm not aware of any allocator that would give us a 10 byte block as opposed to 12 (or more likely 16) bytes. The isn't the correct behavior, since an uninitialized listpack defaults to 16bit encoding, which should only be writing 2 bytes. I rabbit holed like 2 hours into this, and gave up and just ignored the warning on the file. 3. Now that address sanitizer was correctly running, it picked up two issues. A memory leak and uninitialized value, so those were easy to fix. 4. There is also a small change to the fortify to build the test up front instead of later, this is just to be consistent with other tests and has no functional change. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-05-05 22:00:08 -07:00
Björn Svensson	39d4b43d4b	Pin versions of Github Actions in CI (#221 ) Pin the Github Action dependencies to the hash according to secure software development best practices recommended by the Open Source Security Foundation (OpenSSF). When developing a CI workflow, it's common to version-pin dependencies (i.e. actions/checkout@v4). However, version tags are mutable, so a malicious attacker could overwrite a version tag to point to a malicious or vulnerable commit instead. Pinning workflow dependencies by hash ensures the dependency is immutable and its behavior is guaranteed. See https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies The `dependabot` supports updating a hash and the version comment so its update will continue to work as before. Links to used actions and theit tag/hash for review/validation: https://github.com/actions/checkout/tags (v4.1.2 was rolled back) https://github.com/github/codeql-action/tags https://github.com/maxim-lobanov/setup-xcode/tags https://github.com/cross-platform-actions/action/releases/tag/v0.22.0 https://github.com/py-actions/py-dependency-install/tags https://github.com/actions/upload-artifact/tags https://github.com/actions/setup-node/tags https://github.com/taiki-e/install-action/releases/tag/v2.32.2 This PR is part of #211. Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>	2024-05-04 01:54:14 +02:00
Madelyn Olson	5b1fd222ed	An initial simple unit test framework (#344 ) The core idea was to take a lot of the stuff from the C unity framework and adapt it a bit here. Each file in the `unit` directory that starts with `test_` is automatically assumed to be a test suite. Within each file, all functions that start with `test_` are assumed to be a test. See unit/README.md for details about the implementation. Instead of compiling basically a net new binary, the way the tests are compiled is that the main valkey server is compiled as a static archive, which we then compile the individual test files against to create a new test executable. This is not all that important now, other than it makes the compilation simpler, but what it will allow us to do is overwrite functions in the archive to enable mocking for cross compilation unit functions. There are also ways to enable mocking from within the same compilation unit, but I don't know how important this is. Tests are also written in one of two styles: 1. Including the header file and directly calling functions from the archive. 2. Importing the original file, and then calling the functions. This second approach is cool because we can call static functions. It won't mess up the archive either. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-05-02 20:00:04 -07:00
Björn Svensson	1c282a9306	Set permissions for Github Actions in CI (#312 ) This sets the default permission for current CI workflows to only be able to read from the repository (scope: "contents"). When a used Github Action require additional permissions (like CodeQL) we grant that permission on job-level instead. This means that a compromised action will not be able to modify the repo or even steal secrets since all other permission-scopes are implicit set to "none", i.e. not permitted. This is recommended by [OpenSSF](https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions). This PR includes a small fix for the possibility of missing server logs artifacts, found while verifying the permission. The `upload-artifact@v3` action will replace artifacts which already exists. Since both CI-jobs `test-external-standalone` and `test-external-nodebug` uses the same artifact name, when both jobs fail, we only get logs from the last finished job. This can be avoided by using unique artifact names. This PR is part of #211 More about permissions and scope can be found here: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions --------- Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>	2024-04-12 17:24:22 +02:00
Wen Hui	c0a83c0058	Fix CI centos issue (#150 ) Because centos do not support actions/checkout@v4, we need roll back to actions/checkout@v3 Please check the run result https://github.com/hwware/placeholderkv/actions/runs/8526052560/job/23354458574 It looks our CI make happy now Signed-off-by: hwware <wen.hui.ware@gmail.com>	2024-04-02 14:48:12 -04:00
ICHINOSE Shogo	0a51ceca88	bump actions/checkout v4 (#87 ) Node.js 16 actions are deprecated. To result them we are updating to actions/checkout@v4. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/. e.g. failure https://github.com/valkey-io/valkey/actions/runs/8482578610 --------- Signed-off-by: ICHINOSE Shogo <shogo82148@gmail.com>	2024-04-01 18:44:21 -07:00
Roshan Khatri	3630dd08a6	Restore all tests state prior to fork (#117 ) Related to https://github.com/valkey-io/valkey/pull/11#issuecomment-2028930612 Restore all tests state prior to fork and re-enables Daily tests on PRs on release branches. Reverts `2aa820f945` --------- Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>	2024-04-01 12:55:01 -04:00
Madelyn Olson	57789d4d08	Update naming to to Valkey (#62 ) Documentation references should use `Valkey` while server and cli references are all under `valkey`. --------- Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-03-28 09:58:28 -07:00
Roshan Khatri	340ab6d62d	Fixes external server tests and change other references (#14 )	2024-03-25 18:49:52 +01:00
Roshan Khatri	2aa820f945	Avoid daily test to run on PRs (#11 )	2024-03-22 12:36:08 -07:00
Madelyn Olson	9da3166e5c	Fix daily.yml change	2024-03-21 22:41:23 -07:00
Madelyn Olson	084ab10e17	Disabled some workflows for now	2024-03-21 19:50:02 -07:00
Madelyn Olson	2e9855fbd0	Disable some workflows since I don't want to burn through the free tier	2024-03-21 19:39:21 -07:00
Madelyn Olson	3f725b8619	Change mmap rand bits as a temporary mitigation to resolve asan bug (#13150 ) There is a new change in linux kernel 6.6.6 that uses randomization of address space to harden security, but it interferes with the way ASAN works. Folks are working on a fix, but this is a temporary mitigation for us to get our CI to be green again. See https://github.com/google/sanitizers/issues/1716 for more information See https://github.com/redis/redis/actions/runs/8305126288/job/22731614306?pr=13149 for a recent failure	2024-03-17 09:06:51 +02:00
dependabot[bot]	38f0234946	Bump cross-platform-actions/action from 0.21.1 to 0.22.0 (#12904 ) Bumps [cross-platform-actions/action](https://github.com/cross-platform-actions/action) from 0.21.1 to 0.22.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/cross-platform-actions/action/releases">cross-platform-actions/action's releases</a>.</em></p> <blockquote> <h2>Cross Platform Action 0.22.0</h2> <h3>Added</h3> <ul> <li> <p>Added support for using the action in multiple steps in the same job (<a href="https://redirect.github.com/cross-platform-actions/action/issues/26">#26</a>). All the inputs need to be the same for all steps, except for the following inputs: <code>sync_files</code>, <code>shutdown_vm</code> and <code>run</code>.</p> </li> <li> <p>Added support for specifying that the VM should not shutdown after the action has run. This adds a new input parameter: <code>shutdown_vm</code>. When set to <code>false</code>, this will hopefully mitigate very frequent freezing of VM during teardown (<a href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>, <a href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p> </li> </ul> <h3>Changed</h3> <ul> <li> <p>Always terminate VM instead of shutting down. This is more efficient and this will hopefully mitigate very frequent freezing of VM during teardown (<a href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>, <a href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p> </li> <li> <p>Use <code>unsafe</code> as the cache mode for QEMU disks. This should improve performance (<a href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>).</p> </li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/cross-platform-actions/action/blob/master/changelog.md">cross-platform-actions/action's changelog</a>.</em></p> <blockquote> <h2>[0.22.0] - 2023-12-27</h2> <h3>Added</h3> <ul> <li> <p>Added support for using the action in multiple steps in the same job (<a href="https://redirect.github.com/cross-platform-actions/action/issues/26">#26</a>). All the inputs need to be the same for all steps, except for the following inputs: <code>sync_files</code>, <code>shutdown_vm</code> and <code>run</code>.</p> </li> <li> <p>Added support for specifying that the VM should not shutdown after the action has run. This adds a new input parameter: <code>shutdown_vm</code>. When set to <code>false</code>, this will hopefully mitigate very frequent freezing of VM during teardown (<a href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>, <a href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p> </li> </ul> <h3>Changed</h3> <ul> <li> <p>Always terminate VM instead of shutting down. This is more efficient and this will hopefully mitigate very frequent freezing of VM during teardown (<a href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>, <a href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p> </li> <li> <p>Use <code>unsafe</code> as the cache mode for QEMU disks. This should improve performance (<a href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>).</p> </li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`5800fa0060`"><code>5800fa0</code></a> Release 0.22.0</li> <li><a href="`20ad4b2ceb`"><code>20ad4b2</code></a> Fix <a href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>: Use <code>unsafe</code> as the cache mode disks</li> <li><a href="`d9184930c3`"><code>d918493</code></a> Always terminate VM instead of shutting down.</li> <li><a href="`626f1d6c95`"><code>626f1d6</code></a> Fix error when terminating the VM</li> <li><a href="`d59f08dc5c`"><code>d59f08d</code></a> Print stack trace for uncaught exceptions</li> <li><a href="`7f2fab9c56`"><code>7f2fab9</code></a> Revert "Run SSH in verbose mode when debug mode is enabled"</li> <li><a href="`0f566c356e`"><code>0f566c3</code></a> [no ci] Update the changelog</li> <li><a href="`b7f77446bb`"><code>b7f7744</code></a> [no ci] Fix spelling</li> <li><a href="`9894a9b118`"><code>9894a9b</code></a> Wrap <code>host</code> module in namespace</li> <li><a href="`87fdd346a2`"><code>87fdd34</code></a> Fix broken test-vm-shutdown tests</li> <li>Additional commits viewable in <a href="https://github.com/cross-platform-actions/action/compare/v0.21.1...v0.22.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cross-platform-actions/action&package-manager=github_actions&previous-version=0.21.1&new-version=0.22.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-01-04 22:38:33 +02:00
sundb	91309f7981	Fix compilation warning in KeySpace_ServerEventCallback and add CFLAGS=-Werror flag for module CI (#12786 ) Warning: ``` postnotifications.c:216:77: warning: format specifies type 'long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat] RedisModule_Log(ctx, "warning", "Got an unexpected subevent '%ld'", subevent); ~~~ ^~~~~~~~ %llu ``` CI: https://github.com/redis/redis/actions/runs/6937308713/job/18871124342#step:6:115 ## Other Add `CFLAGS=-Werror` flag for module CI. --------- Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2023-11-30 17:41:00 +02:00
Yossi Gottlieb	a9e73c00bc	Reduce FreeBSD daily scope. (#12758 ) The full test is very flaky running on a VM inside GitHub worker, so we have to settle for only building and running a small smoke test.	2023-11-13 17:22:09 +02:00
Roshan Khatri	88e83e517b	Add DEBUG_ASSERTIONS option to custom assert (#12667 ) This PR introduces a new macro, serverAssertWithInfoDebug, to do complex assertions only for debugging. The main intention is to allow running complex operations during tests without impacting runtime performance. This assertion is enabled when setting DEBUG_ASSERTIONS. The DEBUG_ASSERTIONS flag is set for the daily and CI variants of `test-sanitizer-address`.	2023-11-11 20:31:34 -08:00
Yossi Gottlieb	6223355cf3	Use cross-platform-actions for FreeBSD support. (#12732 ) This change overcomes many stability issues experienced with the vmactions action. We need to limit VMs to 8GB for better stability, as the 13GB default seems to hang them occasionally. Shell code has been simplified since this action seem to use `bash -e` which will abort on non-zero exit codes anyway.	2023-11-06 18:07:14 +02:00
sundb	3c734b8e9d	Add new compilation CI for macos-11 and macos-13 (#12666 ) As discussed in #12611 Add a build CI for macox 11 and 13 to avoid compatibility breakage introduced by future macos sdk versions.	2023-10-18 13:25:52 +03:00
dependabot[bot]	1634a0f271	Bump vmactions/freebsd-vm from 0.3.0 to 0.3.1 (#12352 ) Bumps [vmactions/freebsd-vm](https://github.com/vmactions/freebsd-vm) from 0.3.0 to 0.3.1. - [Release notes](https://github.com/vmactions/freebsd-vm/releases) - [Commits](https://github.com/vmactions/freebsd-vm/compare/v0.3.0...v0.3.1) --- updated-dependencies: - dependency-name: vmactions/freebsd-vm dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-27 09:17:34 +03:00
sundb	42c8c61813	Fix some compile warnings and errors when building with gcc-12 or clang (#12035 ) This PR is to fix the compilation warnings and errors generated by the latest complier toolchain, and to add a new runner of the latest toolchain for daily CI. ## Fix various compilation warnings and errors 1) jemalloc.c COMPILER: clang-14 with FORTIFY_SOURCE WARNING: ``` src/jemalloc.c:1028:7: warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation] "/etc/malloc.conf", ^ src/jemalloc.c:1027:3: note: place parentheses around the string literal to silence warning "\"name\" of the file referenced by the symbolic link named " ^ ``` REASON: the compiler to alert developers to potential issues with string concatenation that may miss a comma, just like #9534 which misses a comma. SOLUTION: use `()` to tell the compiler that these two line strings are continuous. 2) config.h COMPILER: clang-14 with FORTIFY_SOURCE WARNING: ``` In file included from quicklist.c:36: ./config.h:319:76: warning: attribute declaration must precede definition [-Wignored-attributes] char strcat(char restrict dest, const char restrict src) __attribute__((deprecated("please avoid use of unsafe C functions. prefer use of redis_strlcat instead"))); ``` REASON: Enabling _FORTIFY_SOURCE will cause the compiler to use `strcpy()` with check, it results in a deprecated attribute declaration after including <features.h>. SOLUTION: move the deprecated attribute declaration from config.h to fmacro.h before "#include <features.h>". 3) networking.c COMPILER: GCC-12 WARNING: ``` networking.c: In function ‘addReplyDouble.part.0’: networking.c:876:21: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 876 \| dbuf[start] = '$'; \| ^ networking.c:868:14: note: at offset -5 into destination object ‘dbuf’ of size 5152 868 \| char dbuf[MAX_LONG_DOUBLE_CHARS+32]; \| ^ networking.c:876:21: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 876 \| dbuf[start] = '$'; \| ^ networking.c:868:14: note: at offset -6 into destination object ‘dbuf’ of size 5152 868 \| char dbuf[MAX_LONG_DOUBLE_CHARS+32]; ``` REASON: GCC-12 predicts that digits10() may return 9 or 10 through `return 9 + (v >= 1000000000UL)`. SOLUTION: add an assert to let the compiler know the possible length; 4) redis-cli.c & redis-benchmark.c COMPILER: clang-14 with FORTIFY_SOURCE WARNING: ``` redis-benchmark.c:1621:2: warning: embedding a directive within macro arguments has undefined behavior [-Wembedded-directive] #ifdef USE_OPENSSL redis-cli.c:3015:2: warning: embedding a directive within macro arguments has undefined behavior [-Wembedded-directive] #ifdef USE_OPENSSL ``` REASON: when _FORTIFY_SOURCE is enabled, the compiler will use the print() with check, which is a macro. this may result in the use of directives within the macro, which is undefined behavior. SOLUTION: move the directives-related code out of `print()`. 5) server.c COMPILER: gcc-13 with FORTIFY_SOURCE WARNING: ``` In function 'lookupCommandLogic', inlined from 'lookupCommandBySdsLogic' at server.c:3139:32: server.c:3102:66: error: '(robj *)argv' may be used uninitialized [-Werror=maybe-uninitialized] 3102 \| struct redisCommand base_cmd = dictFetchValue(commands, argv[0]->ptr); \| ~~~~^~~ ``` REASON: The compiler thinks that the `argc` returned by `sdssplitlen()` could be 0, resulting in an empty array of size 0 being passed to lookupCommandLogic. this should be a false positive, `argc` can't be 0 when strings are not NULL. SOLUTION: add an assert to let the compiler know that `argc` is positive. 6) sha1.c COMPILER: gcc-12 WARNING: ``` In function ‘SHA1Update’, inlined from ‘SHA1Final’ at sha1.c:195:5: sha1.c:152:13: warning: ‘SHA1Transform’ reading 64 bytes from a region of size 0 [-Wstringop-overread] 152 \| SHA1Transform(context->state, &data[i]); \| ^ sha1.c:152:13: note: referencing argument 2 of type ‘const unsigned char[64]’ sha1.c: In function ‘SHA1Final’: sha1.c:56:6: note: in a call to function ‘SHA1Transform’ 56 \| void SHA1Transform(uint32_t state[5], const unsigned char buffer[64]) \| ^ In function ‘SHA1Update’, inlined from ‘SHA1Final’ at sha1.c:198:9: sha1.c:152:13: warning: ‘SHA1Transform’ reading 64 bytes from a region of size 0 [-Wstringop-overread] 152 \| SHA1Transform(context->state, &data[i]); \| ^ sha1.c:152:13: note: referencing argument 2 of type ‘const unsigned char[64]’ sha1.c: In function ‘SHA1Final’: sha1.c:56:6: note: in a call to function ‘SHA1Transform’ 56 \| void SHA1Transform(uint32_t state[5], const unsigned char buffer[64]) ``` REASON: due to the bug[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80922], when enable LTO, gcc-12 will not see `diagnostic ignored "-Wstringop-overread"`, resulting in a warning. SOLUTION: temporarily set SHA1Update to noinline to avoid compiler warnings due to LTO being enabled until the above gcc bug is fixed. 7) zmalloc.h COMPILER: GCC-12 WARNING: ``` In function ‘memset’, inlined from ‘moduleCreateContext’ at module.c:877:5, inlined from ‘RM_GetDetachedThreadSafeContext’ at module.c:8410:5: /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59:10: warning: ‘__builtin_memset’ writing 104 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=] 59 \| return __builtin___memset_chk (__dest, __ch, __len, ``` REASON: due to the GCC-12 bug [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503], GCC-12 cannot see alloc_size, which causes GCC to think that the actual size of memory is 0 when checking with __glibc_objsize0(). SOLUTION: temporarily set malloc-related interfaces to `noinline` to avoid compiler warnings due to LTO being enabled until the above gcc bug is fixed. ## Other changes 1) Fixed `ps -p [pid]` doesn't output `<defunct>` when using procps 4.x causing `replication child dies when parent is killed - diskless` test to fail. 2) Add a new fortify CI with GCC-13 and ubuntu-lunar docker image.	2023-04-18 09:53:51 +03:00
Binbin	810ea67b5b	Don't pass --fail-commands-not-all-hit to validator if we don't run the full testsuite (#12023 ) In daily.yml, if the input suggests we don't run the full testsuite, do not pass --fail-commands-not-all-hit to the validator. This fixes the first point in #11954. Credit goes to the comment on the open issue for GH actions: actions/runner#409 Also improve prints to show the dispatch arguments in every job.	2023-04-12 12:23:50 +03:00
Oran Agra	997fa41e99	Attempt to solve MacOS CI issues in GH Actions (#12013 ) The MacOS CI in github actions often hangs without any logs. GH argues that it's due to resource utilization, either running out of disk space, memory, or CPU starvation, and thus the runner is terminated. This PR contains multiple attempts to resolve this: 1. introducing pause_process instead of SIGSTOP, which waits for the process to stop before resuming the test, possibly resolving race conditions in some tests, this was a suspect since there was one test that could result in an infinite loop in that case, in practice this didn't help, but still a good idea to keep. 2. disable the `save` config in many tests that don't need it, specifically ones that use heavy writes and could create large files. 3. change the `populate` proc to use short pipeline rather than an infinite one. 4. use `--clients 1` in the macos CI so that we don't risk running multiple resource demanding tests in parallel. 5. enable `--verbose` to be repeated to elevate verbosity and print more info to stdout when a test or a server starts.	2023-04-12 09:19:21 +03:00
Oran Agra	f263b6daf3	Increase threshold for flaky cache reclaim test (#12004 ) This test produces 1GB of data and moves it around, and was expecting less than 500kb to be present in the system page cache. It sometimes fails with up to some 6mb in the page cache (0 in the actual RDB files), increasing the threshold. It looks like some background tasks in the container are occupying the page cache. It is safe to ignore the above since we also explicitly check the pages of our dump.rdb are not cached (matching `vmtouch -v` to `0%`). An additional fix is to match ` 0%` (add space), so that we don't successfully match `10%`. details in https://github.com/redis/redis/pull/11818	2023-04-05 14:45:42 +03:00
Oran Agra	9e15b42fda	ignore latency errors in the schema validation CI (#11958 ) these latency threshold errors prevent the schema validation from running.	2023-03-23 10:49:09 +02:00
guybe7	4ba47d2d21	Add reply_schema to command json files (internal for now) (#10273 ) Work in progress towards implementing a reply schema as part of COMMAND DOCS, see #9845 Since ironing the details of the reply schema of each and every command can take a long time, we would like to merge this PR when the infrastructure is ready, and let this mature in the unstable branch. Meanwhile the changes of this PR are internal, they are part of the repo, but do not affect the produced build. ### Background In #9656 we add a lot of information about Redis commands, but we are missing information about the replies ### Motivation 1. Documentation. This is the primary goal. 2. It should be possible, based on the output of COMMAND, to be able to generate client code in typed languages. In order to do that, we need Redis to tell us, in detail, what each reply looks like. 3. We would like to build a fuzzer that verifies the reply structure (for now we use the existing testsuite, see the "Testing" section) ### Schema The idea is to supply some sort of schema for the various replies of each command. The schema will describe the conceptual structure of the reply (for generated clients), as defined in RESP3. Note that the reply structure itself may change, depending on the arguments (e.g. `XINFO STREAM`, with and without the `FULL` modifier) We decided to use the standard json-schema (see https://json-schema.org/) as the reply-schema. Example for `BZPOPMIN`: ``` "reply_schema": { "oneOf": [ { "description": "Timeout reached and no elements were popped.", "type": "null" }, { "description": "The keyname, popped member, and its score.", "type": "array", "minItems": 3, "maxItems": 3, "items": [ { "description": "Keyname", "type": "string" }, { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } ] } ``` #### Notes 1. It is ok that some commands' reply structure depends on the arguments and it's the caller's responsibility to know which is the relevant one. this comes after looking at other request-reply systems like OpenAPI, where the reply schema can also be oneOf and the caller is responsible to know which schema is the relevant one. 2. The reply schemas will describe RESP3 replies only. even though RESP3 is structured, we want to use reply schema for documentation (and possibly to create a fuzzer that validates the replies) 3. For documentation, the description field will include an explanation of the scenario in which the reply is sent, including any relation to arguments. for example, for `ZRANGE`'s two schemas we will need to state that one is with `WITHSCORES` and the other is without. 4. For documentation, there will be another optional field "notes" in which we will add a short description of the representation in RESP2, in case it's not trivial (RESP3's `ZRANGE`'s nested array vs. RESP2's flat array, for example) Given the above: 1. We can generate the "return" section of all commands in [redis-doc](https://redis.io/commands/) (given that "description" and "notes" are comprehensive enough) 2. We can generate a client in a strongly typed language (but the return type could be a conceptual `union` and the caller needs to know which schema is relevant). see the section below for RESP2 support. 3. We can create a fuzzer for RESP3. ### Limitations (because we are using the standard json-schema) The problem is that Redis' replies are more diverse than what the json format allows. This means that, when we convert the reply to a json (in order to validate the schema against it), we lose information (see the "Testing" section below). The other option would have been to extend the standard json-schema (and json format) to include stuff like sets, bulk-strings, error-string, etc. but that would mean also extending the schema-validator - and that seemed like too much work, so we decided to compromise. Examples: 1. We cannot tell the difference between an "array" and a "set" 2. We cannot tell the difference between simple-string and bulk-string 3. we cannot verify true uniqueness of items in commands like ZRANGE: json-schema doesn't cover the case of two identical members with different scores (e.g. `[["m1",6],["m1",7]]`) because `uniqueItems` compares (member,score) tuples and not just the member name. ### Testing This commit includes some changes inside Redis in order to verify the schemas (existing and future ones) are indeed correct (i.e. describe the actual response of Redis). To do that, we added a debugging feature to Redis that causes it to produce a log of all the commands it executed and their replies. For that, Redis needs to be compiled with `-DLOG_REQ_RES` and run with `--reg-res-logfile <file> --client-default-resp 3` (the testsuite already does that if you run it with `--log-req-res --force-resp3`) You should run the testsuite with the above args (and `--dont-clean`) in order to make Redis generate `.reqres` files (same dir as the `stdout` files) which contain request-response pairs. These files are later on processed by `./utils/req-res-log-validator.py` which does: 1. Goes over req-res files, generated by redis-servers, spawned by the testsuite (see logreqres.c) 2. For each request-response pair, it validates the response against the request's reply_schema (obtained from the extended COMMAND DOCS) 5. In order to get good coverage of the Redis commands, and all their different replies, we chose to use the existing redis test suite, rather than attempt to write a fuzzer. #### Notes about RESP2 1. We will not be able to use the testing tool to verify RESP2 replies (we are ok with that, it's time to accept RESP3 as the future RESP) 2. Since the majority of the test suite is using RESP2, and we want the server to reply with RESP3 so that we can validate it, we will need to know how to convert the actual reply to the one expected. - number and boolean are always strings in RESP2 so the conversion is easy - objects (maps) are always a flat array in RESP2 - others (nested array in RESP3's `ZRANGE` and others) will need some special per-command handling (so the client will not be totally auto-generated) Example for ZRANGE: ``` "reply_schema": { "anyOf": [ { "description": "A list of member elements", "type": "array", "uniqueItems": true, "items": { "type": "string" } }, { "description": "Members and their scores. Returned in case `WITHSCORES` was used.", "notes": "In RESP2 this is returned as a flat array", "type": "array", "uniqueItems": true, "items": { "type": "array", "minItems": 2, "maxItems": 2, "items": [ { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } } ] } ``` ### Other changes 1. Some tests that behave differently depending on the RESP are now being tested for both RESP, regardless of the special log-req-res mode ("Pub/Sub PING" for example) 2. Update the history field of CLIENT LIST 3. Added basic tests for commands that were not covered at all by the testsuite ### TODO - [x] (maybe a different PR) add a "condition" field to anyOf/oneOf schemas that refers to args. e.g. when `SET` return NULL, the condition is `arguments.get\|\|arguments.condition`, for `OK` the condition is `!arguments.get`, and for `string` the condition is `arguments.get` - https://github.com/redis/redis/issues/11896 - [x] (maybe a different PR) also run `runtest-cluster` in the req-res logging mode - [x] add the new tests to GH actions (i.e. compile with `-DLOG_REQ_RES`, run the tests, and run the validator) - [x] (maybe a different PR) figure out a way to warn about (sub)schemas that are uncovered by the output of the tests - https://github.com/redis/redis/issues/11897 - [x] (probably a separate PR) add all missing schemas - [x] check why "SDOWN is triggered by misconfigured instance replying with errors" fails with --log-req-res - [x] move the response transformers to their own file (run both regular, cluster, and sentinel tests - need to fight with the tcl including mechanism a bit) - [x] issue: module API - https://github.com/redis/redis/issues/11898 - [x] (probably a separate PR): improve schemas: add `required` to `object`s - https://github.com/redis/redis/issues/11899 Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com> Co-authored-by: Hanna Fadida <hanna.fadida@redislabs.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Shaya Potter <shaya@redislabs.com>	2023-03-11 10:14:16 +02:00
Oran Agra	3ac835777c	Stablize page reclaim CI test (#11818 ) stabilize the test introduced in #11248 * remove random aspect of the test by using DEBUG POPULATE instead of redis-benchmark * disable rdbcompression, so that the rdb file is always about 1GB. when fadvise was disabled, i get about 1GB in the page cace when enabled i get less than 200KB so for now, i'll keep the 500kb threshold.	2023-02-19 18:38:07 +02:00
Oran Agra	5b61b0dc6d	skip new page cache reclame unit test when running in valgrind (#11808 ) the new test is incompatible with valgrind. added a new `--valgrind` argument to `redis-server tests` mode, which will cause that test to be skipped..	2023-02-16 10:50:58 +02:00
Tian	7dae142a2e	Reclaim page cache of RDB file (#11248 ) # Background The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service. Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation. # What the PR does The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way. Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false. # Something deserve noting 1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0. 2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache. # About test A unit test is added to verify the effect of `posix_fadvise`. In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.	2023-02-12 09:23:29 +02:00

1 2 3

111 Commits