111 Commits

Author SHA1 Message Date
ranshid
ba25b586d5
Introduce FORCE_DEFRAG compilation option to allow activedefrag run when allocator is not jemalloc (#1303)
Introduce compile time option to force activedefrag to run even when
jemalloc is not used as the allocator.
This is in order to be able to run tests with defrag enabled
while using memory instrumentation tools.

fixes: https://github.com/valkey-io/valkey/issues/1241

---------

Signed-off-by: ranshid <ranshid@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-17 19:07:55 +02:00
ranshid
66ae8b7135
change the container image to ubuntu:plucky (#1359)
Our fortify workflow is running on ubuntu lunar container that is EOL
since [January 25, 2024(January 25,
2024](https://lists.ubuntu.com/archives/ubuntu-announce/2024-January/000298.html).
This case cause the workflow to fail during update actions like:
```
apt-get update && apt-get install -y make gcc-13
  update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-1[3](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:3) 100
  make all-with-unit-tests CC=gcc OPT=-O3 SERVER_CFLAGS='-Werror -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3'
  shell: sh -e {0}
Ign:1 http://security.ubuntu.com/ubuntu lunar-security InRelease
Err:2 http://security.ubuntu.com/ubuntu lunar-security Release
  [4](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:4)04  Not Found [IP: 91.189.91.82 80]
Ign:3 http://archive.ubuntu.com/ubuntu lunar InRelease
Ign:4 http://archive.ubuntu.com/ubuntu lunar-updates InRelease
Ign:[5](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:5) http://archive.ubuntu.com/ubuntu lunar-backports InRelease
Err:[6](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:7) http://archive.ubuntu.com/ubuntu lunar Release
  404  Not Found [IP: 185.125.190.81 80]
Err:7 http://archive.ubuntu.com/ubuntu lunar-updates Release
  404  Not Found [IP: 185.125.190.81 80]
Err:8 http://archive.ubuntu.com/ubuntu lunar-backports Release
  404  Not Found [IP: 185.125.190.81 80]
Reading package lists...
E: The repository 'http://security.ubuntu.com/ubuntu lunar-security Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu lunar Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu lunar-updates Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu lunar-backports Release' does not have a Release file.
update-alternatives: error: alternative path /usr/bin/gcc-[13](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:14) doesn't exist
Error: Process completed with exit code 2.
```

example:
https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209

This pr uses the latest stable ubuntu image release
[plucky](https://hub.docker.com/layers/library/ubuntu/plucky/images/sha256-dc4565c7636f006c26d54c988faae576465e825ea349fef6fd3af6bf5100e8b6?context=explore)

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2024-11-27 07:34:02 +02:00
Parth
c4920bca4a
Integrating fast_float to optionally replace strtod (#1260)
Fast_float is a C++ header-only library to parse doubles using SIMD
instructions. The purpose is to speed up sorted sets and other commands
that use doubles. A single-file copy of fast_float is included in this
repo. This introduces an optional dependency on a C++ compiler.

The use of fast_float is enabled at compile time using the make variable
`USE_FAST_FLOAT=yes`. It is disabled by default.

Fixes #1069.

---------

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
Signed-off-by: Parth <661497+parthpatel@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Roshan Swain <swainroshan001@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-11-25 10:01:43 +01:00
Seungmin Lee
f9d0b87622
Upgrade macos-12 to macos-13 in workflows (#1318)
### Problem
GitHub Actions is starting the deprecation process for macOS 12.
Deprecation will begin on 10/7/24 and the image will be fully
unsupported by 12/3/24.
For more details, see
https://github.com/actions/runner-images/issues/10721

Signed-off-by: Seungmin Lee <sungming@amazon.com>
Co-authored-by: Seungmin Lee <sungming@amazon.com>
2024-11-18 18:00:30 -08:00
Binbin
d3f3b9cc3a
Fix daily valgrind build with unit tests (#1309)
This was introduced in #515.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-15 14:27:28 +08:00
skyfirelee
4a9864206f
Migrate quicklist unit test to new framework (#515)
Migrate quicklist unit test to new unit test framework, and cleanup
remaining references of SERVER_TEST, parent ticket #428.

Closes #428.

Signed-off-by: artikell <739609084@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-11-14 10:37:44 +08:00
Binbin
9c20c84251
Set fail-fast to false in daily CI (#1162)
Currently in our daily, if a job fails, it will cancel the other jobs
in the same matrix, we want to avoid this so that all jobs in a matrix
can eventually run to completion.

Docs: jobs.<job_id>.strategy.fail-fast applies to the entire matrix.
If jobs.<job_id>.strategy.fail-fast is set to true or its expression
evaluates to true, GitHub will cancel all in-progress and queued jobs
in the matrix if any job in the matrix fails. This property defaults
to true.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-15 10:29:34 +08:00
Madelyn Olson
829aa7fe3c
Remove accurate from extra test tag (#935)
Today if we attached the "run-extra-tests" tag it adds at least 20
minutes because the dump-fuzzer test runs with full accuracy. This
fuzzer is useful, but probably only really needed for the daily, so
removing it from the PRs. We still run the fuzzers, just not for as
long.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-23 11:05:41 -07:00
uriyage
39f8bcb91b
Skip tracking clients OOM test when I/O threads are enabled (#764)
Fix feedback loop in key eviction with tracking clients when using I/O
threads.

Current issue:
Evicting keys while tracking clients or key space-notification exist
creates a feedback loop when using I/O threads:

While evicting keys we send tracking async writes to I/O threads,
preventing immediate release of tracking clients' COB memory
consumption.

Before the I/O thread finishes its write, we recheck used_memory, which
now includes the tracking clients' COB and thus continue to evict more
keys.

**Fix:**
We will skip the test for now while IO threads are active. We may
consider avoiding sending writes in `processPendingWrites` to I/O
threads for tracking clients when we are out of memory.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-21 17:02:57 -07:00
Binbin
76ad8f7a76
Skip IPv6 tests when TCLSH version is < 8.6 (#910)
In #786, we did skip it in the daily, but not for the others.
When running ./runtest on MacOS, we will get the failure.
```
couldn't open socket: host is unreachable (nodename nor servname provided, or not known)
```

The reason is that TCL 8.5 doesn't support ipv6, so we skip tests
tagged with ipv6. This also revert #786.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-15 15:11:38 +08:00
Yury-Fridlyand
bfdab65791
Fix CI concurrency (#849)
Few CI improvements witch will reduce occupation CI queue and eliminate
stale runs.

1. Kill CI jobs on PRs once PR branch gets a new push. This will prevent
situation happened today - a huge job triggered twice in less than an
hour and occupied all **org** (for all repositories) runners queue for
the rest of the day (see pic). This completely blocked valkey-glide
team.
2. Distribute nightly croned jobs on time to prevent them running
together. Keep in mind, cron's TZ is UTC, so midnight tasks incur
developers located in other timezones.

This must be backported to all release branches (`valkey-x.y` and `x.y`)

![image](https://github.com/user-attachments/assets/923d8237-3cb7-42f5-80c8-5322b3f5187d)

---------

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
2024-08-05 22:05:29 -07:00
Madelyn Olson
ccafbb750b
Added a flag to run additional tests for additional tests (#815)
This PR allows running a subset of the daily tests with a PR by
attaching the `run-extra-tests` flag. This is done by conditionally
running the daily tests when the label is attached. (I will do that for
this PR to demonstrate).

One downside of this PR is that a lot of tests will forever show-up as
"skipped" for most PRs, as long as that doesn't bother us it should be
OK. Skipped tests don't take up any of our runner compute.

Another note, if the label isn't attached on the first commit, the
submitter will need to push something to get the tests to run again.
There is a way to make it kick off tests during a label, but that added
a bunch more complexity so just wanted to start with this.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-22 17:44:18 -07:00
Viktor Söderqvist
c1bbdc796d
Skip IPv6 tests on MacOS (daily) (#786)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-07-15 13:38:15 +02:00
uriyage
bbfd041895
Async IO threads (#758)
This PR is 1 of 3 PRs intended to achieve the goal of 1 million requests
per second, as detailed by [dan touitou](https://github.com/touitou-dan)
in https://github.com/valkey-io/valkey/issues/22. This PR modifies the
IO threads to be fully asynchronous, which is a first and necessary step
to allow more work offloading and better utilization of the IO threads.

### Current IO threads state:

Valkey IO threads were introduced in Redis 6.0 to allow better
utilization of multi-core machines. Before this, Redis was
single-threaded and could only use one CPU core for network and command
processing. The introduction of IO threads helps in offloading the IO
operations to multiple threads.

**Current IO Threads flow:**

1. Initialization: When Redis starts, it initializes a specified number
of IO threads. These threads are in addition to the main thread, each
thread starts with an empty list, the main thread will populate that
list in each event-loop with pending-read-clients or
pending-write-clients.
2. Read Phase: The main thread accepts incoming connections and reads
requests from clients. The reading of requests are offloaded to IO
threads. The main thread puts the clients ready-to-read in a list and
set the global io_threads_op to IO_THREADS_OP_READ, the IO threads pick
the clients up, perform the read operation and parse the first incoming
command.
3. Command Processing: After reading the requests, command processing is
still single-threaded and handled by the main thread.
4. Write Phase: Similar to the read phase, the write phase is also be
offloaded to IO threads. The main thread prepares the response in the
clients’ output buffer then the main thread puts the client in the list,
and sets the global io_threads_op to the IO_THREADS_OP_WRITE. The IO
threads then pick the clients up and perform the write operation to send
the responses back to clients.
5. Synchronization: The main-thread communicate with the threads on how
many jobs left per each thread with atomic counter. The main-thread
doesn’t access the clients while being handled by the IO threads.

**Issues with current implementation:**

* Underutilized Cores: The current implementation of IO-threads leads to
the underutilization of CPU cores.
* The main thread remains responsible for a significant portion of
IO-related tasks that could be offloaded to IO-threads.
* When the main-thread is processing client’s commands, the IO threads
are idle for a considerable amount of time.
* Notably, the main thread's performance during the IO-related tasks is
constrained by the speed of the slowest IO-thread.
* Limited Offloading: Currently, Since the Main-threads waits
synchronously for the IO threads, the Threads perform only read-parse,
and write operations, with parsing done only for the first command. If
the threads can do work asynchronously we may offload more work to the
threads reducing the load from the main-thread.
* TLS: Currently, we don't support IO threads with TLS (where offloading
IO would be more beneficial) since TLS read/write operations are not
thread-safe with the current implementation.

### Suggested change

Non-blocking main thread - The main thread and IO threads will operate
in parallel to maximize efficiency. The main thread will not be blocked
by IO operations. It will continue to process commands independently of
the IO thread's activities.

**Implementation details**

**Inter-thread communication.**

* We use a static, lock-free ring buffer of fixed size (2048 jobs) for
the main thread to send jobs and for the IO to receive them. If the ring
buffer fills up, the main thread will handle the task itself, acting as
back pressure (in case IO operations are more expensive than command
processing). A static ring buffer is a better candidate than a dynamic
job queue as it eliminates the need for allocation/freeing per job.
* An IO job will be in the format: ` [void* function-call-back | void
*data] `where data is either a client to read/write from and the
function-ptr is the function to be called with the data for example
readQueryFromClient using this format we can use it later to offload
other types of works to the IO threads.
* The Ring buffer is one way from the main-thread to the IO thread, Upon
read/write event the main thread will send a read/write job then in
before sleep it will iterate over the pending read/write clients to
checking for each client if the IO threads has already finished handling
it. The IO thread signals it has finished handling a client read/write
by toggling an atomic flag read_state / write_state on the client
struct.

**Thread Safety**

As suggested in this solution, the IO threads are reading from and
writing to the clients' buffers while the main thread may access those
clients.
We must ensure no race conditions or unsafe access occurs while keeping
the Valkey code simple and lock free.

Minimal Action in the IO Threads
The main change is to limit the IO thread operations to the bare
minimum. The IO thread will access only the client's struct and only the
necessary fields in this struct.
The IO threads will be responsible for the following:

* Read Operation: The IO thread will only read and parse a single
command. It will not update the server stats, handle read errors, or
parsing errors. These tasks will be taken care of by the main thread.
* Write Operation: The IO thread will only write the available data. It
will not free the client's replies, handle write errors, or update the
server statistics.


To achieve this without code duplication, the read/write code has been
refactored into smaller, independent components:

* Functions that perform only the read/parse/write calls.
* Functions that handle the read/parse/write results.

This refactor accounts for the majority of the modifications in this PR.

**Client Struct Safe Access**

As we ensure that the IO threads access memory only within the client
struct, we need to ensure thread safety only for the client's struct's
shared fields.

* Query Buffer 
* Command parsing - The main thread will not try to parse a command from
the query buffer when a client is offloaded to the IO thread.
* Client's memory checks in client-cron - The main thread will not
access the client query buffer if it is offloaded and will handle the
querybuf grow/shrink when the client is back.
* CLIENT LIST command - The main thread will busy-wait for the IO thread
to finish handling the client, falling back to the current behavior
where the main thread waits for the IO thread to finish their
processing.
* Output Buffer 
* The IO thread will not change the client's bufpos and won't free the
client's reply lists. These actions will be done by the main thread on
the client's return from the IO thread.
* bufpos / block→used: As the main thread may change the bufpos, the
reply-block→used, or add/delete blocks to the reply list while the IO
thread writes, we add two fields to the client struct: io_last_bufpos
and io_last_reply_block. The IO thread will write until the
io_last_bufpos, which was set by the main-thread before sending the
client to the IO thread. If more data has been added to the cob in
between, it will be written in the next write-job. In addition, the main
thread will not trim or merge reply blocks while the client is
offloaded.
* Parsing Fields 
    * Client's cmd, argc, argv, reqtype, etc., are set during parsing.
* The main thread will indicate to the IO thread not to parse a cmd if
the client is not reset. In this case, the IO thread will only read from
the network and won't attempt to parse a new command.
* The main thread won't access the c→cmd/c→argv in the CLIENT LIST
command as stated before it will busy wait for the IO threads.
* Client Flags 
* c→flags, which may be changed by the main thread in multiple places,
won't be accessed by the IO thread. Instead, the main thread will set
the c→io_flags with the information necessary for the IO thread to know
the client's state.
* Client Close 
* On freeClient, the main thread will busy wait for the IO thread to
finish processing the client's read/write before proceeding to free the
client.
* Client's Memory Limits 
* The IO thread won't handle the qb/cob limits. In case a client crosses
the qb limit, the IO thread will stop reading for it, letting the main
thread know that the client crossed the limit.

**TLS**

TLS is currently not supported with IO threads for the following
reasons:

1. Pending reads - If SSL has pending data that has already been read
from the socket, there is a risk of not calling the read handler again.
To handle this, a list is used to hold the pending clients. With IO
threads, multiple threads can access the list concurrently.
2. Event loop modification - Currently, the TLS code
registers/unregisters the file descriptor from the event loop depending
on the read/write results. With IO threads, multiple threads can modify
the event loop struct simultaneously.
3. The same client can be sent to 2 different threads concurrently
(https://github.com/redis/redis/issues/12540).

Those issues were handled in the current PR:

1. The IO thread only performs the read operation. The main thread will
check for pending reads after the client returns from the IO thread and
will be the only one to access the pending list.
2. The registering/unregistering of events will be similarly postponed
and handled by the main thread only.
3. Each client is being sent to the same dedicated thread (c→id %
num_of_threads).


**Sending Replies Immediately with IO threads.**

Currently, after processing a command, we add the client to the
pending_writes_list. Only after processing all the clients do we send
all the replies. Since the IO threads are now working asynchronously, we
can send the reply immediately after processing the client’s requests,
reducing the command latency. However, if we are using AOF=always, we
must wait for the AOF buffer to be written, in which case we revert to
the current behavior.

**IO threads dynamic adjustment**

Currently, we use an all-or-nothing approach when activating the IO
threads. The current logic is as follows: if the number of pending write
clients is greater than twice the number of threads (including the main
thread), we enable all threads; otherwise, we enable none. For example,
if 8 IO threads are defined, we enable all 8 threads if there are 16
pending clients; else, we enable none.
It makes more sense to enable partial activation of the IO threads. If
we have 10 pending clients, we will enable 5 threads, and so on. This
approach allows for a more granular and efficient allocation of
resources based on the current workload.

In addition, the user will now be able to change the number of I/O
threads at runtime. For example, when decreasing the number of threads
from 4 to 2, threads 3 and 4 will be closed after flushing their job
queues.

**Tests**

Currently, we run the io-threads tests with 4 IO threads
(443d80f168/.github/workflows/daily.yml (L353)).
This means that we will not activate the IO threads unless there are 8
(threads * 2) pending write clients per single loop, which is unlikely
to happened in most of tests, meaning the IO threads are not currently
being tested.

To enforce the main thread to always offload work to the IO threads,
regardless of the number of pending events, we add an
events-per-io-thread configuration with a default value of 2. When set
to 0, this configuration will force the main thread to always offload
work to the IO threads.

When we offload every single read/write operation to the IO threads, the
IO-threads are running with 100% CPU when running multiple tests
concurrently some tests fail as a result of larger than expected command
latencies. To address this issue, we have to add some after or wait_for
calls to some of the tests to ensure they pass with IO threads as well.

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-07-08 20:01:39 -07:00
Samuel Adetunji
93123f97a0
Format yaml files (#615)
Closes #533

---------

Signed-off-by: adetunjii <adetunjithomas1@outlook.com>
2024-06-14 13:40:06 -07:00
Ping Xie
2b97aa6171
Introduce enable-debug-assert to enable/disable debug asserts at runtime (#584)
Introduce a new hidden server configuration, `enable-debug-assert`, which
allows selectively enabling or disabling, at runtime, expensive or risky
assertions used primarily for debugging and testing.

Fix #569

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-31 22:50:08 -07:00
Jonathan Wright
1c55f3ca5a
Replace centos 7 with alternative versions (#543)
replace centos 7 with almalinux 8, add almalinux 9, centos stream 9, fedora stable, rawhide

Fixes #527

---------

Signed-off-by: Jonathan Wright <jonathan@almalinux.org>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-24 16:08:51 -07:00
Siddhartha Sankar Mondal
005a018db6
Deprecate MacOS 11 build target (#524)
Deprecate MacOS 11 build target. End of life June 2024.  Fixes #523

---------

Signed-off-by: Siddhartha Mondal <siddharthmondal@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Roshan Khatri <117414976+roshkhatri@users.noreply.github.com>
2024-05-21 12:21:28 -07:00
Binbin
e7e5a104ec
Revert mmap_rnd bits back to default value in CI (#520)
In 3f725b8, we introduced a change back in march to reduce the
entropy of ASLR, because ASAN didn't support it. Now the
vm.mmap_rnd_bits
was reverted in actions/runner-images#9491 so can remove this changes.

Closes #519.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-05-20 12:23:25 -07:00
Madelyn Olson
9b6232b501
Automatically notify the slack channel when tests fail (#509)
Adds a job that will automatically run at the end of the daily, which
will collect all the failed tests and send them to the developer slack.
It will include a link to the job as well.

Example job that ran on my private repo:
https://github.com/madolson/valkey/actions/runs/9123245899/job/25085418567

Example notification: 
<img width="662" alt="image"
src="https://github.com/valkey-io/valkey/assets/34459052/69127db4-e416-4321-bc06-eefcecab1130">
(Note: I removed the sassy text at the bottom from the PR)

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-16 23:51:33 -07:00
Madelyn Olson
1b3199e070
Fix unit test issues on address sanitizer and fortify (#437)
This commit does four things:

1. On various images, the linker was not able to correctly load the flto
optimizations from the archive generated for unit tests, and was
throwing errors. I was able to solve this by updating the plugin for the
fortify test, but was unable to reproduce it on the ASAN tests or find a
solution. So I decided to go with a single solution for now, which was
to just disable the linker optimizations for those tests. This shouldn't
weaken the protections provided by ASAN.
2. The change to remove flto for some reason caused some odd inlining
behavior in the intset test, that I wasn't really able to understand.
The error was basically that we were doing a 4 byte write, starting at
byte offset 8, for the first addition to listpack that was of size 10.
Practically this has no effect, since I'm not aware of any allocator
that would give us a 10 byte block as opposed to 12 (or more likely 16)
bytes. The isn't the correct behavior, since an uninitialized listpack
defaults to 16bit encoding, which should only be writing 2 bytes. I
rabbit holed like 2 hours into this, and gave up and just ignored the
warning on the file.
3. Now that address sanitizer was correctly running, it picked up two
issues. A memory leak and uninitialized value, so those were easy to
fix.
4. There is also a small change to the fortify to build the test up
front instead of later, this is just to be consistent with other tests
and has no functional change.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-05 22:00:08 -07:00
Björn Svensson
39d4b43d4b
Pin versions of Github Actions in CI (#221)
Pin the Github Action dependencies to the hash according to secure
software development best practices
recommended by the Open Source Security Foundation (OpenSSF).

When developing a CI workflow, it's common to version-pin dependencies
(i.e. actions/checkout@v4). However, version tags are mutable, so a
malicious attacker could overwrite a version tag to point to a malicious
or vulnerable commit instead.
Pinning workflow dependencies by hash ensures the dependency is
immutable and its behavior is guaranteed.
See
https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies

The `dependabot` supports updating a hash and the version comment so its
update will continue to work as before.

Links to used actions and theit tag/hash for review/validation:
https://github.com/actions/checkout/tags    (v4.1.2 was rolled back)
https://github.com/github/codeql-action/tags
https://github.com/maxim-lobanov/setup-xcode/tags
https://github.com/cross-platform-actions/action/releases/tag/v0.22.0
https://github.com/py-actions/py-dependency-install/tags
https://github.com/actions/upload-artifact/tags
https://github.com/actions/setup-node/tags
https://github.com/taiki-e/install-action/releases/tag/v2.32.2

This PR is part of #211.

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
2024-05-04 01:54:14 +02:00
Madelyn Olson
5b1fd222ed
An initial simple unit test framework (#344)
The core idea was to take a lot of the stuff from the C unity framework
and adapt it a bit here. Each file in the `unit` directory that starts
with `test_` is automatically assumed to be a test suite. Within each
file, all functions that start with `test_` are assumed to be a test.

See unit/README.md for details about the implementation.

Instead of compiling basically a net new binary, the way the tests are
compiled is that the main valkey server is compiled as a static archive,
which we then compile the individual test files against to create a new
test executable. This is not all that important now, other than it makes
the compilation simpler, but what it will allow us to do is overwrite
functions in the archive to enable mocking for cross compilation unit
functions. There are also ways to enable mocking from within the same
compilation unit, but I don't know how important this is.

Tests are also written in one of two styles:
1. Including the header file and directly calling functions from the
archive.
2. Importing the original file, and then calling the functions. This
second approach is cool because we can call static functions. It won't
mess up the archive either.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-02 20:00:04 -07:00
Björn Svensson
1c282a9306
Set permissions for Github Actions in CI (#312)
This sets the default permission for current CI workflows to only be
able to read from the repository (scope: "contents").
When a used Github Action require additional permissions (like CodeQL)
we grant that permission on job-level instead.

This means that a compromised action will not be able to modify the repo
or even steal secrets since all other permission-scopes are implicit set
to "none", i.e. not permitted. This is recommended by
[OpenSSF](https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions).

This PR includes a small fix for the possibility of missing server logs
artifacts, found while verifying the permission.
The `upload-artifact@v3` action will replace artifacts which already
exists. Since both CI-jobs `test-external-standalone` and
`test-external-nodebug` uses the same artifact name, when both jobs
fail, we only get logs from the last finished job. This can be avoided
by using unique artifact names.

This PR is part of #211

More about permissions and scope can be found here:

https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions

---------

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
2024-04-12 17:24:22 +02:00
Wen Hui
c0a83c0058
Fix CI centos issue (#150)
Because centos do not support actions/checkout@v4, we need roll back to
actions/checkout@v3

Please check the run result
https://github.com/hwware/placeholderkv/actions/runs/8526052560/job/23354458574

It looks our CI make happy now

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-02 14:48:12 -04:00
ICHINOSE Shogo
0a51ceca88
bump actions/checkout v4 (#87)
Node.js 16 actions are deprecated. To result them we are updating to actions/checkout@v4.
For more information see:
https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.

e.g. failure https://github.com/valkey-io/valkey/actions/runs/8482578610

---------

Signed-off-by: ICHINOSE Shogo <shogo82148@gmail.com>
2024-04-01 18:44:21 -07:00
Roshan Khatri
3630dd08a6
Restore all tests state prior to fork (#117)
Related to
https://github.com/valkey-io/valkey/pull/11#issuecomment-2028930612
Restore all tests state prior to fork and re-enables Daily tests on PRs
on release branches.
Reverts
2aa820f945

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-04-01 12:55:01 -04:00
Madelyn Olson
57789d4d08
Update naming to to Valkey (#62)
Documentation references should use `Valkey` while server and cli
references are all under `valkey`.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-03-28 09:58:28 -07:00
Roshan Khatri
340ab6d62d
Fixes external server tests and change other references (#14) 2024-03-25 18:49:52 +01:00
Roshan Khatri
2aa820f945
Avoid daily test to run on PRs (#11) 2024-03-22 12:36:08 -07:00
Madelyn Olson
9da3166e5c Fix daily.yml change 2024-03-21 22:41:23 -07:00
Madelyn Olson
084ab10e17 Disabled some workflows for now 2024-03-21 19:50:02 -07:00
Madelyn Olson
2e9855fbd0 Disable some workflows since I don't want to burn through the free tier 2024-03-21 19:39:21 -07:00
Madelyn Olson
3f725b8619
Change mmap rand bits as a temporary mitigation to resolve asan bug (#13150)
There is a new change in linux kernel 6.6.6 that uses randomization of
address space to harden security, but it interferes with the way ASAN
works. Folks are working on a fix, but this is a temporary mitigation
for us to get our CI to be green again.

See https://github.com/google/sanitizers/issues/1716 for more
information

See
https://github.com/redis/redis/actions/runs/8305126288/job/22731614306?pr=13149
for a recent failure
2024-03-17 09:06:51 +02:00
dependabot[bot]
38f0234946
Bump cross-platform-actions/action from 0.21.1 to 0.22.0 (#12904)
Bumps
[cross-platform-actions/action](https://github.com/cross-platform-actions/action)
from 0.21.1 to 0.22.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/cross-platform-actions/action/releases">cross-platform-actions/action's
releases</a>.</em></p>
<blockquote>
<h2>Cross Platform Action 0.22.0</h2>
<h3>Added</h3>
<ul>
<li>
<p>Added support for using the action in multiple steps in the same job
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/26">#26</a>).
All the inputs need to be the same for all steps, except for the
following
inputs: <code>sync_files</code>, <code>shutdown_vm</code> and
<code>run</code>.</p>
</li>
<li>
<p>Added support for specifying that the VM should not shutdown after
the action
has run. This adds a new input parameter: <code>shutdown_vm</code>. When
set to <code>false</code>,
this will hopefully mitigate very frequent freezing of VM during
teardown (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
</ul>
<h3>Changed</h3>
<ul>
<li>
<p>Always terminate VM instead of shutting down. This is more efficient
and this
will hopefully mitigate very frequent freezing of VM during teardown
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
<li>
<p>Use <code>unsafe</code> as the cache mode for QEMU disks. This should
improve performance (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>).</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/cross-platform-actions/action/blob/master/changelog.md">cross-platform-actions/action's
changelog</a>.</em></p>
<blockquote>
<h2>[0.22.0] - 2023-12-27</h2>
<h3>Added</h3>
<ul>
<li>
<p>Added support for using the action in multiple steps in the same job
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/26">#26</a>).
All the inputs need to be the same for all steps, except for the
following
inputs: <code>sync_files</code>, <code>shutdown_vm</code> and
<code>run</code>.</p>
</li>
<li>
<p>Added support for specifying that the VM should not shutdown after
the action
has run. This adds a new input parameter: <code>shutdown_vm</code>. When
set to <code>false</code>,
this will hopefully mitigate very frequent freezing of VM during
teardown (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
</ul>
<h3>Changed</h3>
<ul>
<li>
<p>Always terminate VM instead of shutting down. This is more efficient
and this
will hopefully mitigate very frequent freezing of VM during teardown
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
<li>
<p>Use <code>unsafe</code> as the cache mode for QEMU disks. This should
improve performance (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>).</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5800fa0060"><code>5800fa0</code></a>
Release 0.22.0</li>
<li><a
href="20ad4b2ceb"><code>20ad4b2</code></a>
Fix <a
href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>:
Use <code>unsafe</code> as the cache mode disks</li>
<li><a
href="d9184930c3"><code>d918493</code></a>
Always terminate VM instead of shutting down.</li>
<li><a
href="626f1d6c95"><code>626f1d6</code></a>
Fix error when terminating the VM</li>
<li><a
href="d59f08dc5c"><code>d59f08d</code></a>
Print stack trace for uncaught exceptions</li>
<li><a
href="7f2fab9c56"><code>7f2fab9</code></a>
Revert &quot;Run SSH in verbose mode when debug mode is
enabled&quot;</li>
<li><a
href="0f566c356e"><code>0f566c3</code></a>
[no ci] Update the changelog</li>
<li><a
href="b7f77446bb"><code>b7f7744</code></a>
[no ci] Fix spelling</li>
<li><a
href="9894a9b118"><code>9894a9b</code></a>
Wrap <code>host</code> module in namespace</li>
<li><a
href="87fdd346a2"><code>87fdd34</code></a>
Fix broken test-vm-shutdown tests</li>
<li>Additional commits viewable in <a
href="https://github.com/cross-platform-actions/action/compare/v0.21.1...v0.22.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cross-platform-actions/action&package-manager=github_actions&previous-version=0.21.1&new-version=0.22.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-04 22:38:33 +02:00
sundb
91309f7981
Fix compilation warning in KeySpace_ServerEventCallback and add CFLAGS=-Werror flag for module CI (#12786)
Warning:
```
postnotifications.c:216:77: warning: format specifies type 'long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
        RedisModule_Log(ctx, "warning", "Got an unexpected subevent '%ld'", subevent);
                                                                     ~~~    ^~~~~~~~
                                                                     %llu
```

CI:
https://github.com/redis/redis/actions/runs/6937308713/job/18871124342#step:6:115

## Other
Add `CFLAGS=-Werror` flag for module CI.

---------

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2023-11-30 17:41:00 +02:00
Yossi Gottlieb
a9e73c00bc
Reduce FreeBSD daily scope. (#12758)
The full test is very flaky running on a VM inside GitHub worker, so we
have to settle for only building and running a small smoke test.
2023-11-13 17:22:09 +02:00
Roshan Khatri
88e83e517b
Add DEBUG_ASSERTIONS option to custom assert (#12667)
This PR introduces a new macro, serverAssertWithInfoDebug, to do complex assertions only for debugging. The main intention is to allow running complex operations during tests without impacting runtime performance. This assertion is enabled when setting DEBUG_ASSERTIONS.

The DEBUG_ASSERTIONS flag is set for the daily and CI variants of `test-sanitizer-address`.
2023-11-11 20:31:34 -08:00
Yossi Gottlieb
6223355cf3
Use cross-platform-actions for FreeBSD support. (#12732)
This change overcomes many stability issues experienced with the
vmactions action.

We need to limit VMs to 8GB for better stability, as the 13GB default
seems to hang them occasionally.

Shell code has been simplified since this action seem to use `bash -e`
which will abort on non-zero exit codes anyway.
2023-11-06 18:07:14 +02:00
sundb
3c734b8e9d
Add new compilation CI for macos-11 and macos-13 (#12666)
As discussed in #12611
Add a build CI for macox 11 and 13 to avoid compatibility breakage introduced by future macos sdk versions.
2023-10-18 13:25:52 +03:00
dependabot[bot]
1634a0f271
Bump vmactions/freebsd-vm from 0.3.0 to 0.3.1 (#12352)
Bumps [vmactions/freebsd-vm](https://github.com/vmactions/freebsd-vm) from 0.3.0 to 0.3.1.
- [Release notes](https://github.com/vmactions/freebsd-vm/releases)
- [Commits](https://github.com/vmactions/freebsd-vm/compare/v0.3.0...v0.3.1)

---
updated-dependencies:
- dependency-name: vmactions/freebsd-vm
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 09:17:34 +03:00
sundb
42c8c61813
Fix some compile warnings and errors when building with gcc-12 or clang (#12035)
This PR is to fix the compilation warnings and errors generated by the latest
complier toolchain, and to add a new runner of the latest toolchain for daily CI.

## Fix various compilation warnings and errors

1) jemalloc.c

COMPILER: clang-14 with FORTIFY_SOURCE

WARNING:
```
src/jemalloc.c:1028:7: warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation]
                    "/etc/malloc.conf",
                    ^
src/jemalloc.c:1027:3: note: place parentheses around the string literal to silence warning
                "\"name\" of the file referenced by the symbolic link named "
                ^
```

REASON:  the compiler to alert developers to potential issues with string concatenation
that may miss a comma,
just like #9534 which misses a comma.

SOLUTION: use `()` to tell the compiler that these two line strings are continuous.

2) config.h

COMPILER: clang-14 with FORTIFY_SOURCE

WARNING:
```
In file included from quicklist.c:36:
./config.h:319:76: warning: attribute declaration must precede definition [-Wignored-attributes]
char *strcat(char *restrict dest, const char *restrict src) __attribute__((deprecated("please avoid use of unsafe C functions. prefer use of redis_strlcat instead")));
```

REASON: Enabling _FORTIFY_SOURCE will cause the compiler to use `strcpy()` with check,
it results in a deprecated attribute declaration after including <features.h>.

SOLUTION: move the deprecated attribute declaration from config.h to fmacro.h before "#include <features.h>".

3) networking.c

COMPILER: GCC-12

WARNING: 
```
networking.c: In function ‘addReplyDouble.part.0’:
networking.c:876:21: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
  876 |         dbuf[start] = '$';
      |                     ^
networking.c:868:14: note: at offset -5 into destination object ‘dbuf’ of size 5152
  868 |         char dbuf[MAX_LONG_DOUBLE_CHARS+32];
      |              ^
networking.c:876:21: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
  876 |         dbuf[start] = '$';
      |                     ^
networking.c:868:14: note: at offset -6 into destination object ‘dbuf’ of size 5152
  868 |         char dbuf[MAX_LONG_DOUBLE_CHARS+32];
```

REASON: GCC-12 predicts that digits10() may return 9 or 10 through `return 9 + (v >= 1000000000UL)`.

SOLUTION: add an assert to let the compiler know the possible length;

4) redis-cli.c & redis-benchmark.c

COMPILER: clang-14 with FORTIFY_SOURCE

WARNING:
```
redis-benchmark.c:1621:2: warning: embedding a directive within macro arguments has undefined behavior [-Wembedded-directive] #ifdef USE_OPENSSL
redis-cli.c:3015:2: warning: embedding a directive within macro arguments has undefined behavior [-Wembedded-directive] #ifdef USE_OPENSSL
```

REASON: when _FORTIFY_SOURCE is enabled, the compiler will use the print() with
check, which is a macro. this may result in the use of directives within the macro, which
is undefined behavior.

SOLUTION: move the directives-related code out of `print()`.

5) server.c

COMPILER: gcc-13 with FORTIFY_SOURCE

WARNING:
```
In function 'lookupCommandLogic',
    inlined from 'lookupCommandBySdsLogic' at server.c:3139:32:
server.c:3102:66: error: '*(robj **)argv' may be used uninitialized [-Werror=maybe-uninitialized]
 3102 |     struct redisCommand *base_cmd = dictFetchValue(commands, argv[0]->ptr);
      |                                                              ~~~~^~~
```

REASON: The compiler thinks that the `argc` returned by `sdssplitlen()` could be 0,
resulting in an empty array of size 0 being passed to lookupCommandLogic.
this should be a false positive, `argc` can't be 0 when strings are not NULL.

SOLUTION: add an assert to let the compiler know that `argc` is positive.

6) sha1.c

COMPILER: gcc-12

WARNING:
```
In function ‘SHA1Update’,
    inlined from ‘SHA1Final’ at sha1.c:195:5:
sha1.c:152:13: warning: ‘SHA1Transform’ reading 64 bytes from a region of size 0 [-Wstringop-overread]
  152 |             SHA1Transform(context->state, &data[i]);
      |             ^
sha1.c:152:13: note: referencing argument 2 of type ‘const unsigned char[64]’
sha1.c: In function ‘SHA1Final’:
sha1.c:56:6: note: in a call to function ‘SHA1Transform’
   56 | void SHA1Transform(uint32_t state[5], const unsigned char buffer[64])
      |      ^
In function ‘SHA1Update’,
    inlined from ‘SHA1Final’ at sha1.c:198:9:
sha1.c:152:13: warning: ‘SHA1Transform’ reading 64 bytes from a region of size 0 [-Wstringop-overread]
  152 |             SHA1Transform(context->state, &data[i]);
      |             ^
sha1.c:152:13: note: referencing argument 2 of type ‘const unsigned char[64]’
sha1.c: In function ‘SHA1Final’:
sha1.c:56:6: note: in a call to function ‘SHA1Transform’
   56 | void SHA1Transform(uint32_t state[5], const unsigned char buffer[64])
```

REASON: due to the bug[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80922], when
enable LTO, gcc-12 will not see `diagnostic ignored "-Wstringop-overread"`, resulting in a warning.

SOLUTION: temporarily set SHA1Update to noinline to avoid compiler warnings due
to LTO being enabled until the above gcc bug is fixed.

7) zmalloc.h

COMPILER: GCC-12

WARNING: 
```
In function ‘memset’,
    inlined from ‘moduleCreateContext’ at module.c:877:5,
    inlined from ‘RM_GetDetachedThreadSafeContext’ at module.c:8410:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:59:10: warning: ‘__builtin_memset’ writing 104 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
   59 |   return __builtin___memset_chk (__dest, __ch, __len,
```

REASON: due to the GCC-12 bug [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503],
GCC-12 cannot see alloc_size, which causes GCC to think that the actual size of memory
is 0 when checking with __glibc_objsize0().

SOLUTION: temporarily set malloc-related interfaces to `noinline` to avoid compiler warnings
due to LTO being enabled until the above gcc bug is fixed.

## Other changes
1) Fixed `ps -p [pid]`  doesn't output `<defunct>` when using procps 4.x causing `replication
  child dies when parent is killed - diskless` test to fail.
2) Add a new fortify CI with GCC-13 and ubuntu-lunar docker image.
2023-04-18 09:53:51 +03:00
Binbin
810ea67b5b
Don't pass --fail-commands-not-all-hit to validator if we don't run the full testsuite (#12023)
In daily.yml, if the input suggests we don't run the full testsuite,
do not pass --fail-commands-not-all-hit to the validator.

This fixes the first point in #11954. Credit goes to the comment
on the open issue for GH actions: actions/runner#409

Also improve prints to show the dispatch arguments in every job.
2023-04-12 12:23:50 +03:00
Oran Agra
997fa41e99
Attempt to solve MacOS CI issues in GH Actions (#12013)
The MacOS CI in github actions often hangs without any logs. GH argues that
it's due to resource utilization, either running out of disk space, memory, or CPU
starvation, and thus the runner is terminated.

This PR contains multiple attempts to resolve this:
1. introducing pause_process instead of SIGSTOP, which waits for the process
  to stop before resuming the test, possibly resolving race conditions in some tests,
  this was a suspect since there was one test that could result in an infinite loop in that
 case, in practice this didn't help, but still a good idea to keep.
2. disable the `save` config in many tests that don't need it, specifically ones that use
  heavy writes and could create large files.
3. change the `populate` proc to use short pipeline rather than an infinite one.
4. use `--clients 1` in the macos CI so that we don't risk running multiple resource
  demanding tests in parallel.
5. enable `--verbose` to be repeated to elevate verbosity and print more info to stdout
  when a test or a server starts.
2023-04-12 09:19:21 +03:00
Oran Agra
f263b6daf3
Increase threshold for flaky cache reclaim test (#12004)
This test produces 1GB of data and moves it around, and was expecting less
than 500kb to be present in the system page cache.
It sometimes fails with up to some 6mb in the page cache (0 in the actual RDB files),
increasing the threshold. It looks like some background tasks in the container are
occupying the page cache.

It is safe to ignore the above since we also explicitly check the pages of our dump.rdb
are not cached (matching `vmtouch -v` to `0%`).
An additional fix is to match ` 0%` (add space), so that we don't successfully match `10%`.

details in https://github.com/redis/redis/pull/11818
2023-04-05 14:45:42 +03:00
Oran Agra
9e15b42fda
ignore latency errors in the schema validation CI (#11958)
these latency threshold errors prevent the schema validation from running.
2023-03-23 10:49:09 +02:00
guybe7
4ba47d2d21
Add reply_schema to command json files (internal for now) (#10273)
Work in progress towards implementing a reply schema as part of COMMAND DOCS, see #9845
Since ironing the details of the reply schema of each and every command can take a long time, we
would like to merge this PR when the infrastructure is ready, and let this mature in the unstable branch.
Meanwhile the changes of this PR are internal, they are part of the repo, but do not affect the produced build.

### Background
In #9656 we add a lot of information about Redis commands, but we are missing information about the replies

### Motivation
1. Documentation. This is the primary goal.
2. It should be possible, based on the output of COMMAND, to be able to generate client code in typed
  languages. In order to do that, we need Redis to tell us, in detail, what each reply looks like.
3. We would like to build a fuzzer that verifies the reply structure (for now we use the existing
  testsuite, see the "Testing" section)

### Schema
The idea is to supply some sort of schema for the various replies of each command.
The schema will describe the conceptual structure of the reply (for generated clients), as defined in RESP3.
Note that the reply structure itself may change, depending on the arguments (e.g. `XINFO STREAM`, with
and without the `FULL` modifier)
We decided to use the standard json-schema (see https://json-schema.org/) as the reply-schema.

Example for `BZPOPMIN`:
```
"reply_schema": {
    "oneOf": [
        {
            "description": "Timeout reached and no elements were popped.",
            "type": "null"
        },
        {
            "description": "The keyname, popped member, and its score.",
            "type": "array",
            "minItems": 3,
            "maxItems": 3,
            "items": [
                {
                    "description": "Keyname",
                    "type": "string"
                },
                {
                    "description": "Member",
                    "type": "string"
                },
                {
                    "description": "Score",
                    "type": "number"
                }
            ]
        }
    ]
}
```

#### Notes
1.  It is ok that some commands' reply structure depends on the arguments and it's the caller's responsibility
  to know which is the relevant one. this comes after looking at other request-reply systems like OpenAPI,
  where the reply schema can also be oneOf and the caller is responsible to know which schema is the relevant one.
2. The reply schemas will describe RESP3 replies only. even though RESP3 is structured, we want to use reply
  schema for documentation (and possibly to create a fuzzer that validates the replies)
3. For documentation, the description field will include an explanation of the scenario in which the reply is sent,
  including any relation to arguments. for example, for `ZRANGE`'s two schemas we will need to state that one
  is with `WITHSCORES` and the other is without.
4. For documentation, there will be another optional field "notes" in which we will add a short description of
  the representation in RESP2, in case it's not trivial (RESP3's `ZRANGE`'s nested array vs. RESP2's flat
  array, for example)

Given the above:
1. We can generate the "return" section of all commands in [redis-doc](https://redis.io/commands/)
  (given that "description" and "notes" are comprehensive enough)
2. We can generate a client in a strongly typed language (but the return type could be a conceptual
  `union` and the caller needs to know which schema is relevant). see the section below for RESP2 support.
3. We can create a fuzzer for RESP3.

### Limitations (because we are using the standard json-schema)
The problem is that Redis' replies are more diverse than what the json format allows. This means that,
when we convert the reply to a json (in order to validate the schema against it), we lose information (see
the "Testing" section below).
The other option would have been to extend the standard json-schema (and json format) to include stuff
like sets, bulk-strings, error-string, etc. but that would mean also extending the schema-validator - and that
seemed like too much work, so we decided to compromise.

Examples:
1. We cannot tell the difference between an "array" and a "set"
2. We cannot tell the difference between simple-string and bulk-string
3. we cannot verify true uniqueness of items in commands like ZRANGE: json-schema doesn't cover the
  case of two identical members with different scores (e.g. `[["m1",6],["m1",7]]`) because `uniqueItems`
  compares (member,score) tuples and not just the member name. 

### Testing
This commit includes some changes inside Redis in order to verify the schemas (existing and future ones)
are indeed correct (i.e. describe the actual response of Redis).
To do that, we added a debugging feature to Redis that causes it to produce a log of all the commands
it executed and their replies.
For that, Redis needs to be compiled with `-DLOG_REQ_RES` and run with
`--reg-res-logfile <file> --client-default-resp 3` (the testsuite already does that if you run it with
`--log-req-res --force-resp3`)
You should run the testsuite with the above args (and `--dont-clean`) in order to make Redis generate
`.reqres` files (same dir as the `stdout` files) which contain request-response pairs.
These files are later on processed by `./utils/req-res-log-validator.py` which does:
1. Goes over req-res files, generated by redis-servers, spawned by the testsuite (see logreqres.c)
2. For each request-response pair, it validates the response against the request's reply_schema
  (obtained from the extended COMMAND DOCS)
5. In order to get good coverage of the Redis commands, and all their different replies, we chose to use
  the existing redis test suite, rather than attempt to write a fuzzer.

#### Notes about RESP2
1. We will not be able to use the testing tool to verify RESP2 replies (we are ok with that, it's time to
  accept RESP3 as the future RESP)
2. Since the majority of the test suite is using RESP2, and we want the server to reply with RESP3
  so that we can validate it, we will need to know how to convert the actual reply to the one expected.
   - number and boolean are always strings in RESP2 so the conversion is easy
   - objects (maps) are always a flat array in RESP2
   - others (nested array in RESP3's `ZRANGE` and others) will need some special per-command
     handling (so the client will not be totally auto-generated)

Example for ZRANGE:
```
"reply_schema": {
    "anyOf": [
        {
            "description": "A list of member elements",
            "type": "array",
            "uniqueItems": true,
            "items": {
                "type": "string"
            }
        },
        {
            "description": "Members and their scores. Returned in case `WITHSCORES` was used.",
            "notes": "In RESP2 this is returned as a flat array",
            "type": "array",
            "uniqueItems": true,
            "items": {
                "type": "array",
                "minItems": 2,
                "maxItems": 2,
                "items": [
                    {
                        "description": "Member",
                        "type": "string"
                    },
                    {
                        "description": "Score",
                        "type": "number"
                    }
                ]
            }
        }
    ]
}
```

### Other changes
1. Some tests that behave differently depending on the RESP are now being tested for both RESP,
  regardless of the special log-req-res mode ("Pub/Sub PING" for example)
2. Update the history field of CLIENT LIST
3. Added basic tests for commands that were not covered at all by the testsuite

### TODO

- [x] (maybe a different PR) add a "condition" field to anyOf/oneOf schemas that refers to args. e.g.
  when `SET` return NULL, the condition is `arguments.get||arguments.condition`, for `OK` the condition
  is `!arguments.get`, and for `string` the condition is `arguments.get` - https://github.com/redis/redis/issues/11896
- [x] (maybe a different PR) also run `runtest-cluster` in the req-res logging mode
- [x] add the new tests to GH actions (i.e. compile with `-DLOG_REQ_RES`, run the tests, and run the validator)
- [x] (maybe a different PR) figure out a way to warn about (sub)schemas that are uncovered by the output
  of the tests - https://github.com/redis/redis/issues/11897
- [x] (probably a separate PR) add all missing schemas
- [x] check why "SDOWN is triggered by misconfigured instance replying with errors" fails with --log-req-res
- [x] move the response transformers to their own file (run both regular, cluster, and sentinel tests - need to
  fight with the tcl including mechanism a bit)
- [x] issue: module API - https://github.com/redis/redis/issues/11898
- [x] (probably a separate PR): improve schemas: add `required` to `object`s - https://github.com/redis/redis/issues/11899

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Hanna Fadida <hanna.fadida@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Shaya Potter <shaya@redislabs.com>
2023-03-11 10:14:16 +02:00
Oran Agra
3ac835777c
Stablize page reclaim CI test (#11818)
stabilize the test introduced in #11248
* remove random aspect of the test by using DEBUG POPULATE instead of redis-benchmark
* disable rdbcompression, so that the rdb file is always about 1GB.

when fadvise was disabled, i get about 1GB in the page cace
when enabled i get less than 200KB
so for now, i'll keep the 500kb threshold.
2023-02-19 18:38:07 +02:00
Oran Agra
5b61b0dc6d
skip new page cache reclame unit test when running in valgrind (#11808)
the new test is incompatible with valgrind.
added a new `--valgrind` argument to `redis-server tests` mode,
which will cause that test to be skipped..
2023-02-16 10:50:58 +02:00
Tian
7dae142a2e
Reclaim page cache of RDB file (#11248)
# Background
The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service.

Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation.

# What the PR does
The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way.

Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false.

# Something deserve noting
1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0.
2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache.

# About test
A unit test is added to verify the effect of `posix_fadvise`.
In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
2023-02-12 09:23:29 +02:00