53 Commits

Author SHA1 Message Date
Madelyn Olson
0e96bb311e
Synchronously delete data during defrag tests (#1443)
The creation of fragmentation is delayed when we use lazy-free. You can
induce some of the active-defrag tests to fail by artificially adding a
delay in the lazyfree process, similar to the issues seen in #1433 and
issues like
https://github.com/valkey-io/valkey/actions/runs/12267010712/job/34226304803#step:7:6538.
The solution is to always do sync free during tests.

Might close https://github.com/valkey-io/valkey/issues/1433.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-14 19:14:01 +01:00
Jim Brunner
32f2c73cb5
defrag: eliminate persistent kvstore pointer and edge case fixes (#1430)
This update addresses several issues in defrag:
1. In the defrag redesign
(https://github.com/valkey-io/valkey/pull/1242), a bug was introduced
where `server.cronloops` was no longer being incremented in the
`whileBlockedCron()`. This resulted in some memory statistics not being
updated while blocked.
2. In the test case for AOF loading, we were seeing errors due to defrag
latencies. However, running the math, the latencies are justified given
the extremely high CPU target of the testcase. Adjusted the expected
latency check to allow longer latencies for this case where defrag is
undergoing starvation while AOF loading is in progress.
3. A "stage" is passed a "target". For the main dictionary and expires,
we were passing in a `kvstore*`. However, on flushall or swapdb, the
pointer may change. It's safer and more stable to use an index for the
DB (a DBID). Then if the pointer changes, we can detect the change, and
simply abort the stage. (If there's still fragmentation to deal with,
we'll pick it up again on the next cycle.)
4. We always start a new stage on a new defrag cycle. This gives the new
stage time to run, and prevents latency issues for certain stages which
don't operate incrementally. However, often several stages will require
almost no work, and this will leave a chunk of our CPU allotment unused.
This is mainly an issue in starvation situations (like AOF loading or
LUA script) - where defrag is running infrequently, with a large
duty-cycle. This change allows a new stage to be initiated if we still
have a standard duty-cycle remaining. (This can happen during starvation
situations where the planned duty cycle is larger than the standard
cycle. Most likely this isn't a concern for real scenarios, but it was
observed in testing.)
5. Minor comment correction in `server.h`

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
2024-12-12 14:55:57 -08:00
Jim Brunner
0c8ad5cd34
defrag: allow defrag to start during AOF loading (#1420)
Addresses https://github.com/valkey-io/valkey/issues/1393

Changes:
* During AOF loading or long running script, this allows defrag to be
initiated.
* The AOF defrag test was corrected to eliminate the wait period and
rely on non-timer invocations.
* Logic for "overage" time in defrag was changed. It previously
accumulated underage leading to large latencies in extreme tests having
very high CPU percentage. After several simple stages were completed
during infrequent blocked processing, a large cycle time would be
experienced.

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
2024-12-11 19:47:06 +02:00
Jim Brunner
397201c48f
Refactor of ActiveDefrag to reduce latencies (#1242)
Refer to:  https://github.com/valkey-io/valkey/issues/1141

This update refactors the defrag code to:
* Make the overall code more readable and maintainable
* Reduce latencies incurred during defrag processing

With this update, the defrag cycle time is reduced to 500us, with more
frequent cycles. This results in much more predictable latencies, with a
dramatic reduction in tail latencies.

(See https://github.com/valkey-io/valkey/issues/1141 for more complete
details.)

This update is focused mostly on the high-level processing, and does NOT
address lower level functions which aren't currently timebound (e.g.
`activeDefragSdsDict()`, and `moduleDefragGlobals()`). These are out of
scope for this update and left for a future update.

I fixed `kvstoreDictLUTDefrag` because it was using up to 7ms on a CME
single shard. See original github issue for performance details.

---------

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-03 08:42:29 -08:00
zvi-code
fd58f8d058
Disable lazy free in defrag test to fix 32bit daily failure (#1370)
Signed-off-by: Zvi Schneider <zvi.schneider22@gmail.com>
Co-authored-by: Zvi Schneider <zvi.schneider22@gmail.com>
2024-11-28 16:27:00 +01:00
Binbin
70624ea63d
Change all the lazyfree configurations to yes by default (#913)
## Set replica-lazy-flush and lazyfree-lazy-user-flush to yes by
default.
There are many problems with running flush synchronously. Even in
single CPU environments, the thread managers should balance between
the freeing and serving incoming requests.

## Set lazy eviction, expire, server-del, user-del to yes by default
We now have a del and a lazyfree del, we also have these configuration
items to control: lazyfree-lazy-eviction, lazyfree-lazy-expire,
lazyfree-lazy-server-del, lazyfree-lazy-user-del. In most cases lazyfree
is better since it reduces the risk of blocking the main thread, and
because we have lazyfreeGetFreeEffort, on those with high effor
(currently
64) will use lazyfree.

Part of #653.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-02 07:07:17 -07:00
uriyage
39f8bcb91b
Skip tracking clients OOM test when I/O threads are enabled (#764)
Fix feedback loop in key eviction with tracking clients when using I/O
threads.

Current issue:
Evicting keys while tracking clients or key space-notification exist
creates a feedback loop when using I/O threads:

While evicting keys we send tracking async writes to I/O threads,
preventing immediate release of tracking clients' COB memory
consumption.

Before the I/O thread finishes its write, we recheck used_memory, which
now includes the tracking clients' COB and thus continue to evict more
keys.

**Fix:**
We will skip the test for now while IO threads are active. We may
consider avoiding sending writes in `processPendingWrites` to I/O
threads for tracking clients when we are out of memory.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-21 17:02:57 -07:00
uriyage
bbfd041895
Async IO threads (#758)
This PR is 1 of 3 PRs intended to achieve the goal of 1 million requests
per second, as detailed by [dan touitou](https://github.com/touitou-dan)
in https://github.com/valkey-io/valkey/issues/22. This PR modifies the
IO threads to be fully asynchronous, which is a first and necessary step
to allow more work offloading and better utilization of the IO threads.

### Current IO threads state:

Valkey IO threads were introduced in Redis 6.0 to allow better
utilization of multi-core machines. Before this, Redis was
single-threaded and could only use one CPU core for network and command
processing. The introduction of IO threads helps in offloading the IO
operations to multiple threads.

**Current IO Threads flow:**

1. Initialization: When Redis starts, it initializes a specified number
of IO threads. These threads are in addition to the main thread, each
thread starts with an empty list, the main thread will populate that
list in each event-loop with pending-read-clients or
pending-write-clients.
2. Read Phase: The main thread accepts incoming connections and reads
requests from clients. The reading of requests are offloaded to IO
threads. The main thread puts the clients ready-to-read in a list and
set the global io_threads_op to IO_THREADS_OP_READ, the IO threads pick
the clients up, perform the read operation and parse the first incoming
command.
3. Command Processing: After reading the requests, command processing is
still single-threaded and handled by the main thread.
4. Write Phase: Similar to the read phase, the write phase is also be
offloaded to IO threads. The main thread prepares the response in the
clients’ output buffer then the main thread puts the client in the list,
and sets the global io_threads_op to the IO_THREADS_OP_WRITE. The IO
threads then pick the clients up and perform the write operation to send
the responses back to clients.
5. Synchronization: The main-thread communicate with the threads on how
many jobs left per each thread with atomic counter. The main-thread
doesn’t access the clients while being handled by the IO threads.

**Issues with current implementation:**

* Underutilized Cores: The current implementation of IO-threads leads to
the underutilization of CPU cores.
* The main thread remains responsible for a significant portion of
IO-related tasks that could be offloaded to IO-threads.
* When the main-thread is processing client’s commands, the IO threads
are idle for a considerable amount of time.
* Notably, the main thread's performance during the IO-related tasks is
constrained by the speed of the slowest IO-thread.
* Limited Offloading: Currently, Since the Main-threads waits
synchronously for the IO threads, the Threads perform only read-parse,
and write operations, with parsing done only for the first command. If
the threads can do work asynchronously we may offload more work to the
threads reducing the load from the main-thread.
* TLS: Currently, we don't support IO threads with TLS (where offloading
IO would be more beneficial) since TLS read/write operations are not
thread-safe with the current implementation.

### Suggested change

Non-blocking main thread - The main thread and IO threads will operate
in parallel to maximize efficiency. The main thread will not be blocked
by IO operations. It will continue to process commands independently of
the IO thread's activities.

**Implementation details**

**Inter-thread communication.**

* We use a static, lock-free ring buffer of fixed size (2048 jobs) for
the main thread to send jobs and for the IO to receive them. If the ring
buffer fills up, the main thread will handle the task itself, acting as
back pressure (in case IO operations are more expensive than command
processing). A static ring buffer is a better candidate than a dynamic
job queue as it eliminates the need for allocation/freeing per job.
* An IO job will be in the format: ` [void* function-call-back | void
*data] `where data is either a client to read/write from and the
function-ptr is the function to be called with the data for example
readQueryFromClient using this format we can use it later to offload
other types of works to the IO threads.
* The Ring buffer is one way from the main-thread to the IO thread, Upon
read/write event the main thread will send a read/write job then in
before sleep it will iterate over the pending read/write clients to
checking for each client if the IO threads has already finished handling
it. The IO thread signals it has finished handling a client read/write
by toggling an atomic flag read_state / write_state on the client
struct.

**Thread Safety**

As suggested in this solution, the IO threads are reading from and
writing to the clients' buffers while the main thread may access those
clients.
We must ensure no race conditions or unsafe access occurs while keeping
the Valkey code simple and lock free.

Minimal Action in the IO Threads
The main change is to limit the IO thread operations to the bare
minimum. The IO thread will access only the client's struct and only the
necessary fields in this struct.
The IO threads will be responsible for the following:

* Read Operation: The IO thread will only read and parse a single
command. It will not update the server stats, handle read errors, or
parsing errors. These tasks will be taken care of by the main thread.
* Write Operation: The IO thread will only write the available data. It
will not free the client's replies, handle write errors, or update the
server statistics.


To achieve this without code duplication, the read/write code has been
refactored into smaller, independent components:

* Functions that perform only the read/parse/write calls.
* Functions that handle the read/parse/write results.

This refactor accounts for the majority of the modifications in this PR.

**Client Struct Safe Access**

As we ensure that the IO threads access memory only within the client
struct, we need to ensure thread safety only for the client's struct's
shared fields.

* Query Buffer 
* Command parsing - The main thread will not try to parse a command from
the query buffer when a client is offloaded to the IO thread.
* Client's memory checks in client-cron - The main thread will not
access the client query buffer if it is offloaded and will handle the
querybuf grow/shrink when the client is back.
* CLIENT LIST command - The main thread will busy-wait for the IO thread
to finish handling the client, falling back to the current behavior
where the main thread waits for the IO thread to finish their
processing.
* Output Buffer 
* The IO thread will not change the client's bufpos and won't free the
client's reply lists. These actions will be done by the main thread on
the client's return from the IO thread.
* bufpos / block→used: As the main thread may change the bufpos, the
reply-block→used, or add/delete blocks to the reply list while the IO
thread writes, we add two fields to the client struct: io_last_bufpos
and io_last_reply_block. The IO thread will write until the
io_last_bufpos, which was set by the main-thread before sending the
client to the IO thread. If more data has been added to the cob in
between, it will be written in the next write-job. In addition, the main
thread will not trim or merge reply blocks while the client is
offloaded.
* Parsing Fields 
    * Client's cmd, argc, argv, reqtype, etc., are set during parsing.
* The main thread will indicate to the IO thread not to parse a cmd if
the client is not reset. In this case, the IO thread will only read from
the network and won't attempt to parse a new command.
* The main thread won't access the c→cmd/c→argv in the CLIENT LIST
command as stated before it will busy wait for the IO threads.
* Client Flags 
* c→flags, which may be changed by the main thread in multiple places,
won't be accessed by the IO thread. Instead, the main thread will set
the c→io_flags with the information necessary for the IO thread to know
the client's state.
* Client Close 
* On freeClient, the main thread will busy wait for the IO thread to
finish processing the client's read/write before proceeding to free the
client.
* Client's Memory Limits 
* The IO thread won't handle the qb/cob limits. In case a client crosses
the qb limit, the IO thread will stop reading for it, letting the main
thread know that the client crossed the limit.

**TLS**

TLS is currently not supported with IO threads for the following
reasons:

1. Pending reads - If SSL has pending data that has already been read
from the socket, there is a risk of not calling the read handler again.
To handle this, a list is used to hold the pending clients. With IO
threads, multiple threads can access the list concurrently.
2. Event loop modification - Currently, the TLS code
registers/unregisters the file descriptor from the event loop depending
on the read/write results. With IO threads, multiple threads can modify
the event loop struct simultaneously.
3. The same client can be sent to 2 different threads concurrently
(https://github.com/redis/redis/issues/12540).

Those issues were handled in the current PR:

1. The IO thread only performs the read operation. The main thread will
check for pending reads after the client returns from the IO thread and
will be the only one to access the pending list.
2. The registering/unregistering of events will be similarly postponed
and handled by the main thread only.
3. Each client is being sent to the same dedicated thread (c→id %
num_of_threads).


**Sending Replies Immediately with IO threads.**

Currently, after processing a command, we add the client to the
pending_writes_list. Only after processing all the clients do we send
all the replies. Since the IO threads are now working asynchronously, we
can send the reply immediately after processing the client’s requests,
reducing the command latency. However, if we are using AOF=always, we
must wait for the AOF buffer to be written, in which case we revert to
the current behavior.

**IO threads dynamic adjustment**

Currently, we use an all-or-nothing approach when activating the IO
threads. The current logic is as follows: if the number of pending write
clients is greater than twice the number of threads (including the main
thread), we enable all threads; otherwise, we enable none. For example,
if 8 IO threads are defined, we enable all 8 threads if there are 16
pending clients; else, we enable none.
It makes more sense to enable partial activation of the IO threads. If
we have 10 pending clients, we will enable 5 threads, and so on. This
approach allows for a more granular and efficient allocation of
resources based on the current workload.

In addition, the user will now be able to change the number of I/O
threads at runtime. For example, when decreasing the number of threads
from 4 to 2, threads 3 and 4 will be closed after flushing their job
queues.

**Tests**

Currently, we run the io-threads tests with 4 IO threads
(443d80f168/.github/workflows/daily.yml (L353)).
This means that we will not activate the IO threads unless there are 8
(threads * 2) pending write clients per single loop, which is unlikely
to happened in most of tests, meaning the IO threads are not currently
being tested.

To enforce the main thread to always offload work to the IO threads,
regardless of the number of pending events, we add an
events-per-io-thread configuration with a default value of 2. When set
to 0, this configuration will force the main thread to always offload
work to the IO threads.

When we offload every single read/write operation to the IO threads, the
IO-threads are running with 100% CPU when running multiple tests
concurrently some tests fail as a result of larger than expected command
latencies. To address this issue, we have to add some after or wait_for
calls to some of the tests to ensure they pass with IO threads as well.

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-07-08 20:01:39 -07:00
Shivshankar
da831c0d22
rename procedure redis_deferring_client to valkey_deferring_client (#270)
Updated procedure redis_deferring_client in test environent to
valkey_deferring_client.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-09 10:38:09 -04:00
debing.sun
ad12730333
Implement defragmentation for pubsub kvstore (#13058)
After #13013

### This PR make effort to defrag the pubsub kvstore in the following
ways:

1. Till now server.pubsub(shard)_channels only share channel name obj
with the first subscribed client, now change it so that the clients and
the pubsub kvstore share the channel name robj.
This would save a lot of memory when there are many subscribers to the
same channel.
It also means that we only need to defrag the channel name robj in the
pubsub kvstore, and then update
all client references for the current channel, avoiding the need to
iterate through all the clients to do the same things.
    
2. Refactor the code to defragment pubsub(shard) in the same way as
defragment of keys and EXPIRES, with the exception that we only
defragment pubsub(without shard) when slot is zero.


### Other
Fix an overlook in #11695, if defragment doesn't reach the end time, we
should wait for the current
db's keys and expires, pubsub and pubsubshard to finish before leaving,
now it's possible to exit
early when the keys are defragmented.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2024-03-04 16:56:50 +02:00
Binbin
13bd3643c2
Re-compute active_defrag_running after adjusting defrag configurations (#13020)
Currently, once active defrag starts, we can not adjust
active_defrag_running
downwards. This is because active_defrag_running will be dynamically
compute
based on the fragmentation, we think we should not lower the effort when
the
fragmentation drops.

However, we need to note that active_defrag_running will also be
dynamically
computed based on configurations. In this case, we are not respecting
cycle-min
or cycle-max. Some people may realize halfway through that defrag
consumes a
lot and want to adjust it.

Previously we could only turn off activedefrag and then turn it on again
to
adjust active_defrag_running downwards. So in this PR, when a active
defrag
configuration change is made, we will re-compute it.

These configuration items are:
- active-defrag-cycle-min
- active-defrag-cycle-max
- active-defrag-threshold-upper
2024-02-06 13:39:07 +02:00
Roshan Khatri
15a048d4f0
re-enable defrag tests in cluster mode. (#12710)
Reverts the skipping defrag tests in cluster mode (done in #12672.
instead it skips only some defrag tests that are relevant for cluster modes.
The test now run well after investigating and making the changes in #12674 and #12694.

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-11-02 13:55:48 +02:00
Harkrishn Patro
26eb4ce397
Fix defrag test (#12674)
Fixing issues started after #11695 when the defrag tests are being executed in cluster mode too.
For some reason, it looks like the defragmentation is over too quickly, before the test is able to
detect that it's running.
so now instead of waiting to see that it's active, we wait to see that it did some work
```
[err]: Active defrag big list: cluster in tests/unit/memefficiency.tcl
defrag not started.
[err]: Active defrag big keys: cluster in tests/unit/memefficiency.tcl
defrag didn't stop.
```
2023-10-22 11:56:45 +03:00
Harkrishn Patro
becd50d0da
Disable flaky defrag tests affecting daily run (#12672)
Temporarily disabling few of the defrag tests in cluster mode to make the daily run stable:

Active defrag eval scripts
Active defrag big keys
Active defrag big list
Active defrag edge case
2023-10-19 21:12:58 +03:00
Vitaly
0270abda82
Replace cluster metadata with slot specific dictionaries (#11695)
This is an implementation of https://github.com/redis/redis/issues/10589 that eliminates 16 bytes per entry in cluster mode, that are currently used to create a linked list between entries in the same slot.  Main idea is splitting main dictionary into 16k smaller dictionaries (one per slot), so we can perform all slot specific operations, such as iteration, without any additional info in the `dictEntry`. For Redis cluster, the expectation is that there will be a larger number of keys, so the fixed overhead of 16k dictionaries will be The expire dictionary is also split up so that each slot is logically decoupled, so that in subsequent revisions we will be able to atomically flush a slot of data.

## Important changes
* Incremental rehashing - one big change here is that it's not one, but rather up to 16k dictionaries that can be rehashing at the same time, in order to keep track of them, we introduce a separate queue for dictionaries that are rehashing. Also instead of rehashing a single dictionary, cron job will now try to rehash as many as it can in 1ms.
* getRandomKey - now needs to not only select a random key, from the random bucket, but also needs to select a random dictionary. Fairness is a major concern here, as it's possible that keys can be unevenly distributed across the slots. In order to address this search we introduced binary index tree). With that data structure we are able to efficiently find a random slot using binary search in O(log^2(slot count)) time.
* Iteration efficiency - when iterating dictionary with a lot of empty slots, we want to skip them efficiently. We can do this using same binary index that is used for random key selection, this index allows us to find a slot for a specific key index. For example if there are 10 keys in the slot 0, then we can quickly find a slot that contains 11th key using binary search on top of the binary index tree.
* scan API - in order to perform a scan across the entire DB, the cursor now needs to not only save position within the dictionary but also the slot id. In this change we append slot id into LSB of the cursor so it can be passed around between client and the server. This has interesting side effect, now you'll be able to start scanning specific slot by simply providing slot id as a cursor value. The plan is to not document this as defined behavior, however. It's also worth nothing the SCAN API is now technically incompatible with previous versions, although practically we don't believe it's an issue.
* Checksum calculation optimizations - During command execution, we know that all of the keys are from the same slot (outside of a few notable exceptions such as cross slot scripts and modules). We don't want to compute the checksum multiple multiple times, hence we are relying on cached slot id in the client during the command executions. All operations that access random keys, either should pass in the known slot or recompute the slot. 
* Slot info in RDB - in order to resize individual dictionaries correctly, while loading RDB, it's not enough to know total number of keys (of course we could approximate number of keys per slot, but it won't be precise). To address this issue, we've added additional metadata into RDB that contains number of keys in each slot, which can be used as a hint during loading.
* DB size - besides `DBSIZE` API, we need to know size of the DB in many places want, in order to avoid scanning all dictionaries and summing up their sizes in a loop, we've introduced a new field into `redisDb` that keeps track of `key_count`. This way we can keep DBSIZE operation O(1). This is also kept for O(1) expires computation as well.

## Performance
This change improves SET performance in cluster mode by ~5%, most of the gains come from us not having to maintain linked lists for keys in slot, non-cluster mode has same performance. For workloads that rely on evictions, the performance is similar because of the extra overhead for finding keys to evict. 

RDB loading performance is slightly reduced, as the slot of each key needs to be computed during the load.

## Interface changes
* Removed `overhead.hashtable.slot-to-keys` to `MEMORY STATS`
* Scan API will now require 64 bits to store the cursor, even on 32 bit systems, as the slot information will be stored.
* New RDB version to support the new op code for SLOT information. 

---------

Co-authored-by: Vitaly Arbuzov <arvit@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-10-14 23:58:26 -07:00
Binbin
96e9dec419
Bump codespell from 2.2.4 to 2.2.5 (#12557)
and adjustments.
2023-09-08 16:10:17 +03:00
Binbin
bfe50a30ed
Increase the threshold of the AOF loading defrag test (#11871)
This test is very sensitive and fragile. It often fails in Daily,
in most cases, it failed in test-ubuntu-32bit (the AOF loading one),
with the range in (31, 40):
```
[err]: Active defrag in tests/unit/memefficiency.tcl
Expected 38 <= 30 (context: type eval line 113 cmd {assert {$max_latency <= 30}} proc ::test)
```

The AOF loading part isn't tightly fixed to the cron hz. It calls
processEventsWhileBlocked once in every 1024 command calls.
```
        /* Serve the clients from time to time */
        if (!(loops++ % 1024)) {
            off_t progress_delta = ftello(fp) - last_progress_report_size;
            loadingIncrProgress(progress_delta);
            last_progress_report_size += progress_delta;
            processEventsWhileBlocked();
            processModuleLoadingProgressEvent(1);
        }
```

In this case, we can either decrease the 1024 or increase the
threshold of just the AOF part of that test. Considering the test
machines are sometimes slow, and all sort of quirks could happen
(which do not indicate a bug), and we've already set to 30, we suppose
we can set it a little bit higher, set it to 40. We can have this instead of
adding another testing config (we can add it when we really need it).

Fixes #11868
2023-03-04 12:54:36 +02:00
Oran Agra
528bb11d7a
Solve issues with active defrag test failing on fast machines (#11598)
We do defrag during AOF loading, but aim to detect fragmentation only
once a second, so this test aims to slow down the AOF loading and mimic
loading of a large file.
On fast machines the sleep, plus the actual work we did was insufficient
making it sleep longer so the test won't fail.

The error we used to get is this one:
Expected 0 > 100000 (context: type eval line 106 cmd {assert {$hits > 100000}} proc ::test)
2022-12-09 13:33:38 +02:00
蔡相跃
24da71e507
Fix typo "the the" (#10399) 2022-03-09 13:55:17 +02:00
yoav-steinberg
b59bb9b476
Fix script active defrag test (#10318)
This includes two fixes:
* We forgot to count non-key reallocs in defragmentation stats.
* Fix the script defrag tests so to make dict entries less signigicant in fragmentation by making the scripts larger.
This assures active defrage will complete and reach desired results.
Some inherent fragmentation might exists in dict entries which we need to ignore.
This lead to occasional CI failures.
2022-02-21 09:37:25 +02:00
yoav-steinberg
2eb9b19612
Fix Eval scripts defrag (broken 7.0 in RC1) (#10271)
Remove scripts defragger since it was broken since #10126 (released in 7.0 RC1).
would crash the server if defragger starts in a server that contains eval scripts.

In #10126 the global `lua_script` dict became a dict to a custom `luaScript` struct with an internal `robj`
in it instead of a generic `sds` -> `robj` dict. This means we need custom code to defrag it and since scripts
should never really cause much fragmentation it makes more sense to simply remove the defrag code for scripts.
2022-02-11 21:58:05 +02:00
Oran Agra
6add1b7217
Add external test that runs without debug command (#9964)
- add needs:debug flag for some tests
- disable "save" in external tests (speedup?)
- use debug_digest proc instead of debug command directly so it can be skipped
- use OBJECT ENCODING instead of DEBUG OBJECT to get encoding
- add a proc for OBJECT REFCOUNT so it can be skipped
- move a bunch of tests in latency_monitor tests to happen later so that latency monitor has some values in it
- add missing close_replication_stream calls
- make sure to close the temp client if DEBUG LOG fails
2021-12-19 17:41:51 +02:00
Oran Agra
d4e7ffb38c
Improve active defrag in jemalloc 5.2 (#9778)
Background:
Following the upgrade to jemalloc 5.2, there was a test that used to be flaky and
started failing consistently (on 32bit), so we disabled it ​(see #9645).

This is a test that i introduced in #7289 when i attempted to solve a rare stagnation
problem, and it later turned out i failed to solve it, ans what's more i added a test that
caused it to be not so rare, and as i mentioned, now in jemalloc 5.2 it became consistent on 32bit.

Stagnation can happen when all the slabs of the bin are equally utilized, so the decision
to move an allocation from a relatively empty slab to a relatively full one, will never
happen, and in that test all the slabs are at 50% utilization, so the defragger could just
keep scanning the keyspace and not move anything.

What this PR changes:
* First, finally in jemalloc 5.2 we have the count of non-full slabs, so when we compare
  the utilization of the current slab, we can compare it to the average utilization of the non-full
  slabs in our bin, instead of the total average of our bin. this takes the full slabs out of the game,
  since they're not candidates for migration (neither source nor target).
* Secondly, We add some 12% (100/8) to the decision to defrag an allocation, this is the part
  that aims to avoid stagnation, and it's especially important since the above mentioned change
  can get us closer to stagnation.
* Thirdly, since jemalloc 5.2 adds sharded bins, we take into account all shards (something
  that's missing from the original PR that merged it), this isn't expected to make any difference
  since anyway there should be just one shard.

How this was benchmarked.
What i did was run the memefficiency test unit with `--verbose` and compare the defragger hits
and misses the tests reported.
At first, when i took into consideration only the non-full slabs, it got a lot worse (i got into
stagnation, or just got a lot of misses and a lot of hits), but when i added the 10% i got back
to results that were slightly better than the ones of the jemalloc 5.1 branch. i.e. full defragmentation
was achieved with fewer hits (relocations), and fewer misses (keyspace scans).
2021-11-21 13:35:39 +02:00
menwen
d5ca72e38b
fix defrag test looking at the wrong latency metric (#9723)
the latency event was renamed in #7726, and the outcome was that the test was
ineffective (unable to measure the max latency, always seeing 0)
2021-11-02 15:52:56 +02:00
yoav-steinberg
81095b1bd9
Skip Active-defrag edge case test until we fix it. (#9645)
Test started failing consistently in 32bit builds after upgrading to jemalloc 5.2.1 (#9623).
2021-10-18 13:28:52 +03:00
Oran Agra
1e7ad894d2
Tune timeout of active defrag test (#9426)
Failed on Raspberry Pi 3b where that single test took about 170 seconds
2021-08-30 12:39:09 +03:00
Binbin
0bfccc55e2
Fixed some typos, add a spell check ci and others minor fix (#8890)
This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes.
This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything,
but at least it is also unlikely to cause false positives.

Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use.

Here's a summary of other changes:
1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces).
2. Outdated function / variable / argument names in comments
3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString.
4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751
5. Some outdated https link URLs.
6. Fix some outdated comment. Such as:
    - In README: about the rdb, we used to said create a `thread`, change to `process`
    - dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey)
    - notifyKeyspaceEvent fucntion comment (add type arg)
    - Some others minor fix in comment (Most of them are incorrectly quoted by variable names)
7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`
2021-06-10 15:39:33 +03:00
Yossi Gottlieb
8a86bca5ed
Improve test suite to handle external servers better. (#9033)
This commit revives the improves the ability to run the test suite against
external servers, instead of launching and managing `redis-server` processes as
part of the test fixture.

This capability existed in the past, using the `--host` and `--port` options.
However, it was quite limited and mostly useful when running a specific tests.
Attempting to run larger chunks of the test suite experienced many issues:

* Many tests depend on being able to start and control `redis-server` themselves,
and there's no clear distinction between external server compatible and other
tests.
* Cluster mode is not supported (resulting with `CROSSSLOT` errors).

This PR cleans up many things and makes it possible to run the entire test suite
against an external server. It also provides more fine grained controls to
handle cases where the external server supports a subset of the Redis commands,
limited number of databases, cluster mode, etc.

The tests directory now contains a `README.md` file that describes how this
works.

This commit also includes additional cleanups and fixes:

* Tests can now be tagged.
* Tag-based selection is now unified across `start_server`, `tags` and `test`.
* More information is provided about skipped or ignored tests.
* Repeated patterns in tests have been extracted to common procedures, both at a
  global level and on a per-test file basis.
* Cleaned up some cases where test setup was based on a previous test executing
  (a major anti-pattern that repeats itself in many places).
* Cleaned up some cases where test teardown was not part of a test (in the
  future we should have dedicated teardown code that executes even when tests
  fail).
* Fixed some tests that were flaky running on external servers.
2021-06-09 15:13:24 +03:00
Oran Agra
5843a45d01
Skip defrag tests on systems with bigger page sizes (#8294)
The defragger works well on these systems, but the tests and their
thresholds are not adjusted for these big pages, so the defragger isn't
able to get down the fragmentation to the levels the test expects and it
fails on "defrag didn't stop".

Randomly choosing 8k as the threshold for the skipping

Fixes #8265 (which had 65k pages)
2021-01-08 10:03:21 +02:00
Oran Agra
7d9b09adaa
Tests: fix new defrag test to be skipped when not supported (#8185)
Additionally the older defrag tests are using an obsolete way to check
if the defragger is suuported (the error no longer contains "DISABLED").
this doesn't usually makes a difference since these tests are completely
skipped if the allocator is not jemalloc, but that would fail if the
allocator is a jemalloc that doesn't support defrag.
2020-12-14 11:13:46 +02:00
Yossi Gottlieb
2faa0f19eb Fix test failure on slower systems.
Not disabling save, slower systems begun background save that did not
complete in time, resulting with SAVE failing with "ERR Background save
already in progress".
2020-11-04 21:43:55 +02:00
Yossi Gottlieb
843a13e88f
Add a --no-latency tests flag. (#7939)
Useful for running tests on systems which may be way slower than usual.
2020-10-22 11:10:53 +03:00
Oran Agra
9ef8d2f671
Run active defrag while blocked / loading (#7726)
During long running scripts or loading RDB/AOF, we may need to do some
defragging. Since processEventsWhileBlocked is called periodically at
unknown intervals, and many cron jobs either depend on run_with_period
(including active defrag), or rely on being called at server.hz rate
(i.e. active defrag knows ho much time to run by looking at server.hz),
the whileBlockedCron may have to run a loop triggering the cron jobs in it
(currently only active defrag) several times.

Other changes:
- Adding a test for defrag during aof loading.
- Changing key-load-delay config to take negative values for fractions
  of a microsecond sleep
2020-09-03 08:47:29 +03:00
Oran Agra
88d71f4793 fix a rare active defrag edge case bug leading to stagnation
There's a rare case which leads to stagnation in the defragger, causing
it to keep scanning the keyspace and do nothing (not moving any
allocation), this happens when all the allocator slabs of a certain bin
have the same % utilization, but the slab from which new allocations are
made have a lower utilization.

this commit fixes it by removing the current slab from the overall
average utilization of the bin, and also eliminate any precision loss in
the utilization calculation and move the decision about the defrag to
reside inside jemalloc.

and also add a test that consistently reproduce this issue.
2020-05-20 16:04:42 +03:00
Oran Agra
b9fa42a197 testsuite run the defrag latency test solo
this test is time sensitive and it sometimes fail to pass below the
latency threshold, even on strong machines.

this test was the reson we're running just 2 parallel tests in the
github actions CI, revering this.
2020-04-16 18:09:22 +03:00
Oran Agra
2f1a1c3835 fix github actions failing latency test for active defrag - part 2
it seems that running two clients at a time is ok too, resuces action
time from 20 minutes to 10. we'll use this for now, and if one day it
won't be enough we'll have to run just the sensitive tests one by one
separately from the others.

this commit also fixes an issue with the defrag test that appears to be
very rare.
2020-02-27 08:34:53 +02:00
Oran Agra
537893420b fix github actions failing latency test for active defrag
seems that github actions are slow, using just one client to reduce
false positives.

also adding verbose, testing only on latest ubuntu, and building on
older one.

when doing that, i can reduce the test threshold back to something saner
2020-02-25 17:53:23 +02:00
Oran Agra
62adabd0e0 Fix latency sensitivity of new defrag test
I saw that the new defag test for list was failing in CI recently, so i
reduce it's threshold from 12 to 60.

besides that, i add / improve the latency test for that other two defrag
tests (add a sensitive latency and digest / save checks)

and fix bad usage of debug populate (can't overrides existing keys).
this was the original intention, which creates higher fragmentation.
2020-02-23 13:05:52 +02:00
Oran Agra
485425cec7 Defrag big lists in portions to avoid latency and freeze
When active defrag kicks in and finds a big list, it will create a bookmark to
a node so that it is able to resume iteration from that node later.

The quicklist manages that bookmark, and updates it in case that node is deleted.

This will increase memory usage only on lists of over 1000 (see
active-defrag-max-scan-fields) quicklist nodes (1000 ziplists, not 1000 items)
by 16 bytes.

In 32 bit build, this change reduces the maximum effective config of
list-compress-depth and list-max-ziplist-size (from 32767 to 8191)
2020-02-18 17:22:32 +02:00
Oran Agra
d0850369c4 fix small test suite race conditions 2018-11-12 10:26:10 +02:00
Oran Agra
c8452ab005 Fix unstable tests on slow machines.
Few tests had borderline thresholds that were adjusted.

The slave buffers test had two issues, preventing the slave buffer from growing:
1) the slave didn't necessarily go to sleep on time, or woke up too early,
   now using SIGSTOP to make sure it goes to sleep exactly when we want.
2) the master disconnected the slave on timeout
2018-08-21 11:46:07 +03:00
Oran Agra
f89c93c8ad make active defrag test more stable
on slower machines, the active defrag test tended to fail.
although the fragmentation ratio was below the treshold, the defragger was
still in the middle of a scan cycle.

this commit changes:
- the defragger uses the current fragmentation state, rather than the cache one
  that is updated by server cron every 100ms. this actually fixes a bug of
  starting one excess scan cycle
- the test lets the defragger use more CPU cycles, in hope that the defrag
  will be faster, but also give it more time before we give up.
2018-07-18 10:16:33 +03:00
Oran Agra
de495ee7ab minor fix in creating a stream NACK for rdb and defrag tests 2018-06-27 15:34:17 +03:00
Oran Agra
5616d4c603 add active defrag support for streams 2018-06-27 15:00:41 +03:00
antirez
98d5d3f118 Make active defragmentation tests optional.
They failed when active defrag could not be activated because the
Jemalloc version does not include the additional APIs.
2018-05-24 18:04:21 +02:00
Oran Agra
ad133e1023 Active defrag fixes for 32bit builds
problems fixed:
* failing to read fragmentation information from jemalloc
* overflow in jemalloc fragmentation hint to the defragger
* test suite not triggering eviction after population
2018-05-17 09:52:00 +03:00
Oran Agra
806736cdf9 Adding real allocator fragmentation to INFO and MEMORY command + active defrag test
other fixes / improvements:
- LUA script memory isn't taken from zmalloc (taken from libc malloc)
  so it can cause high fragmentation ratio to be displayed (which is false)
- there was a problem with "fragmentation" info being calculated from
  RSS and used_memory sampled at different times (now sampling them together)

other details:
- adding a few more allocator info fields to INFO and MEMORY commands
- improve defrag test to measure defrag latency of big keys
- increasing the accuracy of the defrag test (by looking at real grag info)
  this way we can use an even lower threshold and still avoid false positives
- keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things
- add these the MEMORY DOCTOR command
- deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used")
- reduce sampling rate of the rss and allocator info
2018-03-12 15:08:52 +02:00
antirez
c861e1e1ee Defrag: test currently disabled, too many false positives.
Related to #3786.
2017-04-22 15:59:57 +02:00
antirez
a17390853d Defrag: fix test false positive.
Apparently 1.4 is too low compared to what you get in certain setups
(including mine). I raised it to 1.55 that hopefully is still enough to
test that the fragmentation went down from 1.7 but without incurring in
issues, however the test setup may be still fragile so certain times this
may lead to false positives again, it's hard to test for these things
in a determinsitic way.

Related to #3786.
2017-04-22 13:21:41 +02:00
oranagra
0fb5c4ebd8 add test for active defrag 2017-04-22 13:17:09 +02:00