12565 Commits

Author SHA1 Message Date
Madelyn Olson
85a58478df Added release notes for RC2
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
8.0.0-rc2
2024-09-03 09:00:45 -07:00
Madelyn Olson
509f026326 Initialize all the fields for the test kvstore (#982)
Follow up to https://github.com/valkey-io/valkey/pull/966, which didn't
update the kvstore tests. I'm not actually entirely clear why it fixes
it, but the consistency prevents the crash very reliably so will merge
it now and maybe see if Zhao has a better explanation.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-03 09:00:45 -07:00
Amit Nagler
2c46f2047b Add configuration hide-user-data-from-log to hide user data from server logs (#877)
Implement data masking for user data in server logs and diagnostic output. This change prevents potential exposure of confidential information, such as PII, and enhances privacy protection. It masks all command arguments, client names, and client usernames.

Added a new hide-user-data-from-log configuration item, default yes.

---------

Signed-off-by: Amit Nagler <anagler123@gmail.com>
2024-09-03 09:00:45 -07:00
Binbin
def9ddf1e5 Fix set expire test due to the new lazyfree configs changes (#980)
Test failed because these two PRs #865 and #913.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
zhaozhao.zz
e7d46558fe Use metadata to handle the reference relationship between kvstore and dict (#966)
Feature `one-dict-per-slot` refactors the database, and part of it
involved splitting the rehashing list from the global level back to the
database level, or more specifically, the kvstore level. This change is
fine, and it also simplifies the process of swapping databases, which is
good. And it should not have a major impact on the efficiency of
incremental rehashing.

To implement the kvstore-level rehashing list, each `dict` under the
`kvstore` needs to know which `kvstore` it belongs. However, kvstore did
not insert the reference relationship into the `dict` itself, instead,
it placed it in the `dictType`. In my view, this is a somewhat odd way.
Theoretically, `dictType` is just a collection of function handles, a
kind of virtual type that can be referenced globally, not an entity. But
now the `dictType` is instantiated, with each `kvstore` owning an actual
`dictType`, which in turn holds a reverse reference to the `kvstore`'s
resource pointer. This design is somewhat uncomfortable for me.

I think the `dictType` should not be instantiated. The references
between actual resources (`kvstore` and `dict`) should occur between
specific objects, rather than force materializing the `dictType`, which
is supposed to be virtual.

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-09-03 09:00:45 -07:00
Binbin
1dcd3f23d8 Change all the lazyfree configurations to yes by default (#913)
## Set replica-lazy-flush and lazyfree-lazy-user-flush to yes by
default.
There are many problems with running flush synchronously. Even in
single CPU environments, the thread managers should balance between
the freeing and serving incoming requests.

## Set lazy eviction, expire, server-del, user-del to yes by default
We now have a del and a lazyfree del, we also have these configuration
items to control: lazyfree-lazy-eviction, lazyfree-lazy-expire,
lazyfree-lazy-server-del, lazyfree-lazy-user-del. In most cases lazyfree
is better since it reduces the risk of blocking the main thread, and
because we have lazyfreeGetFreeEffort, on those with high effor
(currently
64) will use lazyfree.

Part of #653.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
Madelyn Olson
cb4d6f665f Fix zipmap test null pointer (#975)
The previous test does a strncmp on a NULL, which is not valid. It
should be using an empty length string instead. Addresses
https://github.com/valkey-io/valkey/actions/runs/10649272046/job/29519233939.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-03 09:00:45 -07:00
Binbin
56c61c6ee3 Fast path in SET if the expiration time is expired (#865)
If the expiration time passed in SET is expired, for example, it
has expired due to the machine time (DTS) or the expiration time
passed in (wrong arg). In this case, we don't need to set the key
and wait for the active expire scan before deleting the key.

Compared with previous changes:
1. If the key does not exist, previously we would set the key and wait
for the active expire to delete it, so it is a set + del from the
perspective
of propaganda. Now we will no set the key and return, so it a NOP.

2. If the key exists, previously we woule set the key and wait
for the active expire to delete it, so it is a set + del From the
perspective
of propaganda. Now we will delete it and return, so it is a del.

Adding a new deleteExpiredKeyFromOverwriteAndPropagate function
to reduce the duplicate code.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-03 09:00:45 -07:00
Viktor Söderqvist
1758fbf27f Delete unused parts of zipmap (#973)
Deletes zipmapSet, zipmapGet, etc. Only keep iterator and validate
integrity, what we use when loading an old RDB file.

Adjust unit tests to not use zipmapSet, etc.

Solves a build failure where when compiling with fortify source.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-09-03 09:00:45 -07:00
Binbin
21c2ad7b4a Fix timing issue in replica migration test (#968)
The reason is the server 3 still have the server 7 as its replica
due to a short wait, the wait is not enough, we should wait for
server loss its replica.
```
*** [err]: valkey-cli make source node ignores NOREPLICAS error when doing the last CLUSTER SETSLOT
Expected '{127.0.0.1 21497 267}' to be equal to '' (context: type eval line 34 cmd {assert_equal [lindex [R 3 role] 2] {}} proc ::test)
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
zhaozhao.zz
1864c2f4b4 standalone -REDIRECT handles special case of MULTI context (#895)
In standalone mode, when a `-REDIRECT` error occurs, special handling is
required if the client is in the `MULTI` context.

We have adopted the same handling method as the cluster mode:

1. If a command in the transaction encounters a `REDIRECT` at the time
of queuing, the execution of `EXEC` will return an `EXECABORT` error (we
expect the client to redirect and discard the transaction upon receiving
a `REDIRECT`). That is:

    ```
    MULTI    ==>  +OK
    SET x y  ==>  -REDIRECT
    EXEC     ==>  -EXECABORT
    ```
2. If all commands are successfully queued (i.e., `QUEUED` results are
received) but a redirect is detected during `EXEC` execution (such as a
primary-replica switch), a `REDIRECT` is returned to instruct the client
to perform a redirect. That is:

    ```
    MULTI    ==>  +OK
    SET x y  ==>  +QUEUED
    failover
    EXEC     ==>  -REDIRECT
    ```

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-09-03 09:00:45 -07:00
Shivshankar
5450d41fde Migrate zipmap unit test to new framework (#474)
Migrate zipmap unit test to new unit test framework, parent ticket #428
.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Co-authored-by: hwware <wen.hui.ware@gmail.com>
2024-09-03 09:00:45 -07:00
Binbin
115fc0f912 Fix reconfiguring sub-replica causing data loss when myself change shard_id (#944)
When reconfiguring sub-replica, there may a case that the sub-replica will
use the old offset and win the election and cause the data loss if the old
primary went down.

In this case, sender is myself's primary, when executing updateShardId,
not only the sender's shard_id is updated, but also the shard_id of
myself is updated, casuing the subsequent areInSameShard check, that is,
the full_sync_required check to fail.

As part of the recent fix of #885, the sub-replica needs to decide whether
a full sync is required or not when switching shards. This shard membership
check is supposed to be done against sub-replica's current shard_id, which
however was lost in this code path. This then leads to sub-replica joining
the other shard with a completely different and incorrect replication history.

This is the only place where replicaof state can be updated on this path
so the most natural fix would be to pull the chain replication reduction
logic into this code block and before the updateShardId call.

This one follow #885 and closes #942.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-09-03 09:00:45 -07:00
zhaozhao.zz
bc1cc05a25 free client's multi state when it becomes dirty (#961)
Release the client's MULTI state when the transaction becomes dirty to
save memory.

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-09-03 09:00:45 -07:00
Ping Xie
0def936a88 Exclude '.' and ':' from isValidAuxChar's banned charset (#963)
Fix a bug in isValidAuxChar where valid characters '.' and ':' were
incorrectly included in the banned charset. This issue affected the
validation of auxiliary fields in the nodes.conf file used by Valkey in
cluster mode, particularly when handling IPv4 and IPv6 addresses. The
code now correctly allows '.' and ':' as valid characters, ensuring
proper handling of these fields. Comments were added to clarify the use
of the banned charset.
 
Related to #736

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-09-03 09:00:45 -07:00
Binbin
d2d93ea17c Revert make KEYS to be an exact match if there is no pattern (#964)
In #792, the time complexity became ambiguous, fluctuating between
O(1) and O(n), which is a significant difference. And we agree uncertainty
can potentially bring disaster to the business, the right thing to do is
to persuade users to use EXISTS instead of KEYS in this case, to do the
right thing the right way, rather than accommodating this incorrect usage.

This reverts commit d66a06e8183818c035bb78706f46fd62645db07e.
This reverts #792.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
Viktor Söderqvist
be7f15c7b7 Delete TLS.md and update README.md about tests (#960)
Most of the content of TLS.md has already been copied to README.md in
#927.

The description of how to run tests with TLS is moved to
tests/README.md.

Descriptions of the additional scripts runtest-cluster, runtest-sentinel
and runtest-module are added in tests/README.md.

Links to tests/README.md and src/unit/README.md are added in the
top-level README.md along with a brief overview of the `make test-*`
commands.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-09-03 09:00:45 -07:00
Viktor Söderqvist
314ee9b30f Delete files MANIFESTO, BUGS and INSTALL (#958)
The MANIFESTO is not Valkey's manifesto and it doesn't even mention open
source. Let's write another one later, or some other document about our
project principles.

The other two files are one-line files with no relevant info. They're
polluting the file listing at root level. It's the first thing you see
when you start exploring the repo for the first time.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-09-03 09:00:45 -07:00
I-Hsin Cheng
cb2fa0394d Migrate the contents of TLS.md into README.md (#927)
Migrate the contents in TLS.md into TLS sections including building,
running and detail supports. TODO list in the TLS.md is almost done
except the implementation of benchmark support is still not the best
approach which should migrate to hiredis async mode.

Closes #888

---------

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-09-03 09:00:45 -07:00
Ping Xie
dea55a53a8 Add comment explaining log file reopening for rotation support (#956) 2024-09-03 09:00:45 -07:00
mwish
856d5d1b06 Using intrinsics to optimize counting HyperLogLog trailing bits (#846)
Godbolt link: https://godbolt.org/z/3YPvxsr5s

__builtin_ctz would generate shorter code than hand-written loop.

---------

Signed-off-by: mwish <maplewish117@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-03 09:00:45 -07:00
Binbin
ecb6846c8e Add pause path coverage to replica migration tests (#937)
In #885, we only add a shutdown path, there is another path
is that the server might got hang by slowlog. This PR added
the pause path coverage to cover it.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
Lipeng Zhu
5d06dc87e9 Move prepareClientToWrite out of loop for lrange command to reduce the redundant call. (#860)
## Description 
When I explore the cycles distributions for `lrange` test (
`valkey-benchmark -p 9001 -t lrange -d 100 -r 1000000 -n 1000000 -c 50
--threads 4`). I found the `prepareClientToWrite` and
`clientHasPendingReplies` could be reduced to single call outside
instead of called in a loop, ideally we can gain 3% performance. The
corresponding `LRANG_100`, `LRANG_300`, `LRANGE_500`, `LRANGE_600` have
~2% - 3% performance boost, the benchmark test prove it helps.

This patch try to move the `prepareClientToWrite` and its child
`clientHasPendingReplies` out of the loop to reduce the function
overhead.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2024-09-03 09:00:45 -07:00
Binbin
daf09fb538 Wait for the role change and fix the timing issue in the new test (#947)
The test might be fast enough and then there is no change in the role
causing the test to fail. Adding a wait to avoid the timing issue:
```
*** [err]: valkey-cli make source node ignores NOREPLICAS error when doing the last CLUSTER SETSLOT
Expected '{127.0.0.1 23154 267}' to be equal to '' (context: type eval line 24 cmd {assert_equal [lindex [R 3 role] 2] {}} proc ::test)
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
Vadym Khoptynets
8df10af306 Use sdsAllocSize instead of sdsZmallocSize (#923)
sdsAllocSize returns the correct size without consulting the
allocator. Which is much faster than consulting the allocator.
The only exception is SDS_TYPE_5, for which it has to
consult the allocator.

This PR also sets alloc field correctly for embedded string objects.
It assumes that no allocator would allocate a buffer larger
than `259 + sizeof(robj)` for embedded string. We use embedded strings
for strings up to 44 bytes. If this assumption is wrong, the whole
function would require a rewrite. In general case sds type adjustment
might be needed. Such logic should go to sds.c.

---------

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
2024-09-03 09:00:45 -07:00
Amit Nagler
795fa1edba Remove dual-channel-replication Feature Flag's Protection (#908)
Currently, the `dual-channel-replication` feature flag is immutable if
`enable-protected-configs` is enabled, which is the default behavior.
This PR proposes to make the `dual-channel-replication` flag mutable,
allowing it to be changed dynamically without restarting the cluster.

**Motivation:**
The ability to change the `dual-channel-replication` flag dynamically is
essential for testing and validating the feature on real clusters
running in production environments. By making the flag mutable, we can
enable or disable the feature without disrupting the cluster's
operations, facilitating easier testing and experimentation.
Additionally, this change would provide more flexibility for users to
enable or disable the feature based on their specific requirements or
operational needs without requiring a cluster restart.

---------

Signed-off-by: naglera <anagler123@gmail.com>
2024-09-03 09:00:45 -07:00
Viktor Söderqvist
6d52fffef5 Connection minor fixes (#953)
1. Remove redundant connIncrRefs/connDecrRefs

    In socket.c, the reference counter is incremented before calling
callHandler, but the same reference counter is also incremented inside
callHandler before calling the actual callback.

        static inline int callHandler(connection *conn, ConnectionCallbackFunc handler) {
            connIncrRefs(conn);
            if (handler) handler(conn);
            connDecrRefs(conn);
            ...
        }

    This commit removes the redundant incr/decr calls in socket.c

2. Correct return value of connRead for TLS when peer closed

    According to comments in connection.h, connRead returns 0 when the peer
has closed the connection. This patch corrects the return value for TLS
connections. (Without this patch, it returns -1 which means error.)

    There is an observable difference in what is logged in the verbose
level: "Client closed connection" vs "Reading from client: (null)".

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-09-03 09:00:45 -07:00
uriyage
8d899d7464 Improve multithreaded performance with memory prefetching (#861)
This PR utilizes the IO threads to execute commands in batches, allowing
us to prefetch the dictionary data in advance.

After making the IO threads asynchronous and offloading more work to
them in the first 2 PRs, the `lookupKey` function becomes a main
bottle-neck and it takes about 50% of the main-thread time (Tested with
SET command). This is because the Valkey dictionary is a straightforward
but inefficient chained hash implementation. While traversing the hash
linked lists, every access to either a dictEntry structure, pointer to
key, or a value object requires, with high probability, an expensive
external memory access.

### Memory Access Amortization

Memory Access Amortization (MAA) is a technique designed to optimize the
performance of dynamic data structures by reducing the impact of memory
access latency. It is applicable when multiple operations need to be
executed concurrently. The principle behind it is that for certain
dynamic data structures, executing operations in a batch is more
efficient than executing each one separately.

Rather than executing operations sequentially, this approach interleaves
the execution of all operations. This is done in such a way that
whenever a memory access is required during an operation, the program
prefetches the necessary memory and transitions to another operation.
This ensures that when one operation is blocked awaiting memory access,
other memory accesses are executed in parallel, thereby reducing the
average access latency.

We applied this method in the development of `dictPrefetch`, which takes
as parameters a vector of keys and dictionaries. It ensures that all
memory addresses required to execute dictionary operations for these
keys are loaded into the L1-L3 caches when executing commands.
Essentially, `dictPrefetch` is an interleaved execution of dictFind for
all the keys.


**Implementation details**

When the main thread iterates over the `clients-pending-io-read`, for
clients with ready-to-execute commands (i.e., clients for which the IO
thread has parsed the commands), a batch of up to 16 commands is
created. Initially, the command's argv, which were allocated by the IO
thread, is prefetched to the main thread's L1 cache. Subsequently, all
the dict entries and values required for the commands are prefetched
from the dictionary before the command execution. Only then will the
commands be executed.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-09-03 09:00:45 -07:00
Binbin
6694fa90ec Drop the outdated script replication example comments (#951)
This example was for script replication which we have
completely removed in 7.0, so this example is outdated now.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
Binbin
13527b056e Make KEYS to be an exact match if there is no pattern (#792)
Although KEYS is a dangerous command and we recommend people
to avoid using it, some people who are not familiar with it
still using it, and even use KEYS with no pattern at all.

Once KEYS is using with no pattern, we can convert it to an
exact match to avoid iterating over all data.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
xu0o0
8a33a072d2 Fix invalid escape sequence in utils, minor cleanup in python script (#948)
According to the Python document[1], any invalid escape sequences in
string literals now generate a DeprecationWarning (SyntaxWarning as of
3.12) and in the future this will become a SyntaxError.

This Change uses Python’s raw string notation for regular expression
patterns to avoid it.

[1]: https://docs.python.org/3.10/library/re.html

Signed-off-by: haoqixu <hq.xu0o0@gmail.com>
2024-09-03 09:00:45 -07:00
Binbin
c667fd96e3 Add explicit assert to ensure thread_shared_qb won't expand (#938)
Although this won't happen now, adding this statement explicitly.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
Binbin
7d27a7cffd Add epoch information to failover auth denied logs (#816)
When failover deny to vote, sometimes due to network or
some blocking operations, the time of FAILOVER_AUTH_REQUEST
packet arrival is very uncertain. Since there is no epoch
information in these logs, it is hard to associate the log
with other logs.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
NAM UK KIM
b7b2c595c1 Update "Total" message and used_memory_human log information in serverCron() function (#594)
At the VERBOSE/DEBUG log level, which is output once every 5 seconds,
added to show the "Total" message of all clients and to show memory
usage (used_memory) with used_memory_human.
Also, it seems clearer to show "total" number of keys and the number of
volatile in entire keys.

---------

Signed-off-by: NAM UK KIM <namuk2004@naver.com>
2024-09-03 09:00:45 -07:00
Ayush Sharma
0a12ab22a6 Add support for setting the group on a unix domain socket (#901)
Add new optional, immutable string config called `unixsocketgroup`. 
Change the group of the unix socket to `unixsocketgroup` after `bind()`
if specified.

Adds tests to validate the behavior.

Fixes #873.

Signed-off-by: Ayush Sharma <mrayushs933@gmail.com>
2024-09-03 09:00:45 -07:00
Madelyn Olson
b487e18f55 Remove accurate from extra test tag (#935)
Today if we attached the "run-extra-tests" tag it adds at least 20
minutes because the dump-fuzzer test runs with full accuracy. This
fuzzer is useful, but probably only really needed for the daily, so
removing it from the PRs. We still run the fuzzers, just not for as
long.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-03 09:00:45 -07:00
Binbin
0bd121fd38 valkey-cli make source node ignores NOREPLICAS when doing the last CLUSTER SETSLOT (#928)
This fixes #899. In that issue, the primary is cluster-allow-replica-migration no
and its replica is cluster-allow-replica-migration yes.

And during the slot migration:
1. Primary calling blockClientForReplicaAck, waiting its replica.
2. Its replica reconfiguring itself as a replica of other shards due to
replica migration and disconnect from the old primary.
3. The old primary never got the chance to receive the ack, so it got a
timeout and got a NOREPLICAS error.

In this case, the replicas might automatically migrate to another primary,
resulting in the client being unblocked with the NOREPLICAS error. In this
case, since the configuration will eventually propagate itself, we can safely
ignore this error on the source node.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
Binbin
be0b55ba07 Fix CLUSTER SETSLOT block and unblock error when all replicas are down (#879)
In CLUSTER SETSLOT propagation logic, if the replicas are down, the
client will get block during command processing and then unblock
with `NOREPLICAS Not enough good replicas to write`.

The reason is that all replicas are down (or some are down), but
myself->num_replicas is including all replicas, so the client will
get block and always get timeout.

We should only wait for those online replicas, otherwise the waiting
propagation will always timeout since there are not enough replicas.
The admin can easily check if there are replicas that are down for an
extended period of time. If they decide to move forward anyways, we
should not block it. If a replica  failed right before the replication and
was not included in the replication, it would also unlikely win the election.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@google.com>
2024-09-03 09:00:45 -07:00
Yunxiao Du
dc7b7e8061 Delete redundant declaration clusterNodeCoversSlot and countKeysInSlot (#930)
Delete redundant declaration, clusterNodeCoversSlot and countKeysInSlot
has been declared in cluster.h

Signed-off-by: Yunxiao Du <me@jackdu.cn>
2024-09-03 09:00:45 -07:00
Madelyn Olson
b6862c5c6d Revert repl backlog size back to 1mb for dual channel tests (#934)
There is a test that assumes that the backlog will get overrun, but
because of the recent changes to the default it no longer fails. It
seems like it is a bit flakey now though, so resetting the value in the
test back to 1mb. (This relates to the CoB of 1100k. So it should
consistently work with a 1mb limit).

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-03 09:00:45 -07:00
Wen Hui
1aa2ee7b33 Decline unsubscribe related command in non-subscribed mode (#759)
Now, when clients run the unsubscribe, sunsubscribe and punsubscribe
commands in the non-subscribed mode, it returns 0.
Indeed this is a bug, we should not allow client run these kind of
commands here.

Thus, this PR fixes this bug, but it is a break change for existing
clients

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-09-03 09:00:45 -07:00
Binbin
fd994e2be5 Make runtest-cluster support --io-threads option (#933)
In #764, we add a --io-threads mode in test, but forgot
to handle runtest-cluster, they are different framework.

Currently runtest-cluster does not support tags, and we
don't have plan to support it. And currently cluster tests
does not have any io-threads tests, so this PR just align
--io-threads option with #764.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
Binbin
032c7544f4 Avoid to re-establish replication if node is already myself primary in CLUSTER REPLICATE (#884)
If n is already myself primary, there is no need to re-establish the
replication connection.

In the past we allow a replica node to reconnect with its primary via
this CLUSTER REPLICATE command, it will use psync. But since #885, we
will assume that a full sync is needed in this case, so if we don't do
this, the replica will always use full sync.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@google.com>
2024-09-03 09:00:45 -07:00
uriyage
2e937773ec Skip tracking clients OOM test when I/O threads are enabled (#764)
Fix feedback loop in key eviction with tracking clients when using I/O
threads.

Current issue:
Evicting keys while tracking clients or key space-notification exist
creates a feedback loop when using I/O threads:

While evicting keys we send tracking async writes to I/O threads,
preventing immediate release of tracking clients' COB memory
consumption.

Before the I/O thread finishes its write, we recheck used_memory, which
now includes the tracking clients' COB and thus continue to evict more
keys.

**Fix:**
We will skip the test for now while IO threads are active. We may
consider avoiding sending writes in `processPendingWrites` to I/O
threads for tracking clients when we are out of memory.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-03 09:00:45 -07:00
Harkrishn Patro
0f551a2dca Update README.md file to reference valkey.io (#931)
Update README.md since the project is no longer under construction, and
can reference the main website.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-09-03 09:00:45 -07:00
Binbin
f75b03d861 Set repl-backlog-size from 1mb to 10mb by default (#911)
The repl-backlog-size 1mb is too small in most cases, now network
transmission and bandwidth performance have improved rapidly in more
than ten years.

The bigger the replication backlog, the longer the replica can endure
the disconnect and later be able to perform a partial resynchronization.

Part of #653.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
Wen Hui
606cfbb05d Fix Error in Daily CI -- reply-schemas-validator (#922)
Just add one more test for command "sentinel IS-PRIMARY-DOWN-BY-ADDR" to
make the reply-schemas-validator
run successfully.

Note: test result here
https://github.com/hwware/valkey/actions/runs/10457516111

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-09-03 09:00:45 -07:00
zhenwei pi
f143b66a41 RDMA: Support user keepalive command (#916)
If the client side crashes by any issue or exits normally, the kernel
will try to disconnect RDMA QPs. Then the kernel of server side receives
CM packets, valkey-server handles CM disconnected event and close
connection.

However, there is a lack of keepalive mechanism from RDMA transport
layer. Once the kernel of client side crashes, the server side will not
be notified. To avoid this issue, valkey server sents Keepaliv command
periodically to detect any dead QPs.

An example of mlx-cx5:

```
 # RDMA: CQ handle error status: transport retry counter exceeded[0xc], opcode : 0x0
 # RDMA: CQ handle error status: transport retry counter exceeded[0xc], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
```

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-09-03 09:00:45 -07:00
Binbin
f3b93e2190 Fix data loss when replica do a failover with a old history repl offset (#885)
Our current replica can initiate a failover without restriction when
it detects that the primary node is offline. This is generally not a
problem. However, consider the following scenarios:

1. In slot migration, a primary loses its last slot and then becomes
a replica. When it is fully synchronized with the new primary, the new
primary downs.

2. In CLUSTER REPLICATE command, a replica becomes a replica of another
primary. When it is fully synchronized with the new primary, the new
primary downs.

In the above scenario, case 1 may cause the empty primary to be elected
as the new primary, resulting in primary data loss. Case 2 may cause the
non-empty replica to be elected as the new primary, resulting in data
loss and confusion.

The reason is that we have cached primary logic, which is used for psync.
In the above scenario, when clusterSetPrimary is called, myself will cache
server.primary in server.cached_primary for psync. In replicationGetReplicaOffset,
we get server.cached_primary->reploff for offset, gossip it and rank it,
which causes the replica to use the old historical offset to initiate
failover, and it get a good rank, initiates election first, and then is
elected as the new primary.

The main problem here is that when the replica has not completed full
sync, it may get the historical offset in replicationGetReplicaOffset.

The fix is to clear cached_primary in these places where full sync is
obviously needed, and let the replica use offset == 0 to participate
in the election. In this way, this unhealthy replica has a worse rank
and is not easy to be elected.

Of course, it is possible that it will be elected with offset == 0.
In the future, we may need to prohibit the replica with offset == 0
from having the right to initiate elections.

Another point worth mentioning, in above cases:
1. In the ROLE command, the replica status will be handshake, and the
offset will be -1.
2. Before this PR, in the CLUSTER SHARD command, the replica status will
be online, and the offset will be the old cached value (which is wrong).
3. After this PR, in the CLUSTER SHARD, the replica status will be loading,
and the offset will be 0.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00
Binbin
1fb7fc659c Correct RDB_EOF_MARK_SIZE usage where EOF mark is relevant (#925)
In these places we should use RDB_EOF_MARK_SIZE, but we mixed
it with CONFIG_RUN_ID_SIZE. This is not an issue since they are
all 40, just a cleanup.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-03 09:00:45 -07:00