12547 Commits

Author SHA1 Message Date
Viktor Söderqvist
927c2a8cd1
Delete files MANIFESTO, BUGS and INSTALL (#958)
The MANIFESTO is not Valkey's manifesto and it doesn't even mention open
source. Let's write another one later, or some other document about our
project principles.

The other two files are one-line files with no relevant info. They're
polluting the file listing at root level. It's the first thing you see
when you start exploring the repo for the first time.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-08-28 20:04:23 +02:00
I-Hsin Cheng
6172907094
Migrate the contents of TLS.md into README.md (#927)
Migrate the contents in TLS.md into TLS sections including building,
running and detail supports. TODO list in the TLS.md is almost done
except the implementation of benchmark support is still not the best
approach which should migrate to hiredis async mode.

Closes #888

---------

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-08-28 12:43:29 +02:00
Ping Xie
2b71a78241
Add comment explaining log file reopening for rotation support (#956) 2024-08-27 21:00:17 -07:00
mwish
744b13e302
Using intrinsics to optimize counting HyperLogLog trailing bits (#846)
Godbolt link: https://godbolt.org/z/3YPvxsr5s

__builtin_ctz would generate shorter code than hand-written loop.

---------

Signed-off-by: mwish <maplewish117@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-27 20:44:32 -07:00
Binbin
4fe8320711
Add pause path coverage to replica migration tests (#937)
In #885, we only add a shutdown path, there is another path
is that the server might got hang by slowlog. This PR added
the pause path coverage to cover it.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-28 11:08:27 +08:00
Lipeng Zhu
076bf6605f
Move prepareClientToWrite out of loop for lrange command to reduce the redundant call. (#860)
## Description 
When I explore the cycles distributions for `lrange` test (
`valkey-benchmark -p 9001 -t lrange -d 100 -r 1000000 -n 1000000 -c 50
--threads 4`). I found the `prepareClientToWrite` and
`clientHasPendingReplies` could be reduced to single call outside
instead of called in a loop, ideally we can gain 3% performance. The
corresponding `LRANG_100`, `LRANG_300`, `LRANGE_500`, `LRANGE_600` have
~2% - 3% performance boost, the benchmark test prove it helps.

This patch try to move the `prepareClientToWrite` and its child
`clientHasPendingReplies` out of the loop to reduce the function
overhead.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2024-08-27 19:11:09 -07:00
Binbin
6a84e06b05
Wait for the role change and fix the timing issue in the new test (#947)
The test might be fast enough and then there is no change in the role
causing the test to fail. Adding a wait to avoid the timing issue:
```
*** [err]: valkey-cli make source node ignores NOREPLICAS error when doing the last CLUSTER SETSLOT
Expected '{127.0.0.1 23154 267}' to be equal to '' (context: type eval line 24 cmd {assert_equal [lindex [R 3 role] 2] {}} proc ::test)
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-28 09:51:10 +08:00
Vadym Khoptynets
4f29ad4583
Use sdsAllocSize instead of sdsZmallocSize (#923)
sdsAllocSize returns the correct size without consulting the
allocator. Which is much faster than consulting the allocator.
The only exception is SDS_TYPE_5, for which it has to
consult the allocator.

This PR also sets alloc field correctly for embedded string objects.
It assumes that no allocator would allocate a buffer larger
than `259 + sizeof(robj)` for embedded string. We use embedded strings
for strings up to 44 bytes. If this assumption is wrong, the whole
function would require a rewrite. In general case sds type adjustment
might be needed. Such logic should go to sds.c.

---------

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
2024-08-27 14:43:01 -07:00
Amit Nagler
1ff2a3b6ae
Remove dual-channel-replication Feature Flag's Protection (#908)
Currently, the `dual-channel-replication` feature flag is immutable if
`enable-protected-configs` is enabled, which is the default behavior.
This PR proposes to make the `dual-channel-replication` flag mutable,
allowing it to be changed dynamically without restarting the cluster.

**Motivation:**
The ability to change the `dual-channel-replication` flag dynamically is
essential for testing and validating the feature on real clusters
running in production environments. By making the flag mutable, we can
enable or disable the feature without disrupting the cluster's
operations, facilitating easier testing and experimentation.
Additionally, this change would provide more flexibility for users to
enable or disable the feature based on their specific requirements or
operational needs without requiring a cluster restart.

---------

Signed-off-by: naglera <anagler123@gmail.com>
2024-08-27 10:18:48 -07:00
Viktor Söderqvist
54c0f743dd
Connection minor fixes (#953)
1. Remove redundant connIncrRefs/connDecrRefs

    In socket.c, the reference counter is incremented before calling
callHandler, but the same reference counter is also incremented inside
callHandler before calling the actual callback.

        static inline int callHandler(connection *conn, ConnectionCallbackFunc handler) {
            connIncrRefs(conn);
            if (handler) handler(conn);
            connDecrRefs(conn);
            ...
        }

    This commit removes the redundant incr/decr calls in socket.c

2. Correct return value of connRead for TLS when peer closed

    According to comments in connection.h, connRead returns 0 when the peer
has closed the connection. This patch corrects the return value for TLS
connections. (Without this patch, it returns -1 which means error.)

    There is an observable difference in what is logged in the verbose
level: "Client closed connection" vs "Reading from client: (null)".

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-08-27 16:11:33 +02:00
uriyage
04d76d8b02
Improve multithreaded performance with memory prefetching (#861)
This PR utilizes the IO threads to execute commands in batches, allowing
us to prefetch the dictionary data in advance.

After making the IO threads asynchronous and offloading more work to
them in the first 2 PRs, the `lookupKey` function becomes a main
bottle-neck and it takes about 50% of the main-thread time (Tested with
SET command). This is because the Valkey dictionary is a straightforward
but inefficient chained hash implementation. While traversing the hash
linked lists, every access to either a dictEntry structure, pointer to
key, or a value object requires, with high probability, an expensive
external memory access.

### Memory Access Amortization

Memory Access Amortization (MAA) is a technique designed to optimize the
performance of dynamic data structures by reducing the impact of memory
access latency. It is applicable when multiple operations need to be
executed concurrently. The principle behind it is that for certain
dynamic data structures, executing operations in a batch is more
efficient than executing each one separately.

Rather than executing operations sequentially, this approach interleaves
the execution of all operations. This is done in such a way that
whenever a memory access is required during an operation, the program
prefetches the necessary memory and transitions to another operation.
This ensures that when one operation is blocked awaiting memory access,
other memory accesses are executed in parallel, thereby reducing the
average access latency.

We applied this method in the development of `dictPrefetch`, which takes
as parameters a vector of keys and dictionaries. It ensures that all
memory addresses required to execute dictionary operations for these
keys are loaded into the L1-L3 caches when executing commands.
Essentially, `dictPrefetch` is an interleaved execution of dictFind for
all the keys.


**Implementation details**

When the main thread iterates over the `clients-pending-io-read`, for
clients with ready-to-execute commands (i.e., clients for which the IO
thread has parsed the commands), a batch of up to 16 commands is
created. Initially, the command's argv, which were allocated by the IO
thread, is prefetched to the main thread's L1 cache. Subsequently, all
the dict entries and values required for the commands are prefetched
from the dictionary before the command execution. Only then will the
commands be executed.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-08-26 21:10:44 -07:00
Binbin
694246cfab
Drop the outdated script replication example comments (#951)
This example was for script replication which we have
completely removed in 7.0, so this example is outdated now.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-27 12:04:47 +08:00
Binbin
d66a06e818
Make KEYS to be an exact match if there is no pattern (#792)
Although KEYS is a dangerous command and we recommend people
to avoid using it, some people who are not familiar with it
still using it, and even use KEYS with no pattern at all.

Once KEYS is using with no pattern, we can convert it to an
exact match to avoid iterating over all data.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-27 12:04:27 +08:00
xu0o0
73698fa028
Fix invalid escape sequence in utils, minor cleanup in python script (#948)
According to the Python document[1], any invalid escape sequences in
string literals now generate a DeprecationWarning (SyntaxWarning as of
3.12) and in the future this will become a SyntaxError.

This Change uses Python’s raw string notation for regular expression
patterns to avoid it.

[1]: https://docs.python.org/3.10/library/re.html

Signed-off-by: haoqixu <hq.xu0o0@gmail.com>
2024-08-26 22:53:35 +08:00
Binbin
9f4b1adbea
Add explicit assert to ensure thread_shared_qb won't expand (#938)
Although this won't happen now, adding this statement explicitly.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-25 12:03:34 +08:00
Binbin
c7d1daea05
Add epoch information to failover auth denied logs (#816)
When failover deny to vote, sometimes due to network or
some blocking operations, the time of FAILOVER_AUTH_REQUEST
packet arrival is very uncertain. Since there is no epoch
information in these logs, it is hard to associate the log
with other logs.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-24 18:03:24 +08:00
NAM UK KIM
0053429a02
Update "Total" message and used_memory_human log information in serverCron() function (#594)
At the VERBOSE/DEBUG log level, which is output once every 5 seconds,
added to show the "Total" message of all clients and to show memory
usage (used_memory) with used_memory_human.
Also, it seems clearer to show "total" number of keys and the number of
volatile in entire keys.

---------

Signed-off-by: NAM UK KIM <namuk2004@naver.com>
2024-08-23 18:02:18 -07:00
Ayush Sharma
b48596a914
Add support for setting the group on a unix domain socket (#901)
Add new optional, immutable string config called `unixsocketgroup`. 
Change the group of the unix socket to `unixsocketgroup` after `bind()`
if specified.

Adds tests to validate the behavior.

Fixes #873.

Signed-off-by: Ayush Sharma <mrayushs933@gmail.com>
2024-08-23 11:52:08 -07:00
Madelyn Olson
829aa7fe3c
Remove accurate from extra test tag (#935)
Today if we attached the "run-extra-tests" tag it adds at least 20
minutes because the dump-fuzzer test runs with full accuracy. This
fuzzer is useful, but probably only really needed for the daily, so
removing it from the PRs. We still run the fuzzers, just not for as
long.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-23 11:05:41 -07:00
Binbin
8045994972
valkey-cli make source node ignores NOREPLICAS when doing the last CLUSTER SETSLOT (#928)
This fixes #899. In that issue, the primary is cluster-allow-replica-migration no
and its replica is cluster-allow-replica-migration yes.

And during the slot migration:
1. Primary calling blockClientForReplicaAck, waiting its replica.
2. Its replica reconfiguring itself as a replica of other shards due to
replica migration and disconnect from the old primary.
3. The old primary never got the chance to receive the ack, so it got a
timeout and got a NOREPLICAS error.

In this case, the replicas might automatically migrate to another primary,
resulting in the client being unblocked with the NOREPLICAS error. In this
case, since the configuration will eventually propagate itself, we can safely
ignore this error on the source node.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-23 16:22:30 +08:00
Binbin
5d97f5133c
Fix CLUSTER SETSLOT block and unblock error when all replicas are down (#879)
In CLUSTER SETSLOT propagation logic, if the replicas are down, the
client will get block during command processing and then unblock
with `NOREPLICAS Not enough good replicas to write`.

The reason is that all replicas are down (or some are down), but
myself->num_replicas is including all replicas, so the client will
get block and always get timeout.

We should only wait for those online replicas, otherwise the waiting
propagation will always timeout since there are not enough replicas.
The admin can easily check if there are replicas that are down for an
extended period of time. If they decide to move forward anyways, we
should not block it. If a replica  failed right before the replication and
was not included in the replication, it would also unlikely win the election.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@google.com>
2024-08-23 16:21:53 +08:00
Yunxiao Du
0a11c4a140
Delete redundant declaration clusterNodeCoversSlot and countKeysInSlot (#930)
Delete redundant declaration, clusterNodeCoversSlot and countKeysInSlot
has been declared in cluster.h

Signed-off-by: Yunxiao Du <me@jackdu.cn>
2024-08-23 12:17:27 +08:00
Madelyn Olson
b12668af7a
Revert repl backlog size back to 1mb for dual channel tests (#934)
There is a test that assumes that the backlog will get overrun, but
because of the recent changes to the default it no longer fails. It
seems like it is a bit flakey now though, so resetting the value in the
test back to 1mb. (This relates to the CoB of 1100k. So it should
consistently work with a 1mb limit).

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-22 15:35:28 -07:00
Wen Hui
959dd3485b
Decline unsubscribe related command in non-subscribed mode (#759)
Now, when clients run the unsubscribe, sunsubscribe and punsubscribe
commands in the non-subscribed mode, it returns 0.
Indeed this is a bug, we should not allow client run these kind of
commands here.

Thus, this PR fixes this bug, but it is a break change for existing
clients

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-08-22 11:21:33 -04:00
Binbin
8d9b8c9d3d
Make runtest-cluster support --io-threads option (#933)
In #764, we add a --io-threads mode in test, but forgot
to handle runtest-cluster, they are different framework.

Currently runtest-cluster does not support tags, and we
don't have plan to support it. And currently cluster tests
does not have any io-threads tests, so this PR just align
--io-threads option with #764.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-22 11:21:06 -04:00
Binbin
08aaeea4b7
Avoid to re-establish replication if node is already myself primary in CLUSTER REPLICATE (#884)
If n is already myself primary, there is no need to re-establish the
replication connection.

In the past we allow a replica node to reconnect with its primary via
this CLUSTER REPLICATE command, it will use psync. But since #885, we
will assume that a full sync is needed in this case, so if we don't do
this, the replica will always use full sync.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@google.com>
2024-08-22 11:00:18 +08:00
uriyage
39f8bcb91b
Skip tracking clients OOM test when I/O threads are enabled (#764)
Fix feedback loop in key eviction with tracking clients when using I/O
threads.

Current issue:
Evicting keys while tracking clients or key space-notification exist
creates a feedback loop when using I/O threads:

While evicting keys we send tracking async writes to I/O threads,
preventing immediate release of tracking clients' COB memory
consumption.

Before the I/O thread finishes its write, we recheck used_memory, which
now includes the tracking clients' COB and thus continue to evict more
keys.

**Fix:**
We will skip the test for now while IO threads are active. We may
consider avoiding sending writes in `processPendingWrites` to I/O
threads for tracking clients when we are out of memory.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-21 17:02:57 -07:00
Harkrishn Patro
002d052eef
Update README.md file to reference valkey.io (#931)
Update README.md since the project is no longer under construction, and
can reference the main website.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-08-21 14:19:50 -07:00
Binbin
a1ac459ef1
Set repl-backlog-size from 1mb to 10mb by default (#911)
The repl-backlog-size 1mb is too small in most cases, now network
transmission and bandwidth performance have improved rapidly in more
than ten years.

The bigger the replication backlog, the longer the replica can endure
the disconnect and later be able to perform a partial resynchronization.

Part of #653.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-21 11:59:02 -04:00
Wen Hui
b8dd4fbbf7
Fix Error in Daily CI -- reply-schemas-validator (#922)
Just add one more test for command "sentinel IS-PRIMARY-DOWN-BY-ADDR" to
make the reply-schemas-validator
run successfully.

Note: test result here
https://github.com/hwware/valkey/actions/runs/10457516111

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-08-21 09:36:02 -04:00
zhenwei pi
2673320b66
RDMA: Support user keepalive command (#916)
If the client side crashes by any issue or exits normally, the kernel
will try to disconnect RDMA QPs. Then the kernel of server side receives
CM packets, valkey-server handles CM disconnected event and close
connection.

However, there is a lack of keepalive mechanism from RDMA transport
layer. Once the kernel of client side crashes, the server side will not
be notified. To avoid this issue, valkey server sents Keepaliv command
periodically to detect any dead QPs.

An example of mlx-cx5:

```
 # RDMA: CQ handle error status: transport retry counter exceeded[0xc], opcode : 0x0
 # RDMA: CQ handle error status: transport retry counter exceeded[0xc], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
```

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-08-21 10:38:34 +02:00
Binbin
e1b3629186 Fix data loss when replica do a failover with a old history repl offset (#885)
Our current replica can initiate a failover without restriction when
it detects that the primary node is offline. This is generally not a
problem. However, consider the following scenarios:

1. In slot migration, a primary loses its last slot and then becomes
a replica. When it is fully synchronized with the new primary, the new
primary downs.

2. In CLUSTER REPLICATE command, a replica becomes a replica of another
primary. When it is fully synchronized with the new primary, the new
primary downs.

In the above scenario, case 1 may cause the empty primary to be elected
as the new primary, resulting in primary data loss. Case 2 may cause the
non-empty replica to be elected as the new primary, resulting in data
loss and confusion.

The reason is that we have cached primary logic, which is used for psync.
In the above scenario, when clusterSetPrimary is called, myself will cache
server.primary in server.cached_primary for psync. In replicationGetReplicaOffset,
we get server.cached_primary->reploff for offset, gossip it and rank it,
which causes the replica to use the old historical offset to initiate
failover, and it get a good rank, initiates election first, and then is
elected as the new primary.

The main problem here is that when the replica has not completed full
sync, it may get the historical offset in replicationGetReplicaOffset.

The fix is to clear cached_primary in these places where full sync is
obviously needed, and let the replica use offset == 0 to participate
in the election. In this way, this unhealthy replica has a worse rank
and is not easy to be elected.

Of course, it is possible that it will be elected with offset == 0.
In the future, we may need to prohibit the replica with offset == 0
from having the right to initiate elections.

Another point worth mentioning, in above cases:
1. In the ROLE command, the replica status will be handshake, and the
offset will be -1.
2. Before this PR, in the CLUSTER SHARD command, the replica status will
be online, and the offset will be the old cached value (which is wrong).
3. After this PR, in the CLUSTER SHARD, the replica status will be loading,
and the offset will be 0.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-21 13:11:21 +08:00
Binbin
829243e76b
Correct RDB_EOF_MARK_SIZE usage where EOF mark is relevant (#925)
In these places we should use RDB_EOF_MARK_SIZE, but we mixed
it with CONFIG_RUN_ID_SIZE. This is not an issue since they are
all 40, just a cleanup.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-21 00:00:29 +08:00
ranshid
e2ab7ffd89
Make use of a single listNode pointer for blocking utility lists (#919)
Saves some memory (one pointer) in the client struct.

Since a client cannot be blocked multiple times, we can assume
it will be held in only one extra utility list, so it is ok to maintain
a union of these listNode references. 

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-08-20 18:54:53 +08:00
gmbnomis
7795152fff
Fix valgrind timing issue failure in replica-redirect test (#917)
Wait for the replica to become online before starting the actual test.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
2024-08-18 21:20:53 +08:00
Binbin
70b9285802
Optimize linear search of WAIT and WAITAOF when unblocking the client (#787)
Currently, if the client enters a blocked state, it will be
added to the server.clients_waiting_acks list. When the client
is unblocked, that is, when unblockClient is called, we will
need to linearly traverse server.clients_waiting_acks to delete
the client, and this search is O(N).

When WAIT (or WAITAOF) is used extensively in some cases, this
O(N) search may be time-consuming. We can remember the list node
and store it in the blockingState struct and it can avoid the
linear search in unblockClientWaitingReplicas.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-18 21:20:35 +08:00
Wen Hui
33c7ca41be
Add 4 commands for sentinel and update most test cases and json files (#789)
Add 4 new commands for Sentinel (reference
https://github.com/valkey-io/valkey/issues/36)

Sentinel GET-PRIMARY-ADDR-BY-NAME
Sentinel PRIMARY
Sentinel PRIMARIES
Sentinel IS-PRIMARY-DOWN-BY-ADDR

and deprecate 4 old commands:

Sentinel GET-MASTER-ADDR-BY-NAME
Sentinel MASTER
Sentinel MASTERS
Sentinel IS-MASTER-DOWN-BY-ADDR

and all sentinel tests pass here
https://github.com/hwware/valkey/actions/runs/9962102363/job/27525124583

Note: 

1. runtest-sentinel pass all test cases
2. I finished a sentinel rolling upgrade test: 1 primary 2 replicas 3
sentinel
   there are 4 steps in this test scenario: 
step 1: all 3 sentinel nodes run old sentinel, shutdown primary, and
then new primary can be voted successfully.
step 2: replace sentinel 1 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 3: replace sentinel 2 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 4: replace sentinel 3 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
   
We can see, even mixed version sentinel running, whole system still
works.

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-08-16 09:46:36 -04:00
DarrenJiang13
adf53c212b
Add lfu support for DEBUG OBJECT command, added lfu_freq and lfu_access_time_minutes fields (#479)
For `debug object` command, we use `val->lru` but ignore the `lfu` mode.  
So in `lfu` mode, `debug object` would return meaningless `lru` descriptions. 

Added two new fields lfu_freq and lfu_access_time_minutes.

Signed-off-by: jiangyujie.jyj <yjjiang1996@163.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-08-16 17:49:46 +08:00
Binbin
fc9f291033
Make a light weight version of DEBUG OBJECT, add FAST option (#881)
Adding FAST option to DEBUG OBJECT command.

The light version only shows the light weight infomation,
which mostly O(1). The pre-existing version that show more
stats such as serializedlength sometimes is time consuming.

This should allow looking into debug stats (the key expired
but not deleted), even on huge object, on which we're afraid
to run the command for fear of causing a server freeze.

Somehow like 3ca451c46fed894bf49e7561fa0282d2583f1c06.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-16 10:18:36 +08:00
Binbin
76ad8f7a76
Skip IPv6 tests when TCLSH version is < 8.6 (#910)
In #786, we did skip it in the daily, but not for the others.
When running ./runtest on MacOS, we will get the failure.
```
couldn't open socket: host is unreachable (nodename nor servname provided, or not known)
```

The reason is that TCL 8.5 doesn't support ipv6, so we skip tests
tagged with ipv6. This also revert #786.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-15 15:11:38 +08:00
secwall
103365fe0e
Log unexpected $ENDOFF responses in dual channel replication (#839)
I've tried to test a dual channel replication but forgot to add +sync
for my replication user. As a result replica entered silent cycle like
this:
```
* Connecting to PRIMARY 127.0.0.1:6379
* PRIMARY <-> REPLICA sync started
* Non blocking connect for SYNC fired the event.
* Primary replied to PING, replication can continue...
* Trying a partial resynchronization (request ...)
* PSYNC is not possible, initialize RDB channel.
* Aborting dual channel sync
```

And primary got endless cycle like this:
```
* Replica 127.0.0.1:6380 asks for synchronization
* Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '...', my replication IDs are '...' and '...')
* Replica 127.0.0.1:6380 is capable of dual channel synchronization, and partial sync isn't possible. Full sync will continue with dedicated RDB channel.
```

There was no way to understand that replication user is missing +sync
acl on notice log level. With this one-line change we get a warning
message in our replica log.

---------

Signed-off-by: secwall <secwall@yandex-team.ru>
2024-08-14 22:00:57 -07:00
Pieter Cailliau
4d284daefd
Copyright update to reflect IP transfer from salvatore to Redis (#740)
Update references of copyright being assigned to Salvatore when it was
transferred to Redis Ltd. as per
https://github.com/valkey-io/valkey/issues/544.

---------

Signed-off-by: Pieter Cailliau <pieter@redis.com>
2024-08-14 09:20:36 -07:00
Salvatore Mesoraca
68b2270947
Prevent later accesses to unallocated memory (#907)
A pointer to dtype is stored in the dict forever.
dtype is stack-allocated while the dict created is global.
The dict (and the pointer to dtype in it) will live past the lifetime of
dtype.
clusterManagerLinkDictType is a global object that has the same values
as dtype.

Signed-off-by: Salvatore Mesoraca <salvatore.mesoraca@aiven.io>
2024-08-14 09:03:27 -07:00
zhaozhao.zz
131857e80a
To avoid bouncing -REDIRECT during FAILOVER (#871)
Fix #821

During the `FAILOVER` process, when conditions are met (such as when the
force time is reached or the primary and replica offsets are
consistent), the primary actively becomes the replica and transitions to
the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica,
and after handshaking and other operations, it will eventually send the
`PSYNC FAILOVER` command to the replica, after which the replica will
become the primary. This means that the upgrade of the replica to the
primary is an asynchronous operation, which implies that during the
`FAILOVER_IN_PROGRESS` state, there may be a period of time where both
nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the
request will be redirected to the replica and then redirected back,
causing back and forth redirection. To avoid this situation, during the
`FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that
need to be redirected until the replica truly becomes the primary, and
then resume the execution.

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-08-14 14:04:29 +08:00
Binbin
370bdb3e46
Change server.daylight_active to an atomic variable (#876)
We are updating this variable in the main thread, and the
child threads can printing the logs at the same time. This
generating a warning in SANITIZER=thread:
```
WARNING: ThreadSanitizer: data race (pid=74208)
  Read of size 4 at 0x000102875c10 by thread T3:
    #0 serverLogRaw <null>:52173615 (valkey-server:x86_64+0x10003c556)
    #1 _serverLog <null>:52173615 (valkey-server:x86_64+0x10003ca89)
    #2 bioProcessBackgroundJobs <null>:52173615 (valkey-server:x86_64+0x1001402c9)

  Previous write of size 4 at 0x000102875c10 by main thread (mutexes: write M0):
    #0 afterSleep <null>:52173615 (valkey-server:x86_64+0x10004989b)
    #1 aeProcessEvents <null>:52173615 (valkey-server:x86_64+0x100031e52)
    #2 main <null>:52173615 (valkey-server:x86_64+0x100064a3c)
    #3 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
    #4 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
```

The refresh of daylight_active is not real time, we update
it in aftersleep, so we don't need a strong synchronization,
so using memory_order_relaxed. But also noted we are doing
load/store operations only for daylight_active, which is an
aligned 32-bit integer, so using memory_order_relaxed will
not provide more consistency than what we have today.

So this is just a cleanup that to clear the warning.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-14 13:08:20 +08:00
Amit Nagler
6cb86fff51
Fix dual-channel replication test under valgrind (#904)
Test dual-channel-replication primary gets cob overrun during replica
rdb load` fails during the Valgrind run. This is due to the load
handlers disconnecting before the tests complete, resulting in a low
primary COB. Increasing the handlers' timeout should resolve this issue.

Failure:
https://github.com/valkey-io/valkey/actions/runs/10361286333/job/28681321393

Server logs reveals that the load handler clients were disconnected
before the test started

Also the two previus test took about 20 seconds which is the handler
timeout.

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-13 10:40:19 -07:00
Binbin
f622e375a0
Better messages when valkey-cli cluster --fix meet value check failed (#867)
The clusterManagerCompareKeysValues is introduced in
143bfa1e6e65cf8be1eaad0b8169e2d95ca62f9a, which calls
DEBUG DIGEST-VALUE to check whether the value of the
source node and the target node are consistent.

However, the DEBUG DIGEST-VALUE command is not supported
in older version, such as version 4, and the node will return
unknown subcommand. Or the DEBUG command can be disabled
in version 7, and the node will return DEBUG not allowed.

In these cases, we need to output friendly message to
allow users to proceed to the next step, instead of just
outputing `Value check failed!`.

Unknown subcommand example:
```
*** Target key exists
*** Checking key values on both nodes...
Node 127.0.0.1:30001 replied with error:
ERR unknown subcommand or wrong number of arguments for 'DIGEST-VALUE'. Try DEBUG HELP.
Node 127.0.0.1:30003 replied with error:
ERR unknown subcommand or wrong number of arguments for 'DIGEST-VALUE'. Try DEBUG HELP.
*** Value check failed!
DEBUG DIGEST-VALUE command is not supported.
You can relaunch the command with --cluster-replace option to force key overriding.
```

DEBUG not allowed example:
```
*** Target key exists
*** Checking key values on both nodes...
Node 127.0.0.1:30001 replied with error:
ERR DEBUG command not allowed. If the enable-debug-command option is ...
Node 127.0.0.1:30003 replied with error:
ERR DEBUG command not allowed. If the enable-debug-command option is ...
*** Value check failed!
DEBUG command is not allowed.
You can turn on the enable-debug-command option.
Or you can relaunch the command with --cluster-replace option to force key overriding.
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-13 19:25:14 +08:00
Rayacoo
76f809bc19
Optimize ZUNION[STORE] command by removing unnecessary accumulator dict (#829)
In the past implementation of `ZUNION` and `ZUNIONSTORE` commands, we
first create a temporary dict called `accumulator`. After adding all
member-score mappings to `accumulator`, we still need to convert
`accumulator` back to the final dict `dstzset->dict`. However, we can
directly use `dstzset->dict` to avoid the additional copy operation.

This PR removes the `accumulator` dict and directly uses` dstzset->dict
`to store the member-score mappings.

- **Test**
First, I added 300 unique elements to two sorted sets called
'zunion_test1' and 'zunion_test2'. Then, I tested `zunion` and
`zunionstore` on these two sorted sets. The test results shown below
indicate that the performance of both zunion and zunionstore improved
about 31%.

### ZUNION
#### unstable
```
./valkey-benchmark -P 10 -n 100000 zunion 2 zunion_test1 zunion_test2

Summary:
  throughput summary: 2713.41 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      146.252     3.464   153.343   182.015   184.959   192.895
```
#### This PR
```
./valkey-benchmark -P 10 -n 100000 zunion 2 zunion_test1 zunion_test2

Summary:
  throughput summary: 3543.84 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      108.259     2.984   114.239   141.695   145.151   160.255
```
### ZUNIONSTORE
#### unstable
```
./valkey-benchmark -P 10 -n 100000 zunionstore out 2 zunion_test1 zunion_test2

Summary:
  throughput summary: 3168.07 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      157.511     3.368   183.167   189.311   193.535   231.679
```
#### This PR
```
./valkey-benchmark -P 10 -n 100000 zunionstore out 2 zunion_test1 zunion_test2

Summary:
  throughput summary: 4144.73 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      120.374     2.648   141.823   149.119   153.855   183.167
```

---------

Signed-off-by: RayCao <zisong.cw@alibaba-inc.com>
Signed-off-by: zisong.cw <zisong.cw@alibaba-inc.com>
2024-08-13 16:50:57 +08:00
Eran Liberty
6dfb8203cc
Add debug-context config (#874)
A configuration option with zero impact on server operation but is
printed out on server crash and can be accessed by gdb for debugging. It
can be used by the user/operator to store any free-form string. This
string will persist as long as the server is running and will be
accessible in the following ways:

And printed in crash reports:
```
------ CONFIG DEBUG OUTPUT ------
lazyfree-lazy-eviction no
...
io-threads-do-reads yes
debug-context "test2"
proto-max-bulk-len 512mb
```

---------

Signed-off-by: Eran Liberty <eranl@amazon.com>
Co-authored-by: Eran Liberty <eranl@amazon.com>
2024-08-12 16:33:23 -07:00
naglera
27fce29500
Fix dual-channel-replication related issues (#837)
- Fix TLS bug where connection were shutdown by primary's main process
while the child process was still writing- causing main process to be
blocked.
- TLS connection fix -file descriptors are set to blocking mode in the
main thread, followed by a blocking write. This sets the file
descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`)
(@xbasel).
- Improve the reliability of dual-channel tests. Modify the pause
mechanism to verify process status directly, rather than relying on log.
- Ensure that `server.repl_offset` and `server.replid` are updated
correctly when dual channel synchronization completes successfully.
Thist led to failures in replication tests that validate replication IDs
or compare replication offsets.

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: naglera <58042354+naglera@users.noreply.github.com>
Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: xbasel <103044017+xbasel@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-08-12 13:03:12 -07:00