12919 Commits

Author SHA1 Message Date
Pierre
c9aea6d2d3
Fix memory leak in forgotten node ping ext code path (#1574)
When processing a cluster bus PING extension, there is a memory leak
when adding a new key to the `nodes_black_list` dict. We now make sure
to free the key `sds` if the dict did not take ownership of it.

Signed-off-by: Pierre Turin <pieturin@amazon.com>
2025-01-16 15:38:15 -08:00
Harkrishn Patro
87cc3d7a71
Fix cluster info sent stats for message with light header (#1563)
This issue affected only two message types (CLUSTERMSG_TYPE_PUBLISH and CLUSTERMSG_TYPE_PUBLISHSHARD) because they used a light message header, which caused the CLUSTER INFO stats to miss sent/received message information for those types.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-01-16 11:25:37 -08:00
Ricardo Dias
af71619c45
Extract the scripting engine code from the functions unit (#1312)
This commit creates a new compilation unit for the scripting engine code
by extracting the existing code from the functions unit.
We're doing this refactor to prepare the code for running the `EVAL`
command using different scripting engines.

This PR has a module API change: we changed the type of error messages
returned by the callback
`ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc` to be a
`ValkeyModuleString` (aka `robj`);

This PR also fixes #1470.

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-01-16 10:08:16 +01:00
Ray Cao
921ba19acb
Incr expired_keys if the unix-time is already expired for EXPIREAT and other commands(#1517)
Some commands that use unix-time, such as `EXPIREAT` and `SET EXAT`, should include the deleted keys in the `expired_keys` statistics if the specified time has already expired, and notifications should be sent in the manner of expired.

---------

Signed-off-by: Ray Cao <zisong.cw@alibaba-inc.com>
2025-01-16 16:40:34 +08:00
Binbin
cda9eee8c9
Allow clang-format to be triggered in push events (#1565)
Just like spell-check workflow, we should allow to trigger it
in push events, so that the forks repo can notice the format
thing way before submitting the PR.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-16 10:23:03 +08:00
Sarthak Aggarwal
6a8f068e36
Adding Missing filters to CLIENT LIST and Dedup Parsing (#1401)
Adds filter options to CLIENT LIST:

    * USER <username>
      Return clients authenticated by <username>.
    * ADDR <ip:port>
      Return clients connected from the specified address.
    * LADDR <ip:port>
      Return clients connected to the specified local address.
    * SKIPME (YES|NO)
      Exclude the current client from the list (default: no).
    * MAXAGE <maxage>
      Only list connections older than the specified age.

Modifies the ID filter to CLIENT KILL to allow multiple IDs

    * ID <client-id> [<client-id>...]
      Kill connections by client ids.


This makes CLIENT LIST and CLIENT KILL accept the same options.

For backward compatibility, the default value for SKIPME is NO for
CLIENT LIST and YES for CLIENT KILL.

The MAXAGE comes from CLIENT KILL, where it *keeps* clients with the
given max age and kills the older ones. This logic becomes weird for
CLIENT LIST, but is kept for similary with CLIENT KILL, for the use case
of first testing manually using CLIENT LIST, and then running CLIENT
KILL with the same filters.

The `ID client-id [client-id ...]` no longer needs to be the last
filter. The parsing logic determines if an argument is an ID or not
based on whether it can be parsed as an integer or not.

Partly addresses: #668

---------

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
2025-01-15 20:44:13 +01:00
zhaozhao.zz
c5a1585547
add paused_actions for INFO Clients (#1519)
Add `paused_actions` and `paused_timeout_milliseconds` for INFO Clients
to inform users about if clients are paused.

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2025-01-14 19:01:00 +08:00
Viktor Söderqvist
2a1a65b4c7
Introduce const_sds for const-content sds (#1553)
`sds` is a typedef of `char *`.

`const sds` means `char * const`, i.e. a const-pointer to non-const
content.

More often, you would want `const char *`, i.e. a pointer to
const-content. Until now, it's not possible to express that. This PR
adds `const_sds` which is a pointer to const-content sds.

To get a const-pointer to const-content sds, you can use `const
const_sds`.

In this PR, some uses of `const sds` are replaced by `const_sds`. We can
use it more later.

Fixes #1542

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-14 10:38:12 +01:00
Amit Nagler
6be1c77b1e
Fix valgrind test (#1555)
Introduced at https://github.com/valkey-io/valkey/pull/1165/files

Signed-off-by: naglera <anagler123@gmail.com>
2025-01-14 10:49:46 +02:00
secwall
fdc89c56b7
Escape unix socket group in unit tests (#1554)
In some cases unix groups could have whitespace and/or `\` in them.
One example is my workstation. It's a MacOS in an Active Directory
domain. So my user has group `LD\Domain Users`.
Running `make test` on `unstable` and `8.0` branches fails with:

I'm not sure if we need to fix this in 8.0. But it seems that it should
be fixed in unstable.

Signed-off-by: secwall <secwall@yandex-team.ru>
2025-01-13 20:05:04 -08:00
Rain Valentine
d13aad45f4
Replace dict with new hashtable: hash datatype (#1502)
This PR replaces dict with the new hashtable data structure in the HASH
datatype. There is a new struct for hashtable items which contains a
pointer to value sds string and the embedded key sds string. These
values were previously stored in dictEntry. This structure is kept
opaque so we can easily add small value embedding or other optimizations
in the future.

closes #1095

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
2025-01-13 11:17:16 +01:00
Viktor Söderqvist
dc9ca1b98d
Test coverage for ECHO for reply schema validation (#1549)
After #1545 disabled some tests for reply schema validation, we now have
another issue that ECHO is not covered.

```
WARNING! The following commands were not hit at all:
  echo
ERROR! at least one command was not hit by the tests
```

This patch adds a test case for ECHO in the unit/other test suite. I
haven't checked if there are more commands that aren't covered.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-13 10:14:09 +08:00
Viktor Söderqvist
ad592f73d7
Skip CLI tests with reply schema validation (#1545)
The commands used in valkey-cli tests are not important the reply schema
validation. Skip them to avoid the problem if tests hanging. This has
failed lately in the daily job:

```
[TIMEOUT]: clients state report follows.
sock55fedcc19be0 => (IN PROGRESS) valkey-cli pubsub mode with single standard channel subscription
Killing still running Valkey server 33357
```

These test cases use a special valkey-cli command `:get pubsub` command,
which is an internal command to valkey-cli rather than a Valkey server
command. This command hangs when compiled with with logreqres enabled.
Easy solution is to skip the tests in this setup.

The test cases were introduced in #1432.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-12 08:02:39 +08:00
Binbin
11cb8ee27c
Add latency stats around cluster config file operations (#1534)
When the cluster changes, we need to persist the cluster configuration,
and these file IO operations may cause latency.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-11 11:03:10 +08:00
Binbin
10357ceda5
Mark the node as FAIL when the node is marked as NOADDR and broadcast the FAIL (#1191)
Imagine we have a cluster, for example a three-shard cluster,
if shard 1 doing a CLUSTER RESET HARD, it will change the node
name, and then other nodes will mark it as NOADR since the node
name received by PONG has changed.

In the eyes of other nodes, there is one working primary node
left but with no address, and in this case, the address report
in MOVED will be invalid and will confuse the clients. And in
the same time, the replica will not failover since its primary
is not in the FAIL state. And the cluster looks OK to everyone.

This leaves a cluster that appears OK, but with no coverage for
shard 1, obviously we should do something like CLUSTER FORGET
to remove the node and fix the cluster before using it.

But the point in here, we can mark the NOADDR node as FAIL to
advance the cluster state. If a node is NOADDR means it does
not have a valid address, so we won't reconnect it, we won't
send PING, we won't gossip it, it seems reasonable to mark it
as FAIL.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-11 11:02:05 +08:00
Binbin
211b250aad
Do election in order based on failed primary rank to avoid voting conflicts (#1018)
When multiple primary nodes fail simultaneously, the cluster can not recover
within the default effective time (data_age limit). The main reason is that
the vote is without ranking among multiple replica nodes, which case too many
epoch conflicts.

Therefore, we introduced into ranking based on the failed primary shard-id.
Introduced a new failed_primary_rank var, this var means the rank of this
myself instance in the context of all failed primary list. This var will be
used in failover and we will do the failover election packets in order based
on the rank, this can effectively avoid the voting conflicts.

If a single primary is down, the behavior is the same as before. If multiple
primaries are down, their replica election initiation time will be delayed
by 500ms according to the ranking.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-11 10:43:18 +08:00
Binbin
d6bdd9e7d7
Fix module LatencyAddSample still work when latency-monitor-threshold is 0 (#1541)
When latency-monitor-threshold is set to 0, it means the latency monitor
is disabled, and in VM_LatencyAddSample, we wrote the condition
incorrectly, causing us to record latency when latency was turned off.

This bug was introduced in the very first day, see e3b1d6d, it was merged
in 2019.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-11 10:32:58 +08:00
Binbin
e60990e579
Fix crash when freeing newly created node when nodeIp2String fail (#1535)
In #1441, we found a assert, and decided remove this assert and instead
just free the newly created node and close the link, since if we cannot
get the IP from the link it probably means the connection was closed.
```
=== VALKEY BUG REPORT START: Cut & paste starting from here ===
17847:M 19 Dec 2024 00:15:58.021 # === ASSERTION FAILED ===
17847:M 19 Dec 2024 00:15:58.021 # ==> cluster_legacy.c:3252 'nodeIp2String(node->ip, link, hdr->myip) == C_OK' is not true

------ STACK TRACE ------

17847 valkey-server *
src/valkey-server 127.0.0.1:27131 [cluster](clusterProcessPacket+0x1304) [0x4e5634]
src/valkey-server 127.0.0.1:27131 [cluster](clusterReadHandler+0x11e) [0x4e59de]
/__w/valkey/valkey/src/valkey-tls.so(+0x2f1e) [0x7f083983ff1e]
src/valkey-server 127.0.0.1:27131 [cluster](aeMain+0x8a) [0x41afea]
src/valkey-server 127.0.0.1:27131 [cluster](main+0x4d7) [0x40f547]
/lib64/libc.so.6(+0x40c8) [0x7f083985a0c8]
/lib64/libc.so.6(__libc_start_main+0x8b) [0x7f083985a18b]
src/valkey-server 127.0.0.1:27131 [cluster](_start+0x25) [0x410ef5]
```

But it also introduces another assert. The reason is that this new node
is not added to the cluster nodes dict.
```
17128:M 08 Jan 2025 10:51:44.061 # === ASSERTION FAILED ===
17128:M 08 Jan 2025 10:51:44.061 # ==> cluster_legacy.c:1693 'dictDelete(server.cluster->nodes, nodename) == DICT_OK' is not true

------ STACK TRACE ------

17128 valkey-server *
src/valkey-server 127.0.0.1:28627 [cluster][0x4ebdc4]
src/valkey-server 127.0.0.1:28627 [cluster][0x4e81d2]
src/valkey-server 127.0.0.1:28627 [cluster](clusterReadHandler+0x268)[0x4e8618]
/__w/valkey/valkey/src/valkey-tls.so(+0xb278)[0x7f109480b278]
src/valkey-server 127.0.0.1:28627 [cluster](aeMain+0x89)[0x592b09]
src/valkey-server 127.0.0.1:28627 [cluster](main+0x4b3)[0x453e23]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x7f10958bf7e5]
src/valkey-server 127.0.0.1:28627 [cluster](_start+0x2e)[0x454a5e]
```

This closes #1527.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-10 10:19:04 +08:00
Harkrishn Patro
c338de3d46
Update upload artifacts to v4 (#1539)
Fixes #1538

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2025-01-09 17:19:36 -08:00
Madelyn Olson
d99457c09c
Free the passed in lua context instead of the global (#1536)
The fix that Redis gave us for the CVE-2024-46981 was freeing lctx.lua,
and I didn't merge it correctly. We made some changes so that we are
able to async free the lua context, so we need to free the passed in
context. This was applied correctly on the two released versions (8.0
and 7.2) just not on unstable.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-09 14:35:48 +08:00
Binbin
b207b421bc
Fix new cli subscribed mode test in cluster mode (#1533)
We need to add a hash tag in cluster mode.
Fixes #1531.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-09 12:21:31 +08:00
Karthick Ariyaratnam
80c35402bc
Remove legacy SERVER_TEST compiler flag from cmake. (#1530)
This PR is to cleanup the `SERVER_TEST` compiler flag from cmake compile
definitions, as it is no longer required in the new unit test framework, see #428.

Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
2025-01-09 11:52:45 +08:00
Nadav Gigi
9f4815a224
Accelerate hash table iterator with prefetching (#1501)
This PR introduces improvements to the hashtable iterator, implementing
prefetching technique described in the blog post [Unlock One Million RPS
- Part 2](https://valkey.io/blog/unlock-one-million-rps-part2/) . The
changes lay the groundwork for further enhancements in use cases
involving iterators. Future PRs will build upon this foundation to
improve performance and functionality in various iterator-dependent
operations.

In the pursuit of maximizing iterator performance, I conducted a
comprehensive series of experiments. My tests encompassed a wide range
of approaches, including processing multiple bucket indices in parallel,
prefetching the next bucket upon completion of the current one, and
several other timing and quantity variations. Surprisingly, after
rigorous testing and performance analysis, the simplest implementation
presented in this PR consistently outperformed all other more complex
strategies.

## Implementation

Each time we start iterating over a bucket, we prefetch data for future
iterations:

- We prefetch the entries of the next bucket (if it exists).
- We prefetch the structure (but not the entries) of the bucket after
  the next.

This prefetching is done when we pick up a new bucket, increasing the
chance that the data will be in cache by the time we need it.

## Performance

The data below was taken by conducting keys command on 64cores Graviton
3 Amazon EC2 instance with 50 mil keys in size of 100 bytes each. The
results regarding the duration of “keys *” command was taken from “info
all” command.

```
+--------------------+------------------+-----------------------------+
| prefetching        | Time (seconds)   | Keys Processed per Second   |
+--------------------+------------------+-----------------------------+
| No                 | 11.112279        | 4,499,529                   |
| Yes                | 3.141916         | 15,913,862                  |
+--------------------+------------------+-----------------------------+
Improvement:
Comparing the iterator without prefetching to the one with prefetching, 
we can see a speed improvement of 11.112279 / 3.141916 ≈ 3.54 times faster.
```


### Save command improvment

#### Setup:
- 64cores Graviton 3 Amazon EC2 instance.
-  50 mil keys in size of 100 bytes each.
-  Running valkey server over RAM file system.
-  crc checksum and comperssion off.

#### Results

```
+--------------------+------------------+-----------------------------+
| prefetching        | Time (seconds)   | Keys Processed per Second   |
+--------------------+------------------+-----------------------------+
| No                 | 28               | 1,785,700                   |
| Yes                | 19.6             | 2,550,000                   |
+--------------------+------------------+-----------------------------+
Improvement:
- Reduced SAVE time by 30% (8.4 seconds faster)
- Increased key processing rate by 42.8% (764,300 more keys/second)
```

Signed-off-by: NadavGigi <nadavgigi102@gmail.com>
2025-01-08 23:18:55 +01:00
Viktor Szépe
418f1d059f
Improve Typos configuration (#1456)
- remove old ignores
- fix a "new" typo 🎁

Signed-off-by: Viktor Szépe <viktor@szepe.net>
2025-01-08 22:39:45 +01:00
Nikhil Manglore
9e0204941d
valkey-cli auto-exit from subscribed mode (#1432)
Resolves issue with valkey-cli not auto exiting from subscribed mode on
reaching zero pub/sub subscription (previously filed on Redis)
https://github.com/redis/redis/issues/12592

---------

Signed-off-by: Nikhil Manglore <nmanglor@amazon.com>
2025-01-08 21:03:06 +01:00
Rueian
0a89571dcc
Skip logreqres on tests for the HELLO command (#1528)
Skip logreqres on tests for the HELLO command

Signed-off-by: Rueian <rueiancsie@gmail.com>
2025-01-08 10:05:20 -08:00
Rain Valentine
ab627d6721
Replace dict with new hashtable: sorted set datatype (#1427)
This PR replaces dict with hashtable in the ZSET datatype. Instead of
mapping key to score as dict did, the hashtable maps key to a node in
the skiplist, which contains the score. This takes advantage of
hashtable performance improvements and saves 15 bytes per set item - 24
bytes overhead before, 9 bytes after.

Closes #1096

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-08 18:34:02 +01:00
Lipeng Zhu
8af35a1712
Add build folder to gitignore. (#1488)
Default cmake build folder in vscode is `"cmake.buildDirectory": "${workspaceFolder}/build"`.

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2025-01-08 19:33:02 +08:00
uriyage
6c09eea2bc
client struct: lazy init components and optimize struct layout (#1405)
# Refactor client structure to use modular data components

## Current State
The client structure allocates memory for replication / pubsub /
multi-keys / module / blocked data for every client, despite these
features being used by only a small subset of clients. In addition the
current field layout in the client struct is suboptimal, with poor
alignment and unnecessary padding between fields, leading to a larger
than necessary memory footprint of 896 bytes per client. Furthermore,
fields that are frequently accessed together during operations are
scattered throughout the struct, resulting in poor cache locality.

## This PR's Change

1.  Lazy Initialization 
- **Components are only allocated when first used:**
  - PubSubData: Created on first SUBSCRIBE/PUBLISH operation
  - ReplicationData: Initialized only for replica connections
  - ModuleData: Allocated when module interaction begins
  - BlockingState: Created when first blocking command is issued
  - MultiState: Initialized on MULTI command

2. Memory Layout Optimization:
   - Grouped related fields for better locality
   - Moved rarely accessed fields (e.g., client->name) to struct end
   - Optimized field alignment to eliminate padding

3. Additional changes:
   - Moved watched_keys to be static allocated in the `mstate` struct
   - Relocated replication init logic to replication.c
  

### Key Benefits
- **Efficient Memory Usage:**
- 45% smaller base client structure - Basic clients now use 528 bytes
(down from 896).
- Better memory locality for related operations
- Performance improvement in high throughput scenarios. No performance
regressions in other cases.


### Performance Impact

Tested with 650 clients and 512 bytes values.

#### Single Thread Performance
| Operation   | Dataset | New (ops/sec) | Old (ops/sec) | Change % |
|------------|---------|---------------|---------------|-----------|
| SET        | 1 key   | 261,799      | 258,261      | +1.37%    |
| SET        | 3M keys | 209,134      | ~209,000     | ~0%       |
| GET        | 1 key   | 281,564      | 277,965      | +1.29%    |
| GET        | 3M keys | 231,158      | 228,410      | +1.20%    |

#### 8 IO Threads Performance
| Operation   | Dataset | New (ops/sec) | Old (ops/sec) | Change % |
|------------|---------|---------------|---------------|-----------|
| SET        | 1 key   | 1,331,578    | 1,331,626    | -0.00%    |
| SET        | 3M keys | 1,254,441    | 1,152,645    | +8.83%    |
| GET        | 1 key   | 1,293,149    | 1,289,503    | +0.28%    |
| GET        | 3M keys | 1,152,898    | 1,101,791    | +4.64%    |

#### Pipeline Performance (3M keys)
| Operation | Pipeline Size | New (ops/sec) | Old (ops/sec) | Change % |
|-----------|--------------|---------------|---------------|-----------|
| SET       | 10          | 548,964      | 538,498      | +1.94%    |
| SET       | 20          | 606,148      | 594,872      | +1.89%    |
| SET       | 30          | 631,122      | 616,606      | +2.35%    |
| GET       | 10          | 628,482      | 624,166      | +0.69%    |
| GET       | 20          | 687,371      | 681,659      | +0.84%    |
| GET       | 30          | 725,855      | 721,102      | +0.66%    |

### Observations:
1. Single-threaded operations show consistent improvements (1-1.4%)
2. Multi-threaded performance shows significant gains for large
datasets:
   - SET with 3M keys: +8.83% improvement
   - GET with 3M keys: +4.64% improvement
3. Pipeline operations show consistent improvements:
   - SET operations: +1.89% to +2.35%
   - GET operations: +0.66% to +0.84%
4. No performance regressions observed in any test scenario


Related issue:https://github.com/valkey-io/valkey/issues/761

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-08 10:28:54 +02:00
Rueian
dc4628d444
Add availability_zone to the HELLO command history (#1524)
This PR is a followup for #1487.

Signed-off-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-01-08 10:04:58 +08:00
Madelyn Olson
d3acd90320
Actually run code coverage on ubuntu 22 (#1522)
This commit, https://github.com/valkey-io/valkey/pull/1504, moved the
wrong worker to ubuntu 22. We wanted to move codecov and not coverity.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-07 15:43:46 -08:00
Rueian
3b52186b6a
Add availability_zone to the HELLO response (#1487)
It's inconvenient for client implementations to extract the
`availability_zone` information from the `INFO` response. The `INFO`
response contains a lot of information that a client implementation
typically doesn't need.

This PR adds the availability zone to the `HELLO` response. Clients
usually already use the `HELLO` command for protocol negotiation and
also get the server `version` and `role` from its response. To keep the
`HELLO` response small, the field is only added if availability zone is
configured.

---------

Signed-off-by: Rueian <rueiancsie@gmail.com>
2025-01-07 22:54:55 +01:00
Madelyn Olson
e1db553834
Add tests for acl selectors with no permissions or patterns (#1515)
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-06 15:46:55 -08:00
Madelyn Olson
4ffd3ebdeb
Fix LUA garbage collector (CVE-2024-46981) (#1513)
Reset GC state before closing the lua VM to prevent user data to be
wrongly freed while still might be used on destructor callbacks.

Created and publish by Redis in their OSS branch.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>
2025-01-06 14:02:22 -08:00
Madelyn Olson
7977c55ac9
Fix Read/Write key pattern selector (CVE-2024-51741) (#1514)
The explanation on the original commit was wrong. Key based access must
have a `~` in order to correctly configure whey key prefixes to apply
the selector to. If this is missing, a server assert will be triggered
later.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>
2025-01-06 14:02:16 -08:00
Binbin
c0014ef15e
Check whether to switch to fail when setting the node to pfail in cron (#1061)
This may speed up the transition to the fail state a bit.
Previously we would only check when we received a pfail/fail
report from others in gossip. If myself is the last vote,
we can directly switch to fail in here without waiting for
the next gossip packet.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-06 09:26:17 +08:00
Binbin
33b824137e
Explicitly check C_ERR condition to improve readability in clusterSaveConfig (#1505)
It's not obvious to see it at first, modify it to use C_ERR.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-04 10:47:32 +08:00
eifrah-aws
b3b4bdcda4
CMake: fail on warnings (#1503)
When building with `CMake` (especially the targets `valkey-cli`,
`valkey-server` and `valkey-benchmark`) it is possible to have a
successful build while having warnings.

This PR fixes this - which is aligned with how the `Makefile` is working
today:
- Enable `-Wall` + `-Werror` for valkey targets
- Fixed warning in valkey-cli:jsonStringOutput method

Signed-off-by: Eran Ifrah <eifrah@amazon.com>
2025-01-03 09:44:41 +08:00
Madelyn Olson
fe72c784b7
Move coverity back to ubuntu 22 until test failures are fixed (#1504)
The issues in #1453 seem to
have only shown up since we moved to ubuntu 24, as part of the rolling
`ubunut-latest` migration from 22->24.

Closes #1453.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-03 09:43:16 +08:00
gmbnomis
26a72fa89c
Use the correct command proc for the LOOKUP_NOTOUCH exception in lookupKey (#1499)
When looking up a key in no-touch mode, `LOOKUP_NOTOUCH` is set to avoid
updating the last access time in `lookupKey`. An exception must be made
for the `TOUCH` command which must always update the key.

When called from a script, `server.executing_client` will point to the
`TOUCH` command, while `server.current_client` will point to e.g. an
`EVAL` command. So, we must use the former to find out the currently
executing command if defined.

This fix addresses the issue where TOUCH wasn't updating key access
times when called from scripts like EVAL.

Fixes #1498

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-01-03 09:41:15 +08:00
Wen Hui
93b701d8d4
Update Redis legacy keyword and link in utils/whatisdoing.sh (#1495)
Signed-off-by: hwware <wen.hui.ware@gmail.com>
2025-01-03 09:37:55 +08:00
Ricardo Dias
8d764f27b3
Refactor: move all valkey modules related declarations to module.h (#1489)
In this commit we move all structures and functions declarations related
to Valkey modules from `server.h` to the recently added `module.h` file.

This re-organization makes it easier for new contributors to find the
valkey modules related code, as well as reducing the compilation times
when changes are made to the modules code.

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-01-02 18:35:10 +01:00
Wen Hui
ede4adde7a
Remove releasetools folder (#1496)
The release tool utils\releasetools\ does not work anymore in Valkey, in
this PR, we remove it.

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2025-01-02 10:12:09 -05:00
uriyage
35abb68b79
Offload reading the replication stream to IO threads (#1449)
Support Primary client IO offload.

Related issue: https://github.com/valkey-io/valkey/issues/761

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2025-01-02 10:42:39 +01:00
uriyage
ae70c5459b
replication: fix io-threads possible race by moving waitForClientIO (#1422)
### Fix race with pending writes in replica state transition

#### The Problem
In #60 (Dual channel replication) a new `connWrite` call was added
before the `waitForClientIO` check. This created a race condition where
the main thread may attempt to write to a client that could have pending
writes in IO threads.

#### The Fix
Moved the `waitForClientIO()` call earlier in `syncCommand`, before any
`connWrite` call. This ensures all pending IO operations are completed
before attempting to write to the client.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2025-01-02 10:01:55 +02:00
Amit Nagler
8aff235721
Fix unreliable dual channel Valgrind tests (#1500)
Used same approach as PR #1165 to solve random failures.

Resolves #1491

Signed-off-by: naglera <anagler123@gmail.com>
2025-01-02 10:00:29 +08:00
ranshid
0f273bb648
Align rejected unblocked commands to update the correct error statistic (#577)
Currently, in case a blocked command is unblocked externally (eg. due to
the relevant slot being migrated or the CLIENT UNBLOCK command was
issued, the command statistics will always update the failed_calls error
statistic. This leads to missalignment with
90b9f08e5d
as well as some inconsistencies. For example when a key is migrated
during cluster slot migration, clients blocked on XREADGROUP will be
unblocked and update the rejected_calls stat, while clients blocked on
BLPOP will get unblocked updating the failed_calls stat.

In this PR we add explicit indication in updateStatsOnUnblock thet
indicates if the command was rejected or failed.

---------

Signed-off-by: ranshid <ranshid@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2025-01-01 16:33:09 +02:00
zhenwei pi
a136ad9a50
Make global configs as static (#1159)
Don't expose static configs symbol, and make configEnumGetValue as
static function.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-12-30 15:58:06 -05:00
Pierre
e4179f1f3b
Only (re-)send MEET packet once every handshake timeout period (#1441)
Add `meet_sent` field in `clusterNode` indicating the last time we sent
a MEET packet. Use this field to only (re-)send a MEET packet once every
handshake timeout period when detecting a node without an inbound link.

When receiving multiple MEET packets on the same link while the node is
in handshake state, instead of dropping the packet, we now simply
prevent the creation of a new node. This way we still process the MEET
packet's gossip and reply with a PONG as any other packets.

Improve some logging messages to include `human_nodename`. Add
`nodeExceedsHandshakeTimeout()` function.

This is a follow-up to this previous PR:
https://github.com/valkey-io/valkey/pull/1307
And a partial fix to the crash described in:
https://github.com/valkey-io/valkey/pull/1436

---------

Signed-off-by: Pierre Turin <pieturin@amazon.com>
2024-12-30 15:56:39 -05:00
Madelyn Olson
e470735d91
Immediately restart the defrag cycle if we still need to defrag (#1492) 2024-12-29 08:22:49 -08:00