11934 Commits

Author SHA1 Message Date
Vitah Lin
ae6c6495bf Add Codecov for Automated Code Coverage (#316)
This PR introduces Codecov to automate code coverage tracking for our
project's tests.

For more information about the Codecov platform, please refer to
https://docs.codecov.com/docs/quick-start

---------

Signed-off-by: Vitah Lin <vitahlin@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-08 11:35:54 -08:00
Björn Svensson
4b2edc68ca Set permissions for Github Actions in CI (#312)
This sets the default permission for current CI workflows to only be
able to read from the repository (scope: "contents").
When a used Github Action require additional permissions (like CodeQL)
we grant that permission on job-level instead.

This means that a compromised action will not be able to modify the repo
or even steal secrets since all other permission-scopes are implicit set
to "none", i.e. not permitted. This is recommended by
[OpenSSF](https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions).

This PR includes a small fix for the possibility of missing server logs
artifacts, found while verifying the permission.
The `upload-artifact@v3` action will replace artifacts which already
exists. Since both CI-jobs `test-external-standalone` and
`test-external-nodebug` uses the same artifact name, when both jobs
fail, we only get logs from the last finished job. This can be avoided
by using unique artifact names.

This PR is part of #211

More about permissions and scope can be found here:

https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions

---------

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
2025-01-08 11:35:54 -08:00
Harkrishn Patro
62b42707ea Add release notes for 7.2.8 2025-01-08 11:35:54 -08:00
Madelyn Olson
e04acb377e Fix LUA garbage collector (CVE-2024-46981) (#1513)
Reset GC state before closing the lua VM to prevent user data to be
wrongly freed while still might be used on destructor callbacks.

Created and publish by Redis in their OSS branch.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>
2025-01-08 11:35:54 -08:00
Madelyn Olson
bc1680d7e6 Fix Read/Write key pattern selector (CVE-2024-51741) (#1514)
The explanation on the original commit was wrong. Key based access must
have a `~` in order to correctly configure whey key prefixes to apply
the selector to. If this is missing, a server assert will be triggered
later.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>
2025-01-08 11:35:54 -08:00
gmbnomis
6101248fb0 Use the correct command proc for the LOOKUP_NOTOUCH exception in lookupKey (#1499)
When looking up a key in no-touch mode, `LOOKUP_NOTOUCH` is set to avoid
updating the last access time in `lookupKey`. An exception must be made
for the `TOUCH` command which must always update the key.

When called from a script, `server.executing_client` will point to the
`TOUCH` command, while `server.current_client` will point to e.g. an
`EVAL` command. So, we must use the former to find out the currently
executing command if defined.

This fix addresses the issue where TOUCH wasn't updating key access
times when called from scripts like EVAL.

Fixes #1498

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-01-08 11:35:54 -08:00
muelstefamzn
51fcd6e4ca Trim free space from inline command argument strings to avoid excess memory usage (#1213)
The command argument strings created while parsing inline commands (see
`processInlineBuffer()`) can contain free capacity. Since some commands
,such as `SET`, store these strings in the database, that free capacity
increases the memory usage. In the worst case, it could double the
memory usage.

This only occurs if the inline command format is used. The argument
strings are built by appending character by character in
`sdssplitargs()`. Regular RESP commands are not affected.

This change trims the strings within `processInlineBuffer()`.

this?

When the command argument string is packed into an object,
`trimStringObjectIfNeeded()` is called.

This does only trim the string if it is larger than
`PROTO_MBULK_BIG_ARG` (32kB), as only strings larger than this would
ever need trimming if the command it sent using the bulk string format.

We could modify this condition, but that would potentially have a
performance impact on commands using the bulk format. Since those make
up for the vast majority of executed commands, limiting this change to
inline commands seems prudent.

* 1 million `SET [key] [value]` commands
* Random keys (16 bytes)
* 600 bytes values

Memory usage without this change:

```
used_memory:1089327888
used_memory_human:1.01G
used_memory_rss:1131696128
used_memory_rss_human:1.05G
used_memory_peak:1089348264
used_memory_peak_human:1.01G
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:1040025088
used_memory_dataset_perc:95.55%
```

Memory usage with this change:
```
used_memory:705327888
used_memory_human:672.65M
used_memory_rss:718802944
used_memory_rss_human:685.50M
used_memory_peak:705348256
used_memory_peak_human:672.67M
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:656025088
used_memory_dataset_perc:93.13%
```

If the same experiment is repeated using the normal RESP array of bulk
string format (`*3\r\n$3\r\nSET\r\n...`) then the memory usage is 672MB
with and without of this change.

If a replica is attached, its memory usage is 672MB with and without
this change, since the replication link never uses inline commands.

Signed-off-by: Stefan Mueller <muelstef@amazon.com>
2025-01-08 11:35:54 -08:00
Binbin
ccc8acec9d Fix FUNCTION KILL error message being displayed as SCRIPT KILL (#1171)
The client that was killed by FUNCTION KILL received a reply of
SCRIPT KILL and the server log also showed SCRIPT KILL.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-08 11:35:54 -08:00
Madelyn Olson
357191acb9
Apply security fixes for CVEs (#1113)
Apply the security fixes for the release.

(CVE-2024-31449) Lua library commands may lead to stack overflow and
potential RCE.
(CVE-2024-31227) Potential Denial-of-service due to malformed ACL
selectors.
(CVE-2024-31228) Potential Denial-of-service due to unbounded pattern
matching.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
7.2.7
2024-10-02 13:11:08 -07:00
Melroy van den Berg
44a7c35d31 Build binary releases with systemd support (#1107)
- Add systemd support to the build artifact tarballs, so people can use
it under systemd compatible distros. As discussed here:
https://github.com/orgs/valkey-io/discussions/1103#discussioncomment-10815549.
Adding `libsystemd-dev` to install and add `USE_SYSTEMD=yes` to the
build.
- Cleanup & bring the arm & x86 workflow files in-sync. It was a bit of
a mess ;) (removing `jq wget awscli` from the 'Tarball' step)

Signed-off-by: Melroy van den Berg <melroy@melroy.org>
2024-10-02 20:11:12 +02:00
Melroy van den Berg
c789c3f1d1 Avoid .c, .d and .o files from being copied to the binary tar.gz releases (#1106)
As discussed here:
https://github.com/orgs/valkey-io/discussions/1103#discussioncomment-10814006

`cp` can't be used anymore, `rsync` is more powerful and allow to
exclude files.

Alternatively:

1. Remove the c, d and o files. Which isn't ideal either.
2. Improve the build. Eg. by building inside a `build` directory instead
of in the src folder.

Ps. I know these workflows aren't trigger in this PR. Only via "Build
Release Packages" workflow action:
https://github.com/valkey-io/valkey/actions/workflows/build-release-packages.yml..
So I can't fully test in this PR. But it should work ^^

Ps. ps. I did test `rsync -av --exclude='*.c' --exclude='*.d'
--exclude='*.o' src/valkey-*` command in isolation and that works as
expected!

---------

Signed-off-by: Melroy van den Berg <melroy@melroy.org>
2024-10-02 20:11:12 +02:00
Madelyn Olson
ef56d713d5 Prepare for 7.2.7 release without security fixes
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-30 15:25:39 -07:00
Binbin
6d03409e01 Fix module RdbLoad wrongly disable the AOF (#1001)
In RdbLoad, we disable AOF before emptyData and rdbLoad to prevent copy-on-write issues. After rdbLoad completes, AOF should be re-enabled, but the code incorrectly checks server.aof_state, which has been reset to AOF_OFF in stopAppendOnly. This leads to AOF not being re-enabled after being disabled.
---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-30 15:17:13 -07:00
Binbin
45ae39af04 Fix missing replication link re-connection when primary's IP/port is updated in clusterProcessGossipSection (#965)
`clusterProcessGossipSection` currently doesn't trigger a check and call `replicationSetPrimary` when `myself`'s primary node’s IP/port is updated. This fix ensures that after every node address update, `replicationSetPrimary` is called if the updated node is `myself`'s primary. This prevents missed updates and ensures that replicas reconnect properly to maintain their replication link with the primary.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-30 15:07:33 -07:00
Ted Lyngmo
2be94e68e8 Log the real reason for why posix_fadvise failed (#430)
`reclaimFilePageCache` did not set `errno` but `rdbSaveInternal` which
is logging the error assumed it did. This makes sure `errno` is set.

Signed-off-by: Ted Lyngmo <ted@lyncon.se>
2024-09-30 14:58:31 -07:00
Roshan Khatri
50eefd647d
[Cherry-Pick]Adds workflows to build release binaries and push to S3 (#315) (#857)
[related to](https://github.com/valkey-io/valkey/issues/230)

Adds workflows to build Valkey binaries and push to S3 to make it
available to download from the website

The Workflows can be triggered by pushing a release to the repo and the
other option is manually by one of the Maintainers.

Once the workflow triggers, it will generate a matrix of Jobs for the
platforms we need to build from `utils/releasetools/build-config.json`
and then the respective Jobs are triggered. These jobs make Valkey with
respect to the platform binaries we want to release and would push to a
private S3 bucket.

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-09-30 14:21:33 -07:00
Pierre
68d887636e
Fix crash in active defrag (#883)
This crash would happen if we disable active defrag while it is running,
then re-enable active defrag. In this case the `expires_cursor` variable
in `activeDefragCycle()` would not be reset to 0 when disabling active
defrag, so when re-enabling it, this variable can still be non-zero,
which causes the `db` and other variables to not be initialized before
starting the defrag process.

Signed-off-by: Pierre Turin <pieturin@amazon.com>
2024-08-12 13:38:20 -07:00
Ping Xie
579cca5f00
Valkey 7.2.6 Patch Release (#842)
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
7.2.6
2024-07-30 16:40:02 -07:00
Harkrishn Patro
3aaa129ce4 Generate correct slot information in cluster shards command on primary failure (#790)
Fix #784

Prior to the change, `CLUSTER SHARDS` command processing might pick a
failed primary node which won't have the slot coverage information and
the slots `output` in turn would be empty. This change finds an
appropriate node which has the slot coverage information served by a
given shard and correctly displays it as part of `CLUSTER SHARDS`
output.

 Before:

 ```
 1) 1) "slots"
   2)  (empty array)
   3) "nodes"
   4) 1)  1) "id"
          2) "2936f22a490095a0a851b7956b0a88f2b67a5d44"
          ...
          9) "role"
         10) "master"
         ...
         13) "health"
         14) "fail"
 ```

 After:

 ```
 1) 1) "slots"
   2) 1) 0
       2) 5461
   3) "nodes"
   4) 1)  1) "id"
          2) "2936f22a490095a0a851b7956b0a88f2b67a5d44"
          ...
          9) "role"
         10) "master"
         ...
         13) "health"
         14) "fail"
 ```

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-07-22 23:48:00 -07:00
Yossi Gottlieb
6df023fb98 Reduce FreeBSD daily scope. (#12758)
The full test is very flaky running on a VM inside GitHub worker, so we
have to settle for only building and running a small smoke test.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-15 14:29:28 -07:00
Jonathan Wright
7cb3426a4b Replace centos 7 with alternative versions (#543)
replace centos 7 with almalinux 8, add almalinux 9, centos stream 9, fedora stable, rawhide

Fixes #527

---------

Signed-off-by: Jonathan Wright <jonathan@almalinux.org>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-12 15:01:57 -07:00
Madelyn Olson
5ab3d1b981 Skip tls for xgroup read regression since it doesn't matter
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-11 20:02:30 -07:00
Ping Xie
ad0a24c742 Add missing test helper function
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
Binbin
010609e8a0 Make valkey compatible with redis-sentinel to start sentinel (#731)
We already have similar changes to check-rdb / check-aof, apply
this change to sentinel.

Fixes #719.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
KarthikSubbarao
0210b642ce Allow Module authentication to succeed when cluster is down (#693)
Module Authentication using a blocking implementation currently gets
rejected when the "cluster is down" from the client timeout cron job
(`clientsCronHandleTimeout`).

This PR exempts clients blocked on Module Authentication from being
rejected here.

---------

Signed-off-by: KarthikSubbarao <karthikrs2021@gmail.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
Binbin
d746556e4f Only primary with slots has the right to mark a node as failed (#634)
In markNodeAsFailingIfNeeded we will count needed_quorum and failures,
needed_quorum is the half the cluster->size and plus one, and
cluster-size
is the size of primary node which contain slots, but when counting
failures, we dit not check if primary has slots.

Only the primary has slots that has the rights to vote, adding a new
clusterNodeIsVotingPrimary to formalize this concept.

Release notes:

bugfix where nodes not in the quorum group might spuriously mark nodes
as failed

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
Sankar
c4244b502e Make cluster meet reliable under link failures (#461)
When there is a link failure while an ongoing MEET request is sent the
sending node stops sending anymore MEET and starts sending PINGs. Since
every node responds to PINGs from unknown nodes with a PONG, the
receiving node never adds the sending node. But the sending node adds
the receiving node when it sees a PONG. This can lead to asymmetry in
cluster membership. This changes makes the sender keep sending MEET
until it sees a PONG, avoiding the asymmetry.

---------

Signed-off-by: Sankar <1890648+srgsanky@users.noreply.github.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
nitaicaro
17b7b8a733 Fix crash where command duration is not reset when client is blocked … (#526)
In #11012, we changed the way command durations were computed to handle
the same command being executed multiple times. In #11970, we added an
assert if the duration is not properly reset, potentially indicating
that a call to report statistics was missed.

I found an edge case where this happens - easily reproduced by blocking
a client on `XGROUPREAD` and migrating the stream's slot. This causes
the engine to process the `XGROUPREAD` command twice:

1. First time, we are blocked on the stream, so we wait for unblock to
come back to it a second time. In most cases, when we come back to
process the command second time after unblock, we process the command
normally, which includes recording the duration and then resetting it.
2. After unblocking we come back to process the command, and this is
where we hit the edge case - at this point, we had already migrated the
slot to another node, so we return a `MOVED` response. But when we do
that, we don’t reset the duration field.

Fix: also reset the duration when returning a `MOVED` response. I think
this is right, because the client should redirect the command to the
right node, which in turn will calculate the execution duration.

Also wrote a test which reproduces this, it fails without the fix and
passes with it.

---------

Signed-off-by: Nitai Caro <caronita@amazon.com>
Co-authored-by: Nitai Caro <caronita@amazon.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
Viktor Söderqvist
ee6b93c899 Don't let install flags affect build (#382)
Don't let the Make valiable `USE_REDIS_SYMLINKS` affect the build.
If it does, it causes the second line in the example below (`make
install`) to recompile what was already compiled on the line above, and
this time it's built without BUILD_TLS=yes USE_SYSTEMD=yes.

    make BUILD_TLS=yes USE_SYSTEMD=yes
    make PREFIX=custom/usr USE_REDIS_SYMLINKS=no install

Fixes #377

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
Yanqi Lv
2342db3927 fix wrong data type conversion in zrangeResultBeginStore (#13148)
In `beginResultEmission`, -1 means the result length is not known in
advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it
will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`.
Although `dictExpand` won't succeed because the size overflows, I think
we'd better to avoid this wrong conversion.

This bug can be triggered when the source of `zrangestore` doesn't exist
or we use `zrangestore` command with `byscore` or `bylex`.
The impact is that dst keys will be converted to use skiplist instead of
listpack.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Binbin
306ce81c10 Fix redis-check-aof incorrectly considering data in manifest format as MP-AOF (#12958)
The check in fileIsManifest misjudged the manifest file. For example,
if resp aof contains "file", it will be considered a manifest file and
the check will fail:
```
*3
$3
set
$4
file
$4
file
```

In #12951, if the preamble aof also contains it, it will also fail.
Fixes #12951.

the bug was happening if the the word "file" is mentioned
in the first 1024 lines of the AOF. and now as soon as it finds
a non-comment line it'll break (if it contains "file" or doesn't)

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Matthew Douglass
d9e20f2964 Fix conversion of numbers in lua args to redis args (#13115)
Since lua_Number is not explicitly an integer or a double, we need to
make an effort
to convert it as an integer when that's possible, since the string could
later be used
in a context that doesn't support scientific notation (e.g. 1e9 instead
of 100000000).

Since fpconv_dtoa converts numbers with the equivalent of `%f` or `%e`,
which ever is shorter,
this would break if we try to pass a long integer number to a command
that takes integer.
we'll get an implicit conversion to string in Lua, and then the parsing
in getLongLongFromObjectOrReply will fail.

```
> eval "redis.call('hincrby', 'key', 'field', '1000000000')" 0
(nil)
> eval "redis.call('hincrby', 'key', 'field', tonumber('1000000000'))" 0
(error) ERR value is not an integer or out of range script: ac99c32e4daf7e300d593085b611de261954a946, on @user_script:1.
```

Switch to using ll2string if the number can be safely represented as a
long long.

The problem was introduced in #10587 (Redis 7.2).
closes #13113.

---------

Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
LiiNen
f91dd1d498 Fix redis-cli --count (for --scan, --bigkeys, etc) was ignored unless --pattern was also used (#13092)
The --count option for redis-cli has been released in redis 7.2.
https://github.com/redis/redis/pull/12042
But I have found in code, that some logic was missing for using this
'count' option.

```
static redisReply *sendScan(unsigned long long *it) {
    redisReply *reply;

    if (config.pattern)
        reply = redisCommand(context, "SCAN %llu MATCH %b COUNT %d",
            *it, config.pattern, sdslen(config.pattern), config.count);
    else
        reply = redisCommand(context,"SCAN %llu",*it);
```

The intention was being able to using scan count.
But in this case, the --count will be only applied when 'pattern' is
declared.
So, I had fix it simply, to be worked properly - even if --pattern
option is not being used.

I tested it simply with time() command several times, and I could see it
works as intended with this commit.
The examples of test results are below:
```
# unstable build

time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)

real    0m1.287s
user    0m0.011s
sys     0m0.022s

# count is not applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)

real    0m1.117s
user    0m0.011s
sys     0m0.020s

# count is applied with --pattern

time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)

real    0m0.045s
user    0m0.002s
sys     0m0.002s
```

```
# fix-redis-cli-scan-count build
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)

real    0m1.084s
user    0m0.008s
sys     0m0.024s

# count is applied even if --pattern is not declared
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)

real    0m0.043s
user    0m0.000s
sys     0m0.004s

# of course this also applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)

real    0m0.031s
user    0m0.002s
sys     0m0.002s
```

Thanks a lot.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Binbin
fa1bba8619 Increase tolerance range to block reprocess tests to avoid timing issues (#13053)
These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```

Increase the range to avoid failures, and improve the comment to be
clearer.
tests was introduced in #13004.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
debing.sun
01473fbd5c Fix crash due to merge of quicklist node introduced by #12955 (#13040)
Fix two crash introducted by #12955

When a quicklist node can't be inserted and split, we eventually merge
the current node with its neighboring
nodes after inserting, and compress the current node and its siblings.

1. When the current node is merged with another node, the current node
may become invalid and can no longer be used.

   Solution: let `_quicklistMergeNodes()` return the merged nodes.

3. If the current node is a LZF quicklist node, its recompress will be
1. If the split node can be merged with a sibling node to become head or
tail, recompress may cause the head and tail to be compressed, which is
not allowed.

    Solution: always recompress to 0 after merging.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
debing.sun
b685fadb77 Prevent LSET command from causing quicklist plain node size to exceed 4GB (#12955)
Fix #12864

The main reason for this crash is that when replacing a element of a
quicklist packed node with lpReplace() method,
if the final size is larger than 4GB, lpReplace() will fail and returns
NULL, causing `node->entry` to be incorrectly set to NULL.

Since the inserted data is not a large element, we can't just replace it
like a large element, first quicklistInsertAfter()
and then quicklistDelIndex(), because the current node may be merged and
invalidated in quicklistInsertAfter().

The solution of this PR:
When replacing a node fails (listpack exceeds 4GB), split the current
node, create a new node to put in the middle, and try to merge them.
This is the same as inserting a large element.
In the worst case, its size will not exceed 4GB.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Oran Agra
e4b88bb10f update redis-check-rdb types (#12969)
seems that we forgot to update the array in redis-check rdb.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Binbin
900ae7aed6 Fix timeout not being set in module blockClient case (#13011)
This was introduced in #13004, missing this assignment.
It causes timeout to be a random value (may be less than now),
and then in `Unblock by timer` test, the client is unblocked
and then it call timeout_callback, since the callback is NULL,
the server will crash.

The crash stack is:
```
beforesleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut
-- the timeout_callback is NULL, invalidFunctionWasCalled
bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Binbin
0b7f032673 Fix blocking commands timeout is reset due to re-processing command (#13004)
In #11012, we will reprocess command when client is unblocked on keys,
in some blocking commands, for example, in the XREADGROUP BLOCK
scenario,
because of the re-processing command, we will recalculate the block
timeout,
causing the blocking time to be reset.

This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly
let the command know that it is being re-processed, later in
blockForKeys
we will not reset the timeout.

Affected BLOCK cases:
- list / zset / stream, added test cases for each.

Unaffected cases:
- module (never re-process the commands).
- WAIT / WAITAOF (never re-process the commands).

Fixes #12998.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
bentotten
5cec6b6b93 When one shard, sole primary node marks potentially failed replica as FAIL instead of PFAIL (#12824)
Fixes issue where a single primary cannot mark a replica as failed in a
single-shard cluster.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Binbin
d09dbd5d73 Add announced-endpoints test to all_tests and fix tls related tests (#12927)
The test was introduced in #10745, but we forgot to add it to the
test_helper.tcl, so our CI did not actually run it. This PR adds it
and ensures it passes CI tests.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Mikhail Koviazin
8c2a76f584
module: fix typo in REGISTER_API (#608)
`REGISTER_API` is supposed to register two functions: one starts with
`ValkeyModule_` and the other one with `RedisModule_`. It does so in
`unstable` branch. However there was a copy-paste mistake during
backporting it to 7.2. This caused modules built with `valkeymodule-rs`
to fail since they called `RedisModule_SetModuleAttribs` which caused
valkey to segfault.

This commit fixes the typo.

Signed-off-by: Mikhail Koviazin <mikhail.koviazin@aiven.io>
2024-06-07 00:24:41 -07:00
Madelyn Olson
26388270f1
Release notes for 7.2.5 (#322)
Add GA release notes for 7.2.5. 

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
7.2.5
2024-04-15 21:18:47 -07:00
Ping Xie
9e61851d2a Fixed url links in valkey.conf (#320)
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-15 12:50:27 -07:00
Madelyn Olson
763c373ced
Bumped version to 7.2.5 for valkey and wrote release notes (#305)
Release notes for RC2 (really 7.2.5-rc1). Based on conversation, decided
to have first release on 7.2.5 to make it clear this was a patch release
over Redis. This could be considered a minor release, but we wanted to
clearly signal compatibility with Redis OSS.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
7.2.5-rc1
2024-04-12 12:06:53 -07:00
Madelyn Olson
b5574b4d2e Add links for security issues (#299)
Add an initial security release page. In the fullness of time I would
like to also include our version support here, but until that has been
decided I would like to keep this simple and just include links.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-11 15:58:12 -07:00
Parth
10a634ce35 Fixing a lua debugger bug that prevented use of 'server' for server.call invocations. (#303)
* Tested it on local instance. This was originally part of
https://github.com/valkey-io/valkey/pull/288 but I am pushing this
separately, so that we can easily merge it into the upcoming release.

```
lua debugger> server ping
<redis> ping
<reply> "+PONG"
lua debugger> redis ping
<redis> ping
<reply> "+PONG"
```

* I also searched for lua debugger related unit tests to add coverage
for this but did not find any relevant test to modify. Leaving it at
that for now.

---------

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
2024-04-11 15:55:32 -07:00
Madelyn Olson
fd20a9b87f Overwrite 7.2 README file with unstable's version
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-11 13:17:06 -07:00
Roshan Khatri
fc3203557f Revert update of RedisModuleEvent_MasterLinkChange (#289)
ValkeyModuleEvent_MasterLinkChange was updated to use more inclusive
language, but it was done in the compatibility layer as well
(RedisModuleEvent_).

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-04-10 17:35:38 -07:00
Björn Svensson
5449089f9e
Correcting the installed redis symlinks in 7.2.4-rc1 (#282)
This is a PR directly to the 7.2 branch.

The make variable `ENGINE_NAME` (from
38632278fd06fe186f7707e4fa099f666d805547) was lost during branching of
`7.2` and tag `7.2.4-rc1`
resulting in the creation of faulty symlinks:
```
      INSTALL SYMLINK valkey-serverredis -> valkey-server
      INSTALL SYMLINK valkey-cliredis -> valkey-cli
      INSTALL SYMLINK valkey-benchmarkredis -> valkey-benchmark
      INSTALL SYMLINK valkey-check-rdbredis -> valkey-check-rdb
      INSTALL SYMLINK valkey-check-aofredis -> valkey-check-aof
      INSTALL SYMLINK valkey-sentinelredis -> valkey-sentinel
```

By just adding the variable we get a minimal diff compared to unstable.

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
2024-04-10 16:55:27 -07:00