11946 Commits

Author SHA1 Message Date
Roshan Khatri
fe2ef2616c Workflow changes to fix old release binaries (#1461)
- Moves `build-config.json` to workflow dir to build old versions with
new configs.
- Enables contributors to test release Wf on private repo by adding
`github.event_name == 'workflow_dispatch' ||`

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2025-01-08 11:35:54 -08:00
Binbin
14fb6d3487 Fix wrong file name in build-release-packages.yml (#1437)
Introduced in #1363, the file name does not match.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-08 11:35:54 -08:00
Roshan Khatri
8b17e6a3d9 Fix the secrete for test bucket. (#1447)
We have set the secret as `AWS_S3_TEST_BUCKET` for test bucket and I
missed it in the initial review.

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2025-01-08 11:35:54 -08:00
Vu Diep
8786fcb762 Use configure-aws-credentials workflow instead of passing secret_access_key (#1363)
This PR fixes #1346 where we can get rid of the long term credentials by
using OpenID Connect. OpenID Connect (OIDC) allows your GitHub Actions
workflows to access resources in Amazon Web Services (AWS), without
needing to store the AWS credentials as long-lived GitHub secrets.

---------

Signed-off-by: vudiep411 <vdiep@amazon.com>
2025-01-08 11:35:54 -08:00
Binbin
d6cd90bc8e Skip build-release-packages CI job in forks (#1438)
The CI job was introduced in #1363, we should skip it in forks.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-08 11:35:54 -08:00
Binbin
e59525f037 Skip IPv6 tests when TCLSH version is < 8.6 (#910)
In #786, we did skip it in the daily, but not for the others.
When running ./runtest on MacOS, we will get the failure.
```
couldn't open socket: host is unreachable (nodename nor servname provided, or not known)
```

The reason is that TCL 8.5 doesn't support ipv6, so we skip tests
tagged with ipv6. This also revert #786.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-08 11:35:54 -08:00
Yury-Fridlyand
9fbd8ea344 Fix CI concurrency (#849)
Few CI improvements witch will reduce occupation CI queue and eliminate
stale runs.

1. Kill CI jobs on PRs once PR branch gets a new push. This will prevent
situation happened today - a huge job triggered twice in less than an
hour and occupied all **org** (for all repositories) runners queue for
the rest of the day (see pic). This completely blocked valkey-glide
team.
2. Distribute nightly croned jobs on time to prevent them running
together. Keep in mind, cron's TZ is UTC, so midnight tasks incur
developers located in other timezones.

This must be backported to all release branches (`valkey-x.y` and `x.y`)

![image](https://github.com/user-attachments/assets/923d8237-3cb7-42f5-80c8-5322b3f5187d)

---------

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
2025-01-08 11:35:54 -08:00
Viktor Söderqvist
9308ed4ecb Skip IPv6 tests on MacOS (daily) (#786)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-08 11:35:54 -08:00
Jonathan Wright
763b6f28ca Replace centos 7 with alternative versions (#543)
replace centos 7 with almalinux 8, add almalinux 9, centos stream 9, fedora stable, rawhide

Fixes #527

---------

Signed-off-by: Jonathan Wright <jonathan@almalinux.org>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-08 11:35:54 -08:00
Siddhartha Sankar Mondal
f5d106a90d Deprecate MacOS 11 build target (#524)
Deprecate MacOS 11 build target. End of life June 2024.  Fixes #523

---------

Signed-off-by: Siddhartha Mondal <siddharthmondal@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Roshan Khatri <117414976+roshkhatri@users.noreply.github.com>
2025-01-08 11:35:54 -08:00
Madelyn Olson
3e0c587c08 Automatically notify the slack channel when tests fail (#509)
Adds a job that will automatically run at the end of the daily, which
will collect all the failed tests and send them to the developer slack.
It will include a link to the job as well.

Example job that ran on my private repo:
https://github.com/madolson/valkey/actions/runs/9123245899/job/25085418567

Example notification:
<img width="662" alt="image"
src="https://github.com/valkey-io/valkey/assets/34459052/69127db4-e416-4321-bc06-eefcecab1130">
(Note: I removed the sassy text at the bottom from the PR)

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-08 11:35:54 -08:00
Björn Svensson
d241d77f6c Pin versions of Github Actions in CI (#221)
Pin the Github Action dependencies to the hash according to secure
software development best practices
recommended by the Open Source Security Foundation (OpenSSF).

When developing a CI workflow, it's common to version-pin dependencies
(i.e. actions/checkout@v4). However, version tags are mutable, so a
malicious attacker could overwrite a version tag to point to a malicious
or vulnerable commit instead.
Pinning workflow dependencies by hash ensures the dependency is
immutable and its behavior is guaranteed.
See
https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies

The `dependabot` supports updating a hash and the version comment so its
update will continue to work as before.

Links to used actions and theit tag/hash for review/validation:
https://github.com/actions/checkout/tags    (v4.1.2 was rolled back)
https://github.com/github/codeql-action/tags
https://github.com/maxim-lobanov/setup-xcode/tags
https://github.com/cross-platform-actions/action/releases/tag/v0.22.0
https://github.com/py-actions/py-dependency-install/tags
https://github.com/actions/upload-artifact/tags
https://github.com/actions/setup-node/tags
https://github.com/taiki-e/install-action/releases/tag/v2.32.2

This PR is part of #211.

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
2025-01-08 11:35:54 -08:00
Vitah Lin
ae6c6495bf Add Codecov for Automated Code Coverage (#316)
This PR introduces Codecov to automate code coverage tracking for our
project's tests.

For more information about the Codecov platform, please refer to
https://docs.codecov.com/docs/quick-start

---------

Signed-off-by: Vitah Lin <vitahlin@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-08 11:35:54 -08:00
Björn Svensson
4b2edc68ca Set permissions for Github Actions in CI (#312)
This sets the default permission for current CI workflows to only be
able to read from the repository (scope: "contents").
When a used Github Action require additional permissions (like CodeQL)
we grant that permission on job-level instead.

This means that a compromised action will not be able to modify the repo
or even steal secrets since all other permission-scopes are implicit set
to "none", i.e. not permitted. This is recommended by
[OpenSSF](https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions).

This PR includes a small fix for the possibility of missing server logs
artifacts, found while verifying the permission.
The `upload-artifact@v3` action will replace artifacts which already
exists. Since both CI-jobs `test-external-standalone` and
`test-external-nodebug` uses the same artifact name, when both jobs
fail, we only get logs from the last finished job. This can be avoided
by using unique artifact names.

This PR is part of #211

More about permissions and scope can be found here:

https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions

---------

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
2025-01-08 11:35:54 -08:00
Harkrishn Patro
62b42707ea Add release notes for 7.2.8 2025-01-08 11:35:54 -08:00
Madelyn Olson
e04acb377e Fix LUA garbage collector (CVE-2024-46981) (#1513)
Reset GC state before closing the lua VM to prevent user data to be
wrongly freed while still might be used on destructor callbacks.

Created and publish by Redis in their OSS branch.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>
2025-01-08 11:35:54 -08:00
Madelyn Olson
bc1680d7e6 Fix Read/Write key pattern selector (CVE-2024-51741) (#1514)
The explanation on the original commit was wrong. Key based access must
have a `~` in order to correctly configure whey key prefixes to apply
the selector to. If this is missing, a server assert will be triggered
later.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>
2025-01-08 11:35:54 -08:00
gmbnomis
6101248fb0 Use the correct command proc for the LOOKUP_NOTOUCH exception in lookupKey (#1499)
When looking up a key in no-touch mode, `LOOKUP_NOTOUCH` is set to avoid
updating the last access time in `lookupKey`. An exception must be made
for the `TOUCH` command which must always update the key.

When called from a script, `server.executing_client` will point to the
`TOUCH` command, while `server.current_client` will point to e.g. an
`EVAL` command. So, we must use the former to find out the currently
executing command if defined.

This fix addresses the issue where TOUCH wasn't updating key access
times when called from scripts like EVAL.

Fixes #1498

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-01-08 11:35:54 -08:00
muelstefamzn
51fcd6e4ca Trim free space from inline command argument strings to avoid excess memory usage (#1213)
The command argument strings created while parsing inline commands (see
`processInlineBuffer()`) can contain free capacity. Since some commands
,such as `SET`, store these strings in the database, that free capacity
increases the memory usage. In the worst case, it could double the
memory usage.

This only occurs if the inline command format is used. The argument
strings are built by appending character by character in
`sdssplitargs()`. Regular RESP commands are not affected.

This change trims the strings within `processInlineBuffer()`.

this?

When the command argument string is packed into an object,
`trimStringObjectIfNeeded()` is called.

This does only trim the string if it is larger than
`PROTO_MBULK_BIG_ARG` (32kB), as only strings larger than this would
ever need trimming if the command it sent using the bulk string format.

We could modify this condition, but that would potentially have a
performance impact on commands using the bulk format. Since those make
up for the vast majority of executed commands, limiting this change to
inline commands seems prudent.

* 1 million `SET [key] [value]` commands
* Random keys (16 bytes)
* 600 bytes values

Memory usage without this change:

```
used_memory:1089327888
used_memory_human:1.01G
used_memory_rss:1131696128
used_memory_rss_human:1.05G
used_memory_peak:1089348264
used_memory_peak_human:1.01G
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:1040025088
used_memory_dataset_perc:95.55%
```

Memory usage with this change:
```
used_memory:705327888
used_memory_human:672.65M
used_memory_rss:718802944
used_memory_rss_human:685.50M
used_memory_peak:705348256
used_memory_peak_human:672.67M
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:656025088
used_memory_dataset_perc:93.13%
```

If the same experiment is repeated using the normal RESP array of bulk
string format (`*3\r\n$3\r\nSET\r\n...`) then the memory usage is 672MB
with and without of this change.

If a replica is attached, its memory usage is 672MB with and without
this change, since the replication link never uses inline commands.

Signed-off-by: Stefan Mueller <muelstef@amazon.com>
2025-01-08 11:35:54 -08:00
Binbin
ccc8acec9d Fix FUNCTION KILL error message being displayed as SCRIPT KILL (#1171)
The client that was killed by FUNCTION KILL received a reply of
SCRIPT KILL and the server log also showed SCRIPT KILL.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-08 11:35:54 -08:00
Madelyn Olson
357191acb9
Apply security fixes for CVEs (#1113)
Apply the security fixes for the release.

(CVE-2024-31449) Lua library commands may lead to stack overflow and
potential RCE.
(CVE-2024-31227) Potential Denial-of-service due to malformed ACL
selectors.
(CVE-2024-31228) Potential Denial-of-service due to unbounded pattern
matching.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
7.2.7
2024-10-02 13:11:08 -07:00
Melroy van den Berg
44a7c35d31 Build binary releases with systemd support (#1107)
- Add systemd support to the build artifact tarballs, so people can use
it under systemd compatible distros. As discussed here:
https://github.com/orgs/valkey-io/discussions/1103#discussioncomment-10815549.
Adding `libsystemd-dev` to install and add `USE_SYSTEMD=yes` to the
build.
- Cleanup & bring the arm & x86 workflow files in-sync. It was a bit of
a mess ;) (removing `jq wget awscli` from the 'Tarball' step)

Signed-off-by: Melroy van den Berg <melroy@melroy.org>
2024-10-02 20:11:12 +02:00
Melroy van den Berg
c789c3f1d1 Avoid .c, .d and .o files from being copied to the binary tar.gz releases (#1106)
As discussed here:
https://github.com/orgs/valkey-io/discussions/1103#discussioncomment-10814006

`cp` can't be used anymore, `rsync` is more powerful and allow to
exclude files.

Alternatively:

1. Remove the c, d and o files. Which isn't ideal either.
2. Improve the build. Eg. by building inside a `build` directory instead
of in the src folder.

Ps. I know these workflows aren't trigger in this PR. Only via "Build
Release Packages" workflow action:
https://github.com/valkey-io/valkey/actions/workflows/build-release-packages.yml..
So I can't fully test in this PR. But it should work ^^

Ps. ps. I did test `rsync -av --exclude='*.c' --exclude='*.d'
--exclude='*.o' src/valkey-*` command in isolation and that works as
expected!

---------

Signed-off-by: Melroy van den Berg <melroy@melroy.org>
2024-10-02 20:11:12 +02:00
Madelyn Olson
ef56d713d5 Prepare for 7.2.7 release without security fixes
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-30 15:25:39 -07:00
Binbin
6d03409e01 Fix module RdbLoad wrongly disable the AOF (#1001)
In RdbLoad, we disable AOF before emptyData and rdbLoad to prevent copy-on-write issues. After rdbLoad completes, AOF should be re-enabled, but the code incorrectly checks server.aof_state, which has been reset to AOF_OFF in stopAppendOnly. This leads to AOF not being re-enabled after being disabled.
---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-30 15:17:13 -07:00
Binbin
45ae39af04 Fix missing replication link re-connection when primary's IP/port is updated in clusterProcessGossipSection (#965)
`clusterProcessGossipSection` currently doesn't trigger a check and call `replicationSetPrimary` when `myself`'s primary node’s IP/port is updated. This fix ensures that after every node address update, `replicationSetPrimary` is called if the updated node is `myself`'s primary. This prevents missed updates and ensures that replicas reconnect properly to maintain their replication link with the primary.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-30 15:07:33 -07:00
Ted Lyngmo
2be94e68e8 Log the real reason for why posix_fadvise failed (#430)
`reclaimFilePageCache` did not set `errno` but `rdbSaveInternal` which
is logging the error assumed it did. This makes sure `errno` is set.

Signed-off-by: Ted Lyngmo <ted@lyncon.se>
2024-09-30 14:58:31 -07:00
Roshan Khatri
50eefd647d
[Cherry-Pick]Adds workflows to build release binaries and push to S3 (#315) (#857)
[related to](https://github.com/valkey-io/valkey/issues/230)

Adds workflows to build Valkey binaries and push to S3 to make it
available to download from the website

The Workflows can be triggered by pushing a release to the repo and the
other option is manually by one of the Maintainers.

Once the workflow triggers, it will generate a matrix of Jobs for the
platforms we need to build from `utils/releasetools/build-config.json`
and then the respective Jobs are triggered. These jobs make Valkey with
respect to the platform binaries we want to release and would push to a
private S3 bucket.

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-09-30 14:21:33 -07:00
Pierre
68d887636e
Fix crash in active defrag (#883)
This crash would happen if we disable active defrag while it is running,
then re-enable active defrag. In this case the `expires_cursor` variable
in `activeDefragCycle()` would not be reset to 0 when disabling active
defrag, so when re-enabling it, this variable can still be non-zero,
which causes the `db` and other variables to not be initialized before
starting the defrag process.

Signed-off-by: Pierre Turin <pieturin@amazon.com>
2024-08-12 13:38:20 -07:00
Ping Xie
579cca5f00
Valkey 7.2.6 Patch Release (#842)
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
7.2.6
2024-07-30 16:40:02 -07:00
Harkrishn Patro
3aaa129ce4 Generate correct slot information in cluster shards command on primary failure (#790)
Fix #784

Prior to the change, `CLUSTER SHARDS` command processing might pick a
failed primary node which won't have the slot coverage information and
the slots `output` in turn would be empty. This change finds an
appropriate node which has the slot coverage information served by a
given shard and correctly displays it as part of `CLUSTER SHARDS`
output.

 Before:

 ```
 1) 1) "slots"
   2)  (empty array)
   3) "nodes"
   4) 1)  1) "id"
          2) "2936f22a490095a0a851b7956b0a88f2b67a5d44"
          ...
          9) "role"
         10) "master"
         ...
         13) "health"
         14) "fail"
 ```

 After:

 ```
 1) 1) "slots"
   2) 1) 0
       2) 5461
   3) "nodes"
   4) 1)  1) "id"
          2) "2936f22a490095a0a851b7956b0a88f2b67a5d44"
          ...
          9) "role"
         10) "master"
         ...
         13) "health"
         14) "fail"
 ```

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-07-22 23:48:00 -07:00
Yossi Gottlieb
6df023fb98 Reduce FreeBSD daily scope. (#12758)
The full test is very flaky running on a VM inside GitHub worker, so we
have to settle for only building and running a small smoke test.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-15 14:29:28 -07:00
Jonathan Wright
7cb3426a4b Replace centos 7 with alternative versions (#543)
replace centos 7 with almalinux 8, add almalinux 9, centos stream 9, fedora stable, rawhide

Fixes #527

---------

Signed-off-by: Jonathan Wright <jonathan@almalinux.org>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-12 15:01:57 -07:00
Madelyn Olson
5ab3d1b981 Skip tls for xgroup read regression since it doesn't matter
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-11 20:02:30 -07:00
Ping Xie
ad0a24c742 Add missing test helper function
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
Binbin
010609e8a0 Make valkey compatible with redis-sentinel to start sentinel (#731)
We already have similar changes to check-rdb / check-aof, apply
this change to sentinel.

Fixes #719.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
KarthikSubbarao
0210b642ce Allow Module authentication to succeed when cluster is down (#693)
Module Authentication using a blocking implementation currently gets
rejected when the "cluster is down" from the client timeout cron job
(`clientsCronHandleTimeout`).

This PR exempts clients blocked on Module Authentication from being
rejected here.

---------

Signed-off-by: KarthikSubbarao <karthikrs2021@gmail.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
Binbin
d746556e4f Only primary with slots has the right to mark a node as failed (#634)
In markNodeAsFailingIfNeeded we will count needed_quorum and failures,
needed_quorum is the half the cluster->size and plus one, and
cluster-size
is the size of primary node which contain slots, but when counting
failures, we dit not check if primary has slots.

Only the primary has slots that has the rights to vote, adding a new
clusterNodeIsVotingPrimary to formalize this concept.

Release notes:

bugfix where nodes not in the quorum group might spuriously mark nodes
as failed

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
Sankar
c4244b502e Make cluster meet reliable under link failures (#461)
When there is a link failure while an ongoing MEET request is sent the
sending node stops sending anymore MEET and starts sending PINGs. Since
every node responds to PINGs from unknown nodes with a PONG, the
receiving node never adds the sending node. But the sending node adds
the receiving node when it sees a PONG. This can lead to asymmetry in
cluster membership. This changes makes the sender keep sending MEET
until it sees a PONG, avoiding the asymmetry.

---------

Signed-off-by: Sankar <1890648+srgsanky@users.noreply.github.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
nitaicaro
17b7b8a733 Fix crash where command duration is not reset when client is blocked … (#526)
In #11012, we changed the way command durations were computed to handle
the same command being executed multiple times. In #11970, we added an
assert if the duration is not properly reset, potentially indicating
that a call to report statistics was missed.

I found an edge case where this happens - easily reproduced by blocking
a client on `XGROUPREAD` and migrating the stream's slot. This causes
the engine to process the `XGROUPREAD` command twice:

1. First time, we are blocked on the stream, so we wait for unblock to
come back to it a second time. In most cases, when we come back to
process the command second time after unblock, we process the command
normally, which includes recording the duration and then resetting it.
2. After unblocking we come back to process the command, and this is
where we hit the edge case - at this point, we had already migrated the
slot to another node, so we return a `MOVED` response. But when we do
that, we don’t reset the duration field.

Fix: also reset the duration when returning a `MOVED` response. I think
this is right, because the client should redirect the command to the
right node, which in turn will calculate the execution duration.

Also wrote a test which reproduces this, it fails without the fix and
passes with it.

---------

Signed-off-by: Nitai Caro <caronita@amazon.com>
Co-authored-by: Nitai Caro <caronita@amazon.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
Viktor Söderqvist
ee6b93c899 Don't let install flags affect build (#382)
Don't let the Make valiable `USE_REDIS_SYMLINKS` affect the build.
If it does, it causes the second line in the example below (`make
install`) to recompile what was already compiled on the line above, and
this time it's built without BUILD_TLS=yes USE_SYSTEMD=yes.

    make BUILD_TLS=yes USE_SYSTEMD=yes
    make PREFIX=custom/usr USE_REDIS_SYMLINKS=no install

Fixes #377

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-09 19:39:40 -07:00
Yanqi Lv
2342db3927 fix wrong data type conversion in zrangeResultBeginStore (#13148)
In `beginResultEmission`, -1 means the result length is not known in
advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it
will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`.
Although `dictExpand` won't succeed because the size overflows, I think
we'd better to avoid this wrong conversion.

This bug can be triggered when the source of `zrangestore` doesn't exist
or we use `zrangestore` command with `byscore` or `bylex`.
The impact is that dst keys will be converted to use skiplist instead of
listpack.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Binbin
306ce81c10 Fix redis-check-aof incorrectly considering data in manifest format as MP-AOF (#12958)
The check in fileIsManifest misjudged the manifest file. For example,
if resp aof contains "file", it will be considered a manifest file and
the check will fail:
```
*3
$3
set
$4
file
$4
file
```

In #12951, if the preamble aof also contains it, it will also fail.
Fixes #12951.

the bug was happening if the the word "file" is mentioned
in the first 1024 lines of the AOF. and now as soon as it finds
a non-comment line it'll break (if it contains "file" or doesn't)

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Matthew Douglass
d9e20f2964 Fix conversion of numbers in lua args to redis args (#13115)
Since lua_Number is not explicitly an integer or a double, we need to
make an effort
to convert it as an integer when that's possible, since the string could
later be used
in a context that doesn't support scientific notation (e.g. 1e9 instead
of 100000000).

Since fpconv_dtoa converts numbers with the equivalent of `%f` or `%e`,
which ever is shorter,
this would break if we try to pass a long integer number to a command
that takes integer.
we'll get an implicit conversion to string in Lua, and then the parsing
in getLongLongFromObjectOrReply will fail.

```
> eval "redis.call('hincrby', 'key', 'field', '1000000000')" 0
(nil)
> eval "redis.call('hincrby', 'key', 'field', tonumber('1000000000'))" 0
(error) ERR value is not an integer or out of range script: ac99c32e4daf7e300d593085b611de261954a946, on @user_script:1.
```

Switch to using ll2string if the number can be safely represented as a
long long.

The problem was introduced in #10587 (Redis 7.2).
closes #13113.

---------

Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
LiiNen
f91dd1d498 Fix redis-cli --count (for --scan, --bigkeys, etc) was ignored unless --pattern was also used (#13092)
The --count option for redis-cli has been released in redis 7.2.
https://github.com/redis/redis/pull/12042
But I have found in code, that some logic was missing for using this
'count' option.

```
static redisReply *sendScan(unsigned long long *it) {
    redisReply *reply;

    if (config.pattern)
        reply = redisCommand(context, "SCAN %llu MATCH %b COUNT %d",
            *it, config.pattern, sdslen(config.pattern), config.count);
    else
        reply = redisCommand(context,"SCAN %llu",*it);
```

The intention was being able to using scan count.
But in this case, the --count will be only applied when 'pattern' is
declared.
So, I had fix it simply, to be worked properly - even if --pattern
option is not being used.

I tested it simply with time() command several times, and I could see it
works as intended with this commit.
The examples of test results are below:
```
# unstable build

time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)

real    0m1.287s
user    0m0.011s
sys     0m0.022s

# count is not applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)

real    0m1.117s
user    0m0.011s
sys     0m0.020s

# count is applied with --pattern

time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)

real    0m0.045s
user    0m0.002s
sys     0m0.002s
```

```
# fix-redis-cli-scan-count build
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)

real    0m1.084s
user    0m0.008s
sys     0m0.024s

# count is applied even if --pattern is not declared
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)

real    0m0.043s
user    0m0.000s
sys     0m0.004s

# of course this also applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)

real    0m0.031s
user    0m0.002s
sys     0m0.002s
```

Thanks a lot.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Binbin
fa1bba8619 Increase tolerance range to block reprocess tests to avoid timing issues (#13053)
These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```

Increase the range to avoid failures, and improve the comment to be
clearer.
tests was introduced in #13004.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
debing.sun
01473fbd5c Fix crash due to merge of quicklist node introduced by #12955 (#13040)
Fix two crash introducted by #12955

When a quicklist node can't be inserted and split, we eventually merge
the current node with its neighboring
nodes after inserting, and compress the current node and its siblings.

1. When the current node is merged with another node, the current node
may become invalid and can no longer be used.

   Solution: let `_quicklistMergeNodes()` return the merged nodes.

3. If the current node is a LZF quicklist node, its recompress will be
1. If the split node can be merged with a sibling node to become head or
tail, recompress may cause the head and tail to be compressed, which is
not allowed.

    Solution: always recompress to 0 after merging.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
debing.sun
b685fadb77 Prevent LSET command from causing quicklist plain node size to exceed 4GB (#12955)
Fix #12864

The main reason for this crash is that when replacing a element of a
quicklist packed node with lpReplace() method,
if the final size is larger than 4GB, lpReplace() will fail and returns
NULL, causing `node->entry` to be incorrectly set to NULL.

Since the inserted data is not a large element, we can't just replace it
like a large element, first quicklistInsertAfter()
and then quicklistDelIndex(), because the current node may be merged and
invalidated in quicklistInsertAfter().

The solution of this PR:
When replacing a node fails (listpack exceeds 4GB), split the current
node, create a new node to put in the middle, and try to merge them.
This is the same as inserting a large element.
In the worst case, its size will not exceed 4GB.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Oran Agra
e4b88bb10f update redis-check-rdb types (#12969)
seems that we forgot to update the array in redis-check rdb.

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00
Binbin
900ae7aed6 Fix timeout not being set in module blockClient case (#13011)
This was introduced in #13004, missing this assignment.
It causes timeout to be a random value (may be less than now),
and then in `Unblock by timer` test, the client is unblocked
and then it call timeout_callback, since the callback is NULL,
the server will crash.

The crash stack is:
```
beforesleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut
-- the timeout_callback is NULL, invalidFunctionWasCalled
bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-02 00:24:19 -07:00