This PR introduces a new mechanism for temporarily changing the
server's loading_rio context during RDB loading operations. The new
`RDB_SCOPED_LOADING_RIO` macro allows for a scoped change of the
`server.loading_rio` value, ensuring that it's automatically restored
to its original value when the scope ends.
Introduces a dedicated flag to `rio` to signal immediate abort,
preventing
potential use-after-free scenarios during replication disconnection in
dual-channel load. This ensures proper termination of
`rdbLoadRioWithLoadingCtx`
when replication is cancelled due to connection loss on main connection.
Fixes https://github.com/valkey-io/valkey/issues/1152
---------
Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Amit Nagler <58042354+naglera@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
We should reset repl_down_since only on state change, in the
current code, if the rdb channel in the dual channel is normal,
that is, rdb is loaded normally, but the psync channel is
abnormal, we will set repl_down_since 0 here. If the primary
is down at this time, the replica may be abnormal when calculating
data_age in cluster failover, since repl_state != REPL_STATE_CONNECTED,
this causes the replica to be unable to initiate an election due
to the old data_age.
In dualChannelSyncHandleRdbLoadCompletion, if the psync channel
is not established, the function will return. We will set repl_state
to REPL_STATE_CONNECTED and set repl_down_since to 0 in
dualChannelSyncSuccess, that is, in establishPrimaryConnection.
See also 677d10b2a8ff7f13033ccfe56ffcd246dbe70fb6 for more details.
Signed-off-by: Binbin <binloveplay1314@qq.com>
This PR addresses two issues:
1. Performance Degradation Fix - Resolves a significant performance
issue during RDB load on replica nodes.
- The problem was causing replicas to rehash multiple times during the
load process. Local testing demonstrated up to 50% degradation in BGSAVE
time.
- The problem occurs when the replica tries to expand pre-created slot
dictionaries. This operation fails quietly, resulting in undetected
performance issues.
- This fix aims to optimize the RDB load process and restore expected
performance levels.
2. Bug fix when reading `RDB_OPCODE_RESIZEDB` in Valkey 8.0 cluster
mode-
- Use the shard's master slots count when processing this opcode, as
`clusterNodeCoversSlot` is not initialized for the currently syncing
replica.
- Previously, this problem went unnoticed because `RDB_OPCODE_RESIZEDB`
had no practical impact (due to 1).
These improvements will enhance overall system performance and ensure
smoother upgrades to Valkey 8.0 in the future.
Testing:
- Conducted local tests to verify the performance improvement during RDB
load.
- Verified that ignoring `RDB_OPCODE_RESIZEDB` does not negatively
impact functionality in the current version.
Signed-off-by: naglera <anagler123@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
The code is ok before 2de544cfcc6d1aa7cf6d0c75a6116f7fc27b6fd6,
but now we will set server.repl_transfer_fd right after dfd was
initiated, and in here we have a double close error since dfd and
server.repl_transfer_fd are the same fd.
Also move the declaration of dfd/maxtries to a small scope to avoid
the confusion since they are only used in this code.
Signed-off-by: Binbin <binloveplay1314@qq.com>
When looking up a key in no-touch mode, `LOOKUP_NOTOUCH` is set to avoid
updating the last access time in `lookupKey`. An exception must be made
for the `TOUCH` command which must always update the key.
When called from a script, `server.executing_client` will point to the
`TOUCH` command, while `server.current_client` will point to e.g. an
`EVAL` command. So, we must use the former to find out the currently
executing command if defined.
This fix addresses the issue where TOUCH wasn't updating key access
times when called from scripts like EVAL.
Fixes#1498
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
This PR fixes the missing stat update for `total_net_repl_output_bytes`
that was removed during the refactoring in PR #758. The metric was not
being updated when writing to replica connections.
Changes:
- Restored the stat update in postWriteToClient for replica connections
- Added integration test to verify the metric is properly updated
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
functionsLibCtxClear should clear the provided lib_ctx parameter,
not the static variable curr_functions_lib_ctx, as this contradicts
the function's intended purpose.
The impact i guess is minor, like in some unhappy paths (diskless load
fails, function restore fails?), we will mess up the functions_caches
field, which is used in used_memory_functions / used_memory_scripts
fileds in INFO.
Signed-off-by: Binbin <binloveplay1314@qq.com>
### Problem
Valkey stores scripts in a dictionary (lua_scripts) keyed by their SHA1
hashes, but it needs a way to know which scripts are least recently
used. It uses an LRU list (lua_scripts_lru_list) to keep track of
scripts in usage order. When the list reaches a maximum length, Valkey
evicts the oldest scripts to free memory in both the list and
dictionary. The problem here is that the sds from the LRU list can be
pointing to already freed/moved memory by active defrag that the sds in
the dictionary used to point to. It results in assertion error at [this
line](https://github.com/valkey-io/valkey/blob/unstable/src/eval.c#L519)
### Solution
If we duplicate the sds when adding it to the LRU list, we can create an
independent copy of the script identifier (sha). This duplication
ensures that the sha string in the LRU list remains stable and
unaffected by any defragmentation that could alter or free the original
sds. In addition, dictUnlink doesn't require exact pointer
match([ref](https://github.com/valkey-io/valkey/blob/unstable/src/eval.c#L71-L78))
so this change makes sense to unlink the right dictEntry with the copy
of the sds.
### Reproduce
To reproduce it with tcl test:
1. Disable je_get_defrag_hint in defrag.c to trigger defrag often
2. Execute test script
```
start_server {tags {"auth external:skip"}} {
test {Regression for script LRU crash} {
r config set activedefrag yes
r config set active-defrag-ignore-bytes 1
r config set active-defrag-threshold-lower 0
r config set active-defrag-threshold-upper 1
r config set active-defrag-cycle-min 99
r config set active-defrag-cycle-max 99
for {set i 0} {$i < 100000} {incr i} {
r eval "return $i" 0
}
after 5000;
}
}
```
### Crash info
Crash report:
```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
14044:M 12 Nov 2024 14:51:27.054 # === ASSERTION FAILED ===
14044:M 12 Nov 2024 14:51:27.054 # ==> eval.c:556 'de' is not true
------ STACK TRACE ------
Backtrace:
/usr/bin/redis-server 127.0.0.1:6379 [cluster](luaDeleteFunction+0x148)[0x723708]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](luaCreateFunction+0x26c)[0x724450]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](evalCommand+0x2bc)[0x7254dc]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](call+0x574)[0x5b8d14]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processCommand+0xc84)[0x5b9b10]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processCommandAndResetClient+0x11c)[0x6db63c]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processInputBuffer+0x1b0)[0x6dffd4]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x6bd968]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x659634]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](amzTLSEventHandler+0x194)[0x6588d8]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x750c88]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](aeProcessEvents+0x228)[0x757fa8]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](redisMain+0x478)[0x7786b8]
/lib64/libc.so.6(__libc_start_main+0xe4)[0xffffa7763da4]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x5ad3b0]
```
Defrag info:
```
mem_fragmentation_ratio:1.18
mem_fragmentation_bytes:47229992
active_defrag_hits:20561
active_defrag_misses:5878518
active_defrag_key_hits:77
active_defrag_key_misses:212
total_active_defrag_time:29009
```
### Test:
Run the test script to push 100,000 scripts to ensure the LRU list keeps
500 maximum length without any crash.
```
27489:M 14 Nov 2024 20:56:41.583 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.583 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
[ok]: Regression for script LRU crash (6811 ms)
[1/1 done]: unit/test (7 seconds)
```
---------
Signed-off-by: Seungmin Lee <sungming@amazon.com>
Signed-off-by: Seungmin Lee <155032684+sungming2@users.noreply.github.com>
Co-authored-by: Seungmin Lee <sungming@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
- Moves `build-config.json` to workflow dir to build old versions with
new configs.
- Enables contributors to test release Wf on private repo by adding
`github.event_name == 'workflow_dispatch' ||`
---------
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
We have set the secret as `AWS_S3_TEST_BUCKET` for test bucket and I
missed it in the initial review.
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
## Summary
This PR fixes#1346 where we can get rid of the long term credentials by
using OpenID Connect. OpenID Connect (OIDC) allows your GitHub Actions
workflows to access resources in Amazon Web Services (AWS), without
needing to store the AWS credentials as long-lived GitHub secrets.
---------
Signed-off-by: vudiep411 <vdiep@amazon.com>
The command argument strings created while parsing inline commands (see
`processInlineBuffer()`) can contain free capacity. Since some commands
,such as `SET`, store these strings in the database, that free capacity
increases the memory usage. In the worst case, it could double the
memory usage.
This only occurs if the inline command format is used. The argument
strings are built by appending character by character in
`sdssplitargs()`. Regular RESP commands are not affected.
This change trims the strings within `processInlineBuffer()`.
### Why `trimStringObjectIfNeeded()` within `object.c` is not solving
this?
When the command argument string is packed into an object,
`trimStringObjectIfNeeded()` is called.
This does only trim the string if it is larger than
`PROTO_MBULK_BIG_ARG` (32kB), as only strings larger than this would
ever need trimming if the command it sent using the bulk string format.
We could modify this condition, but that would potentially have a
performance impact on commands using the bulk format. Since those make
up for the vast majority of executed commands, limiting this change to
inline commands seems prudent.
### Experiment Results
* 1 million `SET [key] [value]` commands
* Random keys (16 bytes)
* 600 bytes values
Memory usage without this change:
```
used_memory:1089327888
used_memory_human:1.01G
used_memory_rss:1131696128
used_memory_rss_human:1.05G
used_memory_peak:1089348264
used_memory_peak_human:1.01G
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:1040025088
used_memory_dataset_perc:95.55%
```
Memory usage with this change:
```
used_memory:705327888
used_memory_human:672.65M
used_memory_rss:718802944
used_memory_rss_human:685.50M
used_memory_peak:705348256
used_memory_peak_human:672.67M
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:656025088
used_memory_dataset_perc:93.13%
```
If the same experiment is repeated using the normal RESP array of bulk
string format (`*3\r\n$3\r\nSET\r\n...`) then the memory usage is 672MB
with and without of this change.
If a replica is attached, its memory usage is 672MB with and without
this change, since the replication link never uses inline commands.
Signed-off-by: Stefan Mueller <muelstef@amazon.com>
This special pattern '#' is used to get the element itself,
it does not actually participate in the slot check.
In this case, passing `GET #` will cause '#' to participate
in the slot check, causing the command to get an
`pattern may be in different slots` error.
Signed-off-by: Binbin <binloveplay1314@qq.com>
https://github.com/valkey-io/valkey/issues/1145
First part of a two-step effort to add `WithSlot` API for expiry. This
PR is to fix a crash that occurs when a RANDOMKEY uses a different slot
than the cached slot of a client during a multi-exec.
The next part will be to utilize the new API as an optimization to
prevent duplicate work when calculating the slot for a key.
---------
Signed-off-by: Nadav Levanoni <nadavl@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Nadav Levanoni <nadavl@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
The client that was killed by FUNCTION KILL received a reply of
SCRIPT KILL and the server log also showed SCRIPT KILL.
Signed-off-by: Binbin <binloveplay1314@qq.com>
The module commands which were added to acl categories were getting
skipped when `ACL CAT category` command was executed.
This PR fixes the bug.
Before:
```
127.0.0.1:6379> ACL CAT foocategory
(empty array)
```
After:
```
127.0.0.1:6379> ACL CAT foocategory
aclcheck.module.command.test.add.new.aclcategories
```
---------
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
We have an assert in propagateNow. If the primary node receives a
CLUSTER UPDATE such as dirty slots during SIGTERM waitting or during
a manual failover pausing or during a client pause, the delKeysInSlot
call will trigger this assert and cause primary crash.
In this case, we added a new server_del_keys_in_slot state just like
client_pause_in_transaction to track the state to avoid the assert
in propagateNow, the dirty slots will be deleted in the end without
affecting the data consistency.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
The kvstore operates the active rehashing by traversing a list of
dictionaries which were registered to it when they started rehashing.
The problem is that active defrag may realloc some of the dictionary
structures while they are registered on the list.where the dictionary
might be relocated in dictDefragTables.
The Solution is to make sure we update the rehashing list node withy the
new(or not) dictionary pointer after applying the defrag function.
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Apply the security fixes for the release.
(CVE-2024-31449) Lua library commands may lead to stack overflow and
potential RCE.
(CVE-2024-31227) Potential Denial-of-service due to malformed ACL
selectors.
(CVE-2024-31228) Potential Denial-of-service due to unbounded pattern
matching.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
- Add systemd support to the build artifact tarballs, so people can use
it under systemd compatible distros. As discussed here:
https://github.com/orgs/valkey-io/discussions/1103#discussioncomment-10815549.
Adding `libsystemd-dev` to install and add `USE_SYSTEMD=yes` to the
build.
- Cleanup & bring the arm & x86 workflow files in-sync. It was a bit of
a mess ;) (removing `jq wget awscli` from the 'Tarball' step)
Signed-off-by: Melroy van den Berg <melroy@melroy.org>
As discussed here:
https://github.com/orgs/valkey-io/discussions/1103#discussioncomment-10814006
`cp` can't be used anymore, `rsync` is more powerful and allow to
exclude files.
Alternatively:
1. Remove the c, d and o files. Which isn't ideal either.
2. Improve the build. Eg. by building inside a `build` directory instead
of in the src folder.
Ps. I know these workflows aren't trigger in this PR. Only via "Build
Release Packages" workflow action:
https://github.com/valkey-io/valkey/actions/workflows/build-release-packages.yml..
So I can't fully test in this PR. But it should work ^^
Ps. ps. I did test `rsync -av --exclude='*.c' --exclude='*.d'
--exclude='*.o' src/valkey-*` command in isolation and that works as
expected!
---------
Signed-off-by: Melroy van den Berg <melroy@melroy.org>
Apparently there is a timing issue when using wait_for_ofs_sync:
```
[exception]: Executing test client: can't read "out_before": no such variable.
can't read "out_before": no such variable
```
The reason is that if the connection between the primary
and the replica is not established yet, the master_repl_offset
of the primary and replica in wait_for_ofs_sync is 0, and
the check fails, resulting in no replica client in the
client list below.
In this case, we need to make sure the replica is online
before proceeding.
Signed-off-by: Binbin <binloveplay1314@qq.com>
For fake clients like the ones used for Lua and modules, we don't
determine TLS in the right way, causing CLUSTER SLOTS from EVAL over TLS
to fail a debug-assert.
This error was introduced when the caching of CLUSTER SLOTS was
introduced, i.e. in 8.0.0.
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
The one in CLUSTER SETSLOT help us keep track of state better,
of course it also can make the test case happy.
The one in gossip process fixes a problem that a replica can
print a log saying it is an empty primary.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
The reason is VM_Call will use a fake client without connection,
so we also need to check if c->conn is NULL.
This also affects scripts. If they are called in the script, the
server will crash. Injecting commands into AOF will also cause
startup failure.
Fixes#1054.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Apparently this will fail to compile in some masOS version.
And internet claims _Thread_local is portable.
Fixes#1051.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Since in here the monitor value is mymaster, we need to make sure the
primary name is the same, otherwise the default configuration cannot start
sentinel.
```
sentinel monitor mymaster 127.0.0.1 6379 2
```
The following error occurs when the default configuration is started:
```
*** FATAL CONFIG FILE ERROR (Version 255.255.255) ***
Reading the configuration file, at line 358
>>> 'SENTINEL primary-reboot-down-after-period myprimary 0'
No such master with specified name.
```
Introduced in #647.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Replaced "Could not connect to Redis" with "Could not connect to server" in the log
output for connection errors in `getRedisContext` and `createClient`.
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
Signed-off-by: Ping Xie <pingxie@google.com>
Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Ping Xie <pingxie@google.com>
This pull request improves code readability, as a follow up of #749.
- Internal Naming Conventions: Removed the use of underscores (_) for
internal static structures/functions.
- Descriptive Function Names: Updated function names to be more
descriptive, making their purpose clearer. For instance, `_dictExpand`
is renamed to `dictExpandIfAutoResizeAllowed`.
---------
Signed-off-by: Ping Xie <pingxie@google.com>
Fix timing issue in evaluating `cluster-allow-replica-migration` for replicas
There is a timing bug where the primary and replica have different
`cluster-allow-replica-migration` settings. In issue #970, we found that if
the replica receives `CLUSTER SETSLOT` before the gossip update, it remains
in the original shard. This happens because we only process the
`cluster-allow-replica-migration` flag for primaries during `CLUSTER SETSLOT`.
This commit fixes the issue by also evaluating this flag for replicas in the
`CLUSTER SETSLOT` path, ensuring correct replica migration behavior.
Closes#970
---------
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Signed-off-by: Ping Xie <pingxie@google.com>
The node may not be able to initiate an election in time due to
problems with cluster communication. If an election is initiated,
make sure its offset is 0.
Closes#967.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Ping Xie <pingxie@google.com>
The cluster configuration file is the metadata "database" for the
cluster. It is best to trigger a save when shutdown the server, to
avoid inconsistent content that is not refreshed.
We save the nodes.conf whenever something that affects the nodes.conf
has changed. But we are saving nodes.conf in clusterBeforeSleep, and
some events may save it without a fsync, there is a time gap.
And shutdown has its own save seems good to me, it doesn't need to
care about the others.
At the same time, a comment is added in unlock nodes.conf to explain
why we actively unlock when shutdown.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Ping Xie <pingxie@google.com>
clang generates warning if there is no newline at the end of the source
file.
Update .clang-format to handle the missing newline at eof.
Signed-off-by: haoqixu <hq.xu0o0@gmail.com>
Signed-off-by: Ping Xie <pingxie@google.com>
While doing some profiling, I noticed that getKeySlot() was a fairly
large part (~0.7%) of samples doing perf with high pipeline during
standalone. I think this is because we do a very late check for
server.cluster_mode, we first call getKeySlot() and then call
calculateKeySlot(). (calculateKeySlot was surprisingly not automatically
inlined, we were doing a jump into it and then immediately returning
zero). We then also do useless work in the form of caching zero in
client->slot, which will further mess with cache lines.
So, this PR tries to accomplish a few things things.
1) The usage of the `slot` name made a lot more sense before the
introduction of the kvstore. Now with kvstore, we call this the database
index, so all the references to slot in standalone are no longer really
accurate.
2) Pull the cluster mode check all the way out of getKeySlot(), so
hopefully a bit more performant.
3) Remove calculateKeySlot() as independent from getKeySlot().
calculateKeySlot used to have 3 call sites outside of db.c, which
warranted it's own function. It's now only called in two places,
pubsub.c and networking.c.
I ran some profiling, and saw about ~0.3% improvement, but don't really
trust it because you'll see a much higher (~2%) variance in test runs
just by how the branch predictions will get changed with a new memory
layout. Running perf again showed no samples in getKeySlot() and a
reduction in samples in lookupKey(), so maybe this will help a little
bit.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Ping Xie <pingxie@google.com>
Maybe partially resolves https://github.com/valkey-io/valkey/issues/952.
The hostnames test relies on an assumption that node zero and node six
don't communicate with each other to test a bunch of behavior in the
handshake stake. This was done by previously dropping all meet packets,
however it seems like there was some case where node zero was sending a
single pong message to node 6, which was partially initializing the
state.
I couldn't track down why this happened, but I adjusted the test to
simply pause node zero which also correctly emulates the state we want
to be in since we're just testing state on node 6, and removes the
chance of errant messages. The test was failing about 5% of the time
locally, and I wasn't able to reproduce a failure with this new
configuration.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Ping Xie <pingxie@google.com>