Fix#784
Prior to the change, `CLUSTER SHARDS` command processing might pick a
failed primary node which won't have the slot coverage information and
the slots `output` in turn would be empty. This change finds an
appropriate node which has the slot coverage information served by a
given shard and correctly displays it as part of `CLUSTER SHARDS`
output.
Before:
```
1) 1) "slots"
2) (empty array)
3) "nodes"
4) 1) 1) "id"
2) "2936f22a490095a0a851b7956b0a88f2b67a5d44"
...
9) "role"
10) "master"
...
13) "health"
14) "fail"
```
After:
```
1) 1) "slots"
2) 1) 0
2) 5461
3) "nodes"
4) 1) 1) "id"
2) "2936f22a490095a0a851b7956b0a88f2b67a5d44"
...
9) "role"
10) "master"
...
13) "health"
14) "fail"
```
---------
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
The full test is very flaky running on a VM inside GitHub worker, so we
have to settle for only building and running a small smoke test.
Signed-off-by: Ping Xie <pingxie@google.com>
We already have similar changes to check-rdb / check-aof, apply
this change to sentinel.
Fixes#719.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Ping Xie <pingxie@google.com>
Module Authentication using a blocking implementation currently gets
rejected when the "cluster is down" from the client timeout cron job
(`clientsCronHandleTimeout`).
This PR exempts clients blocked on Module Authentication from being
rejected here.
---------
Signed-off-by: KarthikSubbarao <karthikrs2021@gmail.com>
Signed-off-by: Ping Xie <pingxie@google.com>
In markNodeAsFailingIfNeeded we will count needed_quorum and failures,
needed_quorum is the half the cluster->size and plus one, and
cluster-size
is the size of primary node which contain slots, but when counting
failures, we dit not check if primary has slots.
Only the primary has slots that has the rights to vote, adding a new
clusterNodeIsVotingPrimary to formalize this concept.
Release notes:
bugfix where nodes not in the quorum group might spuriously mark nodes
as failed
---------
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Signed-off-by: Ping Xie <pingxie@google.com>
When there is a link failure while an ongoing MEET request is sent the
sending node stops sending anymore MEET and starts sending PINGs. Since
every node responds to PINGs from unknown nodes with a PONG, the
receiving node never adds the sending node. But the sending node adds
the receiving node when it sees a PONG. This can lead to asymmetry in
cluster membership. This changes makes the sender keep sending MEET
until it sees a PONG, avoiding the asymmetry.
---------
Signed-off-by: Sankar <1890648+srgsanky@users.noreply.github.com>
Signed-off-by: Ping Xie <pingxie@google.com>
In #11012, we changed the way command durations were computed to handle
the same command being executed multiple times. In #11970, we added an
assert if the duration is not properly reset, potentially indicating
that a call to report statistics was missed.
I found an edge case where this happens - easily reproduced by blocking
a client on `XGROUPREAD` and migrating the stream's slot. This causes
the engine to process the `XGROUPREAD` command twice:
1. First time, we are blocked on the stream, so we wait for unblock to
come back to it a second time. In most cases, when we come back to
process the command second time after unblock, we process the command
normally, which includes recording the duration and then resetting it.
2. After unblocking we come back to process the command, and this is
where we hit the edge case - at this point, we had already migrated the
slot to another node, so we return a `MOVED` response. But when we do
that, we don’t reset the duration field.
Fix: also reset the duration when returning a `MOVED` response. I think
this is right, because the client should redirect the command to the
right node, which in turn will calculate the execution duration.
Also wrote a test which reproduces this, it fails without the fix and
passes with it.
---------
Signed-off-by: Nitai Caro <caronita@amazon.com>
Co-authored-by: Nitai Caro <caronita@amazon.com>
Signed-off-by: Ping Xie <pingxie@google.com>
Don't let the Make valiable `USE_REDIS_SYMLINKS` affect the build.
If it does, it causes the second line in the example below (`make
install`) to recompile what was already compiled on the line above, and
this time it's built without BUILD_TLS=yes USE_SYSTEMD=yes.
make BUILD_TLS=yes USE_SYSTEMD=yes
make PREFIX=custom/usr USE_REDIS_SYMLINKS=no install
Fixes#377
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Ping Xie <pingxie@google.com>
In `beginResultEmission`, -1 means the result length is not known in
advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it
will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`.
Although `dictExpand` won't succeed because the size overflows, I think
we'd better to avoid this wrong conversion.
This bug can be triggered when the source of `zrangestore` doesn't exist
or we use `zrangestore` command with `byscore` or `bylex`.
The impact is that dst keys will be converted to use skiplist instead of
listpack.
Signed-off-by: Ping Xie <pingxie@google.com>
The check in fileIsManifest misjudged the manifest file. For example,
if resp aof contains "file", it will be considered a manifest file and
the check will fail:
```
*3
$3
set
$4
file
$4
file
```
In #12951, if the preamble aof also contains it, it will also fail.
Fixes#12951.
the bug was happening if the the word "file" is mentioned
in the first 1024 lines of the AOF. and now as soon as it finds
a non-comment line it'll break (if it contains "file" or doesn't)
Signed-off-by: Ping Xie <pingxie@google.com>
Since lua_Number is not explicitly an integer or a double, we need to
make an effort
to convert it as an integer when that's possible, since the string could
later be used
in a context that doesn't support scientific notation (e.g. 1e9 instead
of 100000000).
Since fpconv_dtoa converts numbers with the equivalent of `%f` or `%e`,
which ever is shorter,
this would break if we try to pass a long integer number to a command
that takes integer.
we'll get an implicit conversion to string in Lua, and then the parsing
in getLongLongFromObjectOrReply will fail.
```
> eval "redis.call('hincrby', 'key', 'field', '1000000000')" 0
(nil)
> eval "redis.call('hincrby', 'key', 'field', tonumber('1000000000'))" 0
(error) ERR value is not an integer or out of range script: ac99c32e4daf7e300d593085b611de261954a946, on @user_script:1.
```
Switch to using ll2string if the number can be safely represented as a
long long.
The problem was introduced in #10587 (Redis 7.2).
closes#13113.
---------
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Signed-off-by: Ping Xie <pingxie@google.com>
The --count option for redis-cli has been released in redis 7.2.
https://github.com/redis/redis/pull/12042
But I have found in code, that some logic was missing for using this
'count' option.
```
static redisReply *sendScan(unsigned long long *it) {
redisReply *reply;
if (config.pattern)
reply = redisCommand(context, "SCAN %llu MATCH %b COUNT %d",
*it, config.pattern, sdslen(config.pattern), config.count);
else
reply = redisCommand(context,"SCAN %llu",*it);
```
The intention was being able to using scan count.
But in this case, the --count will be only applied when 'pattern' is
declared.
So, I had fix it simply, to be worked properly - even if --pattern
option is not being used.
I tested it simply with time() command several times, and I could see it
works as intended with this commit.
The examples of test results are below:
```
# unstable build
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)
real 0m1.287s
user 0m0.011s
sys 0m0.022s
# count is not applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)
real 0m1.117s
user 0m0.011s
sys 0m0.020s
# count is applied with --pattern
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)
real 0m0.045s
user 0m0.002s
sys 0m0.002s
```
```
# fix-redis-cli-scan-count build
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)
real 0m1.084s
user 0m0.008s
sys 0m0.024s
# count is applied even if --pattern is not declared
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)
real 0m0.043s
user 0m0.000s
sys 0m0.004s
# of course this also applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)
real 0m0.031s
user 0m0.002s
sys 0m0.002s
```
Thanks a lot.
Signed-off-by: Ping Xie <pingxie@google.com>
These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```
Increase the range to avoid failures, and improve the comment to be
clearer.
tests was introduced in #13004.
Signed-off-by: Ping Xie <pingxie@google.com>
Fix two crash introducted by #12955
When a quicklist node can't be inserted and split, we eventually merge
the current node with its neighboring
nodes after inserting, and compress the current node and its siblings.
1. When the current node is merged with another node, the current node
may become invalid and can no longer be used.
Solution: let `_quicklistMergeNodes()` return the merged nodes.
3. If the current node is a LZF quicklist node, its recompress will be
1. If the split node can be merged with a sibling node to become head or
tail, recompress may cause the head and tail to be compressed, which is
not allowed.
Solution: always recompress to 0 after merging.
Signed-off-by: Ping Xie <pingxie@google.com>
Fix#12864
The main reason for this crash is that when replacing a element of a
quicklist packed node with lpReplace() method,
if the final size is larger than 4GB, lpReplace() will fail and returns
NULL, causing `node->entry` to be incorrectly set to NULL.
Since the inserted data is not a large element, we can't just replace it
like a large element, first quicklistInsertAfter()
and then quicklistDelIndex(), because the current node may be merged and
invalidated in quicklistInsertAfter().
The solution of this PR:
When replacing a node fails (listpack exceeds 4GB), split the current
node, create a new node to put in the middle, and try to merge them.
This is the same as inserting a large element.
In the worst case, its size will not exceed 4GB.
Signed-off-by: Ping Xie <pingxie@google.com>
This was introduced in #13004, missing this assignment.
It causes timeout to be a random value (may be less than now),
and then in `Unblock by timer` test, the client is unblocked
and then it call timeout_callback, since the callback is NULL,
the server will crash.
The crash stack is:
```
beforesleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut
-- the timeout_callback is NULL, invalidFunctionWasCalled
bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```
Signed-off-by: Ping Xie <pingxie@google.com>
In #11012, we will reprocess command when client is unblocked on keys,
in some blocking commands, for example, in the XREADGROUP BLOCK
scenario,
because of the re-processing command, we will recalculate the block
timeout,
causing the blocking time to be reset.
This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly
let the command know that it is being re-processed, later in
blockForKeys
we will not reset the timeout.
Affected BLOCK cases:
- list / zset / stream, added test cases for each.
Unaffected cases:
- module (never re-process the commands).
- WAIT / WAITAOF (never re-process the commands).
Fixes#12998.
Signed-off-by: Ping Xie <pingxie@google.com>
The test was introduced in #10745, but we forgot to add it to the
test_helper.tcl, so our CI did not actually run it. This PR adds it
and ensures it passes CI tests.
Signed-off-by: Ping Xie <pingxie@google.com>
`REGISTER_API` is supposed to register two functions: one starts with
`ValkeyModule_` and the other one with `RedisModule_`. It does so in
`unstable` branch. However there was a copy-paste mistake during
backporting it to 7.2. This caused modules built with `valkeymodule-rs`
to fail since they called `RedisModule_SetModuleAttribs` which caused
valkey to segfault.
This commit fixes the typo.
Signed-off-by: Mikhail Koviazin <mikhail.koviazin@aiven.io>
Release notes for RC2 (really 7.2.5-rc1). Based on conversation, decided
to have first release on 7.2.5 to make it clear this was a patch release
over Redis. This could be considered a minor release, but we wanted to
clearly signal compatibility with Redis OSS.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Add an initial security release page. In the fullness of time I would
like to also include our version support here, but until that has been
decided I would like to keep this simple and just include links.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
* Tested it on local instance. This was originally part of
https://github.com/valkey-io/valkey/pull/288 but I am pushing this
separately, so that we can easily merge it into the upcoming release.
```
lua debugger> server ping
<redis> ping
<reply> "+PONG"
lua debugger> redis ping
<redis> ping
<reply> "+PONG"
```
* I also searched for lua debugger related unit tests to add coverage
for this but did not find any relevant test to modify. Leaving it at
that for now.
---------
Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
ValkeyModuleEvent_MasterLinkChange was updated to use more inclusive
language, but it was done in the compatibility layer as well
(RedisModuleEvent_).
---------
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
This is a PR directly to the 7.2 branch.
The make variable `ENGINE_NAME` (from
38632278fd06fe186f7707e4fa099f666d805547) was lost during branching of
`7.2` and tag `7.2.4-rc1`
resulting in the creation of faulty symlinks:
```
INSTALL SYMLINK valkey-serverredis -> valkey-server
INSTALL SYMLINK valkey-cliredis -> valkey-cli
INSTALL SYMLINK valkey-benchmarkredis -> valkey-benchmark
INSTALL SYMLINK valkey-check-rdbredis -> valkey-check-rdb
INSTALL SYMLINK valkey-check-aofredis -> valkey-check-aof
INSTALL SYMLINK valkey-sentinelredis -> valkey-sentinel
```
By just adding the variable we get a minimal diff compared to unstable.
Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
Updated to Valkey in valkey-cli.c file's comments and prints.
* The output of valkey-cli --help
* The output of the cli built-in HELP command
* The prompt in interactive valkey-cli -s unixsocket
* The history file and the default rc file (changed filename)
---------
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
The default pid file is created at /var/run/redis.pid based on the code
at
da831c0d22/src/server.h (L132).
Until we update it, we should reflect that in the conf file.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Ref: https://github.com/redis/redis/pull/12760
enabled by default) with older Redis cluster (< 7.0 - extensions not
handled) .
With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.
This fix introduces a flag `extensions_supported` on the clusterMsg
indicating the sender node can handle extensions parsing. Once, a
receiver nodes receives a message with this flag set to 1, it would
update clusterNode new field extensions_supported and start sending out
extensions if it has any.
This PR also introduces a DEBUG sub command to enable/disable cluster
message extensions `process-clustermsg-extensions` feature.
Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `extensions_supported` and then extensions message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).
TCL test verifying the cluster state is healthy irrespective of
enabling/disabling cluster message extensions feature.
---------
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Small changes to the log messages printed during startup and shutdown,
for Valkey branding.
SERVER_NAME is replaced by verbatim "Valkey" in one place, because
SERVER_NAME expands to "valkey" in lowercase. (Should we introduce
another macro that expands to "Valkey"?)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Apply new logo at startup.
It is one character wider and 2 characters taller than the original
Redis logo.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
These were flagged on the 7.2 build system, which is using the old spell
check. I think we should consider re-adding it as it missed some typos.
Relevant: https://github.com/valkey-io/valkey/pull/72
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Fixing redis -> valkey in the output of valkey-server --help.
Signed-off-by: Daniel House <daniel.house@huawei.com>
Co-authored-by: Daniel House <daniel.house@huawei.com>
Mostly comments, but one pre-filled config in this template config file
is changed:
pidfile /var/run/valkey-sentinel.pid
---------
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Remove trademarked wording on configuration layer.
Following changes for release notes:
1. Rename redis.conf to valkey.conf
2. Pre-filled config in the template config file: Changing pidfile to `/var/run/valkey_6379.pid`
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Adds a new make variable called `USE_REDIS_SYMLINKS`, with default value
`yes`. If yes, then `make install` creates additional symlinks to the
installed binaries:
* `valkey-server`
* `valkey-cli`
* `valkey-benchmark`
* `valkey-check-rdb`
* `valkey-check-aof`
* `valkey-sentinel`
The names of the symlinks are the legacy redis binary names
(`redis-server`, etc.). The purpose is to provide backward compatibility
for scripts expecting the these filenames. The symlinks are installed in
the same directory as the binaries (typically `/usr/local/bin/` or
similar).
Similarly, `make uninstall` removes these symlinks if
`USE_REDIS_SYMLINKS` is `yes`.
This is described in a note in README.md.
Fixes#147
---------
Signed-off-by: Vitah Lin <vitahlin@gmail.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>