1609 Commits

Author SHA1 Message Date
uriyage
b72e43ed16
Adjust query buffer resized correctly test to non-jemalloc allocators. (#593)
Test `query buffer resized correctly` start to fail
(https://github.com/valkey-io/valkey/actions/runs/9278013807) with
non-jemalloc allocators after
https://github.com/valkey-io/valkey/pull/258 PR.

With Jemalloc we allocate ~20K for the query buffer, in the test we read
1 byte in the first read, in the second read we make sure we have at
least 16KB free place in the query buffer and we have as Jemalloc
allocated 20KB, But with non jemalloc we allocate in the first read
exactly 16KB. in the second read we check and see that we don't have
16KB free space as we already read 1 byte hence we reallocate this time
greedly (*2 of the requested size of 16KB+1) hence the test condition
that the querybuf size is < 32KB is no longer true

The `query buffer resized correctly test` starts
[failing](https://github.com/valkey-io/valkey/actions/runs/9278013807)
with non-jemalloc allocators after PR #258 .

With jemalloc, we allocate ~20KB for the query buffer. In the test, we
read 1 byte initially and then ensure there is at least 16KB of free
space in the buffer for the second read, which is satisfied by
jemalloc's 20KB allocation. However, with non-jemalloc allocators, the
first read allocates exactly 16KB. When we check again, we don't have
16KB free due to the 1 byte already read. This triggers a greedy
reallocation (doubling the requested size of 16KB+1), causing the query
buffer size to exceed the 32KB limit, thus failing the test condition.

This PR adjusted the test query buffer upper limit to be 32KB +2.

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-06-03 11:15:28 -07:00
Chen Tianjie
d16b4ec1b9
Unshare object to avoid LRU/LFU being messed up (#250)
When LRU/LFU enabled, Valkey does not allow using shared objects, as
value objects may be shared among many different keys and they can't
share LRU/LFU information.

However `maxmemory-policy` is modifiable at runtime. If LRU/LFU is not
enabled at start, but then enabled when some shared objects are already
used, there could be some confusion in LRU/LFU information.

For `set` command it is OK since it is going to create a new object when
LRU/LFU enabled, but `get` command will not unshare the object and just
update LRU/LFU information.

So we may duplicate the object in this case. It is a one-time task for
each key using shared objects, unless this is the case for so many keys,
there should be no serious performance degradation.

Still, LRU will be updated anyway, no matter LRU/LFU is enabled or not,
because `OBJECT IDLETIME` needs it, unless `maxmemory-policy` is set to
LFU. So idle time of a key may still be messed up.

---------

Signed-off-by: chentianjie.ctj <chentianjie.ctj@alibaba-inc.com>
Signed-off-by: Chen Tianjie <TJ_Chen@outlook.com>
2024-06-01 10:09:20 +02:00
nitaicaro
6fb90adf4b
Fix crash where command duration is not reset when client is blocked … (#526)
In #11012, we changed the way command durations were computed to handle
the same command being executed multiple times. In #11970, we added an
assert if the duration is not properly reset, potentially indicating
that a call to report statistics was missed.

I found an edge case where this happens - easily reproduced by blocking
a client on `XGROUPREAD` and migrating the stream's slot. This causes
the engine to process the `XGROUPREAD` command twice:

1. First time, we are blocked on the stream, so we wait for unblock to
come back to it a second time. In most cases, when we come back to
process the command second time after unblock, we process the command
normally, which includes recording the duration and then resetting it.
2. After unblocking we come back to process the command, and this is
where we hit the edge case - at this point, we had already migrated the
slot to another node, so we return a `MOVED` response. But when we do
that, we don’t reset the duration field.

Fix: also reset the duration when returning a `MOVED` response. I think
this is right, because the client should redirect the command to the
right node, which in turn will calculate the execution duration.

Also wrote a test which reproduces this, it fails without the fix and
passes with it.

---------

Signed-off-by: Nitai Caro <caronita@amazon.com>
Co-authored-by: Nitai Caro <caronita@amazon.com>
2024-05-30 12:55:00 -07:00
Binbin
6bab2d7968
Make sure clear the CLUSTER SLOTS cache on time when updating hostname (#564)
In #53, we will cache the CLUSTER SLOTS response to improve the
throughput and reduct the latency.

In the code snippet below, the second cluster slots will use the
old hostname:
```
config set cluster-preferred-endpoint-type hostname
config set cluster-announce-hostname old-hostname.com
multi
cluster slots
config set cluster-announce-hostname new-hostname.com
cluster slots
exec
```

When updating the hostname, in updateAnnouncedHostname, we will set
CLUSTER_TODO_SAVE_CONFIG and we will do a clearCachedClusterSlotsResponse
in clusterSaveConfigOrDie, so harmless in most cases.

Move the clearCachedClusterSlotsResponse call to clusterDoBeforeSleep
instead of scheduling it to be called in clusterSaveConfigOrDie.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-05-30 10:44:12 +08:00
LiiNen
96dcd1183a
Change BITCOUNT 'end' as optional like BITPOS (#118)
_This change is the thing I suggested to redis when it was BSD, and is
not just migration - this is of course more advanced_

### Issue
There is weird difference in syntax between BITPOS and BITCOUNT:
```
BITPOS key bit [start [end [BYTE | BIT]]]
BITCOUNT key [start end [BYTE | BIT]]
```

I think this might cause confusion in terms of usability.
It was not just a syntax typo error, and really works differently.
The results below are with unstable build:
```
> get TEST:ABCD
"ABCD"
> BITPOS TEST:ABCD 1 0 -1
(integer) 1
> BITCOUNT TEST:ABCD 0 -1
(integer) 9
> BITPOS TEST:ABCD 1 0
(integer) 1
> BITCOUNT TEST:ABCD 0
(error) ERR syntax error
```

### What did I fix

simply changes logic, to accept BITCOUNT also without 'end' - 'end'
become optional, like BITPOS
```
> GET TEST:ABCD
"ABCD"
> BITPOS TEST:ABCD 1 0 -1
(integer) 1
> BITCOUNT TEST:ABCD 0 -1
(integer) 9
> BITPOS TEST:ABCD 1 0
(integer) 1
> BITCOUNT TEST:ABCD 0
(integer) 9
```

Of course, I also fixed syntax hint:
```
# ASIS 
> BITCOUNT key [start end [BYTE|BIT]]
# TOBE
> BITCOUNT key [start [end [BYTE|BIT]]]
```

![image](https://github.com/valkey-io/valkey/assets/38001238/8485f58e-6785-4106-9f3f-45e62f90d24b)


### Moreover ...
I hadn't noticed that there was very small dead code in these command
logic, when I wrote PR to redis.
I found it now, when write code again, so I wrote it in valkey.
``` c
/* asis unstable */

/* bitcountCommand() */
if (!strcasecmp(c->argv[4]->ptr,"bit")) isbit = 1;
// ...
if (c->argc < 4) {
    if (isbit) end = (totlen<<3) + 7;
    else end = totlen-1;
}

/* bitposCommand() */
if (!strcasecmp(c->argv[5]->ptr,"bit")) isbit = 1;
// ...
if (c->argc < 5) {
    if (isbit) end = (totlen<<3) + 7;
    else end = totlen-1;
}
```
Bit variable (actually int) "isbit" is only being set as 1, when 'BIT'
is declared.
But we were checking whether 'isbit' is true or false in this 'if'
phrase, even if isbit could never be 1, because argc is always less than
4 (or 5 in bitpos).



I think this minor fixes will make valkey command operation more
consistent.
Of course, this PR contains just changing args from "required" to
"optional", so it will never hurt previous users.

Thanks,

---------

Signed-off-by: LiiNen <kjeonghoon065@gmail.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
2024-05-28 15:01:28 -04:00
uriyage
fd58b73f0a
Introduce shared query buffer for client reads (#258)
This PR optimizes client query buffer handling in Valkey by introducing
a shared query buffer that is used by default for client reads. This
reduces memory usage by ~20KB per client by avoiding allocations for
most clients using short (<16KB) complete commands. For larger or
partial commands, the client still gets its own private buffer.

The primary changes are:

* Adding a shared query buffer `shared_qb` that clients use by default
* Modifying client querybuf initialization and reset logic
* Copying any partial query from shared to private buffer before command
execution
* Freeing idle client query buffers when empty to allow reuse of shared
buffer
* Master client query buffers are kept private as their contents need to
be preserved for replication stream

In addition to the memory savings, this change shows a 3% improvement in
latency and throughput when running with 1000 active clients.

The memory reduction may also help reduce the need to evict clients when
reaching max memory limit, as the query buffer is the main memory
consumer per client.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-28 11:09:37 -07:00
Viktor Söderqvist
4e44f5aae9
Fix races in test for tot-net-in, tot-net-out, tot-cmds (#559)
The races are between the '$rd' client and the 'r' client in the test
case.

Test case "client input output and command process statistics" in
unit/introspection.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-28 17:13:16 +02:00
Ping Xie
e4ead9442b
Make CLUSTER SETSLOT with TIMEOUT 0 block indefinitely (#556)
This aligns the behaviour with established Valkey commands with a
TIMEOUT argument, such as BLPOP.

Fix #422

Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-27 07:11:24 -07:00
Viktor Söderqvist
d72ba06dd0
Make cluster replicas return ASK and TRYAGAIN (#495)
After READONLY, make a cluster replica behave as its primary regarding
returning ASK redirects and TRYAGAIN.

Without this patch, a client reading from a replica cannot tell if a key
doesn't exist or if it has already been migrated to another shard as
part of an ongoing slot migration. Therefore, without an ASK redirect in
this situation, offloading reads to cluster replicas wasn't reliable.

Note: The target of a redirect is always a primary. If a client wants to
continue reading from a replica after following a redirect, it needs to
figure out the replicas of that new primary using CLUSTER SHARDS or
similar.

This is related to #21 and has been made possible by the introduction of
Replication of Slot Migration States in #445.

----

Release notes:

During cluster slot migration, replicas are able to return -ASK
redirects and -TRYAGAIN.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-24 17:58:03 +02:00
Roshan Khatri
c4782066e7
Cache CLUSTER SLOTS response for improving throughput and reduced latency. (#53)
This commit adds a logic to cache `CLUSTER SLOTS` response for reduced
latency and also updates the cache when a change in the cluster is
detected.

Historically, `CLUSTER SLOTS` command was deprecated, however all the
server clients have been using `CLUSTER SLOTS` and have not migrated to
`CLUSTER SHARDS`. In future this logic can be added to any other
commands to improve the performance of the engine.

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-05-22 14:21:41 -07:00
Viktor Söderqvist
efa8ba519b
Finish postponed SCAN changes (#501)
Commit 07ed0eafa98a66 introduced some SCAN improvements, but some
changes were postponed to a later version (8.0), which this PR finishes:

1. Prepare to move the TYPE filtering to the scan callback as well. this
was put on hold since it has side effects that can be considered a
breaking change, which is that we will not attempt to do lazy expire
(delete) a key that was filtered by not matching the TYPE (changing it
would mean TYPE filter starts behaving the same as MATCH filter already
does in that respect).
2. when the specified key TYPE filter is an unknown type, server will
reply a error immediately instead of doing a full scan that comes back
empty handed.

Fixes #235

Release notes:

> SCAN: Expired keys that don't match the TYPE argument for the SCAN are
no longer deleted by SCAN

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-17 13:35:31 +02:00
Ping Xie
fd53f17a61
Use pause_process to stop a node to make Valgrind happy, hopefully (#508)
Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-16 22:59:00 -07:00
Lipeng Zhu
7a9951fb80
Correct the actual allocated size from allocator when call sdsRedize to align the logic with sdsnewlen function. (#476)
This patch try to correct the actual allocated size from allocator
when call sdsRedize to align the logic with sdsnewlen function.

Maybe the https://github.com/valkey-io/valkey/pull/453 optimization
should depend on this.

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2024-05-15 18:22:50 -07:00
Ping Xie
ac47ca2d47
Suppress ASAN errors on tests that intentially crash the server via crash-memcheck-enabled no (#489)
Fix daily CI run errors like
https://github.com/valkey-io/valkey/actions/runs/9039450198/job/24842308071#step:6:4176

Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-12 16:08:47 -07:00
Binbin
fdd023ff82
Migrate cluster mode tests to normal framework (#442)
We currently has two disjoint TCL frameworks:
1. Normal testing framework, which trigger by runtest, which individually
launches nodes for testing.
2. Cluster framework, which trigger by runtest-cluster, which pre-allocates
N nodes and uses them for testing large configurations.

The normal TCL testing framework is much more readily tested and is also
automatically run as part of the CI for new PRs. The runtest-cluster since
it runs very slowly (cannot be parallelized), it currently only runs in daily
CI, this results in some changes to the cluster not being exposed in PR CI
in time.

This PR migrate the Cluster mode tests to normal framework. Some cluster
tests are kept in runtest-cluster because of timing issues or not yet
supported, we can process them later.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-05-09 10:14:47 +08:00
Chen Tianjie
ba9dd7b23a
Add noscores option to command ZSCAN. (#324)
Command syntax is now:
```
ZSCAN key cursor [MATCH pattern] [COUNT count] [NOSCORES]
```
Return format:
```
127.0.0.1:6379> zadd z 1 a 2 b 3 c
(integer) 3
127.0.0.1:6379> zscan z 0
1) "0"
2) 1) "a"
   2) "1"
   3) "b"
   4) "2"
   5) "c"
   6) "3"
127.0.0.1:6379> zscan z 0 noscores
1) "0"
2) 1) "a"
   2) "b"
   3) "c"
```
when NOSCORES is on, the command will only return members in the zset,
without scores.

For client side parsing the command return, I believe it is fine as long
as the command is backwards compatible. The return structure are still
lists, what has changed is the content. And clients can tell the
difference by the subcommand they use.

Since `novalues` option of `HSCAN` is already accepted
(redis/redis#12765), I think similar thing can be done to `ZSCAN`.

---------

Signed-off-by: Chen Tianjie <TJ_Chen@outlook.com>
2024-05-07 14:39:28 +08:00
Ping Xie
6e7af9471c
Slot migration improvement (#445) 2024-05-06 21:40:28 -07:00
Chen Tianjie
cc703aa3bc
Input output traffic stats and command process count for each client. (#327)
We already have global stats for input traffic, output traffic and how
many commands have been executed.

However, some users have the difficulty of locating the IP(s) which have
heavy network traffic. So here some stats for single client are
introduced.
```              
tot-net-in   // Total network input bytes read from the client
tot-net-out  // Total network output bytes sent to the client
tot-cmds     // Total commands the client has executed     
```                             
These three stats are shown in `CLIENT LIST` and `CLIENT INFO`.

Though the metrics are handled in hot paths of the code, personally I
don't think it will slow down the server. Considering all other complex
operations handled nearby, this is only a small and simple operation.

However we do need to be cautious when adding more and more metrics, as
discussed in redis/redis#12640, we may need to find a way to tell
whether this has obvious performance degradation.

---------

Signed-off-by: Chen Tianjie <TJ_Chen@outlook.com>
2024-05-05 21:52:59 -07:00
Shivshankar
52f9291f79
Rename redis to valkey in test suite logs and test names. (#366)
This PR covers below cases.
1. test suite's prints(i.e., puts statement logs).
2. Links refering to redis issues.
3. test names contains redis.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-25 15:13:21 +08:00
Shivshankar
8baf322742
Rename remaining test procedures (#355)
Renamed below procedures and variables (missed in #287) as follows.

redis_cluster             ->     valkey_cluster
redis1                    ->     valkey1
redis2                    ->     valkey2
get_redis_dir             ->     get_valkey_dir
test_redis_cli_rdb_dump   ->     test_valkey_cli_rdb_dump
test_redis_cli_repl       ->     test_valkey_cli_repl
redis-cli                 ->     valkey-cli
redis_reset_state         ->     valkey_reset_state
redisHandle               ->     valkeyHandle
redis_safe_read           ->     valkey_safe_read
redis_safe_gets           ->     valkey_safe_gets
redis_write               ->     valkey_write
redis_read_reply          ->     valkey_read_reply
redis_readable            ->     valkey_readable
redis_readnl              ->     valkey_readnl
redis_writenl             ->     valkey_writenl
redis_read_map            ->     valkey_read_map
redis_read_line           ->     valkey_read_line
redis_read_null           ->     valkey_read_null
redis_read_bool           ->     valkey_read_bool
redis_read_double         ->     valkey_read_double
redis_read_verbatim_str   ->     valkey_read_verbatim_str
redis_call_callback       ->     valkey_call_callback

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-24 18:01:33 +02:00
Shivshankar
669f1d3014
redisbenchmark to valkeybenchmark in test directory and some test name rename. (#347)
This pr covers following chnages.
1. redisbenchmark to valkeybenchmark in test directory 
2. Removed redis from some test's title and changed the name
accordingly.
3. Updated test suite name and redis-server to valkey readme in test
directory.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-23 10:51:53 -07:00
Shivshankar
34413e0862
Replace "redis" with "valkey" test code (#287)
Occurrences of "redis" in TCL test suites and helpers, such as TCL
client used in tests, are replaced with "valkey".

Occurrences of uppercase "Redis" are not changed in this PR.

No files are renamed in this PR.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-18 15:57:17 +02:00
Viktor Söderqvist
9e2b7838ea
Add 'extended-redis-compatibility' config (#306)
New config 'extended-redis-compatibility' (yes/no) default no

* When yes:
  * Use "Redis" in the following error replies:
    - `-LOADING Redis is loading the dataset in memory`
    - `-BUSY Redis is busy`...
    - `-MISCONF Redis is configured to`...
* Use `=== REDIS BUG REPORT` in the crash log delimiters (START and
END).
* The HELLO command returns `"server" => "redis"` and `"version" =>
"7.2.4"` (our Redis OSS compatibility version).
  * The INFO field for mode is called `"redis_mode"`.
* When no:
* Use "Valkey" instead of "Redis" in the mentioned errors and crash log
delimiters.
* The HELLO command returns `"server" => "valkey"` and the Valkey
version for `"version"`.
  * The INFO field for mode is called `"server_mode"`.

* Documentation added in valkey.conf:

> Valkey is largely compatible with Redis OSS, apart from a few cases
where
> Redis OSS compatibility mode makes Valkey pretend to be Redis. Enable
this
  > only if you have problems with tools or clients. This is a temporary
> configuration added in Valkey 8.0 and is scheduled to have no effect
in Valkey
  > 9.0 and be completely removed in Valkey 10.0.

* A test case for the config is added. It is designed to fail if the
config is not deprecated (has no effect) in Valkey 9 and deleted in
Valkey 10.

* Other test cases are adjusted to work regardless of this config.

Fixes #274
Fixes #61

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-18 14:10:24 +02:00
Chen Tianjie
5d23f8f58a
Complete fields in client list and client info test. (#326)
Add `lib-name` and `lib-ver` check.

Signed-off-by: Chen Tianjie <TJ_Chen@outlook.com>
2024-04-17 11:33:25 +08:00
Viktor Söderqvist
8dcc8ebba4
Remove 'Redis' in error replies (#206)
Low-risk error replies containing "Redis" are changed.

In most cases, the word "Redis" is simply removed from the error message,
such as in "This Redis instance is not configured to use an ACL file. (...)",
the message is changed to "This instance is not configured to use an ACL
file. (...)".

Additionally, error replies from `redis.call` in a Lua script are
affected, such as

* "Please specify at least one argument for this redis lib call"
* "Wrong number of args calling Redis command from script"
* "Unknown Redis command called from script"
* "Invalid command passed to redis.acl_check_cmd()"

The name Redis is simply removed from these error message. In the last
one above, "redis.acl_check_cmd()" is replaced by
"server.acl_check_cmd()" in the error message.

The following error replies are considered high of causing problems for
clients, so they are not changed in this commit:

* (not in scope) "-MISCONF Redis is configured to save RDB snapshots
(...)"
* (not in scope) "-LOADING Redis is loading the dataset in memory"
* (not in scope) "-BUSY Redis is busy running a script (...)"

Fixes #204

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-16 21:17:38 +02:00
jonghoonpark
c090874ed4
List test files dynamically (#313)
Motivation: Currently we have to manually update the all_tests variable
when introducing new test files.

Fix: I've modified it to list test files dynamically, but rather than
modify it to add all test files, I've first modified it to only add test
files from the following 4 paths so that it doesn't deviate too much
from what we already do

- unit
- unit/type
- unit/cluster
- integration

Related issue: https://github.com/valkey-io/valkey/issues/302

---------

Signed-off-by: jonghoonpark <dev@jonghoonpark.com>
2024-04-15 14:25:33 +02:00
Parth
644692db79
Fixing a lua debugger bug that prevented use of 'server' for server.call invocations. (#303)
* Tested it on local instance. This was originally part of
https://github.com/valkey-io/valkey/pull/288 but I am pushing this
separately, so that we can easily merge it into the upcoming release.

```
lua debugger> server ping
<redis> ping
<reply> "+PONG"
lua debugger> redis ping
<redis> ping
<reply> "+PONG"
```

* I also searched for lua debugger related unit tests to add coverage
for this but did not find any relevant test to modify. Leaving it at
that for now.

---------

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
2024-04-11 15:54:39 -07:00
Shivshankar
a054862b72
Rename redis_client* procedure to valkey_client* in test environment (#276)
Renamed redis-client* procedure to valkey_client*

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-10 10:18:47 -04:00
Shivshankar
da831c0d22
rename procedure redis_deferring_client to valkey_deferring_client (#270)
Updated procedure redis_deferring_client in test environent to
valkey_deferring_client.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-09 10:38:09 -04:00
Jacob Murphy
df5db0627f
Remove trademarked language in code comments (#223)
This includes comments used for module API documentation.

* Strategy for replacement: Regex search: `(//|/\*| \*|#).* ("|\()?(r|R)edis( |\.
  |'|\n|,|-|\)|")(?!nor the names of its contributors)(?!Ltd.)(?!Labs)(?!Contributors.)`
* Don't edit copyright comments
* Replace "Redis version X.X" -> "Redis OSS version X.X" to distinguish
from newly licensed repository
* Replace "Redis Object" -> "Object"
* Exclude markdown for now
* Don't edit Lua scripting comments referring to redis.X API
* Replace "Redis Protocol" -> "RESP"
* Replace redis-benchmark, -cli, -server, -check-aof/rdb with "valkey-"
prefix
* Most other places, I use best judgement to either remove "Redis", or
replace with "the server" or "server"

Fixes #148

---------

Signed-off-by: Jacob Murphy <jkmurphy@google.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-09 10:24:03 +02:00
Harkrishn Patro
ebfb440629
Pass extensions to node if extension processing is handled by it (#52)
Ref: https://github.com/redis/redis/pull/12760

### Description

#### Fixes compatibilty of PlaceholderKV cluster (7.2 - extensions
enabled by default) with older Redis cluster (< 7.0 - extensions not
handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix introduces a flag `extensions_supported` on the clusterMsg
indicating the sender node can handle extensions parsing. Once, a
receiver nodes receives a message with this flag set to 1, it would
update clusterNode new field extensions_supported and start sending out
extensions if it has any.


This PR also introduces a DEBUG sub command to enable/disable cluster
message extensions `process-clustermsg-extensions` feature.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `extensions_supported` and then extensions message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

### Testing

TCL test verifying the cluster state is healthy irrespective of
enabling/disabling cluster message extensions feature.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-04-08 09:01:30 -07:00
Parth
620d325fdc
Adding server.call/pcall option to LUA scripting. (#136) (#213)
This commit does not remove redis.call/pcall just yet. It also does not
rename Redis in error messages such as "Please specify at least one
argument for this redis lib call". This allows users to maintain full
backwards compatibility while introducing an option to use server.call
for new code.

I verified that the unit tests pass. Also manually verified that the
redis-server responds to server.call invocations within lua scripting.
Also verified that function registration works as expected.

```
[ok]: EVAL - is Lua able to call Redis API? (0 ms)
[ok]: EVAL - is Lua able to call Server API? (1 ms)
[ok]: EVAL - No arguments to redis.call/pcall is considered an error (0 ms)
[ok]: EVAL - No arguments to server.call/pcall is considered an error (1 ms)
```

---------

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-05 21:17:11 -07:00
Madelyn Olson
bc28fb4ac0
Update Server version to valkey version (#232)
This commit updates the following fields:
1. server_version -> valkey_version in server info. Since we would like
to advertise specific compatibility, we are making the version specific
to valkey. servername will remain as an optional indicator, and other
valkey compatible stores might choose to advertise something else.
1. We dropped redis-ver from the API. This isn't related to API
compatibility, but we didn't want to "fake" that valkey was creating an
rdb from a Redis version.
1. Renamed server-ver -> valkey_version in rdb info. Same as point one,
we want to explicitly indicate this was created by a valkey server.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-05 21:15:57 -07:00
Wen Hui
7f5bcc96f0
Update some valkey-cli related in tcl (#236)
Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-05 16:46:33 -04:00
Wen Hui
490f4ebb65
Update runtest test name and test filename (#214)
Update runtest test name and test filename

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-04 18:41:45 -07:00
Madelyn Olson
39d0f457a2
Update versioning fields for compatibility (#47)
New info information to be used to determine the valkey versioning info.

Internally, introduce new define values for "SERVER_VERSION" which is
different from the Redis compatibility version, "REDIS_VERSION".

Add two new info fields:
`server_version`: The Valkey server version
`server_name`: Indicates that the server is valkey.

Add one new RDB field: `server_ver`, which indicates the valkey version
that produced the server.

Add 3 new LUA globals: `SERVER_VERSION_NUM`, `SERVER_VERSION`, and
`SERVER_NAME`. Which reflect the valkey version instead of the Redis
compatibility version.

Also clean up various places where Redis and configuration was being
used that is no longer necessary.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-03 14:52:36 -07:00
Shivshankar
f3ccfbb01f
Rename TLS test cert files to valkey (#186)
This PR covers changing the redis.crt and redis.key to valkey certs for
TLS testing.

The files are generated by the gen-test-certs.sh script under tests/tls/.

Also covers comments provided.

Signed-off-by: hwware <wen.hui.ware@gmail.com>
Co-authored-by: hwware <wen.hui.ware@gmail.com>
2024-04-03 23:04:52 +02:00
Harkrishn Patro
1736018aa9
Remove trademarked wording on configuration file and individual configs (#29)
Remove trademarked wording on configuration layer.

Following changes for release notes:

1. Rename redis.conf to valkey.conf
2. Pre-filled config in the template config file: Changing pidfile to `/var/run/valkey_6379.pid`

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-04-03 19:47:26 +02:00
John Vandenberg
83507d74b8
Update for latest 'typos' (#139)
Signed-off-by: john vandenberg <jayvdb@gmail.com>
Co-authored-by: john vandenberg <jayvdb@192-168-1-101.tpgi.com.au>
2024-04-02 14:51:14 +02:00
John Vandenberg
253fe9dced
Fix typos and replace 'codespell' with 'typos' (#72)
Uses https://github.com/taiki-e/install-action to install
https://github.com/crate-ci/typos in CI

This finds many more/different typos than
https://github.com/codespell-project/codespell , while having very few
false positives.

Signed-off-by: John Vandenberg <jayvdb@gmail.com>
2024-03-31 12:38:22 -07:00
Madelyn Olson
57789d4d08
Update naming to to Valkey (#62)
Documentation references should use `Valkey` while server and cli
references are all under `valkey`.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-03-28 09:58:28 -07:00
Harkrishn Patro
c120a45874
Sharded pubsub command execution within multi/exec (#13)
Allow SPUBLISH command within multi/exec on replica.
2024-03-27 21:29:44 -07:00
Madelyn Olson
084ab10e17 Disabled some workflows for now 2024-03-21 19:50:02 -07:00
Madelyn Olson
3586485355 Moved to correct cli and benchmark 2024-03-21 19:16:03 -07:00
Madelyn Olson
ee107481e5 Refactor some tests to reference new executable 2024-03-21 19:11:08 -07:00
Yanqi Lv
bad33f8738
fix wrong data type conversion in zrangeResultBeginStore (#13148)
In `beginResultEmission`, -1 means the result length is not known in
advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it
will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`.
Although `dictExpand` won't succeed because the size overflows, I think
we'd better to avoid this wrong conversion.

This bug can be triggered when the source of `zrangestore` doesn't exist
or we use `zrangestore` command with `byscore` or `bylex`.
The impact is that dst keys will be converted to use skiplist instead of
listpack.
2024-03-19 08:52:55 +02:00
Binbin
e04d41d78d
Prevent lua error_reply abuse from causing errorstats to become larger (#13141)
Users who abuse lua error_reply will generate a new error object on each
error call, which can make server.errors get bigger and bigger. This
will
cause the server to block when calling INFO (we also return errorstats
by
default).

To prevent the damage it can cause, when a misuse is detected, we will
print a warning log and disable the errorstats to avoid adding more new
errors. It can be re-enabled via CONFIG RESETSTAT.

Because server.errors may be very large (it may be better now since we
have the limit), config resetstat may block for a while. So in
resetErrorTableStats, we will try to lazyfree server.errors.

See the related discussion at the end of #8217.
2024-03-19 08:18:22 +02:00
Binbin
7b070423b8
Fix dictionary use-after-free in active expire and make kvstore iter to respect EMPTY flag (#13135)
After #13072, there is an use-after-free error. In expireScanCallback, we
will delete the dict, and then in dictScan we will continue to use the dict,
like we will doing `dictResumeRehashing(d)` in the end, this casued an error.

In this PR, in freeDictIfNeeded, if the dict's pauserehash is set, don't
delete the dict yet, and then when scan returns try to delete it again.

At the same time, we noticed that there will be similar problems in iterator.
We may also delete elements during the iteration process, causing the dict
to be deleted, so the part related to iter in the PR has also been modified.
dictResetIterator was also missing from the previous kvstoreIteratorNextDict,
we currently have no scenario that elements will be deleted in kvstoreIterator
process, deal with it together to avoid future problems. Added some simple
tests to verify the changes.

In addition, the modification in #13072 omitted initTempDb and emptyDbAsync,
and they were also added. This PR also remove the slow flag from the expire
test (consumes 1.3s) so that problems can be found in CI in the future.
2024-03-18 17:41:54 +02:00
Binbin
ad28d222ed
Lua eval scripts first in first out LRU eviction (#13108)
In some cases, users will abuse lua eval. Each EVAL call generates
a new lua script, which is added to the lua interpreter and cached
to redis-server, consuming a large amount of memory over time.

Since EVAL is mostly the one that abuses the lua cache, and these
won't have pipeline issues (i.e. the script won't disappear
unexpectedly,
and cause errors like it would with SCRIPT LOAD and EVALSHA),
we implement a plain FIFO LRU eviction only for these (not for
scripts loaded with SCRIPT LOAD).

### Implementation notes:
When not abused we'll probably have less than 100 scripts, and when
abused we'll have many thousands. So we use a hard coded value of 500
scripts. And considering that we don't have many scripts, then unlike
keys, we don't need to worry about the memory usage of keeping a true
sorted LRU linked list. We compute the SHA of each script anyway,
and put the script in a dict, we can store a listNode there, and use
it for quick removal and re-insertion into an LRU list each time the
script is used.

### New interfaces:
At the same time, a new `evicted_scripts` field is added to
INFO, which represents the number of evicted eval scripts. Users
can check it to see if they are abusing EVAL.

### benchmark:
`./src/redis-benchmark -P 10 -n 1000000 -r 10000000000 eval "return
__rand_int__" 0`

The simple abuse of eval benchmark test that will create 1 million EVAL
scripts. The performance has been improved by 50%, and the max latency
has dropped from 500ms to 13ms (this may be caused by table expansion
inside Lua when the number of scripts is large). And in the INFO memory,
it used to consume 120MB (server cache) + 310MB (lua engine), but now
it only consumes 70KB (server cache) + 210KB (lua_engine) because of
the scripts eviction.

For non-abusive case of about 100 EVAL scripts, there's no noticeable
change in performance or memory usage.

### unlikely potentially breaking change:
in theory, a user can maybe load a
script with EVAL and then use EVALSHA to call it (by calculating the
SHA1 value on the client side), it could be that if we read the docs
carefully we'll realized it's a valid scenario, but we suppose it's
extremely rare. So it may happen that EVALSHA acts on a script created
by EVAL, and the script is evicted and EVALSHA returns a NOSCRIPT error.
that is if you have more than 500 scripts being used in the same
transaction / pipeline.

This solves the second point in #13102.
2024-03-13 08:27:41 +02:00
Ronen Kalish
a8e745117f
Xread last entry in stream (#7388) (#13117)
Allow using `+` as a special ID for last item in stream on XREAD
command.

This would allow to iterate on a stream with XREAD starting with the
last available message instead of the next one which `$` is used for.
I.e. the caller can use `BLOCK` and `+` on the first call, and change to
`$` on the next call.

Closes #7388

---------

Co-authored-by: Felipe Machado <462154+felipou@users.noreply.github.com>
2024-03-13 08:23:32 +02:00