9240 Commits

Author SHA1 Message Date
Binbin
a21fe718f4
Limit CLUSTER_CANT_FAILOVER_DATA_AGE log to 10 times period (#1189)
If a replica is step into data_age too old stage, it can not
trigger the failover and currently it can not be automatically
recovered and we will print a log every
CLUSTER_CANT_FAILOVER_RELOG_PERIOD,
which is every second. If the primary has not recovered or there is
no manual failover, this log will flood the log file.

In this case, limit its frequency to 10 times period, which is
10 seconds in our code. Also in this data_age too old stage,
the repeated logs also can stand for the progress of the failover.

See also #780 for more details about it.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-10-24 16:38:47 +08:00
muelstefamzn
c419524c05
Trim free space from inline command argument strings to avoid excess memory usage (#1213)
The command argument strings created while parsing inline commands (see
`processInlineBuffer()`) can contain free capacity. Since some commands
,such as `SET`, store these strings in the database, that free capacity
increases the memory usage. In the worst case, it could double the
memory usage.

This only occurs if the inline command format is used. The argument
strings are built by appending character by character in
`sdssplitargs()`. Regular RESP commands are not affected.

This change trims the strings within `processInlineBuffer()`.

### Why `trimStringObjectIfNeeded()` within `object.c` is not solving
this?

When the command argument string is packed into an object,
`trimStringObjectIfNeeded()` is called.

This does only trim the string if it is larger than
`PROTO_MBULK_BIG_ARG` (32kB), as only strings larger than this would
ever need trimming if the command it sent using the bulk string format.

We could modify this condition, but that would potentially have a
performance impact on commands using the bulk format. Since those make
up for the vast majority of executed commands, limiting this change to
inline commands seems prudent.

### Experiment Results

* 1 million `SET [key] [value]` commands
* Random keys (16 bytes)
* 600 bytes values

Memory usage without this change:

```
used_memory:1089327888
used_memory_human:1.01G
used_memory_rss:1131696128
used_memory_rss_human:1.05G
used_memory_peak:1089348264
used_memory_peak_human:1.01G
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:1040025088
used_memory_dataset_perc:95.55%
```

Memory usage with this change:
```
used_memory:705327888
used_memory_human:672.65M
used_memory_rss:718802944
used_memory_rss_human:685.50M
used_memory_peak:705348256
used_memory_peak_human:672.67M
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:656025088
used_memory_dataset_perc:93.13%
```

If the same experiment is repeated using the normal RESP array of bulk
string format (`*3\r\n$3\r\nSET\r\n...`) then the memory usage is 672MB
with and without of this change.

If a replica is attached, its memory usage is 672MB with and without
this change, since the replication link never uses inline commands.

Signed-off-by: Stefan Mueller <muelstef@amazon.com>
2024-10-23 16:56:32 -07:00
danish-mehmood
c176de4251
Clarify the wording from dually to the more common doubly (#1214)
Clarify documentation is ziplist.c

Signed-off-by: danish-mehmood <rdm355190@gmail.com>
2024-10-23 14:30:42 -07:00
Binbin
b803f7aeff
Cleaned up getSlotOrReply is return -1 instead of C_ERR (#1211)
Minor cleanup since getSlotOrReply return -1 on error, not return C_ERR.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-23 17:11:42 +08:00
Binbin
5d70ccd70e
Make replica CLUSTER RESET flush async based on lazyfree-lazy-user-flush (#1190)
Currently, if the replica has a lot of data, CLUSTER RESET
will block for a while and report the slowlog, and it seems
that there is no harm in making it async so external components
can be easier when monitoring it.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-10-23 10:22:25 +08:00
Shivshankar
285064b114
fix typo (#1202)
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-21 22:54:40 -04:00
Shivshankar
771918e4bf
Updating command.def by running the generate-command-code.py (#1203)
Part of https://github.com/valkey-io/valkey/pull/1200 PR, since feild is
changed. Looks like commands.def is missed to get genereated based on
the changes so that is causing CI failure on unstable.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-21 13:48:29 -07:00
Viktor Söderqvist
5885dc56bd
Fix BGSAVE CANCEL since and history fields (#1200)
Fixes wrong "since" and "history" introduced in #757.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-21 16:04:47 +02:00
ranshid
29b83f1ac8
Introduce bgsave cancel (#757)
In some cases bgsave child process can run for a long time exhausting
system resources. Although it is possible to kill the bgsave child
process from the system shell, sometimes it is not possible allowing OS
level access.

This PR adds a new subcommand to the BGSAVE command.
When user will issue `BGSAVE CANCEL`, it will do one of the 2:

1. In case a bgsave child process is currently running, the child
   process would be immediately killed thus terminating any
   save/replication full sync process.
2. In case a bgsave child process is SCHEDULED to run, the scheduled
   execution will be cancelled.

---------

Signed-off-by: ranshid <ranshid@amazon.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-21 11:56:44 +02:00
zhenwei pi
71f8c34eed
RDMA: Fix listener priv opaque pointer (#1194)
struct connListener.priv should be used by connection type specific
data, static local listener data should not use this.

A RDMA config structure is going to be introduced in the next step:

```
typedef struct serverRdmaContextConfig {
    char *bindaddr;
    int bindaddr_count;
    int port;
    int rx_size;
    int comp_vector;
    ...
} serverRdmaContextConfig;
```

Then a builtin RDMA will be supported.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-10-21 10:11:27 +02:00
Binbin
2743b7e04b
Fix SORT GET to ignore special pattern # in cluster slot check (#1182)
This special pattern '#' is used to get the element itself,
it does not actually participate in the slot check.

In this case, passing `GET #` will cause '#' to participate
in the slot check, causing the command to get an
`pattern may be in different slots` error.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-19 14:56:10 +08:00
zhenwei pi
64cfdf61eb
Introduce connection context for Unix socket (#1160)
Hide 'unixsocketgroup' and 'unixsocketperm' into a Unix socket specific
data structure. A single opaque pointer 'void *priv' is enough for a
listener. Once any new config is added, we don't need 'void *priv2',
'void *priv3' and so on.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-10-18 17:48:18 -07:00
Lipeng Zhu
a62d1f177b
Fix false sharing issue between main thread and io-threads when access used_memory_thread. (#1179)
When profiling some workloads with `io-threads` enabled. We found the
false sharing issue is heavy.

This patch try to split the the elements accessed by main thread and
io-threads into different cache line by padding the elements in the head
of `used_memory_thread_padded` array. 

This design helps mitigate the false sharing between main
thread and io-threads, because the main thread has been the bottleneck
with io-threads enabled. We didn't put each element in an individual
cache line is that we don't want to bring the additional cache line
fetch operation (3 vs 16 cache line) when call function like
`zmalloc_used_memory()`.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Signed-off-by: Lipeng Zhu <zhu.lipeng@outlook.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-17 12:37:10 +02:00
Binbin
701ab72429
Remove the restriction that cli --cluster create requires at least 3 primary nodes (#1075)
There is no limitation in Valkey to create a cluster with 1 or 2 primaries,
only that it cannot do automatic failover. Remove this restriction and
add `are you sure` prompt to prompt the user.

This allow we use it to create a test cluster by cli or by
create-cluster.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-17 13:33:44 +08:00
Nadav Levanoni
136d0fd212
Add 'WithDictIndex' expiry API and update RANDOMKEY command (#1155)
https://github.com/valkey-io/valkey/issues/1145

First part of a two-step effort to add `WithSlot` API for expiry. This
PR is to fix a crash that occurs when a RANDOMKEY uses a different slot
than the cached slot of a client during a multi-exec.

The next part will be to utilize the new API as an optimization to
prevent duplicate work when calculating the slot for a key.

---------

Signed-off-by: Nadav Levanoni <nadavl@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Nadav Levanoni <nadavl@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-10-16 17:40:11 -07:00
Amit Nagler
b0f23df165
Refactor return and goto statements (#945)
Consolidate the cleanup of local variables to a single point within the
method, ensuring proper resource management and p
reventing memory leaks or double-free issues.

Previoslly descused here:
- https://github.com/valkey-io/valkey/pull/60#discussion_r1667872633
- https://github.com/valkey-io/valkey/pull/60#discussion_r1668045666

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Amit Nagler <58042354+naglera@users.noreply.github.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-10-15 09:26:42 -07:00
Binbin
247a8f23c5
Fix FUNCTION KILL error message being displayed as SCRIPT KILL (#1171)
The client that was killed by FUNCTION KILL received a reply of
SCRIPT KILL and the server log also showed SCRIPT KILL.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-15 23:32:42 +08:00
Binbin
dc05a327f9
Take hz into account in activerehashing to avoid CPU spikes (#977)
Currently in conf we describe activerehashing as: Active rehashing
uses 1 millisecond every 100 milliseconds of CPU time. This is the
case for hz = 10.

If we change hz, the description in conf will be inaccurate. Users
may notice that the server spends some CPU (used in activerehashing)
at high hz but don't know why, since our cron calls are fixed to 1ms.

This PR takes hz into account and fixed the CPU usage at 1% (this may
not be accurate in some cases because we do 100 step rehashing in
dictRehashMicroseconds but it can avoid CPU spikes in this case).

This PR also improves the description of the activerehashing
configuration item to explain this change.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-15 23:32:22 +08:00
Romain Geissler @ Amadeus
e30ae762a8
Rename z{malloc,calloc,realloc,free} into valkey_{malloc,calloc,realloc,free} (#1169)
The zcalloc symbol is a symbol name already used by zlib, which is
defining other names using the "z" prefix specific to zlib. In practice,
linking valkey with a static openssl, which itself might depend on a
static libz will result in link time error rejecting multiple symbol
definitions.

Fixes: #1157

Signed-off-by: Romain Geissler <romain.geissler@amadeus.com>
2024-10-15 13:05:22 +02:00
Binbin
87b5e13465
Use listLast to replace listIndex -1 (#1163)
Minor cleanup, listLast do the same thing and is widely used
and easier to understand (less code).

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-15 10:29:52 +08:00
Binbin
1a5c80fe90
Minor comments cleanup around replication.c (#1154)
Typo, comment cleanups.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-14 12:37:19 +08:00
Masahiro Ide
262d970a50
Move prepareClientToWrite out of loop for HGETALL command (#1119)
Similar to #860 but this is for HGETALL families (HGETALL/HKEYS/HVALS).
This patch moves `prepareClientToWrite` out of the loop to reduce the
function overhead.

Signed-off-by: Masahiro Ide <imasahiro9@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-10-11 21:28:42 -07:00
Binbin
014219879d
Fix typo last_procssed -> last_processed (#1142)
Minor typo.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-11 00:09:22 +08:00
Shivshankar
079f18ad97
Add io-threads-do-reads config to deprecated config table to have no effect. (#1138)
this fixes: https://github.com/valkey-io/valkey/issues/1116

_Issue details from #1116 by @zuiderkwast_ 

> This config is undocumented since #758. The default was changed to
"yes" and it is quite useless to set it to "no". Yet, it can happen that
some user has an old config file where it is explicitly set to "no". The
result will be bad performace, since I/O threads will not do all the
I/O.
> 
> It's indeed confusing.
> 
> 1. Either remove the whole option from the code. And thus no need for
documentation. _OR:_
> 2. Introduce the option back in the configuration, just as a comment
is fine. And showing the default value "yes": `# io-threads-do-reads
yes` with additional text.
> 
> _Originally posted by @melroy89 in [#1019 (reply in
thread)](https://github.com/orgs/valkey-io/discussions/1019#discussioncomment-10824778)_

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-10 17:46:09 +02:00
Roshan Khatri
9b8a06137c
Fix empty response for ACL CAT category subcommand for module defined categories (#1140)
The module commands which were added to acl categories were getting
skipped when `ACL CAT category` command was executed.

This PR fixes the bug.
Before:
```
127.0.0.1:6379> ACL CAT foocategory
(empty array)
```
After:
```
127.0.0.1:6379> ACL CAT foocategory
aclcheck.module.command.test.add.new.aclcategories
```

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
2024-10-09 21:20:47 -07:00
kronwerk
cd8de095c4
Add flush-before-load option for repl-diskless-load (#909)
A new option for diskless replication on the replica side.

After a network failure, the replica may need to perform a full sync.
The other option for diskless full sync is `swapdb`, but it uses twice
as much memory, temporarily. In situations where this is not acceptable,
and where losing data is acceptable, the `flush-before-load` can be
useful. If the full sync fails, the old data is lost though. Therefore,
the new option is marked as "dangerous".

---------

Signed-off-by: kronwerk <ca11e5e22g@gmail.com>
Signed-off-by: kronwerk <kronwerk@users.noreply.github.com>
Co-authored-by: kronwerk <ca11e5e22g@gmail.com>
2024-10-09 13:11:53 +02:00
Binbin
1892f8a731
Add server log when module load fails with busy name (#1084)
Currently when module loading fails due to busy name, we
don't have a clean way to assist to troubleshooting.

Case 1: when loading the same module multiple times, we can
not detemine the cause of its failure without referring to
the module list or the earliest module load log. The log
may not exist and sometimes it is difficult for people
to associate module list.

Case 2: when multiple modules use the same module name,
we can not quickly associate the busy name without referring
to the module list and the earliest module load log.
Different people wrote modules with the same module name,
they don't easily associate module name.

So in this PR, when doing module onload, we will try to
print a busy name log if this happen. Currently we check
ctx.module since if it is NULL it means the Init call
failed, and Init currently only fails with busy name.

It's kind of ugly. It would have been nice if we could have had a
better way for onload to signal why the load failed.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-09 16:10:29 +08:00
chx9
cba8eaf4c9
fix typo (#1136)
Signed-off-by: chx9 <cheng.huan@icloud.com>
2024-10-08 08:07:51 -07:00
Madelyn Olson
e617bf2ddc
Removing incorrect comment about a warning (#1132)
There is a lot of bad legacy usage of `default:` with enums, which is an
anti-pattern. If you omit the default, the compiler will tell you if a
new enum value was added and that it is missing from a switch statement.

Someone mentioned on another PR they used `default:` because of this
warning, so just removing it, but might create an issue to do a wider
cleanup.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-10-07 11:56:15 -07:00
Masahiro Ide
b5eb793079
Eliminate hashTypeIterator memory allocation by assigning it on stack (#1105)
Signed-off-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
Signed-off-by: Masahiro Ide <imasahiro9@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
2024-10-06 21:34:45 +02:00
otheng
a1cc7c263a
Reuse obey_client variable in processCommand() function (#1101)
I’ve prepared a minor fix for `processCommand()` function. 

In `processCommand()`, the `obey_client` variable is created, but some
conditional statements call the `mustObeyClient()` function instead of
reusing `obey_client`.

I’ve modified these statements to `reuse obey_client`.

Since I’m relatively new to Redis, please let me know if there are any
reasons why the conditional statements need to call `mustObeyClient()`
again.

Thank you for taking the time to review my PR.

Signed-off-by: otheng03 <07c00h@gmail.com>
2024-10-06 10:40:58 -07:00
Parth
d8cd3527bf
Removing Redis from internal lua function names and comments (#1102)
Improved documentation and readability of lua code as well as removed references to Redis.

---------

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
2024-10-04 12:58:42 -07:00
Ricardo Dias
6a8540cefe
Fix some unitialized fields in client struct (#1126)
This commit adds initialization code for the fields
`io_last_reply_block` and `io_last_bufpos` of the `client` struct.

While in the current code flow, these fields are only accessed after
being written in the `trySendWriteToIOThreads`, I discovered that they
were not being initialized while doing some changes to the code flow of
IO threads.

I believe it's good pratice to initialize all fields of a struct upon
creation, and will avoid future bugs which are usually hard to debug.

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2024-10-04 09:17:49 -07:00
Viktor Söderqvist
dcac3e1499
Fix undefined-santitizer warning in rax test (#1122)
Fix the warning introduced in #688:

```
unit/test_rax.c:168:15: runtime error: left shift of 36625 by 16 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior unit/test_rax.c:168:15 in 
Fuzz test in mode 1 [7504]: 
```

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-03 17:34:03 +02:00
Madelyn Olson
150c197bdd
Apply CVE patches for CVE-2024-31449, CVE-2024-31227, CVE-2024-31228 (#1115)
Applying the CVEs against mainline.

(CVE-2024-31449) Lua library commands may lead to stack overflow and
potential RCE.
(CVE-2024-31227) Potential Denial-of-service due to malformed ACL
selectors.
(CVE-2024-31228) Potential Denial-of-service due to unbounded pattern
matching.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-10-02 19:22:09 -04:00
Guillaume Koenig
f85d8bfde9
Rax size tracking (#688)
Introduce a `size_t` field into the rax struct to track allocation size.
Update the allocation size on rax insert and deletes.
Return the allocation size when `raxAllocSize` is called.

This size tracking is now used in MEMORY USAGE and MEMORY STATS in place
of the previous method based on sampling.

The module API allows to create sorted dictionaries, which are backed by
rax. Users now also get precise memory allocation for them (through
`ValkeyModule_MallocSizeDict`).

Fixes #677.

For the release notes:

* MEMORY USAGE and MEMORY STATS are now exact for streams, rather than
based on sampling.

---------

Signed-off-by: Guillaume Koenig <knggk@amazon.com>
Signed-off-by: Guillaume Koenig <106696198+knggk@users.noreply.github.com>
Co-authored-by: Joey <yzhaon@amazon.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-02 19:28:55 +02:00
Wen Hui
613e4e028f
Update keyspace notifications link to valkey.io in code comment (#1100)
As title description


![image](https://github.com/user-attachments/assets/655324e6-b042-4c2f-b558-b912a7d2c10c)

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-10-01 04:30:35 -04:00
Masahiro Ide
ac569c09f8
Create empty lua tables with specified initial capacity as much as possible (#1092)
Currently, we create a Lua table without initial capacity even when the
capacity is known. As a result, we need to resize the Lua tables
repeatedly when converting RESP serialized object to Lua object and it
consumes extra cpu resources a bit when we need to transfer
RESP-serialized data to Lua world.

This patch try to remove this extra resize to reduce (re-)allocation
overhead.

| name | unstable bb57dfe6303 (rps) | this patch(rps) | improvements |
| --------------- | -------- | --------- | -------------- |
| evalsha - hgetall h1 | 60565.68 | 64487.01 |  6.47% |
| evalsha - hgetall h10 | 47023.41 | 50602.17 | 7.61% |
| evalsha - hgetall h25 | 33572.82 | 37345.48 | 11.23% |
| evalsha - hgetall h50 | 24206.63 | 25276.14 | 4.42% |
| evalsha - hgetall h100 | 15068.87 | 15656.8 | 3.90% |
| evalsha - hgetall h300 | 5948.56 | 6094.74 | 2.46% |

Signed-off-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
Co-authored-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
2024-09-30 20:59:22 -07:00
ranshid
c873287d16
avoid double close on replica main channel (#1097)
fixes #1088

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2024-09-30 07:41:05 -07:00
zhenwei pi
a7cbca4066
RDMA: Support .is_local method (#1089)
There is no ethernet style virtual device (like lo 127.0.0.1) for RDMA,
however a connection with the same local address and peer address are
considered as local.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-09-30 11:54:05 +02:00
Shivshankar
a37dee4b3a
Change return value of aeTimeProc callback function to long long. (#1057)
moduleTimerHandler is aeTimeProc handler and event loop gets created
with this. However, found that the function return type is int but
actually returns "long long" value(i.e., next_period). and return value
being assigned to int variable in processTimeEvents(where time events
are processed), this might cause an overflow of the timer values. So
changed the return type of the function to long long. And also updated
other callback function return type to be consistent.

I found this when I was checking functions reported in
https://github.com/valkey-io/valkey/issues/1054 issue stacktrace. (FYI,
this is just to update the return type to be consistent and it will not
the fix for the issue reported)

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-09-27 12:20:47 -07:00
zhenwei pi
983bb5110d
Fix RDMA build dependence (#1074)
RDMA module has dependence on '$(SERVER_NAME)' rather than the old style
'$(REDIS_SERVER_NAME)'.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-09-25 11:30:45 +02:00
Viktor Söderqvist
99865b197c
Fix bug for CLUSTER SLOTS from EVAL over TLS (#1072)
For fake clients like the ones used for Lua and modules, we don't
determine TLS in the right way, causing CLUSTER SLOTS from EVAL over TLS
to fail a debug-assert.

This error was introduced when the caching of CLUSTER SLOTS was
introduced, i.e. in 8.0.0.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-09-25 03:55:53 -04:00
Binbin
80fcbd3fec
Fix module / script call CLUSTER SLOTS / SHARDS fake client check crash (#1063)
The reason is VM_Call will use a fake client without connection,
so we also need to check if c->conn is NULL.

This also affects scripts. If they are called in the script, the
server will crash. Injecting commands into AOF will also cause
startup failure.

Fixes #1054.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-25 14:50:48 +08:00
Binbin
6e0216471d
Trigger the election as soon as possible when doing a forced manual failover (#1067)
In CLUSTER FAILOVER FORCE case, we will set mf_can_start to
1 and wait for a cron to trigger the election. We can also set a
CLUSTER_TODO_HANDLE_MANUALFAILOVER flag so that we
can start the election as soon as possible instead of waiting for
the cron, so that we won't have a 100ms delay (clusterCron).

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-25 12:08:48 +08:00
Mikhail Koviazin
af811748e7
clang-format: set ColumnLimit to 0 and reformat (#1045)
This commit hopefully improves the formatting of the codebase by setting
ColumnLimit to 0 and hence stopping clang-format from trying to put as
much stuff in one line as possible.

This change enabled us to remove most of `clang-format off` directives
and fixed a bunch of lines that looked like this:

```c
#define KEY \
    VALUE /* comment */
```

Additionally, one pair of `clang-format off` / `clang-format on` had
`clang-format off` as the second comment and hence didn't enable the
formatting for the rest of the file. This commit addresses this issue as
well.

Please tell me if anything in the changes seem off. If everything is
fine, I will add this commit to `.git-blame-ignore-revs` later.

---------

Signed-off-by: Mikhail Koviazin <mikhail.koviazin@aiven.io>
2024-09-25 01:22:54 +02:00
Ricardo Dias
c15eee3407
Changes tcmalloc.h header location (#1039)
This commit changes the `tcmalloc.h` header location from the deprecated
location `google/` to `gperftools/`.

**Why we're doing this now?**

The location `google/tcmalloc.h` has been deprecated for more than 10
years in favor of `gperftools/tcmalloc.h`, and the deprecated location
will be removed in the next release of gperftools.

Fixes #1033

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2024-09-23 10:23:48 +02:00
Binbin
56fba564b6
Print an empty primary log when primary lost its last slot (#1064)
The one in CLUSTER SETSLOT help us keep track of state better,
of course it also can make the test case happy.

The one in gossip process fixes a problem that a replica can
print a log saying it is an empty primary.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-09-23 13:14:09 +08:00
Binbin
d07c29791a
Use _Thread_local to solve threads.h build issue (#1053)
Apparently this will fail to compile in some masOS version.
And internet claims _Thread_local is portable.

Fixes #1051.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-22 20:20:55 +08:00
Binbin
7fab15795f
Add log about old primary after myself failover (#1058)
Sometims it is hard to see the old primary during a
multi primaries failover, adding this log can help
use to find the old primary node.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-09-20 14:15:19 +08:00