When the cluster changes, we need to persist the cluster configuration,
and these file IO operations may cause latency.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Imagine we have a cluster, for example a three-shard cluster,
if shard 1 doing a CLUSTER RESET HARD, it will change the node
name, and then other nodes will mark it as NOADR since the node
name received by PONG has changed.
In the eyes of other nodes, there is one working primary node
left but with no address, and in this case, the address report
in MOVED will be invalid and will confuse the clients. And in
the same time, the replica will not failover since its primary
is not in the FAIL state. And the cluster looks OK to everyone.
This leaves a cluster that appears OK, but with no coverage for
shard 1, obviously we should do something like CLUSTER FORGET
to remove the node and fix the cluster before using it.
But the point in here, we can mark the NOADDR node as FAIL to
advance the cluster state. If a node is NOADDR means it does
not have a valid address, so we won't reconnect it, we won't
send PING, we won't gossip it, it seems reasonable to mark it
as FAIL.
Signed-off-by: Binbin <binloveplay1314@qq.com>
When multiple primary nodes fail simultaneously, the cluster can not recover
within the default effective time (data_age limit). The main reason is that
the vote is without ranking among multiple replica nodes, which case too many
epoch conflicts.
Therefore, we introduced into ranking based on the failed primary shard-id.
Introduced a new failed_primary_rank var, this var means the rank of this
myself instance in the context of all failed primary list. This var will be
used in failover and we will do the failover election packets in order based
on the rank, this can effectively avoid the voting conflicts.
If a single primary is down, the behavior is the same as before. If multiple
primaries are down, their replica election initiation time will be delayed
by 500ms according to the ranking.
Signed-off-by: Binbin <binloveplay1314@qq.com>
When latency-monitor-threshold is set to 0, it means the latency monitor
is disabled, and in VM_LatencyAddSample, we wrote the condition
incorrectly, causing us to record latency when latency was turned off.
This bug was introduced in the very first day, see e3b1d6d, it was merged
in 2019.
Signed-off-by: Binbin <binloveplay1314@qq.com>
In #1441, we found a assert, and decided remove this assert and instead
just free the newly created node and close the link, since if we cannot
get the IP from the link it probably means the connection was closed.
```
=== VALKEY BUG REPORT START: Cut & paste starting from here ===
17847:M 19 Dec 2024 00:15:58.021 # === ASSERTION FAILED ===
17847:M 19 Dec 2024 00:15:58.021 # ==> cluster_legacy.c:3252 'nodeIp2String(node->ip, link, hdr->myip) == C_OK' is not true
------ STACK TRACE ------
17847 valkey-server *
src/valkey-server 127.0.0.1:27131 [cluster](clusterProcessPacket+0x1304) [0x4e5634]
src/valkey-server 127.0.0.1:27131 [cluster](clusterReadHandler+0x11e) [0x4e59de]
/__w/valkey/valkey/src/valkey-tls.so(+0x2f1e) [0x7f083983ff1e]
src/valkey-server 127.0.0.1:27131 [cluster](aeMain+0x8a) [0x41afea]
src/valkey-server 127.0.0.1:27131 [cluster](main+0x4d7) [0x40f547]
/lib64/libc.so.6(+0x40c8) [0x7f083985a0c8]
/lib64/libc.so.6(__libc_start_main+0x8b) [0x7f083985a18b]
src/valkey-server 127.0.0.1:27131 [cluster](_start+0x25) [0x410ef5]
```
But it also introduces another assert. The reason is that this new node
is not added to the cluster nodes dict.
```
17128:M 08 Jan 2025 10:51:44.061 # === ASSERTION FAILED ===
17128:M 08 Jan 2025 10:51:44.061 # ==> cluster_legacy.c:1693 'dictDelete(server.cluster->nodes, nodename) == DICT_OK' is not true
------ STACK TRACE ------
17128 valkey-server *
src/valkey-server 127.0.0.1:28627 [cluster][0x4ebdc4]
src/valkey-server 127.0.0.1:28627 [cluster][0x4e81d2]
src/valkey-server 127.0.0.1:28627 [cluster](clusterReadHandler+0x268)[0x4e8618]
/__w/valkey/valkey/src/valkey-tls.so(+0xb278)[0x7f109480b278]
src/valkey-server 127.0.0.1:28627 [cluster](aeMain+0x89)[0x592b09]
src/valkey-server 127.0.0.1:28627 [cluster](main+0x4b3)[0x453e23]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x7f10958bf7e5]
src/valkey-server 127.0.0.1:28627 [cluster](_start+0x2e)[0x454a5e]
```
This closes#1527.
Signed-off-by: Binbin <binloveplay1314@qq.com>
The fix that Redis gave us for the CVE-2024-46981 was freeing lctx.lua,
and I didn't merge it correctly. We made some changes so that we are
able to async free the lua context, so we need to free the passed in
context. This was applied correctly on the two released versions (8.0
and 7.2) just not on unstable.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
This PR is to cleanup the `SERVER_TEST` compiler flag from cmake compile
definitions, as it is no longer required in the new unit test framework, see #428.
Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
This PR introduces improvements to the hashtable iterator, implementing
prefetching technique described in the blog post [Unlock One Million RPS
- Part 2](https://valkey.io/blog/unlock-one-million-rps-part2/) . The
changes lay the groundwork for further enhancements in use cases
involving iterators. Future PRs will build upon this foundation to
improve performance and functionality in various iterator-dependent
operations.
In the pursuit of maximizing iterator performance, I conducted a
comprehensive series of experiments. My tests encompassed a wide range
of approaches, including processing multiple bucket indices in parallel,
prefetching the next bucket upon completion of the current one, and
several other timing and quantity variations. Surprisingly, after
rigorous testing and performance analysis, the simplest implementation
presented in this PR consistently outperformed all other more complex
strategies.
## Implementation
Each time we start iterating over a bucket, we prefetch data for future
iterations:
- We prefetch the entries of the next bucket (if it exists).
- We prefetch the structure (but not the entries) of the bucket after
the next.
This prefetching is done when we pick up a new bucket, increasing the
chance that the data will be in cache by the time we need it.
## Performance
The data below was taken by conducting keys command on 64cores Graviton
3 Amazon EC2 instance with 50 mil keys in size of 100 bytes each. The
results regarding the duration of “keys *” command was taken from “info
all” command.
```
+--------------------+------------------+-----------------------------+
| prefetching | Time (seconds) | Keys Processed per Second |
+--------------------+------------------+-----------------------------+
| No | 11.112279 | 4,499,529 |
| Yes | 3.141916 | 15,913,862 |
+--------------------+------------------+-----------------------------+
Improvement:
Comparing the iterator without prefetching to the one with prefetching,
we can see a speed improvement of 11.112279 / 3.141916 ≈ 3.54 times faster.
```
### Save command improvment
#### Setup:
- 64cores Graviton 3 Amazon EC2 instance.
- 50 mil keys in size of 100 bytes each.
- Running valkey server over RAM file system.
- crc checksum and comperssion off.
#### Results
```
+--------------------+------------------+-----------------------------+
| prefetching | Time (seconds) | Keys Processed per Second |
+--------------------+------------------+-----------------------------+
| No | 28 | 1,785,700 |
| Yes | 19.6 | 2,550,000 |
+--------------------+------------------+-----------------------------+
Improvement:
- Reduced SAVE time by 30% (8.4 seconds faster)
- Increased key processing rate by 42.8% (764,300 more keys/second)
```
Signed-off-by: NadavGigi <nadavgigi102@gmail.com>
Resolves issue with valkey-cli not auto exiting from subscribed mode on
reaching zero pub/sub subscription (previously filed on Redis)
https://github.com/redis/redis/issues/12592
---------
Signed-off-by: Nikhil Manglore <nmanglor@amazon.com>
This PR replaces dict with hashtable in the ZSET datatype. Instead of
mapping key to score as dict did, the hashtable maps key to a node in
the skiplist, which contains the score. This takes advantage of
hashtable performance improvements and saves 15 bytes per set item - 24
bytes overhead before, 9 bytes after.
Closes#1096
---------
Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
# Refactor client structure to use modular data components
## Current State
The client structure allocates memory for replication / pubsub /
multi-keys / module / blocked data for every client, despite these
features being used by only a small subset of clients. In addition the
current field layout in the client struct is suboptimal, with poor
alignment and unnecessary padding between fields, leading to a larger
than necessary memory footprint of 896 bytes per client. Furthermore,
fields that are frequently accessed together during operations are
scattered throughout the struct, resulting in poor cache locality.
## This PR's Change
1. Lazy Initialization
- **Components are only allocated when first used:**
- PubSubData: Created on first SUBSCRIBE/PUBLISH operation
- ReplicationData: Initialized only for replica connections
- ModuleData: Allocated when module interaction begins
- BlockingState: Created when first blocking command is issued
- MultiState: Initialized on MULTI command
2. Memory Layout Optimization:
- Grouped related fields for better locality
- Moved rarely accessed fields (e.g., client->name) to struct end
- Optimized field alignment to eliminate padding
3. Additional changes:
- Moved watched_keys to be static allocated in the `mstate` struct
- Relocated replication init logic to replication.c
### Key Benefits
- **Efficient Memory Usage:**
- 45% smaller base client structure - Basic clients now use 528 bytes
(down from 896).
- Better memory locality for related operations
- Performance improvement in high throughput scenarios. No performance
regressions in other cases.
### Performance Impact
Tested with 650 clients and 512 bytes values.
#### Single Thread Performance
| Operation | Dataset | New (ops/sec) | Old (ops/sec) | Change % |
|------------|---------|---------------|---------------|-----------|
| SET | 1 key | 261,799 | 258,261 | +1.37% |
| SET | 3M keys | 209,134 | ~209,000 | ~0% |
| GET | 1 key | 281,564 | 277,965 | +1.29% |
| GET | 3M keys | 231,158 | 228,410 | +1.20% |
#### 8 IO Threads Performance
| Operation | Dataset | New (ops/sec) | Old (ops/sec) | Change % |
|------------|---------|---------------|---------------|-----------|
| SET | 1 key | 1,331,578 | 1,331,626 | -0.00% |
| SET | 3M keys | 1,254,441 | 1,152,645 | +8.83% |
| GET | 1 key | 1,293,149 | 1,289,503 | +0.28% |
| GET | 3M keys | 1,152,898 | 1,101,791 | +4.64% |
#### Pipeline Performance (3M keys)
| Operation | Pipeline Size | New (ops/sec) | Old (ops/sec) | Change % |
|-----------|--------------|---------------|---------------|-----------|
| SET | 10 | 548,964 | 538,498 | +1.94% |
| SET | 20 | 606,148 | 594,872 | +1.89% |
| SET | 30 | 631,122 | 616,606 | +2.35% |
| GET | 10 | 628,482 | 624,166 | +0.69% |
| GET | 20 | 687,371 | 681,659 | +0.84% |
| GET | 30 | 725,855 | 721,102 | +0.66% |
### Observations:
1. Single-threaded operations show consistent improvements (1-1.4%)
2. Multi-threaded performance shows significant gains for large
datasets:
- SET with 3M keys: +8.83% improvement
- GET with 3M keys: +4.64% improvement
3. Pipeline operations show consistent improvements:
- SET operations: +1.89% to +2.35%
- GET operations: +0.66% to +0.84%
4. No performance regressions observed in any test scenario
Related issue:https://github.com/valkey-io/valkey/issues/761
---------
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
This commit, https://github.com/valkey-io/valkey/pull/1504, moved the
wrong worker to ubuntu 22. We wanted to move codecov and not coverity.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
It's inconvenient for client implementations to extract the
`availability_zone` information from the `INFO` response. The `INFO`
response contains a lot of information that a client implementation
typically doesn't need.
This PR adds the availability zone to the `HELLO` response. Clients
usually already use the `HELLO` command for protocol negotiation and
also get the server `version` and `role` from its response. To keep the
`HELLO` response small, the field is only added if availability zone is
configured.
---------
Signed-off-by: Rueian <rueiancsie@gmail.com>
Reset GC state before closing the lua VM to prevent user data to be
wrongly freed while still might be used on destructor callbacks.
Created and publish by Redis in their OSS branch.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>
The explanation on the original commit was wrong. Key based access must
have a `~` in order to correctly configure whey key prefixes to apply
the selector to. If this is missing, a server assert will be triggered
later.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>
This may speed up the transition to the fail state a bit.
Previously we would only check when we received a pfail/fail
report from others in gossip. If myself is the last vote,
we can directly switch to fail in here without waiting for
the next gossip packet.
Signed-off-by: Binbin <binloveplay1314@qq.com>
When building with `CMake` (especially the targets `valkey-cli`,
`valkey-server` and `valkey-benchmark`) it is possible to have a
successful build while having warnings.
This PR fixes this - which is aligned with how the `Makefile` is working
today:
- Enable `-Wall` + `-Werror` for valkey targets
- Fixed warning in valkey-cli:jsonStringOutput method
Signed-off-by: Eran Ifrah <eifrah@amazon.com>
The issues in #1453 seem to
have only shown up since we moved to ubuntu 24, as part of the rolling
`ubunut-latest` migration from 22->24.
Closes#1453.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
When looking up a key in no-touch mode, `LOOKUP_NOTOUCH` is set to avoid
updating the last access time in `lookupKey`. An exception must be made
for the `TOUCH` command which must always update the key.
When called from a script, `server.executing_client` will point to the
`TOUCH` command, while `server.current_client` will point to e.g. an
`EVAL` command. So, we must use the former to find out the currently
executing command if defined.
This fix addresses the issue where TOUCH wasn't updating key access
times when called from scripts like EVAL.
Fixes#1498
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
In this commit we move all structures and functions declarations related
to Valkey modules from `server.h` to the recently added `module.h` file.
This re-organization makes it easier for new contributors to find the
valkey modules related code, as well as reducing the compilation times
when changes are made to the modules code.
---------
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
### Fix race with pending writes in replica state transition
#### The Problem
In #60 (Dual channel replication) a new `connWrite` call was added
before the `waitForClientIO` check. This created a race condition where
the main thread may attempt to write to a client that could have pending
writes in IO threads.
#### The Fix
Moved the `waitForClientIO()` call earlier in `syncCommand`, before any
`connWrite` call. This ensures all pending IO operations are completed
before attempting to write to the client.
---------
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Currently, in case a blocked command is unblocked externally (eg. due to
the relevant slot being migrated or the CLIENT UNBLOCK command was
issued, the command statistics will always update the failed_calls error
statistic. This leads to missalignment with
90b9f08e5d
as well as some inconsistencies. For example when a key is migrated
during cluster slot migration, clients blocked on XREADGROUP will be
unblocked and update the rejected_calls stat, while clients blocked on
BLPOP will get unblocked updating the failed_calls stat.
In this PR we add explicit indication in updateStatsOnUnblock thet
indicates if the command was rejected or failed.
---------
Signed-off-by: ranshid <ranshid@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Add `meet_sent` field in `clusterNode` indicating the last time we sent
a MEET packet. Use this field to only (re-)send a MEET packet once every
handshake timeout period when detecting a node without an inbound link.
When receiving multiple MEET packets on the same link while the node is
in handshake state, instead of dropping the packet, we now simply
prevent the creation of a new node. This way we still process the MEET
packet's gossip and reply with a PONG as any other packets.
Improve some logging messages to include `human_nodename`. Add
`nodeExceedsHandshakeTimeout()` function.
This is a follow-up to this previous PR:
https://github.com/valkey-io/valkey/pull/1307
And a partial fix to the crash described in:
https://github.com/valkey-io/valkey/pull/1436
---------
Signed-off-by: Pierre Turin <pieturin@amazon.com>
In the `arguments` section, the `arguments` key is only used for
arguments of type `block` or `oneof`.
Consequently, the `arguments` given for `IFEQ` are ignored by the
server. However, they lead to strange results when rendering the
command's page for the web documentation.
Fix this by removing `arguments` for `IFEQ`.
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
This PR fixes the missing stat update for `total_net_repl_output_bytes`
that was removed during the refactoring in PR #758. The metric was not
being updated when writing to replica connections.
Changes:
- Restored the stat update in postWriteToClient for replica connections
- Added integration test to verify the metric is properly updated
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
This PR introduces a new mechanism for temporarily changing the
server's loading_rio context during RDB loading operations. The new
`RDB_SCOPED_LOADING_RIO` macro allows for a scoped change of the
`server.loading_rio` value, ensuring that it's automatically restored
to its original value when the scope ends.
Introduces a dedicated flag to `rio` to signal immediate abort,
preventing
potential use-after-free scenarios during replication disconnection in
dual-channel load. This ensures proper termination of
`rdbLoadRioWithLoadingCtx`
when replication is cancelled due to connection loss on main connection.
Fixes https://github.com/valkey-io/valkey/issues/1152
---------
Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Amit Nagler <58042354+naglera@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
- By not waiting `repl-diskless-sync-delay` when we don't have to, we
can reduce ~30% of dual channel tests execution time.
- This commit also drops one test which is not required for regular sync
(`Sync should continue if not all slaves dropped`).
- Skip dual channel test with master diskless disabled because it will
initiate the same synchronization process as the non-dual channel test,
making it redundant.
Before:
```
Execution time of different units:
171 seconds - integration/dual-channel-replication
305 seconds - integration/replication-psync
\o/ All tests passed without errors!
```
After:
```
Execution time of different units:
120 seconds - integration/dual-channel-replication
236 seconds - integration/replication-psync
\o/ All tests passed without errors!
```
Discused on https://github.com/valkey-io/valkey/pull/1173
---------
Signed-off-by: naglera <anagler123@gmail.com>
As part of #1463, I made a small refactor between the PR and the daily
test I submitted to try to improve readability by adding a function to
abstract the extraction of the message types. However, that change
apparently caused GCC to throw another warning, so reverting the
abstraction on just one line.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
This PR extends the module API to support the addition of different
scripting engines to execute user defined functions.
The scripting engine can be implemented as a Valkey module, and can be
dynamically loaded with the `loadmodule` config directive, or with the
`MODULE LOAD` command.
This PR also adds an example of a dummy scripting engine module, to show
how to use the new module API. The dummy module is implemented in
`tests/modules/helloscripting.c`.
The current module API support, only allows to load scripting engines to
run functions using `FCALL` command.
The additions to the module API are the following:
```c
/* This struct represents a scripting engine function that results from the
* compilation of a script by the engine implementation. */
struct ValkeyModuleScriptingEngineCompiledFunction
typedef ValkeyModuleScriptingEngineCompiledFunction **(*ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc)(
ValkeyModuleScriptingEngineCtx *engine_ctx,
const char *code,
size_t timeout,
size_t *out_num_compiled_functions,
char **err);
typedef void (*ValkeyModuleScriptingEngineCallFunctionFunc)(
ValkeyModuleCtx *module_ctx,
ValkeyModuleScriptingEngineCtx *engine_ctx,
ValkeyModuleScriptingEngineFunctionCtx *func_ctx,
void *compiled_function,
ValkeyModuleString **keys,
size_t nkeys,
ValkeyModuleString **args,
size_t nargs);
typedef size_t (*ValkeyModuleScriptingEngineGetUsedMemoryFunc)(
ValkeyModuleScriptingEngineCtx *engine_ctx);
typedef size_t (*ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc)(
void *compiled_function);
typedef size_t (*ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc)(
ValkeyModuleScriptingEngineCtx *engine_ctx);
typedef void (*ValkeyModuleScriptingEngineFreeFunctionFunc)(
ValkeyModuleScriptingEngineCtx *engine_ctx,
void *compiled_function);
/* This struct stores the callback functions implemented by the scripting
* engine to provide the functionality for the `FUNCTION *` commands. */
typedef struct ValkeyModuleScriptingEngineMethodsV1 {
uint64_t version; /* Version of this structure for ABI compat. */
/* Library create function callback. When a new script is loaded, this
* callback will be called with the script code, and returns a list of
* ValkeyModuleScriptingEngineCompiledFunc objects. */
ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc create_functions_library;
/* The callback function called when `FCALL` command is called on a function
* registered in this engine. */
ValkeyModuleScriptingEngineCallFunctionFunc call_function;
/* Function callback to get current used memory by the engine. */
ValkeyModuleScriptingEngineGetUsedMemoryFunc get_used_memory;
/* Function callback to return memory overhead for a given function. */
ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc get_function_memory_overhead;
/* Function callback to return memory overhead of the engine. */
ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc get_engine_memory_overhead;
/* Function callback to free the memory of a registered engine function. */
ValkeyModuleScriptingEngineFreeFunctionFunc free_function;
} ValkeyModuleScriptingEngineMethodsV1;
/* Registers a new scripting engine in the server.
*
* - `engine_name`: the name of the scripting engine. This name will match
* against the engine name specified in the script header using a shebang.
*
* - `engine_ctx`: engine specific context pointer.
*
* - `engine_methods`: the struct with the scripting engine callback functions
* pointers.
*/
int ValkeyModule_RegisterScriptingEngine(ValkeyModuleCtx *ctx,
const char *engine_name,
void *engine_ctx,
ValkeyModuleScriptingEngineMethods engine_methods);
/* Removes the scripting engine from the server.
*
* `engine_name` is the name of the scripting engine.
*
*/
int ValkeyModule_UnregisterScriptingEngine(ValkeyModuleCtx *ctx, const char *engine_name);
```
---------
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
We are getting a number of errors like:
```
array subscript ‘clusterMsg[0]’ is partly outside array bounds of ‘unsigned char[2272]’
```
Which is basically GCC telling us that we have an object which is longer
than the underlying storage of the allocation. We actually do this a
lot, but GCC is generally not aware of how big the underlying allocation
is, so it doesn't throw this error. We are specifically getting this
error because the msgBlock can be of variable length depending on the
type of message, but GCC assumes it's the longest one possible. The
solution I went with here was make the message type optional, so that it
wasn't included in the size. I think this also makes some sense, since
it's really just a helper for us to easily cast the object around.
I considered disabling this error, but it is generally pretty useful
since it can catch real issues. Another solution would be to
over-allocate to the largest possible object, which could hurt
performance as we initialize it to zero.
Results:
https://github.com/madolson/valkey/actions/runs/12423414811/job/34686899884
This is a slightly cleaned up version of
https://github.com/valkey-io/valkey/pull/1439. I thought I had another
strategy but alas, it didn't work out.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
We set the client output buffer limits to 10 bytes, and then execute
`info stats` which produces more than 10 bytes of output, which can
cause that command to throw an error.
I'm not sure why it wasn't consistently erroring before, might have been
some change related to the ubuntu upgrade though.
Issues related to ubuntu-tls are hopefully resolved now.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
In #1459, I missed that the data was also used to keep track of the PID
files so if the testing framework crashed it would no longer be able to
cleanup the extra servers. So now we properly extract the PID and store
it so we can clean up PIDs.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
There are two changes here:
1. The one in clusterNodeCleanupFailureReports, only primary with slots can
report the failure report, if the primary became a replica its failure report
should be cleared. This may lead to inaccurate node fail judgment in some network
partition cases i guess, it will also affect the CLUSTER COUNT-FAILURE-REPORTS
command.
2. The one in clusterProcessGossipSection, it is not that important, but it can
print a "node is back online" log helps us troubleshoot the problem, although
it may conflict with 1 at some points.
Signed-off-by: Binbin <binloveplay1314@qq.com>
- Moves `build-config.json` to workflow dir to build old versions with
new configs.
- Enables contributors to test release Wf on private repo by adding
`github.event_name == 'workflow_dispatch' ||`
---------
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>