Might close https://github.com/valkey-io/valkey/issues/1484.
I noticed that we don't disable pause after fork on the last test that
was getting executed, so it might getting stuck in pause loops after the
test ends if it tries another psync for any reason.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
change the format of the dual channel replication logs so that it will
not
conflict with existing log formats like modules.
Fixes: https://github.com/valkey-io/valkey/issues/1509
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
This PR introduces a new mechanism for temporarily changing the
server's loading_rio context during RDB loading operations. The new
`RDB_SCOPED_LOADING_RIO` macro allows for a scoped change of the
`server.loading_rio` value, ensuring that it's automatically restored
to its original value when the scope ends.
Introduces a dedicated flag to `rio` to signal immediate abort,
preventing
potential use-after-free scenarios during replication disconnection in
dual-channel load. This ensures proper termination of
`rdbLoadRioWithLoadingCtx`
when replication is cancelled due to connection loss on main connection.
Fixes https://github.com/valkey-io/valkey/issues/1152
---------
Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Amit Nagler <58042354+naglera@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
- By not waiting `repl-diskless-sync-delay` when we don't have to, we
can reduce ~30% of dual channel tests execution time.
- This commit also drops one test which is not required for regular sync
(`Sync should continue if not all slaves dropped`).
- Skip dual channel test with master diskless disabled because it will
initiate the same synchronization process as the non-dual channel test,
making it redundant.
Before:
```
Execution time of different units:
171 seconds - integration/dual-channel-replication
305 seconds - integration/replication-psync
\o/ All tests passed without errors!
```
After:
```
Execution time of different units:
120 seconds - integration/dual-channel-replication
236 seconds - integration/replication-psync
\o/ All tests passed without errors!
```
Discused on https://github.com/valkey-io/valkey/pull/1173
---------
Signed-off-by: naglera <anagler123@gmail.com>
1. The test isn't waiting long enough for the output buffer to overrun.
This problem is happening because an error from the previous test is
bleeding into the current test's logs. The simplest fix would be to
split these tests.
2. Increased replication timeout to ensure sync fails due to output
buffer overrun before a timeout occurs.
Fixes#1367
Signed-off-by: naglera <anagler123@gmail.com>
This PR introduces a consistent tagging system for dual-channel logs.
The goal is to improve log readability and filterability, making it
easier for operators to manage and analyze log entries.
Resolves https://github.com/valkey-io/valkey/issues/986
---------
Signed-off-by: naglera <anagler123@gmail.com>
When doing `$replica replicaof no one`, we may get a LOADING
error, this is because during the test execution, the replica
may reconnect very quickly, and the full sync is initiated,
and the replica has entered the LOADING state.
In this commit, we make sure the primary is pasued after the
fork, so the replica won't enter the LOADING state, and with
this fix, this test seems more natural and predictable.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Sometimes when dual-channel is turned off the tested replica might
disconnect on COB overrun. disable the replica COB limit in order to
prevent such cases.
Fixes: #1153
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
in case of valgrind run, the replica might get disconnected from the
primary due to repl-timeout reached. Fix is to configure larger timeout
in case of valgrind test.
**Partially** fixes: #1152
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Currently, the `dual-channel-replication` feature flag is immutable if
`enable-protected-configs` is enabled, which is the default behavior.
This PR proposes to make the `dual-channel-replication` flag mutable,
allowing it to be changed dynamically without restarting the cluster.
**Motivation:**
The ability to change the `dual-channel-replication` flag dynamically is
essential for testing and validating the feature on real clusters
running in production environments. By making the flag mutable, we can
enable or disable the feature without disrupting the cluster's
operations, facilitating easier testing and experimentation.
Additionally, this change would provide more flexibility for users to
enable or disable the feature based on their specific requirements or
operational needs without requiring a cluster restart.
---------
Signed-off-by: naglera <anagler123@gmail.com>
There is a test that assumes that the backlog will get overrun, but
because of the recent changes to the default it no longer fails. It
seems like it is a bit flakey now though, so resetting the value in the
test back to 1mb. (This relates to the CoB of 1100k. So it should
consistently work with a 1mb limit).
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Test dual-channel-replication primary gets cob overrun during replica
rdb load` fails during the Valgrind run. This is due to the load
handlers disconnecting before the tests complete, resulting in a low
primary COB. Increasing the handlers' timeout should resolve this issue.
Failure:
https://github.com/valkey-io/valkey/actions/runs/10361286333/job/28681321393
Server logs reveals that the load handler clients were disconnected
before the test started
Also the two previus test took about 20 seconds which is the handler
timeout.
---------
Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
- Fix TLS bug where connection were shutdown by primary's main process
while the child process was still writing- causing main process to be
blocked.
- TLS connection fix -file descriptors are set to blocking mode in the
main thread, followed by a blocking write. This sets the file
descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`)
(@xbasel).
- Improve the reliability of dual-channel tests. Modify the pause
mechanism to verify process status directly, rather than relying on log.
- Ensure that `server.repl_offset` and `server.replid` are updated
correctly when dual channel synchronization completes successfully.
Thist led to failures in replication tests that validate replication IDs
or compare replication offsets.
---------
Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: naglera <58042354+naglera@users.noreply.github.com>
Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: xbasel <103044017+xbasel@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Currently lastbgsave_status is used in bgsave or disk-replication,
and the target is the disk. In #60, we update it when transfer error,
i think it is mainly used in tests, so we can use log to replace it.
It changes lastbgsave_status to err in this case, but it is strange
that it does not set ok or err in the above if and the following else.
Also noted this will affect stop-writes-on-bgsave-error.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Update the dual channel-replication tests to wait for the pause to begin
before attempting to unpause.
---------
Signed-off-by: naglera <anagler123@gmail.com>
Introduce several improvements to improve the stability of dual-channel
replication and fix compatibility issues.
1. Make dual-channel-replication tests more reliable: use pause instead
of forced sleep.
2. Fix race conditions when freeing RDB client.
3. Check if sync was stopped during local buffer streaming.
4. Fix $ENDOFFSET reply format to work on 32-bit machines too.
---------
Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
In this PR we introduce the main benefit of dual channel replication by
continuously steaming the COB (client output buffers) in parallel to the
RDB and thus keeping the primary's side COB small AND accelerating the
overall sync process. By streaming the replication data to the replica
during the full sync, we reduce
1. Memory load from the primary's node.
2. CPU load from the primary's main process. [Latest performance
tests](#data)
## Motivation
* Reduce primary memory load. We do that by moving the COB tracking to
the replica side. This also decrease the chance for COB overruns. Note
that primary's input buffer limits at the replica side are less
restricted then primary's COB as the replica plays less critical part in
the replication group. While increasing the primary’s COB may end up
with primary reaching swap and clients suffering, at replica side we’re
more at ease with it. Larger COB means better chance to sync
successfully.
* Reduce primary main process CPU load. By opening a new, dedicated
connection for the RDB transfer, child processes can have direct access
to the new connection. Due to TLS connection restrictions, this was not
possible using one main connection. We eliminate the need for the child
process to use the primary's child-proc -> main-proc pipeline, thus
freeing up the main process to process clients queries.
## Dual Channel Replication high level interface design
- Dual channel replication begins when the replica sends a `REPLCONF
CAPA DUALCHANNEL` to the primary during initial
handshake. This is used to state that the replica is capable of dual
channel sync and that this is the replica's main channel, which is not
used for snapshot transfer.
- When replica lacks sufficient data for PSYNC, the primary will send
`-FULLSYNCNEEDED` response instead
of RDB data. As a next step, the replica creates a new connection
(rdb-channel) and configures it against
the primary with the appropriate capabilities and requirements. The
replica then requests a sync
using the RDB channel.
- Prior to forking, the primary sends the replica the snapshot's end
repl-offset, and attaches the replica
to the replication backlog to keep repl data until the replica requests
psync. The replica uses the main
channel to request a PSYNC starting at the snapshot end offset.
- The primary main threads sends incremental changes via the main
channel, while the bgsave process
sends the RDB directly to the replica via the rdb-channel. As for the
replica, the incremental
changes are stored on a local buffer, while the RDB is loaded into
memory.
- Once the replica completes loading the rdb, it drops the
rdb-connection and streams the accumulated incremental
changes into memory. Repl steady state continues normally.
## New replica state machine

## Data <a name="data"></a>



## Explanation
These graphs demonstrate performance improvements during full sync
sessions using rdb-channel + streaming rdb directly from the background
process to the replica.
First graph- with at most 50 clients and light weight commands, we saw
5%-7.5% improvement in write latency during sync session.
Two graphs below- full sync was tested during heavy read commands from
the primary (such as sdiff, sunion on large sets). In that case, the
child process writes to the replica without sharing CPU with the loaded
main process. As a result, this not only improves client response time,
but may also shorten sync time by about 50%. The shorter sync time
results in less memory being used to store replication diffs (>60% in
some of the tested cases).
## Test setup
Both primary and replica in the performance tests ran on the same
machine. RDB size in all tests is 3.7gb. I generated write load using
valkey-benchmark ` ./valkey-benchmark -r 100000 -n 6000000 lpush my_list
__rand_int__`.
---------
Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: naglera <58042354+naglera@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>