
In this PR we introduce the main benefit of dual channel replication by continuously steaming the COB (client output buffers) in parallel to the RDB and thus keeping the primary's side COB small AND accelerating the overall sync process. By streaming the replication data to the replica during the full sync, we reduce 1. Memory load from the primary's node. 2. CPU load from the primary's main process. [Latest performance tests](#data) ## Motivation * Reduce primary memory load. We do that by moving the COB tracking to the replica side. This also decrease the chance for COB overruns. Note that primary's input buffer limits at the replica side are less restricted then primary's COB as the replica plays less critical part in the replication group. While increasing the primary’s COB may end up with primary reaching swap and clients suffering, at replica side we’re more at ease with it. Larger COB means better chance to sync successfully. * Reduce primary main process CPU load. By opening a new, dedicated connection for the RDB transfer, child processes can have direct access to the new connection. Due to TLS connection restrictions, this was not possible using one main connection. We eliminate the need for the child process to use the primary's child-proc -> main-proc pipeline, thus freeing up the main process to process clients queries. ## Dual Channel Replication high level interface design - Dual channel replication begins when the replica sends a `REPLCONF CAPA DUALCHANNEL` to the primary during initial handshake. This is used to state that the replica is capable of dual channel sync and that this is the replica's main channel, which is not used for snapshot transfer. - When replica lacks sufficient data for PSYNC, the primary will send `-FULLSYNCNEEDED` response instead of RDB data. As a next step, the replica creates a new connection (rdb-channel) and configures it against the primary with the appropriate capabilities and requirements. The replica then requests a sync using the RDB channel. - Prior to forking, the primary sends the replica the snapshot's end repl-offset, and attaches the replica to the replication backlog to keep repl data until the replica requests psync. The replica uses the main channel to request a PSYNC starting at the snapshot end offset. - The primary main threads sends incremental changes via the main channel, while the bgsave process sends the RDB directly to the replica via the rdb-channel. As for the replica, the incremental changes are stored on a local buffer, while the RDB is loaded into memory. - Once the replica completes loading the rdb, it drops the rdb-connection and streams the accumulated incremental changes into memory. Repl steady state continues normally. ## New replica state machine  ## Data <a name="data"></a>    ## Explanation These graphs demonstrate performance improvements during full sync sessions using rdb-channel + streaming rdb directly from the background process to the replica. First graph- with at most 50 clients and light weight commands, we saw 5%-7.5% improvement in write latency during sync session. Two graphs below- full sync was tested during heavy read commands from the primary (such as sdiff, sunion on large sets). In that case, the child process writes to the replica without sharing CPU with the loaded main process. As a result, this not only improves client response time, but may also shorten sync time by about 50%. The shorter sync time results in less memory being used to store replication diffs (>60% in some of the tested cases). ## Test setup Both primary and replica in the performance tests ran on the same machine. RDB size in all tests is 3.7gb. I generated write load using valkey-benchmark ` ./valkey-benchmark -r 100000 -n 6000000 lpush my_list __rand_int__`. --------- Signed-off-by: naglera <anagler123@gmail.com> Signed-off-by: naglera <58042354+naglera@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Ping Xie <pingxie@outlook.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Valkey Test Suite
Overview
Integration tests are written in Tcl, a high-level, general-purpose, interpreted, dynamic programming language [source].
runtest
is the main entrance point for running integration tests.
For example, to run a single test;
./runtest --single unit/your_test_name
# For additional arguments, you may refer to the `runtest` script itself.
The normal execution mode of the test suite involves starting and manipulating
local valkey-server
instances, inspecting process state, log files, etc.
The test suite also supports execution against an external server, which is
enabled using the --host
and --port
parameters. When executing against an
external server, tests tagged external:skip
are skipped.
There are additional runtime options that can further adjust the test suite to match different external server configurations:
Option | Impact |
---|---|
--singledb |
Only use database 0, don't assume others are supported. |
--ignore-encoding |
Skip all checks for specific encoding. |
--ignore-digest |
Skip key value digest validations. |
--cluster-mode |
Run in strict Valkey Cluster compatibility mode. |
--large-memory |
Enables tests that consume more than 100mb |
Debugging
You can set a breakpoint and invoke a minimal debugger using the bp
function.
... your test code before break-point
bp 1
... your test code after break-point
The bp 1
will give back the tcl interpreter to the developer, and allow you to interactively print local variables (through puts
), run functions and so forth [source].
bp
takes a single argument, which is 1
for the case above, and is used to label a breakpoint with a string.
Labels are printed out when breakpoints are hit, so you can identify which breakpoint was triggered.
Breakpoints can be skipped by setting the global variable ::bp_skip
, and by providing the labels you want to skip.
The minimal debugger comes with the following predefined functions.
- Press
c
to continue past the breakpoint. - Press
i
to print local variables.
Tags
Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.
Tags can be applied in different context levels:
start_server
contexttags
context that bundles several tests together- A single test context.
The following compatibility and capability tags are currently used:
Tag | Indicates |
---|---|
external:skip |
Not compatible with external servers. |
cluster:skip |
Not compatible with --cluster-mode . |
large-memory |
Test that requires more than 100mb |
tls:skip |
Not compatible with --tls . |
needs:repl |
Uses replication and needs to be able to SYNC from server. |
needs:debug |
Uses the DEBUG command or other debugging focused commands (like OBJECT REFCOUNT ). |
needs:pfdebug |
Uses the PFDEBUG command. |
needs:config-maxmemory |
Uses CONFIG SET to manipulate memory limit, eviction policies, etc. |
needs:config-resetstat |
Uses CONFIG RESETSTAT to reset statistics. |
needs:reset |
Uses RESET to reset client connections. |
needs:save |
Uses SAVE or BGSAVE to create an RDB file. |
When using an external server (--host
and --port
), filtering using the
external:skip
tags is done automatically.
When using --cluster-mode
, filtering using the cluster:skip
tag is done
automatically.
When not using --large-memory
, filtering using the largemem:skip
tag is done
automatically.
In addition, it is possible to specify additional configuration. For example, to
run tests on a server that does not permit SYNC
use:
./runtest --host <host> --port <port> --tags -needs:repl