History

In this PR we introduce the main benefit of dual channel replication by
continuously steaming the COB (client output buffers) in parallel to the
RDB and thus keeping the primary's side COB small AND accelerating the
overall sync process. By streaming the replication data to the replica
during the full sync, we reduce
1. Memory load from the primary's node.
2. CPU load from the primary's main process. [Latest performance
tests](#data)

## Motivation
* Reduce primary memory load. We do that by moving the COB tracking to
the replica side. This also decrease the chance for COB overruns. Note
that primary's input buffer limits at the replica side are less
restricted then primary's COB as the replica plays less critical part in
the replication group. While increasing the primary’s COB may end up
with primary reaching swap and clients suffering, at replica side we’re
more at ease with it. Larger COB means better chance to sync
successfully.
* Reduce primary main process CPU load. By opening a new, dedicated
connection for the RDB transfer, child processes can have direct access
to the new connection. Due to TLS connection restrictions, this was not
possible using one main connection. We eliminate the need for the child
process to use the primary's child-proc -> main-proc pipeline, thus
freeing up the main process to process clients queries.


 ## Dual Channel Replication high level interface design
- Dual channel replication begins when the replica sends a `REPLCONF
CAPA DUALCHANNEL` to the primary during initial
handshake. This is used to state that the replica is capable of dual
channel sync and that this is the replica's main channel, which is not
used for snapshot transfer.
- When replica lacks sufficient data for PSYNC, the primary will send
`-FULLSYNCNEEDED` response instead
of RDB data. As a next step, the replica creates a new connection
(rdb-channel) and configures it against
the primary with the appropriate capabilities and requirements. The
replica then requests a sync
     using the RDB channel. 
- Prior to forking, the primary sends the replica the snapshot's end
repl-offset, and attaches the replica
to the replication backlog to keep repl data until the replica requests
psync. The replica uses the main
     channel to request a PSYNC starting at the snapshot end offset. 
- The primary main threads sends incremental changes via the main
channel, while the bgsave process
sends the RDB directly to the replica via the rdb-channel. As for the
replica, the incremental
changes are stored on a local buffer, while the RDB is loaded into
memory.
- Once the replica completes loading the rdb, it drops the
rdb-connection and streams the accumulated incremental
     changes into memory. Repl steady state continues normally.

## New replica state machine


![image](https://github.com/user-attachments/assets/38fbfff0-60b9-4066-8b13-becdb87babc3)





## Data <a name="data"></a>

![image](https://github.com/user-attachments/assets/d73631a7-0a58-4958-a494-a7f4add9108f)


![image](https://github.com/user-attachments/assets/f44936ed-c59a-4223-905d-0fe48a6d31a6)


![image](https://github.com/user-attachments/assets/bd333ee2-3c47-47e5-b244-4ea75f77c836)

## Explanation 
These graphs demonstrate performance improvements during full sync
sessions using rdb-channel + streaming rdb directly from the background
process to the replica.

First graph- with at most 50 clients and light weight commands, we saw
5%-7.5% improvement in write latency during sync session.
Two graphs below- full sync was tested during heavy read commands from
the primary (such as sdiff, sunion on large sets). In that case, the
child process writes to the replica without sharing CPU with the loaded
main process. As a result, this not only improves client response time,
but may also shorten sync time by about 50%. The shorter sync time
results in less memory being used to store replication diffs (>60% in
some of the tested cases).

## Test setup 
Both primary and replica in the performance tests ran on the same
machine. RDB size in all tests is 3.7gb. I generated write load using
valkey-benchmark ` ./valkey-benchmark -r 100000 -n 6000000 lpush my_list
__rand_int__`.

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: naglera <58042354+naglera@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>

2024-07-17 13:59:33 -07:00

assets

Introduce enable-debug-assert to enable/disable debug asserts at runtime (#584 )

2024-05-31 22:50:08 -07:00

cluster

Cache CLUSTER SLOTS response for improving throughput and reduced latency. (#53 )

2024-05-22 14:21:41 -07:00

helpers

Dual channel replication (#60 )

2024-07-17 13:59:33 -07:00

integration

Dual channel replication (#60 )

2024-07-17 13:59:33 -07:00

modules

Remove master and slave from source code (#591 )

2024-06-07 14:21:33 -07:00

rdma

Introduce Valkey Over RDMA transport (experimental) (#477 )

2024-07-15 14:04:22 +02:00

sentinel

Replace master-reboot-down-after-period with primary-reboot-down-after-period in sentinel.conf (#647 )

2024-07-15 13:45:40 -04:00

support

Dual channel replication (#60 )

2024-07-17 13:59:33 -07:00

tmp

minor fixes to the new test suite, html doc updated

2010-05-14 18:48:33 +02:00

unit

Dual channel replication (#60 )

2024-07-17 13:59:33 -07:00

instances.tcl

Enable debug asserts for cluster and sentinel tests (#588 )

2024-06-02 13:15:08 -07:00

README.md

Introduce a minimal debugger for .tcl integration test suite. (#683 )

2024-06-25 10:24:53 -07:00

test_helper.tcl

Minor fix for --loops option in normal testing framework (#781 )

2024-07-13 23:25:51 +08:00

README.md

Valkey Test Suite

Overview

Integration tests are written in Tcl, a high-level, general-purpose, interpreted, dynamic programming language [source]. runtest is the main entrance point for running integration tests. For example, to run a single test;

./runtest --single unit/your_test_name
# For additional arguments, you may refer to the `runtest` script itself.

The normal execution mode of the test suite involves starting and manipulating local valkey-server instances, inspecting process state, log files, etc.

The test suite also supports execution against an external server, which is enabled using the --host and --port parameters. When executing against an external server, tests tagged external:skip are skipped.

There are additional runtime options that can further adjust the test suite to match different external server configurations:

Option	Impact
`--singledb`	Only use database 0, don't assume others are supported.
`--ignore-encoding`	Skip all checks for specific encoding.
`--ignore-digest`	Skip key value digest validations.
`--cluster-mode`	Run in strict Valkey Cluster compatibility mode.
`--large-memory`	Enables tests that consume more than 100mb

Debugging

You can set a breakpoint and invoke a minimal debugger using the bp function.

... your test code before break-point
bp 1
... your test code after break-point

The bp 1 will give back the tcl interpreter to the developer, and allow you to interactively print local variables (through puts), run functions and so forth [source]. bp takes a single argument, which is 1 for the case above, and is used to label a breakpoint with a string. Labels are printed out when breakpoints are hit, so you can identify which breakpoint was triggered. Breakpoints can be skipped by setting the global variable ::bp_skip, and by providing the labels you want to skip.

The minimal debugger comes with the following predefined functions.

Press c to continue past the breakpoint.
Press i to print local variables.

Tag	Indicates
`external:skip`	Not compatible with external servers.
`cluster:skip`	Not compatible with `--cluster-mode`.
`large-memory`	Test that requires more than 100mb
`tls:skip`	Not compatible with `--tls`.
`needs:repl`	Uses replication and needs to be able to `SYNC` from server.
`needs:debug`	Uses the `DEBUG` command or other debugging focused commands (like `OBJECT REFCOUNT`).
`needs:pfdebug`	Uses the `PFDEBUG` command.
`needs:config-maxmemory`	Uses `CONFIG SET` to manipulate memory limit, eviction policies, etc.
`needs:config-resetstat`	Uses `CONFIG RESETSTAT` to reset statistics.
`needs:reset`	Uses `RESET` to reset client connections.
`needs:save`	Uses `SAVE` or `BGSAVE` to create an RDB file.

README.md

Valkey Test Suite

Overview

Debugging

Tags