futriix/tests/unit/auth.tcl
naglera ff6b780fe6
Dual channel replication (#60)
In this PR we introduce the main benefit of dual channel replication by
continuously steaming the COB (client output buffers) in parallel to the
RDB and thus keeping the primary's side COB small AND accelerating the
overall sync process. By streaming the replication data to the replica
during the full sync, we reduce
1. Memory load from the primary's node.
2. CPU load from the primary's main process. [Latest performance
tests](#data)

## Motivation
* Reduce primary memory load. We do that by moving the COB tracking to
the replica side. This also decrease the chance for COB overruns. Note
that primary's input buffer limits at the replica side are less
restricted then primary's COB as the replica plays less critical part in
the replication group. While increasing the primary’s COB may end up
with primary reaching swap and clients suffering, at replica side we’re
more at ease with it. Larger COB means better chance to sync
successfully.
* Reduce primary main process CPU load. By opening a new, dedicated
connection for the RDB transfer, child processes can have direct access
to the new connection. Due to TLS connection restrictions, this was not
possible using one main connection. We eliminate the need for the child
process to use the primary's child-proc -> main-proc pipeline, thus
freeing up the main process to process clients queries.


 ## Dual Channel Replication high level interface design
- Dual channel replication begins when the replica sends a `REPLCONF
CAPA DUALCHANNEL` to the primary during initial
handshake. This is used to state that the replica is capable of dual
channel sync and that this is the replica's main channel, which is not
used for snapshot transfer.
- When replica lacks sufficient data for PSYNC, the primary will send
`-FULLSYNCNEEDED` response instead
of RDB data. As a next step, the replica creates a new connection
(rdb-channel) and configures it against
the primary with the appropriate capabilities and requirements. The
replica then requests a sync
     using the RDB channel. 
- Prior to forking, the primary sends the replica the snapshot's end
repl-offset, and attaches the replica
to the replication backlog to keep repl data until the replica requests
psync. The replica uses the main
     channel to request a PSYNC starting at the snapshot end offset. 
- The primary main threads sends incremental changes via the main
channel, while the bgsave process
sends the RDB directly to the replica via the rdb-channel. As for the
replica, the incremental
changes are stored on a local buffer, while the RDB is loaded into
memory.
- Once the replica completes loading the rdb, it drops the
rdb-connection and streams the accumulated incremental
     changes into memory. Repl steady state continues normally.

## New replica state machine


![image](https://github.com/user-attachments/assets/38fbfff0-60b9-4066-8b13-becdb87babc3)





## Data <a name="data"></a>

![image](https://github.com/user-attachments/assets/d73631a7-0a58-4958-a494-a7f4add9108f)


![image](https://github.com/user-attachments/assets/f44936ed-c59a-4223-905d-0fe48a6d31a6)


![image](https://github.com/user-attachments/assets/bd333ee2-3c47-47e5-b244-4ea75f77c836)

## Explanation 
These graphs demonstrate performance improvements during full sync
sessions using rdb-channel + streaming rdb directly from the background
process to the replica.

First graph- with at most 50 clients and light weight commands, we saw
5%-7.5% improvement in write latency during sync session.
Two graphs below- full sync was tested during heavy read commands from
the primary (such as sdiff, sunion on large sets). In that case, the
child process writes to the replica without sharing CPU with the loaded
main process. As a result, this not only improves client response time,
but may also shorten sync time by about 50%. The shorter sync time
results in less memory being used to store replication diffs (>60% in
some of the tested cases).

## Test setup 
Both primary and replica in the performance tests ran on the same
machine. RDB size in all tests is 3.7gb. I generated write load using
valkey-benchmark ` ./valkey-benchmark -r 100000 -n 6000000 lpush my_list
__rand_int__`.

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: naglera <58042354+naglera@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-17 13:59:33 -07:00

93 lines
3.0 KiB
Tcl

start_server {tags {"auth external:skip"}} {
test {AUTH fails if there is no password configured server side} {
catch {r auth foo} err
set _ $err
} {ERR *any password*}
test {Arity check for auth command} {
catch {r auth a b c} err
set _ $err
} {*syntax error*}
}
start_server {tags {"auth external:skip"} overrides {requirepass foobar}} {
test {AUTH fails when a wrong password is given} {
catch {r auth wrong!} err
set _ $err
} {WRONGPASS*}
test {Arbitrary command gives an error when AUTH is required} {
catch {r set foo bar} err
set _ $err
} {NOAUTH*}
test {AUTH succeeds when the right password is given} {
r auth foobar
} {OK}
test {Once AUTH succeeded we can actually send commands to the server} {
r set foo 100
r incr foo
} {101}
test {For unauthenticated clients multibulk and bulk length are limited} {
set rr [valkey [srv "host"] [srv "port"] 0 $::tls]
$rr write "*100\r\n"
$rr flush
catch {[$rr read]} e
assert_match {*unauthenticated multibulk length*} $e
$rr close
set rr [valkey [srv "host"] [srv "port"] 0 $::tls]
$rr write "*1\r\n\$100000000\r\n"
$rr flush
catch {[$rr read]} e
assert_match {*unauthenticated bulk length*} $e
$rr close
}
}
foreach dualchannel {yes no} {
start_server {tags {"auth_binary_password external:skip"}} {
test {AUTH fails when binary password is wrong} {
r config set requirepass "abc\x00def"
catch {r auth abc} err
set _ $err
} {WRONGPASS*}
test {AUTH succeeds when binary password is correct} {
r config set requirepass "abc\x00def"
r auth "abc\x00def"
} {OK}
start_server {tags {"primaryauth"}} {
set master [srv -1 client]
set master_host [srv -1 host]
set master_port [srv -1 port]
set slave [srv 0 client]
test "primaryauth test with binary password dualchannel = $dualchannel" {
$master config set requirepass "abc\x00def"
$master config set dual-channel-replication-enabled $dualchannel
# Configure the replica with primaryauth
set loglines [count_log_lines 0]
$slave config set primaryauth "abc"
$slave config set dual-channel-replication-enabled $dualchannel
$slave slaveof $master_host $master_port
# Verify replica is not able to sync with master
wait_for_log_messages 0 {"*Unable to AUTH to PRIMARY*"} $loglines 1000 10
assert_equal {down} [s 0 master_link_status]
# Test replica with the correct primaryauth
$slave config set primaryauth "abc\x00def"
wait_for_condition 50 100 {
[s 0 master_link_status] eq {up}
} else {
fail "Can't turn the instance into a replica"
}
}
}
}
}