
Test on ARM + TLS often fail with this error: ``` *** [err]: Slave is able to detect timeout during handshake in tests/integration/replication.tcl Replica is not able to detect timeout ``` https://github.com/redis/redis-extra-ci/actions/runs/3727554226/jobs/6321797837 The replica logs show that in this case the replica got timeout before even getting a response to the PING command (instead of the SYNC command). it should have shown these: ``` * MASTER <-> REPLICA sync started * REPLICAOF 127.0.0.1:22112 enabled .... ### Starting test Slave enters handshake in tests/integration/replication.tcl * Non blocking connect for SYNC fired the event. ``` then: ``` * Master replied to PING, replication can continue... * Trying a partial resynchronization (request 50da9eff70d774f4e6cb723eb4b091440f215772:1). ``` and then hang for 5 seconds: ``` # Timeout connecting to the MASTER... * Reconnecting to MASTER 127.0.0.1:21112 after failure ``` but instead it got this (looks like it disconnected too early, and then tried to re-connect): ``` 10890:M 19 Dec 2022 01:32:54.794 * Ready to accept connections tls 10890:M 19 Dec 2022 01:32:54.809 - Accepted 127.0.0.1:41047 10890:M 19 Dec 2022 01:32:54.878 - Reading from client: error:0A000126:SSL routines::unexpected eof while reading 10890:M 19 Dec 2022 01:32:54.925 - Accepted 127.0.0.1:39207 10890:S 19 Dec 2022 01:32:55.463 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer. 10890:S 19 Dec 2022 01:32:55.463 * Connecting to MASTER 127.0.0.1:24126 10890:S 19 Dec 2022 01:32:55.463 * MASTER <-> REPLICA sync started 10890:S 19 Dec 2022 01:32:55.463 * REPLICAOF 127.0.0.1:24126 enabled (user request from 'id=4 addr=127.0.0.1:39207 laddr=127.0.0.1:24125 fd=8 name= age=1 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 qbuf=43 qbuf-free=20431 argv-mem=21 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=22317 events=r cmd=slaveof user=default redir=-1 resp=2') ### Starting test Slave enters handshake in tests/integration/replication.tcl 10890:S 19 Dec 2022 01:32:55.476 * Non blocking connect for SYNC fired the event. 10890:S 19 Dec 2022 01:33:00.701 # Failed to read response from the server: (null) <- note this!! 10890:S 19 Dec 2022 01:33:00.701 # Master did not respond to command during SYNC handshake 10890:S 19 Dec 2022 01:33:01.002 * Connecting to MASTER 127.0.0.1:24126 10890:S 19 Dec 2022 01:33:01.002 * MASTER <-> REPLICA sync started ### Starting test Slave is able to detect timeout during handshake in tests/integration/replication.tcl 10890:S 19 Dec 2022 01:33:05.497 * Non blocking connect for SYNC fired the event. 10890:S 19 Dec 2022 01:33:05.500 * Master replied to PING, replication can continue... 10890:S 19 Dec 2022 01:33:05.510 * Trying a partial resynchronization (request 947e1956372a0e6c819cfec51c42cc7979b0c221:1). 10890:S 19 Dec 2022 01:34:05.833 # Failed to read response from the server: error:0A000126:SSL routines::unexpected eof while reading 10890:S 19 Dec 2022 01:34:05.833 # Master did not reply to PSYNC, will try later ``` This PR sets enables the 5 seconds timeout at a later stage to try and prevent the early disconnection.
Redis Test Suite
The normal execution mode of the test suite involves starting and manipulating
local redis-server
instances, inspecting process state, log files, etc.
The test suite also supports execution against an external server, which is
enabled using the --host
and --port
parameters. When executing against an
external server, tests tagged external:skip
are skipped.
There are additional runtime options that can further adjust the test suite to match different external server configurations:
Option | Impact |
---|---|
--singledb |
Only use database 0, don't assume others are supported. |
--ignore-encoding |
Skip all checks for specific encoding. |
--ignore-digest |
Skip key value digest validations. |
--cluster-mode |
Run in strict Redis Cluster compatibility mode. |
--large-memory |
Enables tests that consume more than 100mb |
Tags
Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.
Tags can be applied in different context levels:
start_server
contexttags
context that bundles several tests together- A single test context.
The following compatibility and capability tags are currently used:
Tag | Indicates |
---|---|
external:skip |
Not compatible with external servers. |
cluster:skip |
Not compatible with --cluster-mode . |
large-memory |
Test that requires more than 100mb |
tls:skip |
Not compatible with --tls . |
needs:repl |
Uses replication and needs to be able to SYNC from server. |
needs:debug |
Uses the DEBUG command or other debugging focused commands (like OBJECT REFCOUNT ). |
needs:pfdebug |
Uses the PFDEBUG command. |
needs:config-maxmemory |
Uses CONFIG SET to manipulate memory limit, eviction policies, etc. |
needs:config-resetstat |
Uses CONFIG RESETSTAT to reset statistics. |
needs:reset |
Uses RESET to reset client connections. |
needs:save |
Uses SAVE or BGSAVE to create an RDB file. |
When using an external server (--host
and --port
), filtering using the
external:skip
tags is done automatically.
When using --cluster-mode
, filtering using the cluster:skip
tag is done
automatically.
When not using --large-memory
, filtering using the largemem:skip
tag is done
automatically.
In addition, it is possible to specify additional configuration. For example, to
run tests on a server that does not permit SYNC
use:
./runtest --host <host> --port <port> --tags -needs:repl